Evolutionary Biology - Concepts, Molecular and Morphological Evolution

Evolutionary Biology – Concepts, Molecular and Morphological Evolution . Pierre Pontarotti Editor Evolutionary Bio...

Author: Pierre Pontarotti

214 downloads 1714 Views 7MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Evolutionary Biology – Concepts, Molecular and Morphological Evolution

.

Pierre Pontarotti Editor

Evolutionary Biology – Concepts, Molecular and Morphological Evolution

Editor Dr. Pierre Pontarotti UMR 6632 Universite´ d’Aix-Marseille/CNRS Laboratoire Evolution Biologique et Mode´lisation, case 19 Place Victor Hugo 3 13331 Marseille Cedex 03 France [email protected]

ISBN 978-3-642-12339-9 e-ISBN 978-3-642-12340-5 DOI 10.1007/978-3-642-12340-5 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2010933958 # Springer-Verlag Berlin Heidelberg 2010 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: WMXDesign GmbH, Heidelberg, Germany Cover illustration: An antennal tip of a female parasitic wasp (Ichneumonidae: Cryptinae: Latibulus sp.). See Fig. 16.3b Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Preface

The 13th Evolutionary Biology Meeting was held in Marseille on the 22–25 September 2009. These events aim to gather leading scientists involved in research on evolutionary biology, promoting an exchange of state-of-the-art knowledge and the initiation of inter-group collaborations. Over the past years, this has been rewarded by the publication of several important review articles dealing with this subject matter. For me personally, the Evolutionary Biology Meeting is a valuable scientific exchange platform serving as booster for the use of evolutionary-based approaches not only in biology but also in other scientific fields. In 2009, some 100 presentations (oral, as well as “fast presentation” and traditional posters) admirably reflected the epistemological nature of the meeting. I selected one fifth of the most representative contributions for this book, these 21 articles being organized in different categories: Evolutionary Biology Concepts, Genome/Molecular Evolution, and Morphological Evolution/Speciation. I would like to thank the contributors to this book, as well as all other participants who helped making this meeting such as success, and our sponsors – the Universite´ de Provence, CNRS, GDR BIM, Conseil Ge´ne´ral 13, and Ville de Marseille. I gratefully acknowledge the support of members of the Association pour l’Etude de l’Evolution Biologique (AEEB). In addition, I am indebted to the staff of our publisher, Springer, for their competence and help. Last but not least, I sincerely wish to thank the AEEB coordinator, Axelle Pontarotti, for the excellent organization of the meeting and the production of the book. In terms of collaborative scientific exchange and the publication of this proceedings, the scientific output of the 13th Marseille meeting reflects the high quality not only of individual contributions but also of the Marseille way of hosting, for which Axelle Pontarotti is an outstanding ambassador. Marseille, France May 2010

Pierre Pontarotti

v

.

Contents

Part I

Evolutionary Biology Concepts

1

Extinct and Extant Reptiles: A Model System for the Study of Sex Chromosome Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Daniel E. Janes

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Eugene V. Koonin and Yuri I. Wolf

3

Starvation-Induced Reproductive Isolation in Yeast . . . . . . . . . . . . . . . . . 49 Eugene Kroll, R. Frank Rosenzweig, and Barbara Dunn

4

Populations of RNA Molecules as Computational Model for Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Michael Stich, Carlos Briones, Ester Lzaro, and Susanna C. Manrubia

5

Pseudaptations and the Emergence of Beneficial Traits . . . . . . . . . . . . . . 81 Steven E. Massey

Part II

Genome/Molecular Evolution

6

Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 John W. Whitaker and David R. Westhead

7

Comparative Genomics and Transcriptomics of Lactation . . . . . . . . . 115 Christophe M. Lefe`vre, Karensa Menzies, Julie A. Sharp, and Kevin R. Nicholas

vii

viii

Contents

8

Evolutionary Dynamics in the Aphid Genome: Search for Genes Under Positive Selection and Detection of Gene Family Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Morgane Ollivier and Claude Rispe

9

Mammalian Chromosomal Evolution: From Ancestral States to Evolutionary Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Terence J. Robinson and Aurora Ruiz-Herrera

10

Mechanisms and Evolution of Dorsal–Ventral Patterning . . . . . . . . . . 159 Claudia Mieko Mizutani and Rui Sousa-Neves

11

Evolutionary Genomics for Eye Diversification . . . . . . . . . . . . . . . . . . . . . . 179 Atsushi Ogura

12

Do Long and Highly Conserved Noncoding Sequences in Vertebrates Have Biological Functions? . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Yoichi Gondo

Part III

Morphological Evolution/Speciation

13

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina . . . . . . . . 209 Anne Duplouy and Scott L. O’Neill

14

Evolution of Immunosuppressive Organelles from DNA Viruses in Insects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Brian A. Federici and Yves Bigot

15

The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails with Remarkable Pharmacological Potential . . . . . . . . 249 Maria Vittoria Modica and Mande¨ Holford

16

Antennal Hammers: Echos of Sensillae Past . . . . . . . . . . . . . . . . . . . . . . . . . 271 Nina Laurenne and Donald L.J. Quicke

17

Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics and Evolutionary Patterns in Behavior and Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Burton K. Lim

18

Trends in Rhizobial Evolution and Some Taxonomic Remarks . . . . 301 Julio C. Martı´nez-Romero, Ernesto Ormen˜o-Orrillo, Marco A. Rogel, Aline Lo´pez-Lo´pez, and Esperanza Martı´nez-Romero

Contents

ix

19

Convergent Evolution of Morphogenetic Processes in Fungi . . . . . . . 317 Sylvain Brun and Philippe Silar

20

Evolution and Historical Biogeography of a Song Sparrow Ring in Western North America . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Michael A. Patten

21

Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont d’Arc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Ce´line Bon and Jean-Marc Elalouf

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

.

Contributors

Yves Bigot Laboratoire d’Etude des Parasites Ge´ne´tiquesParc Grandmont, Universite´ de Tours, U.F.R. des Sciences et Techniques, 37200 Tours, France Ce´line Bon CEA, IBiTec-S, F-91191, Gif-sur-Yvette cedex, France, celine.bon@ cea.fr Sylvain Brun UFR des Sciences du Vivant, Universite´ de Paris 7 – Denis Diderot, 75205 Paris Cedex 13, France; Institut de Ge´ne´tique et Microbiologie, UMR CNRS – Universite´ de Paris 11, UPS Baˆt. 400, 91405, Orsay cedex, France Barbara Dunn Department of Genetics, Stanford University, Stanford, CA 94305, USA Anne Duplouy School of Biological Sciences, The University of Queensland, Brisbane, QLD 4072, Australia, [email protected] Jean-Marc Elalouf CEA, IBiTec-S, F-91191 Gif-sur-Yvette cedex, France Brian A. Federici Department of Entomology and Interdepartmental Graduate Programs in Genetics and Microbiology, University of California, Riverside, CA 92521, USA; Laboratoire d’Etude des Parasites Ge´ne´tiquesParc Grandmont, Universite´ de Tours, U.F.R. des Sciences et Techniques, 37200 Tours, France, [email protected] Yoichi Gondo Mutagenesis and Genomics TeamRIKEN BioResource Center, 3-1-1 Koyadai, Tsukuba 305-0074, Japan, [email protected] Mande¨ Holford York College and Graduate Center, and The American Museum of Natural History, The City University of New York, NY, USA, mholford@york. cuny.edu

xi

xii

Contributors

Daniel E. Janes Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138-3899, USA, [email protected] Eugene V. Koonin National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA, [email protected] Eugene Kroll Division of Biological Sciences, University of Montana, Missoula, MT 59812, USA, [email protected] Nina Laurenne Museum of Natural History, Entomology Division, University of Helsinki, P.O. Box 17(P. Arkadiankatu 13), 00014, Helsinki, Finland, nina. [email protected] Christophe M. Lefe`vre Institute for Technology Research and Innovation, Deakin University, Waurn Ponds, Geelong, VIC 3217, Australia; CRC for Innovative Dairy Products, Department of Zoology, University of Melbourne, Melbourne, VIC 3010, Australia; Victorian Bioinformatics Consortium, Monash University, Clayton, Melbourne, VIC 3080, Australia, [email protected] Burton K. Lim Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario M5S 2C6, Canada, [email protected] Aline Lo´pez-Lo´pez Centro de Ciencias Geno´micas, UNAM, Av. Universidad, Cuernavaca, Morelos 62210, Me´xico Julio C. Martı´nez-Romero Centro de Ciencias Av. Universidad, Cuernavaca, Morelos 62210, Me´xico

Geno´micas,

UNAM,

Esperanza Martı´nez-Romero Centro de Ciencias Geno´micas, UNAM, Av. Universidad, Cuernavaca, Morelos 62210, Me´xico, esperanzaeriksson@ yahoo.com.mx Steven E. Massey Biology Department, University of Puerto Rico – Rio Piedras, P.O. Box 23360, San Juan, Puerto Rico 00931, USA, [email protected] Karensa Menzies Institute for Technology Research and Innovation, Deakin University, Waurn Ponds, Geelong, VIC 3217, Australia; CRC for Innovative Dairy Products, Department of Zoology, University of Melbourne, Melbourne, VIC 3010, Australia Claudia Mieko Mizutani Department of Biology, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 447080, USA Department of Genetics, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 447080, USA, [email protected]

Contributors

xiii

Maria Vittoria Modica Sapienza University of Rome, Piazzale Aldo Moro 5, 00185 Rome, Italy, [email protected] Kevin R. Nicholas Institute for Technology Research and Innovation, Deakin University, Waurn Ponds, Geelong, VIC 3217, Australia; CRC for Innovative Dairy Products, Department of Zoology, University of Melbourne, Melbourne, VIC 3010, Australia Scott L. O’Neill School of Biological Sciences, The University of Queensland, Brisbane, QLD 4072, Australia Atsushi Ogura Division of Advanced Sciences, Ochadai Academic Production, Ochanomizu University, Ohtsuka 2-1-1, Bunkyo, Tokyo 112-8610, Japan, ogura. [email protected] Morgane Ollivier INRA, UMR1099 BiO3P, Domaine de la Motte, F-35653, Le Rheu, France Ernesto Ormen˜o-Orrillo Centro de Ciencias Av. Universidad, Cuernavaca, Morelos 62210, Me´xico

Geno´micas,

UNAM,

Michael A. Patten Oklahoma Biological Survey and Department of Zoology, University of Oklahoma, 111 E. Chesapeake Street, Norman, OK 73019, USA, [email protected] Donald L.J. Quicke Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, Berkshire SL5 7PY, UK; Department of Entomology, Natural History Museum, London, SW7 5BD, UK Claude Rispe INRA, UMR1099 BiO3P, Domaine de la Motte, F-35653, Le Rheu, France, [email protected] Terence J. Robinson Evolutionary Genomics Group, Department of Botany and Zoology, University of Stellenbosch, Private Bag X1, Matieland 7602, South Africa, [email protected] Marco A. Rogel Centro de Ciencias Geno´micas, UNAM, Av. Universidad, Cuernavaca, Morelos 62210, Me´xico R. Frank Rosenzweig Division of Biological Sciences, University of Montana, Missoula, MT 59812, USA Aurora Ruiz-Herrera Unitat de Citologia i Histologia, Departament de Biologia Cel.lular, Fisiologia i Inmunologia, Universitat Auto`noma de Barcelona, Campus

xiv

Contributors

Bellaterra, 08193, Barcelona, Spain; Institut de Biotecnologia i Biomedicina, Universitat Auto`noma de Barcelona, Campus Bellaterra, 08193 Barcelona, Spain, [email protected] Julie A. Sharp Institute for Technology Research and Innovation, Deakin University, Waurn Ponds, Geelong, VIC 3217, Australia; CRC for Innovative Dairy Products, Department of Zoology, University of Melbourne, Melbourne, VIC 3010, Australia Philippe Silar UFR des Sciences du Vivant, Universite´ de Paris 7 – Denis Diderot, 75205 Paris Cedex 13, France; Institut de Ge´ne´tique et Microbiologie, UMR CNRS – Universite´ de Paris 11, UPS Baˆt. 400, 91405 Orsay cedex, France, [email protected] Rui Sousa-Neves Department of Biology, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 447080, USA Michael Stich Dpto de Evolucio´n Molecular, Centro de Astrobiologı´a (CSIC-INTA), Ctra de Ajalvir, km 4, Torrejo´n de Ardoz, Madrid 28850, Spain, [email protected] David R. Westhead Institute of Molecular and Cellular Biology, University of Leeds, Garstang Building, Leeds LS2 9J, UK, [email protected] John W. Whitaker Institute of Molecular and Cellular Biology, University of Leeds, Garstang Building, Leeds, LS2 9J, UK, [email protected] Yuri I. Wolf National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA

Part I Evolutionary Biology Concepts

Chapter 1

Extinct and Extant Reptiles: A Model System for the Study of Sex Chromosome Evolution Daniel E. Janes

Abstract The evolution and functional dynamics of sex chromosomes are focuses of current biological research. Although common organismal morphologies and functions of males and females are found among amniotes, underlying sex chromosome organizations and sex-determining mechanisms are widely variable. This chapter investigates the role that reptiles play in the study of sex chromosome evolution. Reptile studies have described the coevolution of genotypic sex determination and viviparity, the adaptive significance of sex-determining mechanisms, and shared ancestry of chromosomes. Novel resources, including whole-genome sequences and mapped sex-linked markers, have allowed researchers to examine sex chromosome evolution in reptiles, an important group for this type of study for their position as the sister group to mammals. Compared with mammals, reptiles exhibit much more variability in sex chromosome organization, providing raw material for study of sex chromosome evolution across amniotes.

1.1

Introduction

Embryos develop as either male or female depending on factors that vary widely among amniotes. Broadly speaking, amniotes can be classified as either genotypically sex-determined (GSD) or temperature-dependently sex-determined (TSD). Embryos of GSD species, including all mammals, birds, snakes, and many lizards and turtles, develop as either male or female depending on chromosomal contributions from parents at conception. Many, but not all, of these species exhibit detectable

D.E. Janes Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138-3899, USA e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_1, # Springer-Verlag Berlin Heidelberg 2010

3

4

D.E. Janes

cytogenetic sex differences (i.e., heteromorphic sex chromosomes). The difference between heteromorphic and homomorphic sex chromosomes could be explained by the length of the interval since the origin of genotypic sex determination in a species (Ohno 1967; Janes et al. 2010b). Apparently, sex chromosomes begin to diverge from each other only after a new GSD system arises (see Sect. 1.3.1). This sex difference in karyotype is not apparent in individuals of TSD amniotes that develop as male or female primarily in response to incubation temperature, including all crocodilians, tuataras, and some turtles and lizards. In this review, I will describe the variability of sex-determining mechanisms among amniotes. This variability includes, for example, the temperatures that trigger male or female development and the timing of temperature’s effect among TSD species, as well as the presence or absence and type of sex chromosomes in GSD species. Almost all mammals exhibit male heterogamety in which females carry two X sex chromosomes of the same size and content, whereas males carry one X sex chromosome and one smaller, degenerated Y sex chromosome. In birds, females are heterogametic which means they carry the smaller, degenerated W sex chromosome and one larger, more gene-rich Z sex chromosome, whereas male birds carry two Z sex chromosomes. This difference in heterogamety affects the genomics of amniotes in ways that are discernible from genome sequencing and experimental evidence. Further, the evolutionary history of sex-determining mechanisms informs the different arrangements of amniotic sex chromosomes that have been studied using techniques that include phylogenetic inference, cytogenetic mapping, and measurements of population genetics parameters. Recent studies of sex-determining mechanisms and, specifically, the evolution of sex chromosomes have focused on extinct and extant reptiles for two reasons. First, nonavian reptiles exhibit greater variety of sexdetermining mechanisms and sex chromosomes than birds or mammals. Second, genomic resources for reptiles (including birds) have recently improved to an extent that previously untestable hypotheses are now open to experimentation and comparative analyses (Janes et al. 2008).

1.2 1.2.1

Sex-Determining Mechanisms Patterns and Variability

Amniote sex-determining mechanisms are typically described as either GSD or TSD but within those categories, functional patterns vary. As described above, GSD species vary in their organization of sex chromosomes [i.e., female heterogamety (ZW system) or male heterogamety (XY system)] (Fig. 1.1a). Phylogenetic inference and comparative chromosome hybridizations suggest that male and female heterogamety have evolved more than once among amniotes although the exact number of independent origins is debated (Ezaz et al. 2009; Organ and Janes 2008). Likewise, the number of independent origins of temperature-dependent sex

1 Extinct and Extant Reptiles

a

5

b Type Ia TSD Type Ib TSD Type II TSD GSD Male Heterogamety Female Heterogamety No Heterogamety

Male Heterogamety

XX

X Female Heterogamety

W Z

ZZ

% Male Offspring / Clutch

Y

No Heterogamety

AA

AA

Incubation Temperature

Fig. 1.1 (a) Pairs of sex chromosomes that consist of either a male-specific Y chromosome and an X chromosome or a female-specific W chromosome and a Z chromosome. Species that exhibit these sex chromosomes are described as either male heterogametic (XY system) or female heterogametic (ZW system). Other GSD species exhibit no detectable heterogameties or sex differences in karyotype. (b) Influence of incubation temperature on offspring sex ratios among temperature-dependently (TSD) and genotypically sex-determined (GSD) species. The y-axis models the proportion of males yielded per clutch of eggs incubated at different points on the thermal gradient indicated on the x-axis. Sex-determining response to incubation temperature follows one of three patterns (Type Ia, Ib, or II) in TSD species. GSD species produce similarly balanced offspring sex ratios regardless of incubation temperature or type of heterogamety

determination is not clear. Although the sex-determining mechanisms of two or more species may respond to incubation temperature in a similar manner, the similarity may represent convergence. Three basic patterns of sex-determining response to incubation temperature (Types Ia, Ib, and II) have been described (Fig. 1.1b) (Bull 1983). Species that exhibit Type Ia temperature-dependent sex determination, such as loggerhead (Caretta caretta), green (Chelonia mydas), and leatherback (Dermochelys coriacea) sea turtles, produce more male offspring from eggs incubated at cooler temperatures (Standora and Spotila 1985). Species with Type Ib temperature-dependent sex determination, such as all crocodilians, produce more male offspring from eggs incubated at warmer temperatures (Valenzuela 2004). Species with Type II temperature-dependent sex determination, such as leopard geckos (Eublepharis macularius), produce a maximal proportion of males from eggs incubated at an intermediate temperature, whereas cooler or warmer temperatures yield higher proportions of females (Janes and Wayne 2006; Viets et al. 1994).

6

D.E. Janes

The timing of the effect of temperature on sex-determining response also varies among TSD reptiles. Shine et al. (2007) tested two TSD lizards for the effects of fadrozole, a chemical that blocks the bioconversion of testosterone to estrogen, thereby causing male development in eggs incubated at female-producing temperatures. In this type of experiment, the stage during which fadrozole affects offspring sex ratios represents the thermally sensitive period when temperature can influence sex determination. In two TSD reptiles, jacky dragons (Amphibolurus muricatus) and Duperrey’s window-eyed skinks (Bassiana duperreyi), the thermally sensitive period in which sex could be reversed by fadrozole treatment occurred in the first half of the postoviposition incubation period. The thermally sensitive period has been shown to occur slightly later in turtles and tuataras, during only the middle third of the postoviposition incubation period (Ewert et al. 2004; Mitchell et al. 2006) and occurs even later in crocodilians, during the third quarter of the entire incubatory period (Lang and Andrews 1994). GSD amniotes exhibit a similar degree of variability (Organ and Janes 2008). In birds, snakes, and some turtles and lizards, females are the heterogametic sex. Male heterogamety is found in some turtles and lizards and throughout mammals (with exceptions). The mammalian exceptions include, among others, the mole vole (Ellobius lutescens) in which a Y sex chromosome is absent. Both males and females of this species carry one X sex chromosome (Just et al. 1995; Vogel et al. 1998). Within heterogameties, there is variation in the extent of degeneration of either the male-specific Y sex chromosome or the female-specific W sex chromosome. For example, the Z and W sex chromosomes of emus (Dromaius novaehollandiae) are virtually homomorphic, whereas in chickens (Gallus gallus), the W sex chromosome is considerably smaller than the Z sex chromosome (Janes et al. 2009; Solari 1994). Clearly, a single line of demarcation between genotypic and temperature-dependent sex determination is overly simplistic and does not accurately represent the evolutionary history of sex-determining mechanisms in amniotes (Sarre et al. 2004).

1.2.2

Adaptive Significance of Sex-Determining Mechanisms

The variability of reptilian sex-determining mechanisms and, among GSD species, type of heterogamety are difficult to explain. Among agamid lizards, for example, species within the same genus with no discernible differences in natural history exhibit different sex-determining mechanisms (Ezaz et al. 2009; Uller et al. 2006). However, the adaptive significance of both genotypic and temperature-dependent sex determination has been explored in theory and experimentation. Fisher (1930) argued that parents should invest equally in sons and daughters. If sons and daughters represent equivalent parental investment, genotypic sex determination is expected to balance offspring sex ratios by matching them to the balanced

1 Extinct and Extant Reptiles

7

probability of inheriting an X or a Y chromosome from a male parent in a male heterogametic species or the probability of inheriting a Z or a W chromosome from a female parent in a female heterogametic species. Charnov and Bull (1977) hypothesized that temperature-dependent sex determination would allow parents greater control over offspring sex ratios in environments where the costs of sons and daughters are unequal and fluctuating. However, the Charnov–Bull hypothesis has not acquired much empirical support. Parents of TSD species do not appear to control offspring sex ratios by nesting behavior. However, Freedberg and Wade (2001) suggested that offspring sex ratios are inherited as nest sites, and their unique exposures to sun and soil temperature are passed matrilineally. Also, Warner and Shine (2008) demonstrated that incubation temperature can affect reproductive success in jacky dragons. Male jacky dragons hatched from eggs incubated at the optimal male-producing temperature had greater lifetime reproductive success than males hatched from eggs incubated at a different temperature and experimentally masculinized by chemical aromatase inhibition. The same pattern of greater reproductive success was reported among females incubated at either the optimal female-producing temperature or a different temperature. This study provides evidence that, in a TSD species, incubation temperature directly influences reproductive success in a sex-differential manner. Although this study supports the Charnov–Bull hypothesis, it does not explain why some species would benefit from temperature-dependent sex determination but not other closely related species with similar life history traits. Reproductive mode, whether a species is oviparous (egg-laying) or viviparous (live-bearing), is associated with type of sex-determining mechanism. Viviparity appears to be enabled by genotypic but not temperature-dependent sex determination. From a sample of 94 extant amniote species for which sex-determining mechanism, reproductive mode, and phylogenetic position are known, only two, perhaps three, exhibit both temperature-dependent sex determination and viviparity. The southern water skink (Eulamprus tympanum) and its sister species (Eulamprus heatwolei) give live birth and exhibit temperature-dependent sex determination and some evidence suggests that the spotted skink (Niveoscincus ocellatus) is also TSD and viviparous (Organ et al. 2009). For TSD species including these skinks, producing both male and female offspring requires exposing different embryos to one of at least two (optimal male-producing and optimal female-producing) thermal environments. For viviparous species, this requirement entails manipulating maternal body temperature and evidence for maternal manipulation of body temperature in TSD, viviparous skinks is debated (Allsop et al. 2006; While and Wapstra 2009). Further, as explained in Sect. 1.4, fluctuations in maternal body temperatures are even less likely in thermally consistent environments such as deep oceans. Apparently, thermal consistency is not an issue for oviparous, TSD species such as crocodilians and sea turtles because their nests experience sufficient thermal variation from top to bottom to explain mixed sex ratios emerging from clutches of eggs (Georges 1992 but see Warner and Shine 2009).

8

1.2.3

D.E. Janes

Genotype and Environment Interaction

The proximate differences among sex-determining mechanisms remain unclear. Controlled incubation studies in the laboratory have been used to identify species in which incubation temperatures may or may not skew offspring sex ratios. These incubation experiments that measure offspring sex ratios are challenged by the possibility that a specific temperature that elicits a sex-determining response goes inadvertently untested. Further, in a tested species, the difference between a temperature that yields a consistent offspring sex ratio and a temperature that yields lethality may be too small to tease them apart in incubation studies. In the face of such uncertainty, many experimental characterizations of sex-determining mechanisms are considered tentative (Viets et al. 1994). In addition to results from incubation studies, GSD and TSD species can be distinguished by the presence or absence of sex chromosomes. If a species has detectable sex chromosomes, then offspring sex ratios are expected to be defined by genotype. However, an exception to this rule has been presented by a study of central bearded dragons (Pogona vitticeps) (Quinn et al. 2007). Central bearded dragons exhibit clear female heterogamety, yet extreme incubation temperatures can feminize genotypically male embryos. This result suggests environmental effects on sex determination in a GSD species. Likewise, genotypic effects have been reported for leopard geckos (Eublepharis macularius), a reptile that has been classified as exhibiting TSD because incubation studies of leopard geckos demonstrate a clear and repeatable influence of incubation temperature on offspring sex ratios (Janes et al. 2007; Viets et al. 1993; Wagner 1980). Nonetheless, a quantitative genetic effect on temperature-dependent sex determination is clear from study of sex-determining response to incubation temperature in different matrilineal lines of leopard geckos. Janes and Wayne (2006) identified genetically dissimilar females within a captive-bred colony of leopard geckos. These females were each mated to fertile males and the resultant offspring were placed randomly within one of three environmental chambers set to temperatures known to produce either 0%, 50%, or 70% male offspring. In this species, a 100% male-producing incubation temperature has not been identified. Although incubation temperature overwhelmingly influenced offspring sex ratios across family lines, a genotype environment interaction was detected in the varying offspring sex ratios from different matrilineal lines exposed to the same incubation temperatures. This result suggests that families vary in their sex-determining response to incubation temperature. Genotype environment interactions also indicate that a studied trait is polygenic (Falconer and MacKay 1996). Polygenic inheritance is relevant to conservation of TSD reptiles that may be exceptionally vulnerable to climate change because of the possibility that they are not exposed to temperatures needed to produce both sons and daughters (Huey and Janzen 2008). If there is an underlying polygenic control of sex-determining responses to temperature in TSD reptiles, then there is opportunity for microevolution and adaptation to changing climates. Recent modeling has suggested that tuataras (Sphenodon guntheri) occupy a habitat in

1 Extinct and Extant Reptiles

9

which ambient temperature is expected to change to a degree that could negatively affect offspring sex ratios within the next century (Huey and Janzen 2008). If sexdetermining responses to temperature do not change adaptively, the remaining possibilities include extinction or migration to cooler habitats but migration is unlikely without human intervention considering tuataras’ habitat of small islands off New Zealand.

1.3 1.3.1

Sex Chromosomes Origins and Degeneration of Sex Chromosomes

Heteromorphic sex chromosomes arise when one of a pair of sex chromosomes degenerates to a sufficient degree that cytogenetic differences between the pair are observable. A number of different causes for this degeneration have been proposed, including the Hill–Robertson effect, background selection, Muller’s Ratchet, and hitchhiking of deleterious alleles onto favored mutations (Charlesworth and Charlesworth 2000; Charlesworth et al. 1987). The Hill–Robertson effect prevents the repair or elimination of deleterious alleles because of their close linkage to beneficial alleles and background selection explains rates of elimination or fixation by the degree to which an allele is either deleterious or beneficial. Mildly deleterious alleles are more likely to be tolerated than more seriously deleterious alleles (Charlesworth and Charlesworth 2000). If mildly deleterious alleles are permitted to accumulate on the Y chromosome as a result of reduced repair via recombination with the X, then, over time, the mean fitness of the Y chromosome declines. The accumulation of mildly deleterious alleles, known as Muller’s Ratchet, eventually causes an allele to become damaged and then eliminated. Following that, the homologous copy becomes fixed at a rate that is much faster than the fixation rate for genes that are retained as two copies (Rice 1987). Hitchhiking works in conjunction with Muller’s Ratchet to hasten the degeneration of the Y chromosome. Deleterious mutations that hitchhike with favorable alleles on the Y are less likely to be purged, further reducing the overall fitness of the chromosome. These forces drive the degeneration of sex chromosomes after an initial event that converts an ancestral pair of autosomes into sex chromosomes. Ohno (1967) described the origination of sex chromosomes from ancestral autosomes. Once a novel sex-determining gene is either exapted from a different function or transposed to a chromosome from elsewhere in the genome, recombination ceases in the general vicinity of the gene. This block to recombination allows parents to pass the sex-determining gene to either sons or daughters, depending on the nature of the expression of the sex-determining gene. In mammals, a single-copy gene called the sex-determining region on the Y (Sry) initiates male sexual development (Sinclair et al. 1990). Cessation of recombination around the Sry or some other ancestral sex-determining gene speeds up

10

D.E. Janes

Muller’s Ratchet, causing the degeneration of the mammalian Y chromosome. The evolution of avian sex chromosomes may have followed a different path. In chickens, dosage-dependent effects of a Z-linked gene, Dmrt1, appear to drive male sexual development rather than the absence of a single copy of a W-linked gene (Smith et al. 2009). Reptiles provide an excellent model for the process of sex chromosome degeneration because of the intermediate stages of chromosomal degeneration found in the group. For example, the smooth softshell turtle (Apalone mutica) is GSD but sex chromosomes have not yet been identified, most likely due to a lack of sufficient heteromorphy (Valenzuela et al. 2006). Further, micro-sex chromosomes have been found in central bearded dragons (Pogona vitticeps), common snake-necked turtles (Chelodina longicollis), and Chinese soft-shelled turtles (Pelodiscus sinensis) (Ezaz et al. 2005, 2006; Kawai et al. 2007). The variety of sex chromosome organizations has been mapped onto phylogenetic trees to investigate the number of origins of sex chromosomes and types of heterogameties in the group (Janzen and Krenz 2004; Pokorna and Kratochvil 2009). Parsimony, likelihood, Bayesian, and stochastic approaches reconstruct temperature-dependent sex determination as ancestral to archosaurs (turtles, crocodilians, and birds) (Organ and Janes 2008). Turtles are extraordinarily variable in their organizations of sex chromosomes with species exhibiting male heterogamety, female heterogamety, no detectable heterogamety, or temperature-dependent sex determination (Organ and Janes 2008). These results indicate multiple independent origins of sex chromosomes among archosaurs (Fig. 1.2). Also, Matsubara et al. (2006) demonstrated a lack of sequence similarity between the female heterogametic sex chromosomes of birds and those of snakes, indicating at least two independent origins of sex chromosomes. Reptiles, with such variability and rapidly improving genomic resources, provide tremendous raw material for studies of the causes and consequences of sex chromosome origination and degeneration.

1.3.2

Detection of Sex Chromosomes

Species for which genotypic sex determination has been ascribed but sex chromosomes have not yet been identified are an important focus of research on reptile genomics (Janes et al. 2010a). For species like the smooth softshell turtle, sex chromosomes have not been reported but it is unclear if this is because they are lacking in this species or if current cytogenetic techniques are not yet sufficiently sensitive to detect them. The cytogenetic technique of C-banding, which stains the heterochromatic regions of chromosomes, has identified female-specific W sex chromosomes in central bearded dragons (P. vitticeps) (Ezaz et al. 2005) as well as eastern bearded dragons (Pogona barbata), Nobbi dragons (Amphibolurus nobbi), and Mallee dragons (Ctenophorus fordi) (Ezaz et al. 2009). Comparative genomic hybridization, Ag–NOR staining, and fluorescent in situ hybridization (FISH) are also standard techniques for identifying karyotypic sex differences (Kawai

1 Extinct and Extant Reptiles

11

F M

F M

F M F M

Mammals

Tuatara

Geckos

F M

Skinks

Lacertid lizards

F M

Snakes

F M

Iguanids

F M

Birds

Crocodilians

F M

Turtles

Amphibians

F M

F M

0 Mya

100 Mya

200 Mya

300 Mya

Fig. 1.2 Presence or absence of male or female heterogamety across amphibians, nonavian and avian reptiles, and mammals (Organ and Janes 2008). Sex chromosomes have not been reported for crocodilians or tuataras, both exhibiting temperature-dependent sex determination. Female heterogamety is exhibited by snakes but is shaded differently in this figure to indicate that snake sex chromosomes do not share sequence with avian sex chromosomes as the two pairs of sex chromosomes most likely resulted from independent origins of female heterogamety (Matsubara et al. 2006). The characterization of similarities or differences between avian sex chromosomes and female heterogameties found in other reptiles and the estimation of the number of independent origins of sex chromosomes are focuses of reptilian genomics research (Janes et al. 2010a)

et al. 2007). As more sex chromosomes are identified, more sex-linked sequences will be cataloged for reptile species. For example, 18 S–28 S ribosomal RNA genes are located on both micro-sex chromosomes in the Chinese soft-shelled turtle but in more copies on the W chromosome than on the Z chromosome (Kawai et al. 2007). Comparative FISH mapping of sex-linked markers will be useful for supporting or rejecting hypotheses regarding the evolutionary history of sex-determining mechanisms. Clearly, snake and bird sex chromosomes have little or no sequence in common but the similarities and differences of sex chromosomes among birds, turtles, and possibly TSD reptiles have not yet been characterized (Fig. 1.2) (Janes et al. 2010b). However, Kawagoshi et al. (2009) identified five Z-linked markers in the Chinese soft-shelled turtle by FISH mapping cDNA fragments of the genes GIT2, NF2, SBNO1, SF3A1, and TOP3B. These markers map to chicken chromosome 15, suggesting a common origin.

12

1.3.3

D.E. Janes

Heterogamety and Dosage Compensation

Hypotheses are emerging about the differences between male and female heterogamety. For example, dosage compensation appears to function differently between male heterogametic and female heterogametic species. Genes found on the X chromosome in male heterogametic species and on the Z chromosome in female heterogametic species occur in different doses between males and females. Mammals balance gene dosage by inactivating an X chromosome. X-chromosome inactivation transcriptionally silences genes on one of two X chromosomes in a female, thereby balancing gene dosage between males and females (Payer and Lee 2008). Birds, however, do not globally inactivate a Z chromosome in males. Rather, dosage compensation appears to act rarely and on small regions of avian sex chromosomes (Melamed and Arnold 2007). In fact, global dosage compensation has only been found in male heterogametic groups, including therian mammals, fruitflies (Drosophila), and nematodes (Caenorhabditis elegans), whereas local dosage compensation has been found in female heterogametic groups, including birds and lepidopterans (Mank 2009). At present, the pattern has only been described among three male heterogametic groups and two female heterogametic groups and has yet to be explored among reptiles (but see King and Lawson 1996). Inactivation or hyper-transcription of sex-linked genes and entire chromosomes should be compared between closely related male heterogametic and female heterogametic reptiles, particularly among emydid turtles, chameleons, and geckos that exhibit differences in heterogamety within families (Organ and Janes 2008).

1.4

Fossil Evidence

Extinct reptiles are relevant to the study of sex chromosome evolution because of the order in which genotypic sex determination and sex chromosomes evolve. Sex chromosomes become detectable only after they have been sufficiently affected by evolutionary forces that arise subsequent to the block to recombination caused by either the novel function or novel location of a sex-determining gene. Fossils of extinct reptiles allow us to examine the history of sex-determining mechanisms and subsequently predict which extinct reptiles exhibited genotypic sex determination. Organ et al. (2009) used a reversible-jump Markov-chain Monte Carlo algorithm to establish a Bayesian posterior probability distribution for models of correlated change between different types of sex-determining mechanisms and reproductive modes in extant amniotes (see Sect. 1.2.2). Reproductive mode describes the means by which parents produce young. Among amniotes, species are either viviparous or oviparous. The Bayesian analysis yielded a significant result for correlated evolution of genotypic sex determination and viviparity. Oviparity does not effectively predict a certain sex-determining mechanism but viviparity predicts genotypic sex determination. As described above, only two, perhaps three, of 94 studied extant

1 Extinct and Extant Reptiles

13

amniotes are both viviparous and TSD. This correlation permitted a prediction of genotypic sex determination in extinct species known to be viviparous. In fact, fossil evidence demonstrates viviparity in several extinct marine reptiles, including sauropterygians, mosasaurs, and ichthyosaurs. The study predicted sex-determining mechanisms for seven species for which sex-determining mechanisms were known but not introduced to the algorithm. This test group included six extant reptiles and an extinct horse (Propalaeotherium) for which pregnant specimens have been found in the fossil record. The study showed that genotypic sex determination could be accurately predicted for viviparous species. All ten marine reptiles examined in the study were assigned a significant posterior probability of having genotypic sex determination. Organ et al. (2009) argued that this result is meaningful for the natural history of extinct marine reptiles. Oviparity in the open ocean would not have been possible for amniote species like ichthyosaurs because amniotic eggs require gas-exchange with the atmosphere (Andrews and Mathies 2000). Extant marine reptiles including saltwater crocodiles (Crocodylus porosus) and sea turtles nest on land but extinct marine reptiles like ichthyosaurs did not have a body plan that was likely to allow terrestrial nesting. Freed by viviparity from the requirement to nest on land, extinct marine reptiles evolved morphologies that were adaptive to pelagic existence. These morphologies included fluked tails, dorsal fins, and wing-shaped limbs. Further, if prerequisite for the evolution of viviparity, genotypic sex determination may have permitted the adaptive radiation of extinct marine reptiles since viviparity seems to be a prerequisite for the pelagic existence of those species (Caldwell and Lee 2001).

1.5

Impact of Genome Projects and Future Directions

The study of sex chromosome evolution has much to gain from current genome sequencing efforts. At present, only the green anole (Anolis carolinensis) and the painted turtle (Chrysemys picta) are focuses of genome sequencing projects (Janes et al. 2008) but the recently announced Genome 10K collection of species that has been targeted for whole-genome sequencing includes 3,297 nonavian reptiles (Haussler et al. 2009). In particular, the genome sequences of 140 turtles, 569 iguanids, and 621 geckos that have been targeted for genome sequencing will provide a window into the variability of sex-determining mechanisms and sex chromosome organizations found in these three groups. The identities and map locations of sex-linked markers will support or reject current hypotheses of common origins of sex chromosomes. For example, Kawai et al. (2009) suggested a common origin between the sex chromosome pairs of the gecko lizard (Gekko hokouensis) and chicken because they share a linkage group that consists of six markers. Following the publication of multiple reptile genomes, studies of this kind will involve more markers in more species, allowing more robust conclusions to be made regarding the number of independent origins of reptilian sex chromosomes.

14

D.E. Janes

Until the sequencing and mapping of sex-linked and sex-differentiating markers have reached a more advanced stage, studies of reptilian sex chromosomes will be smaller in scope. Nonetheless, sex-linked markers have been identified in birds (Backstro¨m et al. 2006; Hillier et al. 2004), snakes (Matsubara et al. 2006), turtles (Kawagoshi et al. 2009), and lizards (Kawai et al. 2009). These sequences provide sufficient raw material for mapping comparisons among pairs of reptilian sex chromosomes. Comparative mapping studies, in concert with ancestral reconstructions, will directly inform questions regarding the number of independent origins of sex chromosomes in reptiles and why sex chromosome systems have higher turnover in nonavian reptiles than they have in either birds or mammals. Acknowledgments I would like to thank Miguel Alcaide, Maude Baldwin, Elena Gonzalez, June Yong Lee, Christopher Organ, and Irene Salicini for their critical reviews of this chapter. This work has benefited from conversations with Nicole Valenzuela (NV), Scott V. Edwards (SVE), Tariq Ezaz, Jennifer A.M. Graves, Arthur Georges, and Andrew Sinclair. Support in the laboratory and valuable discussions were shared by Christopher Balakrishnan, Charles Chapus, and Andrew Shedlock. Funding for this work was provided by a grant from the United States National Science Foundation (MCB0817687) to NV and SVE. Last, I would like to thank Pierre Pontarotti for the invitation to contribute to the 13th Evolutionary Biology Meeting at Marseille where this work was presented.

References Allsop DJ, Warner DA, Langkilde T, Du W, Shine R (2006) Do operational sex ratios influence sex allocation in viviparous lizards with temperature-dependent sex determination? J Evol Biol 19(4):1175–1182 Andrews RM, Mathies T (2000) Natural history of reptilian development: constraints on the evolution of viviparity. Bioscience 50(3):227–238 Backstro¨m N, Brandstrom M, Gustafsson L, Qvarnstrom A, Cheng H, Ellegren H (2006) Genetic mapping in a natural population of collared flycatchers (Ficedula albicollis): conserved synteny but gene order rearrangements on the avian Z chromosome. Genetics 174(1): 377–386 Bull JJ (1983) Evolution of sex determining mechanisms. Benjamin/Cummings, Menlo Park, CA Caldwell MW, Lee MSY (2001) Live birth in Cretaceous marine lizards (mosasauroids). Proc R Soc Lond B Biol Sci 268(1484):2397–2401 Charlesworth B, Charlesworth D (2000) The degeneration of Y chromosomes. Phil Trans Roy Soc Lond B 355(1403):1563–1572 Charlesworth B, Coyne JA, Barton NH (1987) The relative rates of evolution of sex chromosomes and autosomes. Am Nat 130(1):113–146 Charnov EL, Bull J (1977) When is sex environmentally determined. Nature 266(5605):829–830 Ewert BJ, Etchberger CR, Nelson CE (2004) Turtle sex-determining modes and TSD patterns, and some TSD pattern correlates. In: Valenzuela N, Lance VA (eds) Temperature-dependent sex determination in vertebrates. Smithsonian Books, Washington, DC, pp 21–32 Ezaz T, Quinn AE, Miura I, Sarre SD, Georges A, Graves JAM (2005) The dragon lizard Pogona vitticeps has ZZ/ZW micro-sex chromosomes. Chromosome Res 13(8):763–776 Ezaz T, Valenzuela N, Grutzner F, Miura I, Georges A, Burke RL, Graves JAM (2006) An XX/XY sex microchromosome system in a freshwater turtle, Chelodina longicollis (Testudines: Chelidae) with genetic sex determination. Chromosome Res 14(2):139–150

1 Extinct and Extant Reptiles

15

Ezaz T, Quinn AE, Sarre SD, O’Meally D, Georges A, Graves JAM (2009) Molecular marker suggests rapid changes of sex-determining mechanisms in Australian dragon lizards. Chromosome Res 17(1):91–98 Falconer DS, MacKay TFC (1996) Introduction to quantitative genetics. Longmann Press, London, UK Fisher RA (1930) The genetical theory of natural selection. Oxford University Press, New York, USA Freedberg S, Wade MJ (2001) Cultural inheritance as a mechanism for population sex-ratio bias in reptiles. Evolution 55(5):1049–1055 Georges A (1992) Thermal characteristics and sex determination in field nests of the pig-nosed turtle, Carettochelys insculpta (Chelonia, Carettochelydidae), from northern Australia. Aust J Zool 40(5):511–521 Haussler D, O’Brien SJ, Ryder OA, Barker FK, Clamp M, Crawford AJ, Hanner R, Hanotte O, Johnson WE, McGuire JA, Miller W, Murphy RW, Murphy WJ, Sheldon FH, Sinervo B, Venkatesh B, Wiley EO, Allendorf FW, Amato G, Baker CS, Bauer A, Beja-Pereira A, Bermingham E, Bernardi G, Bonvicino CR, Brenner S, Burke T, Cracraft J, Diekhans M, Edwards S, Ericson PGP, Estes J, Fjelsda J, Flesness N, Gamble T, Gaubert P, Graphodatsky AS, Graves JAM, Green ED, Green RE, Hackett S, Hebert P, Helgen KM, Joseph L, Kessing B, Kingsley DM, Lewin HA, Luikart G, Martelli P, Moreira MAM, Nguyen N, Orti G, Pike BL, Rawson DM, Schuster SC, Seuanez HN, Shaffer HB, Springer MS, Stuart JM, Sumner J, Teeling E, Vrijenhoek RC, Ward RD, Warren WC, Wayne R, Williams TM, Wolfe ND, Zhang YP (2009) Genome 10K: a proposal to obtain whole-genome sequence for 10 000 vertebrate species. J Hered 100(6):659–674 Hillier LW, Miller W, Birney E, Warren W, Hardison RC, Ponting CP, Bork P, Burt DW, Groenen MAM, Delany ME, Dodgson JB, Chinwalla AT, Cliften PF, Clifton SW, Delehaunty KD, Fronick C, Fulton RS, Graves TA, Kremitzki C, Layman D, Magrini V, McPherson JD, Miner TL, Minx P, Nash WE, Nhan MN, Nelson JO, Oddy LG, Pohl CS, Randall-Maher J, Smith SM, Wallis JW, Yang SP, Romanov MN, Rondelli CM, Paton B, Smith J, Morrice D, Daniels L, Tempest HG, Robertson L, Masabanda JS, Griffin DK, Vignal A, Fillon V, Jacobbson L, Kerje S, Andersson L, Crooijmans RPM, Aerts J, van der Poel JJ, Ellegren H, Caldwell RB, Hubbard SJ, Grafham DV, Kierzek AM, McLaren SR, Overton IM, Arakawa H, Beattie KJ, Bezzubov Y, Boardman PE, Bonfield JK, Croning MDR, Davies RM, Francis MD, Humphray SJ, Scott CE, Taylor RG, Tickle C, Brown WRA, Rogers J, Buerstedde JM, Wilson SA, Stubbs L, Ovcharenko I, Gordon L, Lucas S, Miller MM, Inoko H, Shiina T, Kaufman J, Salomonsen J, Skjoedt K, Wong GKS, Wang J, Liu B, Yu J, Yang HM, Nefedov M, Koriabine M, deJong PJ, Goodstadt L, Webber C, Dickens NJ, Letunic I, Suyama M, Torrents D, von Mering C, Zdobnov EM, Makova K, Nekrutenko A, Elnitski L, Eswara P, King DC, Yang S, Tyekucheva S, Radakrishnan A, Harris RS, Chiaromonte F, Taylor J, He JB, Rijnkels M, Griffiths-Jones S, Ureta-Vidal A, Hoffman MM, Severin J, Searle SMJ, Law AS, Speed D, Waddington D, Cheng Z, Tuzun E, Eichler E, Bao ZR, Flicek P, Shteynberg DD, Brent MR, Bye JM, Huckle EJ, Chatterji S, Dewey C, Pachter L, Kouranov A, Mourelatos Z, Hatzigeorgiou AG, Paterson AH, Ivarie R, Brandstrom M, Axelsson E, Backstrom N, Berlin S, Webster MT, Pourquie O, Reymond A, Ucla C, Antonarakis SE, Long MY, Emerson JJ, Betran E, Dupanloup I, Kaessmann H, Hinrichs AS, Bejerano G, Furey TS, Harte RA, Raney B, Siepel A, Kent WJ, Haussler D, Eyras E, Castelo R, Abril JF, Castellano S, Camara F, Parra G, Guigo R, Bourque G, Tesler G, Pevzner PA, Smit A, Fulton LA, Mardis ER, Wilson RK (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432(7018):695–716 Huey RB, Janzen FJ (2008) Climate warming and environmental sex determination in tuatara: the last of the Sphenodontians? Proc R Soc Lond B Biol Sci 275(1648):2181–2183 Janes DE, Wayne ML (2006) Evidence for a genotype environment interaction in sex-determining response to incubation temperature in the leopard gecko, Eublepharis macularius. Herpetologica 62(1):56–62

16

D.E. Janes

Janes DE, Bermudez D, Guillette LJ, Wayne ML (2007) Estrogens induced male production at a female-producing temperature in a reptile (Leopard Gecko, Eublepharis macularius) with temperature-dependent sex determination. J Herpetol 41(1):9–15 Janes DE, Organ C, Valenzuela N (2008) New resources inform study of genome size, content, and organization in nonavian reptiles. Integr Comp Biol 48(4):447–453 Janes DE, Ezaz T, Graves JAM, Edwards SV (2009) Recombination and nucleotide diversity in the sex chromosomal pseudoautosomal region of the emu, Dromaius novaehollandiae. J Hered 100(2):125–136 Janes DE, Fujita MK, Organ CL, Shedlock AM, Edwards SV (2010a) Genome evolution in Reptilia, the sister group of mammals. Annu Rev Genom Hum Genet (in press) Janes DE, Organ CL, Edwards SV (2010b) Variability in sex-determining mechanisms influences genome complexity in Reptilia. Cytogenet Genome Res 127(2–4):242–248 Janzen FJ, Krenz JG (2004) Phylogenetics: which was first, TSD or GSD? In: Valenzuela N, Lance VA (eds) Temperature-dependent sex determination in vertebrates. Smithsonian Books, Washington, DC, pp 121–130 Just W, Rau W, Vogel W, Akhverdian M, Fredga K, Graves JAM, Lyapunova E (1995) Absence of Sry in species of the vole Ellobius. Nat Genet 11(2):117–118 Kawagoshi T, Uno Y, Matsubara K, Matsuda Y, Nishida C (2009) The ZW micro-sex chromosomes of the chinese soft-shelled turtle (Pelodiscus sinensis, Trionychidae, Testudines) have the same origin as chicken chromosome 15. Cytogenet Genome Res 125:125–131 Kawai A, Nishida-Umehara C, Ishijima J, Tsuda Y, Ota H, Matsuda Y (2007) Different origins of bird and reptile sex chromosomes inferred from comparative mapping of chicken Z-linked genes. Cytogenet Genome Res 117(1–4):92–102 Kawai A, Ishijima J, Nishida C, Kosaka A, Ota H, Kohno S, Matsuda Y (2009) The ZW sex chromosomes of Gekko hokouensis (Gekkonidae, Squamata) represent highly conserved homology with those of avian species. Chromosoma 118(1):43–51 King RB, Lawson R (1996) Sex-linked inheritance of fumarate hydratase alleles in natricine snakes. J Hered 87:81–83 Lang JW, Andrews HV (1994) Temperature-dependent sex determination in crocodilians. J Exp Zool 270(1):28–44 Mank JE (2009) The W, X, Y and Z of sex-chromosome dosage compensation. Trends Genet 25(5):226–233 Matsubara K, Tarui H, Toriba M, Yamada K, Nishida-Umehara C, Agata K, Matsuda Y (2006) Evidence for different origin of sex chromosomes in snakes, birds, and mammals and step-wise differentiation of snake sex chromosomes. Proc Natl Acad Sci USA 103(48):18190–18195 Melamed E, Arnold AP (2007) Regional differences in dosage compensation on the chicken Z chromosome. Genome Biol 8(9):R202 Mitchell NJ, Nelson NJ, Cree A, Pledger S, Keall SN, Daugherty CH (2006) Support for a rare pattern of temperature-dependent sex determination in archaic reptiles: evidence from two species of tuatara (Sphenodon). Front Zool 3:9 Ohno S (1967) Sex chromosomes and sex linked genes. Springer, Berlin Organ CL, Janes DE (2008) Evolution of sex chromosomes in Sauropsida. Integr Comp Biol 48 (4):512–519 Organ CL, Janes DE, Meade A, Pagel M (2009) Genotypic sex determination enabled adaptive radiations of extinct marine reptiles. Nature 461(7262):389–392 Payer B, Lee JT (2008) X chromosome dosage compensation: how mammals keep the balance. Annu Rev Genet 42:733–772 Pokorna M, Kratochvil L (2009) Phylogeny of sex-determining mechanisms in squamate reptiles: are sex chromosomes an evolutionary trap? Zool J Linn Soc 156(1):168–183 Quinn AE, Georges A, Sarre SD, Guarino F, Ezaz T, Graves JAM (2007) Temperature sex reversal implies sex gene dosage in a reptile. Science 316(5823):411 Rice WR (1987) Genetic hitchhiking and the evolution of reduced genetic activity of the Y sex chromosome. Genetics 116(1):161–167

1 Extinct and Extant Reptiles

17

Sarre SD, Georges A, Quinn A (2004) The ends of a continuum: genetic and temperaturedependent sex determination in reptiles. Bioessays 26(6):639–645 Shine R, Warner DA, Radder R (2007) Windows of embryonic sexual lability in two lizard species with environmental sex determination. Ecology 88(7):1781–1788 Sinclair AH, Berta P, Palmer MS, Hawkins JR, Griffiths BL, Smith MJ, Foster JW, Frischauf AM, Lovell-badge R, Goodfellow PN (1990) A gene from the human sex-determining region encodes a protein with homology to a conserved DNA-binding motif. Nature 346(6281): 240–244 Smith CA, Roeszler KN, Ohnesorg T, Cummins DM, Fairlie PG, Doran TJ, Sinclair AH (2009) The avian Z-linked gene DMRT1 is required for male sex determination in the chicken. Nature 461:267–271 Solari AJ (1994) Sex chromosomes and sex determination in vertebrates. CRC Press, Boca Raton, FL Standora EA, Spotila JR (1985) Temperature-dependent sex determination in sea turtles. Copeia 3:711–722 Uller T, Mott B, Odierna G, Olsson M (2006) Consistent sex ratio bias of individual female dragon lizards. Biol Lett 2(4):569–572 Valenzuela N (2004) Introduction. In: Valenzuela N, Lance VA (eds) Temperature-dependent sex determination in vertebrates. Smithsonian Books, Washington, DC, pp 1–4 Valenzuela N, LeClere A, Shikano T (2006) Comparative gene expression of steroidogenic factor 1 in Chrysemys picta and Apalone mutica turtles with temperature-dependent and genotypic sex determination. Evol Dev 8(5):424–432 Viets BE, Tousignant A, Ewert MA, Nelson CE, Crews D (1993) Temperature-dependent sex determination in the leopard gecko, Eublepharis macularius. J Exp Zool 265(6):679–683 Viets BE, Ewert MA, Talent LG, Nelson CE (1994) Sex-determining mechanisms in squamate reptiles. J Exp Zool 270(1):45–56 Vogel W, Jainta S, Rau W, Geerkens C, Baumstark A, Correa-Cerro LS, Ebenhoch C, Just W (1998) Sex determination in Ellobius lutescens: the story of an enigma. Cytogenet Cell Genet 80(1–4):214–221 Wagner E (1980) Temperature-dependent sex determination in a gekko lizard. Q Rev Biol 55:21, appendix Warner DA, Shine R (2008) The adaptive significance of temperature-dependent sex determination in a reptile. Nature 451(7178):566–568 Warner DA, Shine R (2009) Maternal and environmental effects on offspring phenotypes in an oviparous lizard: do field data corroborate laboratory data? Oecologia 161(1):209–220 While GM, Wapstra E (2009) Snow skinks (Niveoscincus ocellatus) do not shift their sex allocation patterns in response to mating history. Behaviour 146:1405–1422

Chapter 2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution Eugene V. Koonin and Yuri I. Wolf

Abstract Evolutionary genomics identifies multiple constraints that differentially affect different parts of the genomes of diverse life forms. The selective pressures that shape the evolution of viral, prokaryotic, and eukaryotic genomes differ dramatically, and substantial differences exist even between animal and bacterial lineages. Constraints on protein evolution appear to be more universal and could be determined by the fundamental physics of protein folding. Some key features of the molecular phenome such as protein abundance turn out to be unexpectedly conserved and hence strongly constrained. The constraints that shape the evolution of genomes and phenomes are complemented by the plasticity and robustness of genome architecture, expression, and regulation. Several universal “laws” of genome and phenome evolution were detected, some of which seem to be dictated by selective constraints and others by neutral process.

2.1

Introduction

In principle, the entire genome of any life form can be perceived as evolving under constraints (purifying selection) the strength of which varies from 0 (unconstrained evolution) to 1 (absolute conservation). Moreover, constraints affect evolution at all levels of biological organization, from genome sequence to genome architecture to gene expression to molecular interactions to actual organismal phenotypes (Kimura 1983; Lynch 2007c). Generally, constraints on the rates and paths of evolution can be divided into genomic, those that are manifest at the level of the genome sequence and architecture, and phenomic, those that pertain to phenotypic characteristics (although ultimately realized through genomic changes as well). Comparative E.V. Koonin and Y.I. Wolf National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_2, # Springer-Verlag Berlin Heidelberg 2010

19

20

E.V. Koonin and Y.I. Wolf

genomics and systems biology produce massive amounts of diverse data that provide for previously inconceivable insights into the patterns and processes of genome and phenome evolution (Kitano 2002; Medina 2005; Koonin and Wolf 2006; Lynch 2007c; Loewe 2009; Yamada and Bork 2009). Comparative genomics allows us, at least in principle, to measure the strength of constraints that affect different classes of sites in genomes and to elucidate the biological nature of these constraints. However, genome comparison does more than that as it gives us material to address evolutionary constraints beyond the traditional aspect of sequence conservation to higher level questions such as: how constrained in evolution are gene repertoires of organisms, genome architecture, evolution rate itself, and more? The massive influx of data from systems biology takes the study of evolutionary constraints into new dimensions by allowing researchers to ask qualitatively new questions: what are the nature and strength of constraints that affect gene expression, regulatory, and interaction networks, metabolic fluxes and other characteristics of organisms that can be denoted “molecular phenome”? In this article, we present a broad overview of the constraints that affect gene sequences, genome architectures, and molecular phenotypic characteristics such as gene expression level and the structures of protein–protein interaction and regulatory networks. We attempt a genome-wide and organism-wide assessment of different types of constraints operative at different levels and additionally discuss the concepts of robustness and plasticity that are intimately linked to constraints. Of course, the subject we address is vast and cannot be reasonably covered in full in one, relatively brief review. We leave out some important areas such as developmental constraints and only fleetingly touch upon others such as evolution of regulatory networks. Nevertheless, it is our hope that even such sketchy discussion reveals some important general aspects of constraints that define evolution at diverse levels of biological organization.

2.2

Evolutionary Constraints on Sequence Evolution Across Genomes and Taxa

The origins and characteristic strengths of constraints that affect different classes of sequences in genomes of different life forms are extremely diverse and certainly are not yet known in full. Typically, the constraints on sequences encoding proteins and structural RNAs (such as rRNAs and tRNAs) are stronger than the constraints on noncoding sequences although, for each type of sequences, there is a broad distribution of constraint strengths, and the ranges of the distributions overlap (Shabalina and Kondrashov 1999; Margulies et al. 2007). Obviously, constraints that affect a particular class of sites can be measured only by comparison to another class of sites that can be construed to evolve neutrally. The choice of an appropriate neutral model is a major problem in molecular evolution. In the pregenomic era, Motoo

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

21

Kimura, the founder of the neutral theory, was the first to come up with the simple but important idea that pseudogenes that are numerous in vertebrates could be used as a neutral baseline for assessing selection pressure (Kimura 1983). Despite some exceptional cases of pseudogene recruitment for specific functions (Khachane and Harrison 2009), in general, this contention still appears to hold true (Harrison and Gerstein 2002). Genomics revealed additional sources of (apparently) neutrally evolving sequences such as introns and intergenic regions in animals (Parsch et al. 2010; Resch et al. 2007). However, a general difficulty with any attempt to define a universal baseline of neutral evolution is that different parts of a genome differ in their mutation rates, and consequently, in the rate of neutral evolution for which the fixation rate equals the mutation rate (Ellegren et al. 2003). Therefore, for a reliable estimate of the strength of selection/constraints, the neutral model has to be derived from the same gene/region for which selection is being measured. Several such measures have been developed (Nielsen 2005; Charlesworth and Eyre-Walker 2008; Eyre-Walker and Keightley 2009). The most popular gage of selection pressure for protein-coding sequences naturally follows from the redundancy and nonrandom structure of the genetic code in which the same amino acid typically is encoded by codons that differ only in their third (or less commonly first) positions. This measure, Ka/Ks (dN/dS), is the ratio of the number or rate of nonsynonymous substitutions (those that change an amino acid in the encoded protein) to the number or rate of synonymous substitutions (those that occur in synonymous positions of codons and so do not affect the protein sequence) (Hurst 2002; Ellegren 2008). The assumption that underpins the use of Ka/Ks as a measure of selection is that synonymous sites evolve neutrally or at least under weak selection compared with nonsynonymous sites, allowing the use of synonymous sites as the baseline to measure the constraints on protein evolution. As a crude approximation, this assumption holds as for the great majority of protein-coding genes from any organism, Ka/Ks << 1 indicating that, taken as a whole, most proteins are subject to purifying selection of widely differing strength (Fig. 2.1). Moreover, the distribution of Ka spans a substantially wider range of values than the distribution of Ks, indicating that the constraints affecting proteins are qualitatively different from and much more diverse than those affecting synonymous sites (Fig. 2.1). For unconstrained, neutral evolution, Ka ¼ Ks as is the case for most pseudogenes. For a small subset of protein-coding genes, Ka/Ks > 1, which is construed as evidence of evolution under positive selection. Genes evolving under positive selection encode specialized proteins for which rapid change is paramount for function that typically involves “arms race” between competing agencies such as hosts and parasites; examples include proteins bacterial surface proteins (Petersen et al. 2007; Muzzi et al. 2008) and proteins involved in mammalian spermatogenesis, sperm competition, and sperm–egg interaction (Nielsen et al. 2005; Turner et al. 2008). Of course, evolution under positive selection is not unconstrained as constraints on the overall protein structure still apply (Worth et al. 2009) but evolution along the available trajectories proceeds rapidly. The fact that most protein-coding genes evolve under constraints imposed by purifying selection by no means implies that all amino acid sites are subject to the

22

E.V. Koonin and Y.I. Wolf dN dS

0.0001

0.001

0.01

0.1

1

10

distance between Human and Macaque orthologs

Human-Macaque B.cenocepacia-B. vietnamiensis AspergillusNeosartorya

0.001

0.01

0.1 dN/dS ratio for orthologs

1

10

Fig. 2.1 The distributions of evolutionary rates for nonsynonymous and synonymous sites of protein-coding genes in primates and the Ka/Ks ratios for three diverse pairs of species (Wolf et al. 2009)

same constraints. On the contrary, the evolutionary rates of sites and by implication the strength of constraints affecting different sites are well described by a characteristic skewed Gamma distribution (or more precisely a mixture of Gamma distribution), with a small fraction of sites that are virtually unconstrained or, in some cases, subject to positive selection and the majority of the sites subject to broadly distributed constraints (Kelly and Churchill 1996; Grishin et al. 2000; Mayrose et al. 2005; Nielsen 2005). The characteristic strengths of constraints that affect evolution of protein-coding genes widely differ between organisms. Typically, prokaryotic proteins are subject to stronger constraints than eukaryotic proteins, especially, those of multicellular forms (plants and animals), with the characteristic median Ka/Ks values in the range of 0.01–0.1 and 0.1–0.5, respectively (Fig. 2.1) (Jordan et al. 2002;

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

23

Novichkov et al. 2009b). The values of Ka/Ks and by inference the strength of constraints widely differ between evolutionary lineages such as diverse lineages of bacteria and archaea, and seem to be related to the specific lifestyles of the respective organisms (Novichkov et al. 2009b). The assumption that synonymous sites in protein-coding genes evolve neutrally is useful for measuring selection acting at the protein level but in itself is a rough approximation at best. The universally observed, significant positive correlation between Ka and Ks (Makalowski and Boguski 1998; Drummond and Wilke 2008, 2009; Ellegren 2008) indicates that evolution of synonymous sites is constrained as well and suggests that the evolutionary forces that shape the evolution of nonsynonymous and synonymous sites are related (see the section on protein evolution below). More accurate and powerful tests for purifying and positive selection affecting different classes of sites are variations of the classic McDonald–Kreitman test which compares the patterns of substitutions for within species variation (polymorphisms) with those for between species divergences, under the assumption that the fraction of nonneutral polymorphisms is negligible (Nielsen 2001, 2005). The overall distributions of constraints across genomes are dramatically different in life forms with distinct genome architectures, in particular, between viruses and prokaryotes, on the one hand, with their “wall-to-wall” genomes that consist mostly of protein-coding and RNA-coding genes, and multicellular eukaryotes in whose genomes the coding nucleotides are in the minority, on the other hand (Lynch and Conery 2003; Koonin 2009a) (Fig. 2.2). On a per nucleotide basis, the constraints affecting compact genomes, particularly, those of prokaryotes are orders of magnitude greater than the constraints on the larger genomes of multicellular eukaryotes. Considering the characteristic low Ka/Ks values indicative of strongly constrained evolution of protein sequences (Fig. 2.1), there are almost no sequences whose evolution is (effectively) unconstrained in the compact viral and prokaryotic genomes. The notable exception are pseudogenes that are common in some parasitic bacteria such as Rickettsia or Mycobacterium leprae (Harrison and Gerstein 2002; Darby et al. 2007; Monot et al. 2009). In typical genomes of free-living prokaryotes and especially viruses, noncoding regions constitute only 10–15% of the genome, and a considerable fraction of these sequences consists of regulatory elements (promoters, operators, terminators, and translation initiation regions) whose evolution is variably constrained (Molina and van Nimwegen 2008). The genomes of most viruses are even more compact than prokaryotic genomes, with nearly all of the genome sequence taken up by protein-coding genes (Koonin 2009a). Unicellular eukaryotes resemble prokaryotes in their overall genome architecture (notwithstanding important differences such as the absence of operons and the presence of varying numbers of introns) and show a roughly similar distribution of evolutionary constraints although the fraction of apparently unconstrained noncoding sequences in these genomes is somewhat greater. However, the genomes of multicellular eukaryotes (plants and especially animals) present a stark contrast. These organisms have intron-rich genomes with long intergenic regions, and a substantial, albeit variable fraction of these noncoding sequences indeed appear to

24

E.V. Koonin and Y.I. Wolf

100%

strong constraints

80%

"junk" genome

60%

40%

introns control elements

20%

weak constraints

ORFs multiicellular eukaryotes

unicellular eukaryotes

prokaryotes

viruses

0%

Fig. 2.2 Approximate distribution of evolutionary constraints across genomes with different architectures. The fractions of different classes of sequences subject to constraints of varying strength are shown as rough approximation of the values that are typical of the respective class of genomes

undergo unconstrained evolution (Fig. 2.2). Using McDonald–Kreitman-based approaches, it is possible to estimate the fraction of the nucleotides in a genome that are subject to evolutionary constraints (Sella et al. 2009). These estimated fractions substantially differ even between animals: in Drosophila, 70% of the sites including 65% of the noncoding sites appear to be subject to selection (including positive selection) (Sella et al. 2009), whereas in mammals, this fraction is estimated at 5–6% only as determined using repeats ancestral to human and mouse as a neutral baseline (Waterston et al. 2002). An independent approach based on the deviations from the expected neutral distribution of insertions and deletions in mammalian genomes led to an even lower value of 3% of sites under constraint (Lunter et al. 2006). It is notable, however, that the absolute numbers of sites subject to selection in these animal genomes of widely different size are quite close. By contrast, in Arabidopsis, a plant that is comparable to Drosophila in terms of genome size and overall architecture, the fraction of constrained noncoding sites appears to be substantially lower. The estimate of 3–6% for the fraction of constrained sites in mammalian genomes is remarkable from two opposite standpoints. On the one hand, it appears that the great majority of the mammalian genomic DNA after all fits the early (and much maligned) definition of junk (Doolittle and Sapienza 1980). Of course,

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

25

recruitment of “junk” sequences, such as those of diverse transposable elements, for various functions is common (Jordan et al. 2003; Bowen and Jordan 2007), so yesterday’s junk can be today’s essential gene (and vice versa) but at any given time, most of the primate genome evolves without appreciable constraints. But the converse aspect of these estimates is that, as protein-coding sequences comprise only 1.2% of the genome (Waterston et al. 2002), the substantial majority of the selected sites do not encode amino acids. We still do not know the actual distribution of the constrained sites among different classes of sequences or the distribution of selection pressures but some important contributions and their approximate magnitudes have become clear. In particular, the selective pressure on 50 -terminal and especially long 30 -terminal untranslated regions of mammalian genomes is comparable to that affecting synonymous sites in coding regions if not stronger (Duret et al. 1993; Shabalina et al. 2004; Drake et al. 2006). An even greater contribution to the noncoding part of the mammalian “selectome” using the term in the most general sense as the totality of sites subject to all form of selection as opposed to the original usage limited to positive selection (Proux et al. 2009) is the ever-growing compendium of noncoding RNA genes present in vertebrate genomes, the RNome (Costa 2005). A major and currently best characterized part of the RNome consists of thousands of regulatory microRNAs that are subject to a broad range of evolutionary constraints (Shabalina and Koonin 2008; Carthew and Sontheimer 2009). In addition, there are numerous long noncoding (macro) RNAs the functions of which remain largely unclear although there is striking anecdotal evidence of roles of these RNAs in gene regulation and development (Ponting et al. 2009). Approximately 3,000 macroRNAs were found to be conserved in mammals and are subject to a selective pressure that appears to be comparable to the constraints affecting protein-coding genes (Ponjavic et al. 2007). Beyond doubt, the known part of the RNome is the proverbial tip of the iceberg, especially considering the detection of transcripts from nearly all sequences in mammalian genomes (Bertone et al. 2004; Johnson et al. 2005). Comparativegenomic analysis reveals numerous conserved sequences (including the so-called ultraconserved elements that retained their identity throughout long evolutionary spans such as the entire course of vertebrate evolution) within introns and intergenic regions of animals and plant genomes (Dermitzakis et al. 2005; Elgar 2009), but so far transcription into a specific functional RNA has been demonstrated only for a few of these (Bejerano et al. 2004; Baira et al. 2008). Nevertheless, it has been shown that the ultraconserved sequences are subject to “ultraselection” suggesting key functions that remain to be deciphered (Katzman et al. 2007). On the whole, the problem of evolutionarily constrained “dark matter” in animal genomes remains pertinent as the status of the majority of constrained nucleotides is still unclear, at least, in vertebrates, the organisms with the lowest known gene density. In particular, the extent of sequence conservation unrelated to transcription but rather caused by requirements of expression regulation, chromatin structure, and other factors is still a wide open question. To succinctly summarize the current understanding of the constraints affecting different types of sites across the known diversity of the genomes (Fig. 2.2), some

26

E.V. Koonin and Y.I. Wolf

fundamental, straightforward conclusions appear indisputable, in particular, that nonsynonymous sites in protein-coding sequences and sequences encoding structural RNAs are among the most strongly constrained and that the characteristic distributions of constraints critically depend on genome architecture. However, beyond these basic principles, and perhaps unexpectedly, the evolutionary regimes seem to widely differ even for rather closely related lineages, and much additional work in diverse organisms is required to develop a comprehensive picture of the constraints and pressures that shape genome evolution.

2.3

Evolutionary Constraints on Gene and Genome Architectures

Beyond sequence evolution, comparative genomics yields massive amounts of data on the evolution of gene and genome organization, or architecture. An aspect of gene architecture that is common to all life forms but is particularly prominent in eukaryotes is the multidomain organization of proteins (Koonin et al. 2000). Numerous proteins consist of multiple “evolutionary domains” that may or may not correspond to structural domains but in either case show varying degrees of evolutionary mobility. The multidomain organization of some key proteins is conserved through the entire course of evolution of domains of cellular life (archaea, bacteria, and eukaryotes), as is the case of the association of polymerase domains with nuclease domains in different families of DNA polymerases (Aravind and Koonin 1998), to mention just one striking example. More generally, however, domain rearrangements at all ranges of evolutionary distances form an important resource of evolutionary plasticity which is particularly remarkable in the case of so-called promiscuous domains which combine with diverse other domains in numerous proteins and often provide connections in interaction and regulatory networks and complexes (Wuchty and Almaas 2005; Basu et al. 2008, 2009). A feature of gene architecture that is almost fully eukaryote-specific is the exon–intron organization of protein-coding genes which in eukaryotes consist of multiple exons separated by introns. A notable discovery of comparative genomics is the high level of conservation of intron positions over long evolutionary spans: indeed, up to 25–30% of the intron positions are shared between animals and plants, with the implication that most of these introns remained in the same positions throughout eukaryotic evolution (Fedorov et al. 2002; Rogozin et al. 2003; Roy and Gilbert 2006). Within some of the animals lineages, in particular, vertebrates, there seems to be almost complete intron stasis, with minimal intron loss and virtually no gain. In a sharp contrast, evolution of other lineages, such as nematodes, as well as many groups of unicellular eukaryotes, involves extensive turnover of introns (Carmel et al. 2007; Roy and Penny 2007). Thus, evolution of eukaryotic gene architecture shows a complex landscape, with a dynamic evolutionary process in some lineages but much less change in others.

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

27

Genome architecture refers to all aspects of the mapping of genetic elements onto the genome including gene order, clustering, and co-regulation of genes with related functions, allocation of genes to individual chromosomes, etc. (Carmel et al. 2007; Lynch 2007c; Roy and Penny 2007; Koonin 2009a). The very first comparisons of the order of genes in sequenced bacterial genomes revealed a remarkable lack of conservation of the long-range gene order which contrasts with the recurrent presence of partially conserved arrays of co-regulated genes, operons, in diverse prokaryotes (Mushegian and Koonin 1996a; Dandekar et al. 1998). Subsequent analysis has shown that the divergence of long-range gene orders in prokaryotes is roughly proportional to sequence divergence of protein-coding genes but evolution of gene order is extremely fast such that, for many lineages, no long-range conservation is seen even at very low levels of sequence divergence. Beyond this general pattern, the rate of gene order decay substantially differs between prokaryotic lineages (Novichkov et al. 2009b) (Fig. 2.3). The gene order in prokaryotes appears to be disrupted primarily by inversions centered at the origin of replication the frequency of which dramatically differs among prokaryotes (Eisen et al. 2000). Apparently, the origin-centered inversion is a neutral process that is not constrained (or minimally constrained) by purifying selection and depends primarily on the activity of the relevant recombination machinery. In contrast to the lack of conservation of the long-range gene order, prokaryotic operons are characterized by a combination of evolutionary resilience and plasticity, forming overlapping gene arrays that are partially shared by evolutionarily

Genome rearrangement distance (dY)

0.3 Shewanella baltica 0.25

Bacillus anthracis Burkholderia ambifaria Yersinia pestis

0.2

0.15

0.1

0.05

0 0

0.5

1

1.5

2

2.5

Sequence distance (dS)

Fig. 2.3 Divergence of large-scale genome organization vs. protein sequence conservation. The data are shown for four sets of closely related bacterial strains from the ATGC database (Novichkov et al. 2009a). The rearrangement distance (dY) is calculated as the fraction of (putative) orthologs that do not belong to regions of synteny. The dS value of 1 approximately corresponds to 93–97% identity between the compared sequences (Novichkov et al. 2009b)

28

E.V. Koonin and Y.I. Wolf

distant organisms (Rogozin et al. 2002; Ling et al. 2009). To a large extent, the wide spread of some operons among prokaryotes (the ribosomal superoperon and membrane transport cassette operons being the prime cases in point) owes to horizontal gene transfer (HGT) as captured in the selfish operon concept (Lawrence and Roth 1996; Lawrence 1999). When a transferred piece of DNA includes an entire operon consisting of genes encoding a complete pathway or functional system, the chances of fixation dramatically increase. The lack of long-range gene order conservation notwithstanding, the gross architecture of prokaryotic genomes is not entirely unconstrained: there are substantial biases in gene localization, for instance, the preferential codirectionality of gene transcription with replication, conceivably, as a result of selection for minimization of the chance of collision between RNA polymerase and replication forks (Rocha 2008). With a few notable exceptions, such as nematodes and trypanosomes, eukaryotes have no operons; those operons that do exist have nothing to do with prokaryotic operons and seem to have evolved de novo (Blumenthal 2004; Osbourn and Field 2009). Attempts to identify nonrandomness in the eukaryotic gene order, in the form of clustering of genes with connected functions, similar expression levels, and patterns, and other similar characteristics have led to mixed results (Hurst et al. 2004; Koonin 2009a; Osbourn and Field 2009). With some striking exceptions such as the strict order of the animal Hox genes (Lemons and McGinnis 2006), the trends in gene clustering tend to be weak, so the gene order can be considered quasirandom (Koonin 2009a). Evolution of gene order in eukaryotes seems to be determined, primarily, by random chromosomal breaks, and there are no highly conserved gene arrays between distantly related forms, such as different animal phyla, let alone animals and fungi or plants. On the whole, evolution of genome architecture appears to be shaped by the interplay of strong constraints that determine the conservation of operons, weak constraints on other forms of functional clustering and large-scale gene organization, and extensive dynamics of genome rearrangements and HGT. This dynamics both counteracts weak constraints by disrupting gene associations and reinforces the effect of stronger constraints as in the case of horizontal spread of “selfish” operons.

2.4

Evolutionary Constraints on Genome Size, Gene Number, Evolution of Orthologous Gene Lineages, and Gene Repertoires

The number of protein-coding genes in cellular life forms varies within a surprisingly narrow range compared with the genome size and especially considering the difference in organizational complexity between prokaryotes and multicellular eukaryotes. Excluding, on one end of the spectrum, extremely reduced genomes of some intracellular parasitic bacteria that seem to be on their way to becoming

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

29

organelles (Nakabachi et al. 2006) and, on the other end, polyploid plant genomes, the number of encoded proteins varies only from 500 to 25,000, less than two orders of magnitude (Koonin 2009a). The largest known bacterial genome contains only about twofold fewer protein-coding genes than the most complex eukaryotic genomes. As already mentioned above, the genome architectures are drastically different between unicellular and multicellular life forms, so that in unicellular organisms, especially in prokaryotes, the number of encoded proteins closely correlates with the genome size (roughly constant gene density, around one gene per kilobase of DNA), whereas in multicellular organisms, especially animals, the two are decoupled. What constrains the number of encoded proteins from below and from above? The low threshold of genomic complexity intuitively relates to a “minimal gene set for cellular life”, that is, the minimal set of genes sufficient to maintain a functional cell (in practice, of course, a prokaryotic cell) (Koonin 2003; Moya et al. 2009). The concept of a minimal gene set is intrinsically linked to the definition of gene orthology and orthologous gene sets and nonorthologous gene displacement. Orthologs are genes that evolved from a single ancestral gene in the last common ancestor of the compared genomes in contrast to paralogs, genes that evolved by duplication (Koonin 2005). For the majority of genes, evolution of orthologous gene lineages is constrained within a distinct trajectory so that such lineages remain unique and distinguishable from each other over long evolutionary spans. This evolutionary distinctness of orthologous lineages provides for the considerable effectiveness of straightforward methods for identifications of orthologous genes sets based on “bidirectional best hits” and is key to comparative genomics allowing comprehensive comparison of gene repertoires and delineation of core sets of conserved genes and putative minimal gene sets (Tatusov et al. 1997; Altenhoff and Dessimoz 2009). Minimal gene sets for cellular life derived by comparative-genomic and experimental approaches converge at 250–350 genes and seem to encode most of the essential cellular functions (Koonin 2003; Moya et al. 2009). However, an apparent paradox is that a set of 250–350 conserved orthologous genes can be derived only in comparisons of small sets of genomes of not too diverse organisms as exemplified by the first analysis of this kind that compared the parasitic bacteria Haemophilus influenzae and Mycoplasma genitalium and yielded a hypothetical minimal gene set of approximately 250 genes (Mushegian and Koonin 1996b). The core set of ubiquitously conserved genes is continuously shrinking with the addition of new sequenced genomes and seems to be limited to approximately 30 genes, all encoding proteins involved in translation and transcription (Charlebois and Doolittle 2004; Koonin and Wolf 2008). The explanation is nonorthologous gene displacement: most of the essential cellular functions can be performed by members of more than one orthologous gene set, and in many cases, genes or systems responsible for the same function are completely unrelated (Koonin et al. 1996; Koonin 2003). The relevant concept for defining a minimal genetic complement of a cell – the low bound of genomic complexity – is not a unique minimal gene set but rather a unique set of indispensable functional niches that can be filled with diverse collections of genes. Minimal requirements for specific life styles can be defined similarly, for

30

E.V. Koonin and Y.I. Wolf

instance, the minimal gene complement of an autotrophic organism, which includes about 1,000 essential functions (Koonin 2003). Thus, the low bound is defined by the minimal number of functions that are necessary to support a particular life style, but even at this fundamental level of cellular organization, there is notable plasticity in terms of specific gene complements supporting these functions. The nature of the upper bound of genetic complexity is much less clear. However, the question why, despite the accelerating genome sequencing, the maximum number of genes practically does not grow, seems pressing, especially, considering the decoupling of gene number and genome size seen in multicellular prokaryotes. One attractive hypothesis is the “bureaucratic ceiling of complexity”. It has been noticed that different functional classes of genes scale differently with the total number of genes in a genome. Some variation notwithstanding, in prokaryotes, there seem to be three fundamental exponents that characterizes these dependences: 0, 1, and 2 (van Nimwegen 2003; Koonin and Wolf 2008). Genes for proteins involved in information processing (translation, transcription, and replication) scale with a 0 exponent, i.e., the number of these genes reaches a plateau already in the smallest genomes and effectively does not depend on the overall genomic complexity; metabolic enzymes and transport proteins scale roughly proportionally to the total number of genes, whereas regulators and signal transduction system components scale quadratically (Fig. 2.4). The characteristic exponents of the three broad functional classes of genes show remarkably little variation across prokaryotic lineages suggesting that the differential evolutionary dynamics of genes with different functions reflect fundamental “laws” of evolution of cellular organization (Molina and van Nimwegen 2009) or, in other words, distinct, strong constraints on the functional composition 10000

Number of proteins in the class

Transcriptional regulators Signal transduction Metabolism 1000

Translation g = 1.0

100 γ = 0.2

10 g = 1.9 g = 1.9 1 100

1000

10000

Total number of proteins in COGs

Fig. 2.4 Differential scaling of four broad classes of genes with the total number of genes in prokaryotic genomes. The data are from (Koonin and Wolf 2008); genes that did not belong to COGs (typically, 15–20% in each genome) were not taken into account

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

31

of genomes. Eukaryotic genes show similar even if less pronounced patterns of power law gene scaling, with the exponent for the regulatory genes being substantially greater than one (van Nimwegen 2003). The deep underlying causes of the superlinear scaling of the regulators remain to be understood. A simple “toolbox” model of evolution of prokaryotic metabolic networks seems to be compatible with the quadratic scaling of regulators (Maslov et al. 2009). Under this model, enzymes for utilizing new metabolites together with their dedicated regulators are added (primarily, via HGT) to a progressively versatile reaction network, and because of the growing complexity of the preexisting network that provides enzymes for intermediate reactions, the ratio of regulators to regulated genes steadily grows. Regardless of the exact underlying mechanisms, the superlinear scaling of the regulators clearly could determine the upper limit of the growth of the gene number. At some point (that is not easy to identify precisely), the cost of adding extra regulation (“inflating bureaucracy”) will inevitably become unsustainable, curbing the growth of genetic complexity. The bureaucracy ceiling hypothesis seems particularly plausible in view of the surprising lack of major gene number expansion in vertebrates where the coupling between the gene number and genome size is obviously broken (see also below). In these organisms, the cost of replication can be ruled out as the major factor determining the upper limit, and the cost of regulation, possibly, along with the cost of expression, is the most likely candidate for the role of the principal constraint. It is not by chance, then, that vertebrates evolved other, elaborate means of increasing the proteomic complexity, such as the pervasive alternative splicing and alternative transcription (Nilsen and Graveley 2010), and regulatory complexity (the expansive, still under-appreciated regulatory RNome) that do not involve inflation of the number of protein-coding genes. A major process of genome evolution that in eukaryotes could be the principal path to innovation is gene duplication leading to the formation of paralogous gene families (Ohno 1970; Lespinet et al. 2002). The size distribution of paralogous families in each studied genome follows a power-law-like function that is reproduced, with a high precision, by a simple gene birth and death model conditioned on the equilibrium (constant size) in genome evolution (Karev et al. 2002; Koonin et al. 2002). This process seems to underlie a fundamental constraint on gene demography that is coupled to the constraint on the total number of genes. Beyond the sheer numbers of genes, comparative genomics yields insights into the constraints on and plasticity of gene repertoires. In agreement with the findings on the small and shrinking cores of conserved genes, nonorthologous gene displacement, and extensive redundancy, gene loss has emerged as a major factor of evolution in all life forms. Gene loss is dominant over other processes in the evolution of parasites but is extensive in all lineages, in particular, in the evolution of many animal taxa as illustrated by the high level of orthology between vertebrates and primitive animals such as sea anemone and trichoplax, in contrast to much more limited orthologous relationships between vertebrates and arthropods or nematodes (Putnam et al. 2007; Srivastava et al. 2008). Individual genes show a broad distribution of propensities for gene loss (PGL) (Krylov et al. 2003), and

32

E.V. Koonin and Y.I. Wolf

moreover, it appears that the observed evolutionary and phenomic features of genes are compatible with a steady-state model of genome evolution under which the distribution of PGL as well as the distribution of gene loss rate remain effectively constant over extended evolutionary spans (Wolf et al. 2009). This distribution might be another important constraint governing genome evolution.

2.5

The Causes of Evolution of Protein-Coding Genes

Protein-coding genes, at least, the nonsynonymous positions that determine the amino acid identity, are among the most strongly constrained sequences in all genomes. However, the distribution of the rates of evolution among orthologous genes in any pair of compared genomes spans 3–4 orders of magnitude and is much broader than the distribution of the rates for synonymous sites (Fig. 2.1). Remarkably, the shapes of the rate distributions for orthologous proteins are highly similar for all studied cellular life forms, from bacteria to archaea to mammals (Wolf et al. 2009) (Fig. 2.5). Another universal of genomic and phenomic evolution is the anticorrelation between the rate of evolution of a protein-coding gene and its expression level: highly expressed genes evolve slowly, a dependence that was invariably observed in all model organisms for which expression data are available (Pal et al. 2001, 2006; Krylov et al. 2003; Drummond and Wilke 2008). Given the aforementioned positive Burkholderia Salinispora Methanococcus Homo Aspergillus model

0.01

0.1

1 Relative evolution rate

10

Fig. 2.5 The universal distribution of evolutionary rates across orthologous gene sets. The evolutionary rates for five pairs of closely related organisms from different branches of life were calculated as nucleotide distances for the complete sets of orthologous genes (Wolf et al. 2009). The relative evolution rate for each gene was obtained by dividing its evolution rate by the median rate for the respective pair of organisms. “Model” refers to estimated transition rates in 134 mutationally connected networks for simulated robustly folding 18-mer protein-like molecules (Lobkovsky et al. 2010). Original model rates were normalized by their median value and scaled to standard deviation of 0.25 to match the width of the distributions derived from biological data

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

33

correlation between Ka and Ks, it is not surprising that both rates show the same dependence; more unexpectedly, this anticorrelation with the evolutionary rate was detected also for 30 UTRs but not for 50 UTRs (Jordan et al. 2004). The existence of these universals of genomic evolution and their fundamental link with phenomic characteristics suggest that the primary causes of protein evolution could have more to do with fundamental principles of protein folding than with unique biological functions. It has been proposed that the principal selective factor underlying the evolution of proteins is robustness to misfolding, owing to the deleterious effect of misfolded proteins that, in addition to the expenditure of energy, can be toxic to the cell (Drummond et al. 2005; Drummond and Wilke 2008, 2009). Moreover, under this model, evolution of synonymous sites is constrained, at least, in part, by the same factors as the evolution of proteins owing to the pressure for the preferential use of optimal codons in highly expressed proteins and in specific sites that are important for protein folding (Drummond and Wilke 2008; Zhou et al. 2009), and evolution of the 30 UTRs could follow the same trend (Jordan et al. 2004) as these regions are involved in the regulation of translation. A recent modeling study of misfolding-dominated protein evolution that employed a simple off-lattice model of protein folding and produced estimates of evolutionary rates under the assumption that protein misfolding was the only source of fitness cost (Lobkovsky et al. 2010) reproduced the universal distribution of protein evolutionary rates as well as the dependence between evolutionary rate and expression with considerable accuracy (Fig. 2.5). These findings suggest that the universal rate distribution indeed might be a consequence of fundamental physics of proteins and provide for a general model of protein evolution under which evolution of a given protein is determined, primarily, by its intrinsic robustness to misfolding which also determines the attainable level of translation (Fig. 2.6) (Wolf et al. 2010). In general, the robustness of a protein to misfolding and accordingly the rate of evolution are determined by the size of the (nearly) neutral network, that is, the network of sequences that have approximately the same robustness and accordingly the same fitness as the original sequence (Wagner 2008). Under the model (Wolf et al. 2010), the nearly neutral network size is (roughly) inversely proportional to the robustness of the original sequence, i.e., in the fitness landscape, robust, highly expressed proteins occupy tall, steep peaks, with small areas of high fitness, hence slow evolution; in contrast, proteins with lower robustness occupy lower and wider peaks, with larger areas of high fitness, allowing faster evolution (Fig. 2.6). The original hypothesis on misfolding-dominated evolution of protein-coding genes held that misfolding was largely induced by mistranslation of the coding sequence (Drummond and Wilke 2008, 2009). The latest analysis of the relative contributions of structural–functional constraints and translation rate to protein evolution imply that stochastic misfolding of the native sequence could be even more common and consequential than mistranslation-induced misfolding (Wolf et al. 2010). Nevertheless, mistranslation (somatic mutation), which is relatively frequent [104–105 per codon (Kramer and Farabaugh 2007)], is likely to be an important factor affecting the instantaneous shape of the robustness landscape by temporarily expanding the nearly neutral network (Fig. 2.6).

34

E.V. Koonin and Y.I. Wolf protein family Y folding robustness

protein family X

sequence space

at low expression (higher evolution rate)

at high expression (lower evolution rate) fitness low

high

Fig. 2.6 A conceptual model of misfolding-driven protein evolution. The cartoon schematically shows the robustness/fitness landscapes for two protein families at high and low expression levels. The high fitness/robustness area (green) reflects the size of the nearly neutral network in the sequence space

The view of protein evolution under which the primary constraints have to do more with the maintenance of the native folding as well as intermolecular interactions than with unique protein functions seems to be compatible with the recent large-scale analysis of protein family evolution (Worth et al. 2009).

2.6

Constraints on Molecular Phenotypes

The advances of systems biology provide for direct evolutionary study of molecular phenomic variables, such as gene expression, protein abundance, and architecture of interaction networks. In other words, it is now possible to assess evolutionary variance and constraints by directly comparing gene expression profiles and networks, protein abundances and other features of the molecular phenotype between different organism and evolutionary lineages.

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

35

Molecular phenomic variables, such as gene expression level and number of interaction partners of a protein, show a distinct structure of dependences among themselves and with evolutionary variables such as sequence evolution rate and the rate of gene loss (Wolf et al. 2006). The correlations between phenomic variables are typically positive, i.e., highly expressed proteins also tend to interact with many other proteins, to have many paralogs etc., whereas the correlations between the phenomic and evolutionary variables are negative, for instance, highly expressed genes on average evolve slower than those expressed at a low level. Thus, as exemplified by the model of protein evolution discussed above, constraints on the ranges of phenomic variables, in part, appear to constrain evolution of gene sequences, gene repertoires, and genome architectures. Several studies suggested that gene expression in animals is not strongly constrained during evolution (Jordan et al. 2004; Khaitovich et al. 2004) or at least has a major neutral component (Jordan et al. 2004; Khaitovich et al. 2004). However, subsequent analyzes revealed clear signatures of selective constraints that affect gene expression (Denver et al. 2005; Jordan et al. 2005; Gilad et al. 2006). Recently, it has been shown that the abundances of orthologous proteins are strongly correlated even among distantly related animals. A correlation coefficient greater than 0.8 was observed for approximately 3,000 orthologous genes from the nematode C. elegans and the fly D. melanogaster, a value that is in sharp contrast with the correlation coefficients in the range of 0.2–0.4 that are typically seen in comparisons of genomic and molecular phenomic variables (Wolf et al. 2006). Strikingly, the correlation between protein abundances was found to be substantially greater than the correlation between mRNA expression rates and between the rates of coding sequence evolution (measured by comparison of orthologous genes from pairs of closely related species) within the same set of genes (Schrimpf et al. 2009; Wolf et al. 2010). Thus, assuming there are no unrecognized biases in the measurements, protein abundance appears to be constrained during evolution to a substantially greater extent than gene expression and even stronger than the sequence evolution itself. The global architectures of protein interaction and gene coexpression networks appear to be universal across all life forms, with the characteristic power law distribution of the network node degree (number of connections) (Barabasi and Oltvai 2004). Local network structures seem to be much less strongly constrained and differ even among closely related organisms (Bergmann et al. 2004; Tsaparas et al. 2006). However, a comparison of gene coexpression networks from the so-called mutation accumulation lineages of C. elegans, in which the selective constraints are effectively removed (Denver et al. 2005), with those of the natural isolate suggests that it is the local wiring of the coexpression network that is constrained by selection, whereas the global properties are not affected by the removal of constraints (Jordan et al. 2008). Thus, the similar global network properties seen in widely different organisms might reflect “neutral” rather than selective constraints, that is, could have evolved via simple, stochastic, nonselective processes as exemplified by birth-and-death models of genome and network evolution (Koonin et al. 2002; Lynch 2007a).

36

2.7

E.V. Koonin and Y.I. Wolf

Constraints on Evolutionary Trajectories: What Happens When the Tape of Evolution Is Rewound?

An intriguing, deep question in evolutionary biology is how constrained is the course of evolution itself, or in other words, to what extent the evolutionary process is free to explore different trajectories between the given initial and end states (Kassen 2009). In theory, mutational trajectories in sequence space are considered to be fundamentally stochastic (Mani and Clarke 1990). However, experimental evolution studies indicate that paths of adaptive evolution are substantially constrained by interactions between mutation (epistasis and pleiotropy) although not to the point of becoming deterministic. A series of experiments on evolution of bacterial antibiotic resistance resulting from 5 point mutations in the b-lactamase gene showed that, of the 120 trajectories across the sequence space, 102 were inaccessible to evolution, and of the remaining 18 trajectories, several had negligible probability of realization (Weinreich et al. 2006). Even stronger constraints were identified in a subsequent study that explored a more complex fitness landscape by simultaneously evolving resistance to two antibiotics (Novais et al. 2010). The remarkable long-term study of bacterial evolution under controlled conditions by Lenski and coworkers provides examples of both parallel emergence of the same mutations under a particular selective pressure and the realization of multiple trajectories (Barrick et al. 2009; Barrick and Lenski 2009; Kassen 2009; Stanek et al. 2009). For instance, it has been explicitly shown that evolution of the same, extremely rare phenotype, the ability to grow on citrate, proceeded along distinct trajectories in different Escherichia coli populations (Blount et al. 2008). Direct studies of evolutionary trajectories in the sequence space are still very limited but they have already made it clear that, although historical contingency is crucial in the evolutionary process (Jacob 1977), the exploration of the sequence space is strongly constrained so that only a minority of theoretically possible trajectories are accessible. The extent of these constraints depends on the shape of the fitness landscape: the more rugged the landscape, the stronger the constraints. The shape of the landscape itself depends on the nature, strength, and interactions of the relevant selective factors and evolves with time, which makes it more of a seascape (Mustonen and Lassig 2009, 2010).

2.8

Robustness, Plasticity, and Evolutionary Constraints

The aspects of evolution that are orthogonal to constraints are the plasticity of genomic and phenomic characteristics and the robustness of molecular phenotypes (Wagner 2005). In many groups of organisms, large-scale genome organization seems to be only weakly constrained so that gene order substantially differs even between closely related organisms, especially, among prokaryotes (Koonin 2009a; Novichkov et al. 2009b) (Fig. 2.6). The gene repertoire of many organisms,

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

37

especially, prokaryotes shows plasticity that may even exceed the plasticity of genome architecture as dramatically illustrated by rapid genome reduction in parasitic bacteria (Darby et al. 2007) and by acquisition of pathogenicity islands that may comprise over 30% of the recipient genome in bacterial pathogens (Dobrindt et al. 2003). The plasticity of genome organization and composition is paralleled by the evolutionary flexibility of regulatory networks and complements the more strongly constrained evolution of individual genes (Lozada-Chavez et al. 2006; Kazakov et al. 2009). Evolutionary plasticity and the strength of evolutionary constraints are tightly linked to robustness of biological systems, that is, resistance of phenotypes to genetic perturbation (mutations, recombination, etc.). Robustness seems to be an evolved property as demonstrated by the study of specialized buffering mechanism (for instance, those mediated by molecular chaperones of the HSP90 family), the impairment of which (often by environmental stress) reveals hidden genetic variation and accordingly enhances the evolutionary potential of the organism (Queitsch et al. 2002; Wagner 2008; Masel and Siegal 2009). Recently, the concept of variation stabilization has been extended to include numerous genes that are not molecular chaperones but possess extremely diverse functions; it seems that stabilization is a general property of interaction networks, so that disruption of almost any highly connected node reduces robustness of the system and leads to increased variation (Bergman and Siegal 2003). A comprehensive study of such “capacitor” properties of yeast mutants revealed approximately 300 genes (about 6% of the total) whose disruption significantly decreased the robustness of yeast to environmental perturbations (Levy and Siegal 2008). Thus, robustness might be a major, selectable mechanism that counteracts evolutionary constraints, in particular, those caused by the interaction between mutations, and enhances plasticity.

2.9

Effective Population Size as the General Determinant of Evolutionary Constrains and Distinction Between Constraints and Neutral Conservation

The classic population genetics theory asserts that the effectiveness of purifying selection is proportional to the effective population size of the given organism (assuming a uniform mutation rate for simplicity). In other words, only those mutational changes can be fixed or efficiently eliminated during evolution for which s > 1/Ne, where s is the selection coefficient and Ne is the effective population size (Lynch 2007c). Conversely, mutations with s < 1/Ne are effectively “invisible” to selection. This simple dependence seems to be an important, possibly, the primary determinant of the constraints that affect different aspects of genome and phenome evolution. In particular, differences in Ne seem to underlie the qualitative difference in the genome architectures of unicellular and multicellular organisms

38

E.V. Koonin and Y.I. Wolf

described above (Lynch and Conery 2003; Lynch 2007b). Substantial genome expansion seems to be attainable only in organisms with small populations and the attendant weak selection, such as plants and animals. In these organisms, the deleterious effect of propagation of nonfunctional sequences is often too small to allow their “detection” and elimination by purifying selection. Accordingly, evolutionary conservation does not automatically imply that the conserved feature is constrained by purifying selection but rather, somewhat paradoxically, can reflect weak purifying selection that is insufficient to eliminate nonadaptive ancestral features. Evolution of the exon–intron gene structure in eukaryotes provides an excellent case in point for this population-genetic paradigm. Most of the introns do not appear to possess a distinct function but do require distinct splicing signals for transcript maturation to occur accurately. Thus, approximately 25 nucleotides per intron are subject to purifying selection of varying strength (Lynch 2006a). Because of the associated cost of selection and also owing to the expenditure of time and energy on replication and transcription of intronic sequences, functionless introns are weakly deleterious for the respective organisms. However, a simple estimate taking into account the characteristic mutation rates in eukaryotes shows that the deleterious effect of introns is “visible” to purifying selection only in relatively large populations with Ne on the order of 107 or greater. This is the characteristic range of effective population sizes of unicellular eukaryotes, whereas multicellular eukaryotes typically have smaller populations (Lynch and Conery 2003; Lynch 2006a, 2007c). The effect of these differences on the evolution of genome architecture in eukaryotes is dramatic. Unlike genomes of unicellular forms that typically contain less than one intron per gene, and in many case, only a few introns in the entire genome, plants, and animals possess numerous introns, up to 8 per gene in vertebrates (Roy and Gilbert 2006). The positions of many introns are conserved in orthologous genes of animals and plants (see above), that is, most likely, since the time of existence of the last common ancestor of the extant eukaryotes. However, there seems to be no reason to claim that, in general, the positions of introns are constrained during evolution. The conservation of intron positions appears to be due to the weak purifying selection that precludes efficient elimination of introns in organisms with small characteristic values of Ne. Beyond the sheer number of introns, the features of introns themselves drastically differ: all the introns in intron-poor genomes of unicellular eukaryotes are short, with tightly controlled lengths and highly conserved, optimized splice signals at exon–intron junctions (Irimia et al. 2007; Irimia and Roy 2008). By contrast, introns in intron-rich genomes, such as plants, and animals, are often long (especially, in vertebrates) and are bounded by relatively weak, suboptimal splice signals owing to the relatively low selection favoring strong splicing signals (Irimia et al. 2009). The existence of these long introns with weak splice signals, which yield relatively inaccurate splicing, provides for the evolution of alternative splicing and nested gene structures, the crucial factors of structural and regulatory diversification of proteins and RNAs in multicellular eukaryotes.

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

39

The case of intron evolution illustrates the crucial interplay of constraints and plasticity that is central to the evolution of genomes and molecular phenomes (Fig. 2.7). Effective population size determines the background strength of purifying selection (constraints). When Ne is small, as in multicellular eukaryotes, constraints are relatively weak, so plasticity is enhanced such that nonfunctional genomic elements like introns can be retained, the result being a system that is relatively inefficient and vulnerable to random factors that can cause extinction, but also possesses a high potential for evolutionary innovation. Conversely, when Ne is large, as in most prokaryotes, many aspects of evolution are strongly constrained although there is still much plasticity in the evolution of these organisms thanks to dynamic, effectively neutral processes, in particular, HGT. Its fundamental importance notwithstanding, it is important to keep in mind that Ne determines the course of evolution only on a coarse grain scale. Thus, a comparative analysis of the Kn/Ks values among prokaryotic lineages failed to detect a negative correlation between selective constraints and genome size, as implied by the straightforward population genetic perspective (Lynch 2006b). On the contrary, larger genomes tend to evolve under stronger constraints (even when only free-living microbes are analyzed) suggesting that lifestyle could be a critical determinant of genome evolution (favoring, in particular, gene acquisition via HGT in variable environments) independent of Ne (Jordan et al. 2002; Novichkov et al. 2009b).

strong

functional and folding-critical sites

low

intron donor and acceptor sites

typical protein sites typical regulatory sites

protein function operons and gene clusters

constraimts

protein abundance

synonymous sites in CDS

plasticity

gene islands and superoperons mRNA abundance gene neighborhoods disordered segments introns

functional and regulatory networks

"junk" genome weak molecular structure and dynamics

local genome context

genome-scale gene order

genome architecture

high molecular phenomics

level of organization

Fig. 2.7 Genomic and phenomic constraints operative at different levels of biological organization. The scales are rough approximations

40

2.10

E.V. Koonin and Y.I. Wolf

Conclusions: Selective and Neutral Constraints and Evolutionary Universals

The prevailing theme that emerges from the recent advances of evolutionary genomics and evolutionary systems biology is the plurality of constraints that affect the evolution of different types of sequences in any genome, genome architectures, and molecular phenomes (Fig. 2.7) along with major differences of evolutionary regimens between taxa. Nevertheless, beyond this diversity, comparative-genomic and molecular phenomic analysis reveals universal patterns that at least in some cases are compatible with relatively simple and general models of evolution. As discussed here, such models start to suggest simple, fundamental causes underlying important aspects of evolution such as the constraints on evolution of proteins and evolution of gene repertoire (Table 2.1). In this context, it seems appropriate to expand the notion of constraints to include not only selective but also “neutral” constraints that are determined by nonselective, stochastic properties of biological systems and are often amenable to modeling using techniques borrowed from statistical physics (Table 2.1) (Frank 2009; Koonin 2009b). Evolutionary trajectories in the sequence space seem to be strongly constrained, thus substantially limiting the “tinkering potential” of evolution, using the famous metaphor of Jacob (Jacob 1977). The evolutionary process thus appears to be a compromise “between design and bricolage” (Wilkins 2007), the design aspect Table 2.1 Universals of genome and molecular phenome evolution Universal pattern Putative underlying Nature of process/model relevant constraints Approximately log-normal Protein folding Selective: protein distribution of robustness to evolutionary rates of misfolding protein-coding genes Anticorrelation between Protein folding Selective: protein evolution rate and robustness to expression level misfolding (translation rate) of dependent on protein-coding genes translation rate “Toolbox”-like growth Neutral Distinct scaling laws for of metabolic different functional networks classes of genes

Power law like distribution Birth and death of paralogous gene family process of gene size evolution Power law like distribution Network evolution by of node degree in preferential interaction and attachments coexpression networks

Neutral

Neutral

References

(Wolf et al. 2009; Lobkovsky et al. 2010) (Drummond and Wilke 2008, 2009; Wolf et al. 2010)

(van Nimwegen 2003; Maslov et al. 2009; Molina and van Nimwegen 2009) (Karev et al. 2002; Koonin et al. 2002) (Barabasi and Oltvai 2004; Tsaparas et al. 2006)

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

41

brought about by constraints (certainly having nothing to do with any intelligence) and the bricolage stemming from the evolved robustness and the ensuing plasticity of evolving organisms. Comparative genomics and systems approaches transform evolutionary biology into a much more complex but also more precise, quantitative field than it was in the twentieth century. Next generation sequencing, quantitative proteomics, and other systemic approaches, combined with more specific approaches of experimental evolution, can be expected to reveal the specific, precise constraints affecting diverse aspects of genome and phenome evolution.

References Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5:e1000262 Aravind L, Koonin EV (1998) Phosphoesterase domains associated with DNA polymerases of diverse origins. Nucleic Acids Res 26:3746–3752 Baira E, Greshock J, Coukos G, Zhang L (2008) Ultraconserved elements: genomics, function and disease. RNA Biol 5:132–134 Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5:101–113 Barrick JE, Lenski RE (2009) Genome-wide mutational diversity in an evolving population of Escherichia coli. Cold Spring Harb Symp Quant Biol 16:345–355 Barrick JE, Yu DS, Yoon SH, Jeong H, Oh TK, Schneider D, Lenski RE, Kim JF (2009) Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature 461:1243–1247 Basu MK, Carmel L, Rogozin IB, Koonin EV (2008) Evolution of protein domain promiscuity in eukaryotes. Genome Res 18:449–461 Basu MK, Poliakov E, Rogozin IB (2009) Domain mobility in proteins: functional and evolutionary implications. Brief Bioinform 10:205–216 Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D (2004) Ultraconserved elements in the human genome. Science 304:1321–1325 Bergman A, Siegal ML (2003) Evolutionary capacitance as a general feature of complex gene networks. Nature 424:549–552 Bergmann S, Ihmels J, Barkai N (2004) Similarities and differences in genome-wide expression data of six organisms. PLoS Biol 2:E9 Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, Gerstein M, Snyder M (2004) Global identification of human transcribed sequences with genome tiling arrays. Science 306:2242–2246 Blount ZD, Borland CZ, Lenski RE (2008) Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. Proc Natl Acad Sci USA 105:7899–7906 Blumenthal T (2004) Operons in eukaryotes. Brief Funct Genomic Proteomic 3:199–211 Bowen NJ, Jordan IK (2007) Exaptation of protein coding sequences from transposable elements. Genome Dyn 3:147–162 Carmel L, Rogozin IB, Wolf YI, Koonin EV (2007) Patterns of intron gain and conservation in eukaryotic genes. BMC Evol Biol 7:192 Carthew RW, Sontheimer EJ (2009) Origins and mechanisms of miRNAs and siRNAs. Cell 136:642–655 Charlebois RL, Doolittle WF (2004) Computing prokaryotic gene ubiquity: rescuing the core from extinction. Genome Res 14:2469–2477

42

E.V. Koonin and Y.I. Wolf

Charlesworth J, Eyre-Walker A (2008) The McDonald–Kreitman test and slightly deleterious mutations. Mol Biol Evol 25:1007–1015 Costa FF (2005) Non-coding RNAs: new players in eukaryotic biology. Gene 357:83–94 Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23:324–328 Darby AC, Cho NH, Fuxelius HH, Westberg J, Andersson SG (2007) Intracellular pathogens go extreme: genome evolution in the Rickettsiales. Trends Genet 23:511–520 Denver DR, Morris K, Streelman JT, Kim SK, Lynch M, Thomas WK (2005) The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans. Nat Genet 37:544–548 Dermitzakis ET, Reymond A, Antonarakis SE (2005) Conserved non-genic sequences – an unexpected feature of mammalian genomes. Nat Rev Genet 6:151–157 Dobrindt U, Agerer F, Michaelis K, Janka A, Buchrieser C, Samuelson M, Svanborg C, Gottschalk G, Karch H, Hacker J (2003) Analysis of genome plasticity in pathogenic and commensal Escherichia coli isolates by use of DNA arrays. J Bacteriol 185:1831–1840 Doolittle WF, Sapienza C (1980) Selfish genes, the phenotype paradigm and genome evolution. Nature 284:601–603 Drake JA, Bird C, Nemesh J, Thomas DJ, Newton-Cheh C, Reymond A, Excoffier L, Attar H, Antonarakis SE, Dermitzakis ET, Hirschhorn JN (2006) Conserved noncoding sequences are selectively constrained and not mutation cold spots. Nat Genet 38:223–227 Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134:341–352 Drummond DA, Wilke CO (2009) The evolutionary consequences of erroneous protein synthesis. Nat Rev Genet 10:715–724 Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH (2005) Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA 102:14338–14343 Duret L, Dorkeld F, Gautier C (1993) Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. Nucleic Acids Res 21:2315–2322 Eisen JA, Heidelberg JF, White O, Salzberg SL (2000) Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol 1(6):RESEARCH0011 Elgar G (2009) Pan-vertebrate conserved non-coding sequences associated with developmental regulation. Brief Funct Genomic Proteomic 8:256–265 Ellegren H (2008) Comparative genomics and the study of evolution by natural selection. Mol Ecol 17:4586–4596 Ellegren H, Smith NG, Webster MT (2003) Mutation rate variation in the mammalian genome. Curr Opin Genet Dev 13:562–568 Eyre-Walker A, Keightley PD (2009) Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. Mol Biol Evol 26:2097–2108 Fedorov A, Merican AF, Gilbert W (2002) Large-scale comparison of intron positions among animal, plant, and fungal genes. Proc Natl Acad Sci USA 99:16128–16133 Frank SA (2009) The common patterns of nature. J Evol Biol 22:1563–1585 Gilad Y, Oshlack A, Rifkin SA (2006) Natural selection on gene expression. Trends Genet 22:456–461 Grishin NV, Wolf YI, Koonin EV (2000) From complete genomes to measures of substitution rate variability within and between proteins. Genome Res 10:991–1000 Harrison PM, Gerstein M (2002) Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol 318:1155–1174 Hurst LD (2002) The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet 18:486 Hurst LD, Pal C, Lercher MJ (2004) The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet 5:299–310

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

43

Irimia M, Roy SW (2008) Evolutionary convergence on highly-conserved 3’ intron structures in intron-poor eukaryotes and insights into the ancestral eukaryotic genome. PLoS Genet 4: e1000148 Irimia M, Penny D, Roy SW (2007) Coevolution of genomic intron number and splice sites. Trends Genet 23:321–325 Irimia M, Roy SW, Neafsey DE, Abril JF, Garcia-Fernandez J, Koonin EV (2009) Complex selection on 5’ splice sites in intron-rich organisms. Genome Res 19:2021–2027 Jacob F (1977) Evolution and tinkering. Science 196:1161–1166 Johnson JM, Edwards S, Shoemaker D, Schadt EE (2005) Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. Trends Genet 21:93–102 Jordan IK, Rogozin IB, Wolf YI, Koonin EV (2002) Microevolutionary genomics of bacteria. Theor Popul Biol 61:435–447 Jordan IK, Rogozin IB, Glazko GV, Koonin EV (2003) Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet 19:68–72 Jordan IK, Marino-Ramirez L, Wolf YI, Koonin EV (2004) Conservation and coevolution in the scale-free human gene coexpression network. Mol Biol Evol 21:2058–2070 Jordan IK, Marino-Ramirez L, Koonin EV (2005) Evolutionary significance of gene expression divergence. Gene 345:119–126 Jordan IK, Katz LS, Denver DR, Streelman JT (2008) Natural selection governs local, but not global, evolutionary gene coexpression networks in Caenorhabditis elegans. BMC Syst Biol 2:96 Karev GP, Wolf YI, Rzhetsky AY, Berezovskaya FS, Koonin EV (2002) Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol Biol 2:18 Kassen R (2009) Toward a general theory of adaptive radiation: insights from microbial experimental evolution. Ann N Y Acad Sci 1168:3–22 Katzman S, Kern AD, Bejerano G, Fewell G, Fulton L, Wilson RK, Salama SR, Haussler D (2007) Human genome ultraconserved elements are ultraselected. Science 317:915 Kazakov AE, Rodionov DA, Alm E, Arkin AP, Dubchak I, Gelfand MS (2009) Comparative genomics of regulation of fatty acid and branched-chain amino acid utilization in proteobacteria. J Bacteriol 191:52–64 Kelly C, Churchill GA (1996) Biases in amino acid replacement matrices and alignment scores due to rate heterogeneity. J Comput Biol 3:307–318 Khachane AN, Harrison PM (2009) Assessing the genomic evidence for conserved transcribed pseudogenes under selection. BMC Genomics 10:435 Khaitovich P, Weiss G, Lachmann M, Hellmann I, Enard W, Muetzel B, Wirkner U, Ansorge W, Paabo S (2004) A neutral model of transcriptome evolution. PLoS Biol 2:E132 Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge Kitano H (2002) Computational systems biology. Nature 420:206–210 Koonin EV (2003) Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol 1:127–136 Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39:309–338 Koonin EV (2009a) Evolution of genome architecture. Int J Biochem Cell Biol 41:298–306 Koonin EV (2009b) Darwinian evolution in the light of genomics. Nucleic Acids Res 37:1011–1034 Koonin EV, Wolf YI (2006) Evolutionary systems biology: links between gene evolution and function. Curr Opin Biotechnol 17:481–487 Koonin EV, Wolf YI (2008) Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res 36(21):6688–6719 Koonin EV, Mushegian AR, Bork P (1996) Non-orthologous gene displacement. Trends Genet 12:334–336 Koonin EV, Aravind L, Kondrashov AS (2000) The impact of comparative genomics on our understanding of evolution. Cell 101:573–576

44

E.V. Koonin and Y.I. Wolf

Koonin EV, Wolf YI, Karev GP (2002) The structure of the protein universe and genome evolution. Nature 420:218–223 Kramer EB, Farabaugh PJ (2007) The frequency of translational misreading errors in E. coli is largely determined by tRNA competition. RNA 13:87–96 Krylov DM, Wolf YI, Rogozin IB, Koonin EV (2003) Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res 13:2229–2235 Lawrence J (1999) Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes. Curr Opin Genet Dev 9:642–648 Lawrence JG, Roth JR (1996) Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143:1843–1860 Lemons D, McGinnis W (2006) Genomic evolution of Hox gene clusters. Science 313:1918–1922 Lespinet O, Wolf YI, Koonin EV, Aravind L (2002) The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res 12:1048–1059 Levy SF, Siegal ML (2008) Network hubs buffer environmental variation in Saccharomyces cerevisiae. PLoS Biol 6:e264 Ling X, He X, Xin D (2009) Detecting gene clusters under evolutionary constraint in a large number of genomes. Bioinformatics 25:571–577 Lobkovsky AE, Wolf YI, Koonin EV (2010) Universal distribution of protein evolution rates as a consequence of protein folding physics. Proc Natl Acad Sci USA 107(7):2983–2988, doi: 10.1073/pnas.0910445107 Loewe L (2009) A framework for evolutionary systems biology. BMC Syst Biol 3:27 Lozada-Chavez I, Janga SC, Collado-Vides J (2006) Bacterial regulatory networks are extremely flexible in evolution. Nucleic Acids Res 34:3434–3445 Lunter G, Ponting CP, Hein J (2006) Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput Biol 2:e5 Lynch M (2006a) The origins of eukaryotic gene structure. Mol Biol Evol 23:450–468 Lynch M (2006b) Streamlining and simplification of microbial genome architecture. Annu Rev Microbiol 60:327–349 Lynch M (2007a) The evolution of genetic networks by non-adaptive processes. Nat Rev Genet 8:803–813 Lynch M (2007b) The frailty of adaptive hypotheses for the origins of organismal complexity. Proc Natl Acad Sci USA 104(Suppl 1):8597–8604 Lynch M (2007c) The origins of genome architecture. Sinauer Associates, Sunderland, MA Lynch M, Conery JS (2003) The origins of genome complexity. Science 302:1401–1404 Makalowski W, Boguski MS (1998) Synonymous and nonsynonymous substitution distances are correlated in mouse and rat genes. J Mol Evol 47:119–121 Mani GS, Clarke BC (1990) Mutational order: a major stochastic process in evolution. Proc R Soc Lond B Biol Sci 240:29–37 Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, Siepel A, Birney E, Keefe D, Schwartz AS, Hou M, Taylor J, Nikolaev S, Montoya-Burgos JI, Loytynoja A, Whelan S, Pardi F, Massingham T, Brown JB, Bickel P, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Stone EA, Rosenbloom KR, Kent WJ, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VV, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Hinrichs A, Trumbower H, Clawson H, Zweig A, Kuhn RM, Barber G, Harte R, Karolchik D, Field MA, Moore RA, Matthewson CA, Schein JE, Marra MA, Antonarakis SE, Batzoglou S, Goldman N, Hardison R, Haussler D, Miller W, Pachter L, Green ED, Sidow A (2007) Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res 17:760–774 Masel J, Siegal ML (2009) Robustness: mechanisms and consequences. Trends Genet 25:395–403 Maslov S, Krishna S, Pang TY, Sneppen K (2009) Toolbox model of evolution of prokaryotic metabolic networks and their regulation. Proc Natl Acad Sci USA 106:9743–9748

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

45

Mayrose I, Friedman N, Pupko T (2005) A gamma mixture model better accounts for among site rate heterogeneity. Bioinformatics 21(Suppl 2):ii151–ii158 Medina M (2005) Genomes, phylogeny, and evolutionary systems biology. Proc Natl Acad Sci USA 102(Suppl 1):6630–6635 Molina N, van Nimwegen E (2008) Universal patterns of purifying selection at noncoding positions in bacteria. Genome Res 18:148–160 Molina N, van Nimwegen E (2009) Scaling laws in functional genome content across prokaryotic clades and lifestyles. Trends Genet 25:243–247 Monot M, Honore N, Garnier T, Zidane N, Sherafi D, Paniz-Mondolfi A, Matsuoka M, Taylor GM, Donoghue HD, Bouwman A, Mays S, Watson C, Lockwood D, Khamispour A, Dowlati Y, Jianping S, Rea TH, Vera-Cabrera L, Stefani MM, Banu S, Macdonald M, Sapkota BR, Spencer JS, Thomas J, Harshman K, Singh P, Busso P, Gattiker A, Rougemont J, Brennan PJ, Cole ST (2009) Comparative genomic and phylogeographic analysis of Mycobacterium leprae. Nat Genet 41:1282–1289 Moya A, Gil R, Latorre A, Pereto J, Pilar Garcillan-Barcia M, de la Cruz F (2009) Toward minimal bacterial cells: evolution vs. design. FEMS Microbiol Rev 33:225–235 Mushegian AR, Koonin EV (1996a) Gene order is not conserved in bacterial evolution. Trends Genet 12:289–290 Mushegian AR, Koonin EV (1996b) A minimal gene set for cellular life derived by comparison of complete bacterial genomes [see comments]. Proc Natl Acad Sci USA 93:10268–10273 Mustonen V, Lassig M (2009) From fitness landscapes to seascapes: non-equilibrium dynamics of selection and adaptation. Trends Genet 25:111–119 Mustonen V, Lassig M (2010) Fitness flux and ubiquity of adaptive evolution. Proc Natl Acad Sci USA 107(9):4248–4253 Muzzi A, Moschioni M, Covacci A, Rappuoli R, Donati C (2008) Pilus operon evolution in Streptococcus pneumoniae is driven by positive selection and recombination. PLoS ONE 3:e3660 Nakabachi A, Yamashita A, Toh H, Ishikawa H, Dunbar HE, Moran NA, Hattori M (2006) The 160-kilobase genome of the bacterial endosymbiont Carsonella. Science 314:267 Nielsen R (2001) Statistical tests of selective neutrality in the age of genomics. Heredity 86:641–647 Nielsen R (2005) Molecular signatures of natural selection. Annu Rev Genet 39:197–218 Nielsen R, Bustamante C, Clark AG, Glanowski S, Sackton TB, Hubisz MJ, Fledel-Alon A, Tanenbaum DM, Civello D, White TJ, Sninsky JJ, Adams MD, Cargill M (2005) A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol 3(6):e170 Nilsen TW, Graveley BR (2010) Expansion of the eukaryotic proteome by alternative splicing. Nature 463:457–463 Novais A, Comas I, Baquero F, Canton R, Coque TM, Moya A, Gonzalez-Candelas F, Galan JC (2010) Evolutionary trajectories of beta-lactamase CTX-M-1 cluster enzymes: predicting antibiotic resistance. PLoS Pathog 6(1):e1000735 Novichkov PS, Ratnere I, Wolf YI, Koonin EV, Dubchak I (2009a) ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes. Nucleic Acids Res 37:D448–D454 Novichkov PS, Wolf YI, Dubchak I, Koonin EV (2009b) Trends in prokaryotic evolution revealed by comparison of closely related bacterial and archaeal genomes. J Bacteriol 191:65–73 Ohno S (1970) Evolution by gene duplication. Springer-Verlag, Berlin-Heidelberg-New York Osbourn AE, Field B (2009) Operons. Cell Mol Life Sci 66:3755–3775 Pal C, Papp B, Hurst LD (2001) Highly expressed genes in yeast evolve slowly. Genetics 158:927–931 Pal C, Papp B, Lercher MJ (2006) An integrated view of protein evolution. Nat Rev Genet 7:337–348 Parsch J, Novozhilov S, Saminadin-Peter SS, Wong KM and Andolfatto P (2010) On the utility of short intron sequences as a reference for the detection of positive and negative selection in Drosophila. Mol Biol Evol [Epub ahead of print]

46

E.V. Koonin and Y.I. Wolf

Petersen L, Bollback JP, Dimmic M, Hubisz M, Nielsen R (2007) Genes under positive selection in Escherichia coli. Genome Res 17:1336–1343 Ponjavic J, Ponting CP, Lunter G (2007) Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res 17:556–565 Ponting CP, Oliver PL, Reik W (2009) Evolution and functions of long noncoding RNAs. Cell 136:629–641 Proux E, Studer RA, Moretti S, Robinson-Rechavi M (2009) Selectome: a database of positive selection. Nucleic Acids Res 37:D404–D407 Putnam NH, Srivastava M, Hellsten U, Dirks B, Chapman J, Salamov A, Terry A, Shapiro H, Lindquist E, Kapitonov VV, Jurka J, Genikhovich G, Grigoriev IV, Lucas SM, Steele RE, Finnerty JR, Technau U, Martindale MQ, Rokhsar DS (2007) Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317:86–94 Queitsch C, Sangster TA, Lindquist S (2002) Hsp90 as a capacitor of phenotypic variation. Nature 417:618–624 Resch AM, Carmel L, Marino-Ramirez L, Ogurtsov AY, Shabalina SA, Rogozin IB, Koonin EV (2007) Widespread positive selection in synonymous sites of mammalian genes. Mol Biol Evol 24:1821–1831 Rocha EP (2008) The organization of the bacterial genome. Annu Rev Genet 42:211–233 Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, Tatusov RL, Szekely LA, Koonin EV (2002) Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res 30:2212–2223 Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV (2003) Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol 13:1512–1517 Roy SW, Gilbert W (2006) The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet 7:211–221 Roy SW, Penny D (2007) Patterns of intron loss and gain in plants: intron loss-dominated evolution and genome-wide comparison of O. sativa and A. thaliana. Mol Biol Evol 24:171–181 Schrimpf SP, Weiss M, Reiter L, Ahrens CH, Jovanovic M, Malmstrom J, Brunner E, Mohanty S, Lercher MJ, Hunziker PE, Aebersold R, von Mering C, Hengartner MO (2009) Comparative functional analysis of the Caenorhabditis elegans and Drosophila melanogaster proteomes. PLoS Biol 7:e48 Sella G, Petrov DA, Przeworski M, Andolfatto P (2009) Pervasive natural selection in the Drosophila genome? PLoS Genet 5:e1000495 Shabalina SA, Kondrashov AS (1999) Pattern of selective constraint in C. elegans and C. briggsae genomes. Genet Res 74:23–30 Shabalina SA, Koonin EV (2008) Origins and evolution of eukaryotic RNA interference. Trends Ecol Evol 23:578–587 Shabalina SA, Ogurtsov AY, Rogozin IB, Koonin EV, Lipman DJ (2004) Comparative analysis of orthologous eukaryotic mRNAs: potential hidden functional signals. Nucleic Acids Res 32:1774–1782 Srivastava M, Begovic E, Chapman J, Putnam NH, Hellsten U, Kawashima T, Kuo A, Mitros T, Salamov A, Carpenter ML, Signorovitch AY, Moreno MA, Kamm K, Grimwood J, Schmutz J, Shapiro H, Grigoriev IV, Buss LW, Schierwater B, Dellaporta SL, Rokhsar DS (2008) The Trichoplax genome and the nature of placozoans. Nature 454:955–960 Stanek MT, Cooper TF, Lenski RE (2009) Identification and dynamics of a beneficial mutation in a long-term evolution experiment with Escherichia coli. BMC Evol Biol 9:302 Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637 Tsaparas P, Marino-Ramirez L, Bodenreider O, Koonin EV, Jordan IK (2006) Global similarity and local divergence in human and mouse gene co-expression networks. BMC Biol 6:70 Turner LM, Chuong EB, Hoekstra HE (2008) Comparative analysis of testis protein evolution in rodents. Genetics 179:2075–2089

2

Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution

47

van Nimwegen E (2003) Scaling laws in the functional content of genomes. Trends Genet 19:479–484 Wagner A (2005) Robustness, evolvability, and neutrality. FEBS Lett 579:1772–1778 Wagner A (2008) Neutralism and selectionism: a network-based reconciliation. Nat Rev Genet 9:965–974 Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo´ R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O’Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562 Weinreich DM, Delaney NF, Depristo MA, Hartl DL (2006) Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312:111–114 Wilkins AS (2007) Between “design” and “bricolage”: genetic networks, levels of selection, and adaptive evolution. Proc Natl Acad Sci USA 104(Suppl 1):8590–8596 Wolf YI, Carmel L, Koonin EV (2006) Unifying measures of gene function and evolution. Proc Biol Sci 273:1507–1515 Wolf YI, Novichkov PS, Karev GP, Koonin EV, Lipman DJ (2009) The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci USA 106:7273–7280 Wolf YI, Gopich IV, Lipman DJ, Koonin EV (2010) Relative contributions of intrinsic structuralfunctional constraints and translation rate to the evolution of protein-coding genes. Genome Biol Evol 2010:190–199 Worth CL, Gong S, Blundell TL (2009) Structural and functional constraints in the evolution of protein families. Nat Rev Mol Cell Biol 10:709–720 Wuchty S, Almaas E (2005) Evolutionary cores of domain co-occurrence networks. BMC Evol Biol 5:24 Yamada T, Bork P (2009) Evolution of biomolecular networks: lessons from metabolic and protein interactions. Nat Rev Mol Cell Biol 10:791–803 Zhou T, Weems M, Wilke CO (2009) Translationally optimal codons associate with structurally sensitive sites in proteins. Mol Biol Evol 26:1571–1580

Chapter 3

Starvation-Induced Reproductive Isolation in Yeast Eugene Kroll, R. Frank Rosenzweig, and Barbara Dunn

Abstract Speciation in eukaryotes is one of the central issues in evolutionary biology. Retrospective studies of existing species may not reveal the molecular events underlying speciation, as it is frequently impossible to distinguish changes which preceded speciation from those which happened after speciation has occurred. We propose a model for experimental speciation using a well-studied Eukaryotic organism, the yeast Saccharomyces cerevisiae, and starvation as an agent of speciation. Starvation can be viewed as a general and widespread consequence of catastrophic environmental change that leads to a decrease in survival or reproductive success. We find that yeast populations subjected to a month-long starvation exhibit a drastic increase in genomic rearrangements compared with a modest increase in point mutation. We subsequently find that starved yeast populations become reproductively isolated from their ancestor, which we attribute to chromosomal abnormalities in the starved clones’ genomes. Our model provides direct molecular evidence – that speciation can rapidly occur without the precondition of geographic separation or divergent selection.

3.1

Continuing Uncertainty over Species Definitions Among the Eukarya

Two central questions in eukaryotic evolutionary biology are: how do new species emerge and how are they perpetuated? We can provisionally define a species as group of organisms that shares a complex genetic network of interacting alleles and E. Kroll, and R.F. Rosenzweig Division of Biological Sciences, University of Montana, 32 campus dr., Missoula, MT 59812, USA e-mail: [email protected] B. Dunn Department of Genetics, Stanford University, Stanford, CA 94305, USA

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_3, # Springer-Verlag Berlin Heidelberg 2010

49

50

E. Kroll et al.

preserves its integrity by restricting the exchange of genetic material with other such networks (Mayr 1966). The processes by which new networks emerge, i.e., speciation, appear to be diverse and their relative contributions remain the subject of considerable controversy. While Darwin explicitly linked the process of speciation to the adaptation of organisms to novel environments (Darwin 1859, Ch. 4), neo-Darwinists have emphasized the role of interpopulation isolation (Fisher 1930; Dobzhansky 1937; Muller 1940; Mayr 1942). Uncertainty persists as to which of these emphases is correct (Lande 1989; Vulic et al. 1999; Orr and Presgraves 2000; Schilthuizen 2000; Turelli et al. 2001; Sinervo and Svensson 2002; Herrmann et al. 2003), largely due to the dearth of knowledge about the specific molecular mechanisms that underlie eukaryotic speciation and the fact that species can be defined in various ways. The most widely used definition for speciation is based on the biological species concept, i.e., the cessation of gene flow between groups of organisms, or “reproductive isolation” (Dobzhansky 1937; Mayr 1942, 1996; Lande 1989; Coyne and Orr 1998). Though this definition is not universally accepted (Darwin 1859, Ch. 8; Schilthuizen 2000 and refs. therein), it is, due to its inherently measurable nature, the most amenable framework for experimental investigation. Relative reproductive isolation between two species is a quantitative trait that can be measured as a ratio between fertilities of interspecific hybrids and their conspecific parentals. Importantly, “relative reproductive isolation” can be used as a proxy to assess divergence between closely related organisms. Reproductive isolation in sexual species can be both pre and postzygotic. To date, efforts to explain incipient speciation in eukaryotes have focused on prezygotic isolation mechanisms such as spatial/temporal or behavioral separation (Orr and Presgraves 2000). In many cases, the former arises in allopatry, whereas the latter is viewed as a reinforcing mechanism. That said, much theoretical and experimental work now indicates that postzygotic mechanisms, e.g., the inviability or infertility of interspecific hybrids, can play crucial roles in initiating reproductive isolation, and that prezygotic mechanisms might therefore evolve at a later stage (Lande 1989; Schliewen et al. 1994; Dieckmann and Doebeli 1999; Schilthuizen 2000; Turelli et al. 2001; Via 2001). Hence, it may be difficult to uncover, using existing species as evidence, the important transformative events that initiate speciation, as genetic divergence following reproductive isolation is likely to obscure initial steps in the process. In other words, most of the genetic differences that separate contemporary species by enforcing isolation may not be the differences that originally caused speciation. Overall, our research goal is to elucidate the exact molecular mechanisms that bring about speciation in a model eukaryote and to do so in real time under controlled laboratory conditions. We contend that an experimental rather than a comparative approach is more likely to enable us to clarify the role of postzygotic mechanisms in the initial stages of speciation.

3 Starvation-Induced Reproductive Isolation in Yeast

3.2

51

The Nature of Postzygotic Reproductive Isolation in Eukaryotes

Postzygotic reproductive isolation manifests in the inviability or infertility of hybrid progeny (Orr and Presgraves 2000). While hybrid inviability can be caused by developmental incompatibilities or dysgenesis (Hartl et al. 1997), hybrid infertility is likely a consequence of defective hybrid meiosis. Establishment of postzygotic reproductive isolation in eukaryotes has been explained by one of two competing theories. One, “the chromosomal theory of speciation,” holds that chromosomal changes (genomic rearrangements) disrupt recombination and segregation of homologues in meiosis I, and/or fine-scale mutations disrupt meiotic recombination via the action of mismatch repair (White 1978; King 1993; Radman and Wagner 1993; Chambers et al. 1996; Searle 1998; Britton-Davidian et al. 2000; Rieseberg 2001). The other, “the genic theory of speciation” (“speciation genes”) holds that genic changes, e.g., functional incompatibilities between diverged alleles, result in lower hybrid fitness (Bateson 1909; Dobzhansky 1937; Muller 1940; Coyne and Orr 1998). A third possibility, the idea that postzygotic isolation occurs due to a combination of these two theories described above, has also been proposed (Henikoff et al. 2001; Noor et al. 2001; Rieseberg 2001). The chromosomal theory of speciation did not gain acceptance during the early studies on postzygotic reproductive isolation due to two reasons: first, pioneering experiments in Drosophila by Dobzhansky appeared to demonstrate the genic nature of postzygotic isolation (Dobzhansky 1933); second, chromosomal speciation appears to be incompatible due to the “underdominance” effect, wherein an individual is rendered less fertile if it sustains a chromosomal rearrangement, and it thus would not be able to form a new species (Livingstone and Rieseberg 2004). The archetypal experiments by T. Dobzhansky first demonstrated that the sterility of male hybrids formed as a result of interbreeding between two races of Drosophila pseudoobscura distinguished by several chromosomal rearrangements was due to mis-segregation of homologous chromosomes in meiosis I (Dobzhansky 1933). Dobzhansky further noted that in rare instances when tetraploid spermatogonia were found in these interracial hybrids, chromosomes also had mis-segregated in meiosis I. From this observation, Dobzhansky deduced that because every chromosome in tetraploid hybrid meioses is furnished with its exact homologue, tetraploidization should have restored faithful segregation of homologues if mis-segregation had been caused by chromosomal rearrangements and not by genic incompatibilities. Thus, he concluded that genic incompatibilities, not chromosomal changes, caused hybrid sterility to occur in the male hybrids of two races of D. pseudoobscura. The ensuing rush for “speciation genes” or, rather, incompatible alleles, did render some tangible results, notably from the cloning of Odysseus, a gene encoding a homeobox protein responsible for interspecific incompatibilies in Drosophila (Ting et al. 1998; Greenberg et al. 2003) and several more genes that control hybrid infertility (Lee et al. 2008; Phadnis and Orr 2009).

52

E. Kroll et al.

However, there is absolutely no way to make certain that such incompatible alleles were the actual reason for speciation and not merely the product of species divergence; in other words, finding speciation genes is not in fact a proof that speciation ultimately has a genic nature. Intriguingly, in a footnote to his pioneering paper on the genic nature of reproductive isolation mentioned above, Dobzhansky acknowledges that he did not report the results of the reciprocal cross, which is “different in many important details” and would be “published elsewhere” (Dobzhansky 1933). In stark contrast to the aforementioned studies of Dobzhansky, Noor and colleagues used the very same species of Drosophila to directly implicate large chromosomal inversions in the reproductive isolation between sympatric D. pseudoobscura and D. persimilis populations (Noor et al. 2001). Indeed, inversions and other small rearrangements that may have a deleterious effect on meiosis have been shown to be abundant between related species in many species of yeast (Seoighe et al. 2000; Kellis et al. 2003; Fischer et al. 2006), as well as in roundworms (Hutter et al. 2000), mice (Hauffe and Searle 1998), plants (Blanc et al. 2000), and a variety of other organisms (for a review: Eichler and Sankoff 2003). Experiments on tetraploidization in several species of plants showed that certain types of chromosomal rearrangements were responsible for postzygotic reproductive isolation (Anderson 1949; White 1978; Searle 1998; Pialek et al. 2001; Rieseberg 2001). Chromosomal rearrangements have also been implicated in human evolution, acting to decrease gene flow in the chromosomal regions that harbor inversions (Navarro and Barton 2003). In Saccharomyces cerevisiae, chromosomal inversions have been shown to directly and efficiently impair the progression of meiosis (Dresser et al. 1994; Jinks-Robertson et al. 1997; Chen and Jinks-Robertson 1999). As for the concept of underdominance – a decrease in, or lack of, the ability to go through meiosis due to one or more heterozygous rearrangement – overshadowing the chromosomal speciation theory, it is fair to say that different genomic rearrangements may have very different effects on meiosis, ranging from irrelevant to prohibitive, with all shades in between. Clearly, an organism that contains a chromosomal rearrangement that abrogates meiosis is not going to form a new species; however, a partial restriction of gene flow resulting from a rearrangement could allow for faster rates of sequence and functional divergence (Lande 1989; Noor et al. 2001; Rieseberg 2001; Navarro and Barton 2003), increasing the probability of speciation. Finally, using the same logic that is used for epistasis in speciation genes (Bateson 1909), genomic rearrangements can also form incompatible pairs, further destabilizing meiosis in hybrid organisms. Assuming that the experimental observations supporting both theories of postzygotic isolation are correct, should one conclude that these opposing results reflect variation in experimental techniques, or are they more readily explained as variations between diverse taxa? And is it then reasonable to assume that both the genic and chromosomal models of speciation (acting in concert or separately in different taxa) can act in the process of speciation? To address these questions experimentally, we have developed a laboratory assay using the yeast S. cerevisiae to isolate reproductively separated clones during the course of prolonged starvation.

3 Starvation-Induced Reproductive Isolation in Yeast

3.3

53

A Starvation-Based Experimental Model May Help Resolve Uncertainties Concerning the Molecular Basis for Speciation

Comparative analyzes of existing species may poorly discriminate between changes that cause speciation and those that arise secondarily (Schilthuizen 2000). However, experimental evidence obtained under conditions physiologically close to optimal may be difficult to acquire as these conditions typically result in low and constant mutation rates (Drake et al. 1998) that make speciation less likely to occur (Rice and Hostert 1993). We have therefore developed an experimental laboratory model to study speciation that uses prolonged starvation as a proxy for sudden and severe environmental change. This treatment effectively disrupts normal living conditions, disintegrating a population’s niche, over time diminishing its mean fitness, measured as both survivorship and reproductive capacity. Furthermore, starvation is a condition that virtually all species experience and that many contend with regularly in the wild (Koch 1971; Death and Ferenci 1994). All manner of environmental change, such as wildfire, flood, sudden transfer to a new habitat, or even the invasion of a competitive species can bring about starvation. We hypothesize that because starvation is universally experienced in the wild owing to a plethora of circumstances, natural selection has brought about mechanisms that respond to this generic signal in ways that may increase population diversity via increased mutations, including large-scale genome rearrangements.

3.4

Starvation-Responses That Could Increase Population Genetic Diversity

Escherichia coli’s SOS system activates multiple responses to DNA damage, nutrient starvation, and low temperature that are both mutagenic and recombinogenic (Witkin and Wermundsen 1979; Dri and Moreau 1993; Friedberg et al. 1995; McKenzie et al. 2000). Following activation of the SOS system, bacteria sustain a high frequency of random mutation, rearrangements, and transposition (Radman 1975; Witkin 1976; Petit et al. 1991; Guerin et al. 2009), revealing a genetic link between stress caused by highly challenging environmental conditions and variability (Taddei et al. 1997). In fact, it has been shown that starvation-induced mutagenesis in bacteria is directly controlled by the SOS system (Taddei et al. 1995; Hastings et al. 2000; McKenzie et al. 2000; Finkel 2006; He et al. 2006) as well as by global stress response (Zinser and Kolter 1999; Bjedov et al. 2003; Lombardo et al. 2004). Eukaryotes possess a combination of genetic pathways that may be functionally analogous to those of bacteria, such as checkpoint adaptation, translesion synthesis,

54

E. Kroll et al.

stress signaling, and others (Toczyski et al. 1997; Kai and Wang 2003; Smets et al. 2010). However, although the causal connection between environmental stress and an increase in adaptively significant variation has been well studied, the molecular basis for such connection in eukaryotes remains obscure. By employing starvation to mimic severe stress, we hope to model conditions in nature with which all populations must contend (Death and Ferenci 1994) and to discover molecular mechanisms that link catastrophic environmental change with the types of genetic variation that could lead to speciation.

3.5

Advantages of Using Yeast as Model to Study Speciation in Real Time

Several factors contributed to our choice of S. cerevisiae as a model organism. S. cerevisiae is a well-studied organism that possesses most of the major signal transduction (Smets et al. 2010) and DNA maintenance pathways (San Filippo et al. 2008) found in other eukaryotes. Also, the genomes of multiple strains of S. cerevisiae and more than ten-related species have been sequenced. Lastly, yeast genetics, especially as it relates to DNA maintenance, cell cycle, checkpoints and stress resistance, is well understood. In S. cerevisiae, as in higher eukaryotes, the controlled occurrence of DNA double-strand breaks early in meiotic prophase is essential for the maturation of the synaptonemal complex as well as for chiasmata formation in diplotene and for faithful homologue segregation at anaphase I (Peoples et al. 2002; Page and Hawley 2003). This dependence is reinforced by the pachytene checkpoint (Roeder and Bailis 2000), which ensures that meiotic recombination and homologue synapsis are completed before cells proceed to metaphase I. In contrast, the chromosomes in another well-studied yeast species, Schizosaccharomyces pombe, do not form synaptonemal complexes in meiosis (Davis and Smith 2003); while in the popular multicellular model organisms C. elegans and D. melanogaster, double-strand breaks are not required for chromosome synapsis to occur (Dernburg et al. 1998; Jang et al. 2003). Moreover, heterogametic (male) meioses in Drosophila and other Diptera and Lepidoptera occur in the complete absence of recombination (Hawley 2002). Thus, among favored models systems, the processes of meiosis in S. cerevisiae most resemble those found within meioses of mouse and human spermatocytes (Lichten 2001; Page and Hawley 2003). Finally, in S. cerevisiae, reproductive isolation manifests as a quantitative trait that can be scored as the efficiency of producing viable spores or spore yield (a combination of sporulation efficiency and spore viability). We chose an S288c strain [BY4743 (Brachmann et al. 1998)] for our speciation studies because this diploid, unlike other laboratory strains, does not spontaneously sporulate when starved, and thus starved diploids that have not gone through meiosis can be reliably obtained.

3 Starvation-Induced Reproductive Isolation in Yeast

3.6

55

Three Modes of PostZygotic Isolation in Yeast – Sequence, Chromosome, Breakpoint-Recombination

The six nonhybrid species that comprise the sensu stricto group of Saccharomyces (S. cerevisiae, S. paradoxus, S. mikatae, S. cariocanus, S. kudriavzevii, and S. bayanus) show large genomic rearrangements relative to each other, as detected by pulsed-field gel analysis, with the exception of S. cerevisiae and S. paradoxus which are almost identical. Fischer et al. showed that these rearrangements did not in fact correspond to a phylogenetic tree based on sequence divergence of rRNA (Fischer et al. 2000, 2006), and thus concluded that genomic rearrangements were unimportant in the speciation of yeast. Interestingly, the restoration of the colinearity of gene order between two sensu stricto species, S. cerevisiae and S. mikatae, did lead to a partial restoration of the interspecific hybrid fertility (Delneri et al. 2003), indicating that genomic rearrangements are important for the maintenance of the postzygotic reproductive isolation in yeast. Mutational load and the action of the mismatch repair system also affect, albeit partially, reproductive isolation between S. cerevisiae and S. paradoxus (Chambers et al. 1996; Chen and Jinks-Robertson 1999), as crossing-over in yeast is dependent on sequence homology between homeologous chromosomes (Hunter et al. 1996). However, experiments suggesting these possibilities were conducted with extant species, where genetic changes such as sequence divergence – proposed as a possible cause for a reproductive barrier – may actually have occurred after the speciation event and thus might not be a reason for the initial reproductive barrier. Additionally, dominant epistatic incompatibilities between two sensu stricto species of Saccharomyces have been shown not to be important for speciation by either tetraploidization experiments (Greig et al. 2002) or directly checking for speciation genes (Greig and Leu 2009). Although one pair of incompatible alleles has been recently identified between S. cerevisiae and S. bayanus (Lee et al. 2008), it is again unclear whether this incompatibility was a driving force, or a secondary consequence, of the initial speciation event. Chromosome rearrangements are plentiful in yeast genomes. Genomic rearrangements, such as reciprocal translocations, transpositions, insertions, deletions, and inversions, are ubiquitous features of even closely related species. Studies using pulsed-field gel analysis and hybridization, such as Fischer et al. (Fischer et al. 2000), identified only a small subset of all rearrangements and inversions among the sensu stricto species – as shown by subsequent whole genome sequencing – because smaller rearrangements and inversions simply cannot be resolved by pulsed-field gels. Remarkably, of all the syntenic breakpoints between S. cerevisiae and S. bayanus, less than 10% are large-scale rearrangements (Fischer et al. 2001). Sequence data from the S. bayanus, S. mikatae, and S. paradoxus genomes have revealed many more genomic rearrangements than were previously known, especially at chromosome ends (Kellis et al. 2003). The nine inversions that exist between the genomes of these three species and S. cerevisiae are flanked by tRNA genes, usually of the same isoacceptor type (Kellis et al. 2003). This finding suggests that inversions and perhaps other rearrangements that have accumulated in

56

E. Kroll et al.

the genomes of the Saccharomyces spp. arose via homologous recombination. An alternative hypothesis is that rearrangements may have been caused by yeast retrotransposons (Ty), as the tRNA genes are hotspots for Ty1, 3, and 5 transposition (Natsoulis et al. 1989). In addition, nonhomologous end-joining may have played a role in creating some of the rearrangements, as has been observed among flor yeast used in fortified winemaking (Infante et al. 2003). Thus, in our opinion, certain genomic rearrangements that include small and large inversions, small translocations, and small insertion–deletions that escape detection by pulsed-field gel analysis (but discovered later by sequencing) may be a ubiquitous feature of evolving genomes. We further suggest that such rearrangements may play a key role in incipient speciation among yeasts and other Eukaryotes.

3.7

Starved Yeast Cultures Sustain High Frequencies of Genomic Rearrangements

In extant species of Saccharomyces yeast, the rates of genomic rearrangements are highly variable (Fischer et al. 2006). We contend that starvation as a result of environmental change can affect the rates of genomic variation. Moreover, we have already shown that a champagne strain, DB146, sustains a massive amount of change in genomic architecture after prolonged starvation (Coyle and Kroll 2008). To appraise the effect of prolonged starvation on genomic change, we starved multiple random clones of the laboratory yeast diploid BY4743 (Brachmann et al. 1998), essentially as described (Coyle and Kroll 2008). During a 1-month-long starvation treatment, and accounting for diminished viability, the starving cultures underwent an average of ten generations. At no point did we observe sporulating cells in starving cultures. For comparison, we established a control by growing BY4743 cells in rich medium for approximately twice the number of generations that starved cultures underwent. Because, strictly speaking, the cells obtained at the end of these 20 generations are neither ancestral nor “wild-type” to the starved cultures, we chose to call them “nonstarved” cultures.

3.7.1

Starved Cultures Sporulate at Lower Level Than the Nonstarved Cultures

Genomic rearrangements may create a reproductive barrier between two populations, as discussed previously. If a reproductive barrier existed between our starved and ancestral populations, it would manifest as decreased fertility of starved cultures in backcrosses between haploid progeny of the starved and ancestral populations when compared with the values for nonstarved to ancestral backcrosses. Both efficiency of sporulation (the frequency at which yeast cells form gametes or spores) and spore viability (colony-forming units per number of spores plated) could be

3 Starvation-Induced Reproductive Isolation in Yeast

57

Fig. 3.1 Starved and nonstarved cultures of BY4743 sporulated for 2 days. Arrows denote spore sacks (asci) that contain three or four spores. (a) Starved diploid culture. Only one misshapen spore sack (ascus) is shown (arrow). (b) Nonstarved culture. The majority of cells have formed asci

expected to affect hybrid fertility. Generally, only a partial measure of fertility – spore viability – is measured in crosses between separate yeast species (Naumov et al. 2000). Since different species usually require different conditions for optimal sporulation, sporulation efficiency of the interspecific hybrid lacks an obvious control. However, in our case, we used only one ancestral strain, and thus we were able to assess both sporulation efficiency and spore viability of the backcross hybrids. To score these traits, we incubated the cells overnight in fresh rich medium to minimize the fraction of dead cells in starved cultures, then sporulated them using conditions optimized for the ancestral strain, We scored sporulation efficiency and the viability of the resultant spores. For all comparisons we used nonparametric statistical tests, as we could not assume normal distribution for our data. Nonstarved diploid cultures sporulated at the efficiency characteristic of the BY4743 ancestor and spore viability was nearly 100%. In contrast, starved BY4743 cultures sporulated about at half the frequency of the nonstarved cultures, even after prolonged sporulation (Coyle et al. in preparation). Nevertheless, spore viability among sporulated starved cultures was almost as high as that of spores derived from the nonstarved cultures (Fig. 3.1). The fact that starved cultures exhibited significantly lower sporulation efficiency than nonstarved control suggests the possibility that accumulated changes in the genomes of starved cultures alter their fertility. Viable spores derived from such starved cells might be wholly or partially reproductively isolated from each other and from the ancestral population.

3.7.2

A Subset of Starved Backcrosses Show Lower Fertility Than the Nonstarved Backcrosses

To test how reproductive isolation was distributed within starved cultures we assessed the fertility of the backcrossed hybrids. We isolated rare spores from

58

E. Kroll et al.

1 month starved cultures, germinated those spores into haploid strains, or “starved isolates” and performed backcrosses. We then sporulated the resultant backcross hybrids and measured their sporulation efficiency and spore viability; finally, we compared their hybrid fertility with that of the nonstarved isolates. The results recapitulate the previous findings for starved diploids: multiple backcrossed hybrids exhibited significantly lower average sporulation efficiency than the nonstarved backcrosses (Mann–Whitney U test). Specifically, about one-third of starved isolates used for the backcross analysis showed a sporulation efficiency that was significantly lower than those of their respective nonstarved intercrosses (Coyle et al. in preparation). In contrast to sporulation efficiency, spore viability in all cases was indistinguishable from the ancestral (Coyle et al. in preparation).

3.7.3

Starved Isolates Reproductively Isolated from the Ancestral Population Are Self-Fertile

Complete inability to undergo meiosis would prevent the establishment of a new species. This might be caused either by mutations in genes important for meiosis or by a chromosome aberration that prohibits meiosis. To ensure that the starved isolates could have found a new lineage, capable of sexual reproduction, we selfed starved isolates that exhibited lower fertility in backcrosses. To do this, we made haploid progeny of those starved isolates homothallic and isolated their selfed diploid progeny. After sporulating these selfed diploids we found that their sporulation efficiency was significantly higher than the fertility of the backcross hybrid (Coyle et al. in preparation). We concluded that starved isolates reproductively isolated from the ancestral population were self-fertile and able to form new sexually reproducing lineages, that is, new biological species. These results confirm bona fide incipient speciation arising in a yeast population within a 1-month period of starvation.

3.7.4

Molecular Basis of Reproductive Barrier in a Starved Isolate

To discover the molecular mechanism of reproductive isolation, we further studied several of the reproductively isolated starved isolates. Our experiments showed that forward mutation frequency increased only two times in starved populations compared with the nonstarved control, which could not account for the widespread reproductive isolation. In contrast, pulsed-field gel analysis revealed a 6.6% total frequency of new chromosomal variants in the starved BY4743 cultures, with no rearrangements detected in nonstarved cultures (Coyle et al. in preparation). This frequency is orders of magnitude higher than can be estimated for a typical laboratory yeast strain (Schmidt et al. 2006). Finally, using microarray-based comparative genomic hybridization (Dunn et al. 2005) we showed that all starved isolates contained deletions and additions of genomic DNA (Coyle et al. in preparation).

3 Starvation-Induced Reproductive Isolation in Yeast

59

In particular, one isolate contained duplication of the whole Chromosome I (Coyle et al. in preparation). We decided to examine this disomic haploid isolate further to determine whether chromosomal abnormalities which arose during starvation could explain this strain’s reproductive isolation. As has been reasoned before, in tetraploid hybrid meioses every chromosome is furnished with its exact homologue (Dobzhansky 1933), therefore tetraploidization should restore faithful segregation of homologues if mis-segregation in the diploid hybrid were caused by chromosomal rearrangements and not by genic incompatibilities. In our case, when we crossed the disomic starved isolate to its haploid ancestor, we obtained a diploid hybrid with trisomy for Chromosome I (two copies of the chromosome from the starved isolate and one from the ancestor). If the Chromosome I trisomy were responsible for the lowered fertility of the backcross hybrid, because there was no homologue furnished for the extra Chromosome I, we would expect tetraploidization of this hybrid to restore its fertility. If the fertility of the backcross hybrid were not restored then we would have to assume that an epistatic interaction between incompatible alleles underlies reproductive barrier between this isolate and its ancestor. To test for this possibility, we obtained tetraploid versions of the trisomic backcross hybrid by deleting one of the two MAT loci in the hybrid. We identified hybrids expressing either MATa or MATalpha and crossed such strains using a micromanipulator to produce several independent tetraploid versions of the backcross hybrid. We repeated this procedure with the nonstarved isolates to obtain control tetraploids. After tetraploidy was confirmed by tetrad dissection, we sporulated the resulting diploid hybrids and their tetraploid derivatives and measured the sporulation efficiency as before. The results are shown in Fig. 3.2. 100 90 80 70 60 50 40 30 20 10 0

a

b

c

d

Fig. 3.2 Relative sporulation efficiency of (a) Diploid starved backcross hybrid with extra Chromosome I, (b) tetraploid starved backcross hybrid with extra Chromosome I, (c) diploid nonstarved backcross hybrid, (d) tetraploid backcross hybrid. Ancestral sporulation efficiency is assumed to be 100%. Spore viability in all strains was indistinguishable from the ancestral

60

E. Kroll et al.

Independently obtained tetraploid derivatives of the trisomic hybrid showed a dramatic increase in sporulation efficiency compared with the diploid hybrid using Mann–Whitney U test (Coyle et al. in preparation). In contrast, the increase in sporulation efficiency of the nonstarved tetraploidized backcross hybrids was indistinguishable from that of the nonstarved diploid backcross, indicating that tetraploidization does not generally result in increased sporulation efficiency in the nonstarved clones. Our results indicate that reproductive isolation in the starved disomic isolate cannot be a consequence of the allelic incompatibilities between the disomic isolate and the nonstarved ancestor. Rather, these results support the hypothesis that chromosomal rather than genic differences underlie reduced fertility of the starved isolate.

3.8

Conclusions

The experiments described here provide insight into the phenomenon of starvationassociated genomic rearrangements and its possible role in establishing reproductive isolation. Starvation is a condition that most natural organisms frequently contend within the wild. Because a variety of changes in the external milieu can result in starvation, we contend that starvation is a generic “interpreter” of catastrophic environmental change. Organisms that evolved mechanisms to harness starvation as signal to increase population diversity could be expected to leave more descendants in the wake of such catastrophes. These mechanisms represent an alternative population-level evolutionary response to the many individual-level responses that enable organisms to persist under severe stress (e.g., spores, hibernation, aestivation, extreme desiccation resistance, etc.). Eukaryotes possess genetic mechanisms able to respond to stressful conditions; however, no connection between starvation, starvation-induced genetic variation, and speciation has been experimentally established in eukaryotes. Our experiments provide evidence for this connection by showing that starved yeast populations sustain genomic rearrangements at a dramatically higher frequency than nonstarved populations, and that certain clones that survive starvation are reproductively isolated from their ancestors. These newly evolved clones may represent incipient species. Genomic rearrangements have been shown to occur in yeast during chemical treatment (Hughes et al. 2000) and growth in nutrient-limiting conditions (Adams et al. 1992; Dunham et al. 2002). In fact, Dunham et al. note that several of their parallel cultures grown in continuous culture under glucose limitation failed to sporulate, a phenomenon similar to the one observed here (Dunham et al. 2002). This phenotype arose after 250–500 generations of continuous growth, unlike our cultures which only underwent 10 generations during the course of starvation. Recently, another study has shown that adaptation to diverse environments leads to incipient speciation in yeast (Dettman et al. 2007), echoing the classic experiments in Drosophila (Rice and Hostert 1993). The authors attempted to examine the

3 Starvation-Induced Reproductive Isolation in Yeast

61

molecular nature of de novo speciation, using correlation between hybrid fitness and fertility. Interestingly, in contrast to findings in extant yeast species (Greig 2009), their yeast hybrids, like ours, retained almost 100% of spore viability but exhibited lower sporulation efficiency (Dettman et al. 2007). We contend that genomic rearrangements arising during starvation may contribute to reproductive isolation, supporting the chromosomal theory of speciation (White 1978). When the rate of genomic rearrangements is very low and the effective population size is high, the chromosomal theory of speciation cannot plausibly explain the process of speciation (Rieseberg 2001). However, the stress of complete starvation circumvents these problems by dramatically increasing the rate of chromosomal rearrangements in starving populations and simultaneously decreasing the effective population size (because of the lower chances of having enough resources to mate and also because of lower viability). Thus, environmental conditions leading to starvation may favor the establishment of small, reproductively isolated, inbred subpopulations that harbor restructured genomes poised to undergo rapid speciation without a requirement for any other type of prezygotic isolation. Acknowledgments We would like to acknowledge technical help from S. Coyle. This work was supported by NSF grant 0134648 to E.K., NASA grant NNX07AJ28G grant to R.F.R. and NSF ADVANCE grant DBI-0340856 to BD

References Adams J, Puskas-Rozsa S, Simlar J, Wilke CM (1992) Adaptation and major chromosomal changes in populations of Saccharomyces cerevisiae. Curr Genet 22:13–19 Anderson E (1949) Introgressive hybridization. Chapman & Hall, London Bateson W (1909) Heredity and variation in modern lights. Darwin and modern science. Cambridge University Press, Cambridge, UK Bjedov I, Tenaillon O, Gerard B, Souza V, Denamur E, Radman M, Taddei F, Matic I (2003) Stress-induced mutagenesis in bacteria. Science 300:1404–1409 Blanc G, Barakat A, Guyot R, Cooke R, Delseny M (2000) Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell 12:1093–1101 Brachmann CB, Davies A, Cost GJ, Caputo E, Li J, Hieter P, Boeke JD (1998) Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14:115–132 Britton-Davidian J, Catalan J, da Graca Ramalhinho M, Ganem G, Auffray JC, Capela R, Biscoito M, Searle JB, da Luz Mathias M (2000) Rapid chromosomal evolution in island mice. Nature 403:158 Chambers SR, Hunter N, Louis EJ, Borts RH (1996) The mismatch repair system reduces meiotic homeologous recombination and stimulates recombination-dependent chromosome loss. Mol Cell Biol 16:6110–6120 Chen W, Jinks-Robertson S (1999) The role of the mismatch repair machinery in regulating mitotic and meiotic recombination between diverged sequences in yeast. Genetics 151:1299–1313 Coyle S, Kroll E (2008) Starvation induces genomic rearrangements and starvation-resilient phenotypes in yeast. Mol Biol Evol 25:310–318 Coyle S, Dunn B, Rosenzweig RF, Kroll E (in preparation) The molecular basis of starvationassociated reproductive isolation in yeast

62

E. Kroll et al.

Coyne JA, Orr HA (1998) The evolutionary genetics of speciation. Philos Trans R Soc Lond B Biol Sci 353:287–305 Darwin C (1859) On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. J. Murray, London Davis L, Smith GR (2003) Nonrandom homolog segregation at meiosis I in Schizosaccharomyces pombe mutants lacking recombination. Genetics 163:857–874 Death A, Ferenci T (1994) Between feast and famine: endogenous inducer synthesis in the adaptation of Escherichia coli to growth with limiting carbohydrates. J Bacteriol 176:5101–5107 Delneri D, Colson I, Grammenoudi S, Roberts IN, Louis EJ, Oliver SG (2003) Engineering evolution to study speciation in yeasts. Nature 422:68–72 Dernburg AF, McDonald K, Moulder G, Barstead R, Dresser M, Villeneuve AM (1998) Meiotic recombination in C. elegans initiates by a conserved mechanism and is dispensable for homologous chromosome synapsis. Cell 94:387–398 Dettman JR, Sirjusingh C, Kohn LM, Anderson JB (2007) Incipient speciation by divergent adaptation and antagonistic epistasis in yeast. Nature 447:585–588 Dieckmann U, Doebeli M (1999) On the origin of species by sympatric speciation. Nature 400:354–357 Dobzhansky T (1933) On the sterility of the interracial hybrids in Drosophila pseudoobscura. Proc Natl Acad Sci USA 19:397–403 Dobzhansky T (1937) Genetics and the origin of species. Columbia Press, New York Drake JW, Charlesworth B, Charlesworth D, Crow JF (1998) Rates of spontaneous mutation. Genetics 148:1667–1686 Dresser ME, Ewing DJ, Harwell SN, Coody D, Conrad MN (1994) Nonhomologous synapsis and reduced crossing over in a heterozygous paracentric inversion in Saccharomyces cerevisiae. Genetics 138:633–647 Dri AM, Moreau PL (1993) Phosphate starvation and low temperature as well as ultraviolet irradiation transcriptionally induce the Escherichia coli LexA- controlled gene sfiA. Mol Microbiol 8:697–706 Dunham MJ, Badrane H, Ferea T, Adams J, Brown PO, Rosenzweig F, Botstein D (2002) Characteristic genome rearrangements in experimental evolution of Saccharomyces cerevisiae. Proc Natl Acad Sci USA 99:16144–16149 Dunn B, Levine RP, Sherlock G (2005) Microarray karyotyping of commercial wine yeast strains reveals shared, as well as unique, genomic signatures. BMC Genomics 6(1):53–57 Eichler EE, Sankoff D (2003) Structural dynamics of eukaryotic chromosome evolution. Science 301:793–797 Finkel SE (2006) Long-term survival during stationary phase: evolution and the GASP phenotype. Nat Rev Microbiol 4:113–120 Fischer G, James SA, Roberts IN, Oliver SG, Louis EJ (2000) Chromosomal evolution in Saccharomyces. Nature 405:451–454 Fischer G, Neuveglise C, Durrens P, Gaillardin C, Dujon B (2001) Evolution of gene order in the genomes of two related yeast species. Genome Res 11:2009–2019 Fischer G, Rocha EP, Brunet F, Vergassola M, Dujon B (2006) Highly variable rates of genome rearrangements between hemiascomycetous yeast lineages. PLoS Genet 2:e32 Fisher RA (1930) The Genetical theory of natural selection. Oxford, UK Friedberg E, Walker G, Siede W (1995) DNA repair and mutagenesis. Am Soc Microbiol, Washington, DC Greenberg AJ, Moran JR, Coyne JA, Wu CI (2003) Ecological adaptation during incipient speciation revealed by precise gene replacement. Science 302:1754–1757 Greig D (2009) Reproductive isolation in Saccharomyces. Heredity 102:39–44 Greig D, Leu JY (2009) Natural history of budding yeast. Curr Biol 19:R886–R890 Greig D, Borts RH, Louis EJ, Travisano M (2002) Epistasis and hybrid sterility in Saccharomyces. Proc R Soc Lond B Biol Sci 269:1167–1171

3 Starvation-Induced Reproductive Isolation in Yeast

63

Guerin E, Cambray G, Sanchez-Alberola N, Campoy S, Erill I, Da Re S, Gonzalez-Zorn B, Barbe J, Ploy MC, Mazel D (2009) The SOS response controls integron recombination. Science 324:1034 Hartl DL, Lohe AR, Lozovskaya ER (1997) Regulation of the transposable element mariner. Genetica 100:177–184 Hastings PJ, Bull HJ, Klump JR, Rosenberg SM (2000) Adaptive amplification. An inducible chromosomal instability mechanism. Cell 103:723–731 Hauffe HC, Searle JB (1998) Chromosomal heterozygosity and fertility in house mice (Mus musculus domesticus) from Northern Italy. Genetics 150:1143–1154 Hawley RS (2002) Meiosis: how male flies do meiosis. Curr Biol 12:R660–R662 He AS, Rohatgi PR, Hersh MN, Rosenberg SM (2006) Roles of E. coli double-strand-break-repair proteins in stress-induced mutation. DNA Repair 5:258–273 Henikoff S, Ahmad K, Malik HS (2001) The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293:1098–1102 Herrmann RG, Maier RM, Schmitz-Linneweber C (2003) Eukaryotic genome evolution: rearrangement and coevolution of compartmentalized genetic information. Philos Trans R Soc Lond B Biol Sci 358:87–97, discussion 97 Hughes TR, Roberts CJ, Dai H, Jones AR, Meyer MR, Slade D, Burchard J, Dow S, Ward TR, Kidd MJ, Friend SH, Marton MJ (2000) Widespread aneuploidy revealed by DNA microarray expression profiling. Nat Genet 25:333–337 Hunter N, Chambers SR, Louis EJ, Borts RH (1996) The mismatch repair system contributes to meiotic sterility in an interspecific yeast hybrid. EMBO J 15:1726–1733 Hutter H, Vogel BE, Plenefisch JD, Norris CR, Proenca RB, Spieth J, Guo C, Mastwal S, Zhu X, Scheel J, Hedgecock EM (2000) Conservation and novelty in the evolution of cell adhesion and extracellular matrix genes. Science 287:989–994 Infante JJ, Dombek KM, Rebordinos L, Cantoral JM, Young ET (2003) Genome-wide amplifications caused by chromosomal rearrangements play a major role in the adaptive evolution of natural yeast. Genetics 165:1745–1759 Jang JK, Sherizen DE, Bhagat R, Manheim EA, McKim KS (2003) Relationship of DNA doublestrand breaks to synapsis in Drosophila. J Cell Sci 116:3069–3077 Jinks-Robertson S, Sayeed S, Murphy T (1997) Meiotic crossing over between nonhomologous chromosomes affects chromosome segregation in yeast. Genetics 146:69–78 Kai M, Wang TS (2003) Checkpoint activation regulates mutagenic translesion synthesis. Genes Dev 17:64–76 Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241–254 King M (1993) Species evolution: the role of chromosome change. Cambridge University Press, Cambridge Koch AL (1971) The adaptive responses of Escherichia coli to a feast and famine existence. Adv Microb Physiol 6:147–217 Lande R (1989) Fisherian and Wrightian theories of speciation. Genome 31:221–227 Lee HY, Chou JY, Cheong L, Chang NH, Yang SY, Leu JY (2008) Incompatibility of nuclear and mitochondrial genomes causes hybrid sterility between two yeast species. Cell 135:1065–1073 Lichten M (2001) Meiotic recombination: breaking the genome to save it. Curr Biol 11: R253–R256 Livingstone K, Rieseberg L (2004) Chromosomal evolution and speciation: a recombinationbased approach. New Phytol 161:107–112 Lombardo MJ, Aponyi I, Rosenberg SM (2004) General stress response regulator RpoS in adaptive mutation and amplification in Escherichia coli. Genetics 166:669–680 Mayr E (1942) Systematics and the origins of species. Columbia University Press, New York Mayr E (1966) Animal species and evolution. Harvard University Press, Cambridge Mayr E (1996) What is a species and what is not? Philos Sci 63:262–277 McKenzie GJ, Harris RS, Lee PL, Rosenberg SM (2000) The SOS response regulates adaptive mutation. Proc Natl Acad Sci USA 97:6646–6651

64

E. Kroll et al.

Muller HJ (1940) Bearing of the Drosophila work on systematics. In: Huxley J (ed) The new systematics. Clarendon, Oxford, pp 185–268 Natsoulis G, Thomas W, Roghmann MC, Winston F, Boeke JD (1989) Ty1 transposition in Saccharomyces cerevisiae is nonrandom. Genetics 123:269–279 Naumov GI, James SA, Naumova ES, Louis EJ, Roberts IN (2000) Three new species in the Saccharomyces sensu stricto complex: Saccharomyces cariocanus,Saccharomyces kudriavzevii and Saccharomyces mikatae. Int J Syst Evol Microbiol 50(Pt 5):1931–1942 Navarro A, Barton NH (2003) Chromosomal speciation and molecular divergence–accelerated evolution in rearranged chromosomes. Science 300:321–324 Noor MA, Grams KL, Bertucci LA, Reiland J (2001) Chromosomal inversions and the reproductive isolation of species. Proc Natl Acad Sci USA 98:12084–12088 Orr HA, Presgraves DC (2000) Speciation by postzygotic isolation: forces, genes and molecules. Bioessays 22:1085–1094 Page SL, Hawley RS (2003) Chromosome choreography: the meiotic ballet. Science 301:785–789 Peoples TL, Dean E, Gonzalez O, Lambourne L, Burgess SM (2002) Close, stable homolog juxtaposition during meiosis in budding yeast is dependent on meiotic recombination, occurs independently of synapsis, and is distinct from DSB-independent pairing contacts. Genes Dev 16:1682–1695 Petit MA, Dimpfl J, Radman M, Echols H (1991) Control of large chromosomal duplications in Escherichia coli by the mismatch repair system. Genetics 129:327–332 Phadnis N, Orr HA (2009) A single gene causes both male sterility and segregation distortion in Drosophila hybrids. Science 323:376–379 Pialek J, Hauffe HC, Rodriguez-Clark KM, Searle JB (2001) Raciation and speciation in house mice from the Alps: the role of chromosomes. Mol Ecol 10:613–625 Radman M (1975) SOS repair hypothesis: phenomenology of an inducible DNA repair which is accompanied by mutagenesis. Basic Life Sci 5A:355–367 Radman M, Wagner R (1993) Mismatch recognition in chromosomal interactions and speciation. Chromosoma 102:369–373 Rice W, Hostert E (1993) Laboratory experiments on speciation: what have we learned in 40 years? Evolution 47:1637–1653 Rieseberg LH (2001) Chromosomal rearrangements and speciation. Trends Ecol Evol 16:351–358 Roeder GS, Bailis JM (2000) The pachytene checkpoint. Trends Genet 16:395–403 San Filippo J, Sung P, Klein H (2008) Mechanism of eukaryotic homologous recombination. Annu Rev Biochem 77:229–257 Schilthuizen M (2000) Dualism and conflicts in understanding speciation. Bioessays 22:1134–1141 Schliewen UK, Tautz D, Paabo S (1994) Sympatric speciation suggested by monophyly of crater lake cichlids. Nature 368:629–632 Schmidt KH, Pennaneach V, Putnam CD, Kolodner RD (2006) Analysis of gross-chromosomal rearrangements in Saccharomyces cerevisiae. Methods Enzymol 409:462–476 Searle JB (1998) Speciation, chromosomes, and genomes. Genome Res 8:1–3 Seoighe C, Federspiel N, Jones T, Hansen N, Bivolarovic V, Surzycki R, Tamse R, Komp C, Huizar L, Davis RW, Scherer S, Tait E, Shaw DJ, Harris D, Murphy L, Oliver K, Taylor K, Rajandream MA, Barrell BG, Wolfe KH (2000) Prevalence of small inversions in yeast gene order evolution. Proc Natl Acad Sci USA 97:14433–14437 Sinervo B, Svensson E (2002) Correlational selection and the evolution of genomic architecture. Heredity 89:329–338 Smets B, Ghillebert R, De Snijder P, Binda M, Swinnen E, De Virgilio C, Winderickx J (2010) Life in the midst of scarcity: adaptations to nutrient availability in Saccharomyces cerevisiae. Curr Genet 56:1–32 Taddei F, Matic I, Radman M (1995) cAMP-dependent SOS induction and mutagenesis in resting bacterial populations. Proc Natl Acad Sci USA 92:11736–11740 Taddei F, Vulic M, Radman M, Matic I (1997) Genetic variability and adaptation to stress. EXS 83:271–290

3 Starvation-Induced Reproductive Isolation in Yeast

65

Ting CT, Tsaur SC, Wu ML, Wu CI (1998) A rapidly evolving homeobox at the site of a hybrid sterility gene. Science 282:1501–1504 Toczyski DP, Galgoczy DJ, Hartwell LH (1997) CDC5 and CKII control adaptation to the yeast DNA damage checkpoint. Cell 90:1097–1106 Turelli M, Barton NH, Coyne JA (2001) Theory and speciation. Trends Ecol Evol 16:330–343 Via S (2001) Sympatric speciation in animals: the ugly duckling grows up. Trends Ecol Evol 16:381–390 Vulic M, Lenski RE, Radman M (1999) Mutation, recombination, and incipient speciation of bacteria in the laboratory. Proc Natl Acad Sci USA 96:7348–7351 White MJD (1978) Modes of speciation. W.H. Freeman& Co, SanFrancisco Witkin EM (1976) Ultraviolet mutagenesis and inducible DNA repair in Escherichia coli. Bacteriol Rev 40:869–907 Witkin EM, Wermundsen IE (1979) Targeted and untargeted mutagenesis by various inducers of SOS functions in Escherichia coli. Cold Spring Harb Symp Quant Biol 43(Pt 2):881–886 Zinser ER, Kolter R (1999) Mutations enhancing amino acid catabolism confer a growth advantage in stationary phase. J Bacteriol 181:5800–5807

Chapter 4

Populations of RNA Molecules as Computational Model for Evolution Michael Stich, Carlos Briones, Ester La´zaro, and Susanna C. Manrubia

Abstract We consider populations of RNA molecules as computational model for molecular evolution. Based on a large body of previous work, we review some recent results. In the first place, we study the sequence–structure map, its implications on the structural repertoire of a pool of random RNA sequences and its relevance for the RNA world hypothesis of the origin of life. In a scenario where template replication is possible, we discuss the internal organization of evolving populations and its relationship with robustness and adaptability. Finally, we explore how the effect of the mutation rate on fitness changes depends on the degree of adaptation of an RNA population.

4.1

Introduction

Molecular evolution covers a huge area of research, ranging from prebiotic chemistry and questions on the origin of life, through many aspects related to the origin of and the relationships among species, the study of viral and bacterial evolution and their medical implications up to the artificial design and in vitro selection of molecules, with all their applications in nano- and biotechnology. In this chapter, we do not aim to give a complete overview of that wide research field, but focus on the use of populations of RNA molecules as a model to understand evolution of prebiotic replicators in the RNA world. As RNA viruses share many characteristics with primitive RNA molecules with replicative ability, these studies can also be used to tackle many aspects of viral evolution. Although a large body of our work is inspired by experiments, in this chapter we focus on theoretical approaches for understanding evolutionary processes. M. Stich, C. Briones, E. La´zaro, and S.C. Manrubia Dpto de Evolucio´n Molecular, Centro de Astrobiologı´a (CSIC-INTA), Ctra de Ajalvir, km 4, 28850 Torrejo´n de Ardoz (Madrid), Spain e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_4, # Springer-Verlag Berlin Heidelberg 2010

67

68

M. Stich et al.

RNA molecules are a very well suited model for studying evolution because they incorporate, in a single molecular entity, both genotype and phenotype. While errors in the replication process introduce mutations in the RNA sequence (genotype), selection acts upon the function (phenotype) of the molecule. Since in many cases the spatial structure of the molecule is crucial for its biochemical function, the structure of an RNA molecule can be considered as a minimal representation of the phenotype. In current biology, RNA viruses are the paradigmatic example for evolving populations: replication is fast, it takes place with a relatively high error rate, and population sizes are large. This has made RNA viruses an often used example for quasispecies, a concept originally proposed by Eigen (1971) and developed over the last decades in the context of virology (Domingo 2006). It states that a population of replicators, e.g., an RNA virus evolving within an infected host, cannot be represented by only one, fittest, genome, but by the spectrum of related mutants that are present in the population. The quasispecies evolves under a certain error (mutation) rate and the cloud of mutants enables the population to adapt quickly to new environmental situations, such as population bottlenecks and changed selective pressures. Under constant external conditions, a quasispecies approaches a dynamic equilibrium between selection of favorable sequences (what we mean by favorable, will be specified below) and the diversity constantly introduced by mutation. Therefore, the mutation rate is of crucial importance in the study of such heterogeneous populations in molecular evolution (Huynen et al. 1996; Biebricher and Eigen 2005): if the mutation rate becomes too large, selection becomes inefficient, the correlations between the genomes within the population decay, and the whole population may even become extinct. There are many reported examples of the extinction of RNA virus populations when replication takes place at increased error rates due to the presence of mutagenic agents (Sierra et al. 2000; Domingo 2005; Cases-Gonza´lez et al. 2008). These results have inspired a new promising antiviral strategy named lethal mutagenesis (Loeb et al. 1999). Another field of research within molecular evolution is the quest for understanding the origin and early evolution of life. One of the most appealing theories in this context is the so-called RNA world hypothesis. It is based on the facts that RNA cannot only represent a genetic code, like DNA in present-day cells, but also can act as catalyst of biochemical reactions, like present-day enzymes. Therefore, a single RNA molecule could have been endowed with the two main features of living matter, providing the genome (i.e., the blueprint for replication) and the primordial machinery for replication and metabolism. One of the open questions in this context is how the first template-dependent RNA polymerase ribozyme could have emerged. Experimentally, a minimum size of approximately 165 nucleotides has been established for such a molecule (Johnston et al. 1999; Joyce 2004), a length three to four times that of the longest RNA oligomers obtained by random polymerization (Huang and Ferris 2003, 2006). Hence, one of the main challenges within the RNA world scenario is to convincingly bridge this gap. In this chapter, we will review some recent results obtained in our lab (Manrubia and Briones 2007; Stich et al. 2007, 2008, 2010; Briones et al. 2009) and put them

4 Populations of RNA Molecules as Computational Model for Evolution

69

into the context of the aforementioned issues. The first part of this chapter tries to deepen our understanding of the sequence–structure map, relevant for the RNA world model. Then, we discuss the internal organization of evolving populations and its relevance for robustness and adaptability. Subsequently, we explore the relationship between microscopic mutation rate and the fractions of beneficial and deleterious mutations, as observed in experiments or used in phenomenological models.

4.2

Structural Repertoire of RNA Pools

RNA structure is crucial for biochemical function of an RNA molecule. A lot of research efforts are dedicated to the folding process that relates RNA sequences with RNA structures. For our purpose, it is sufficient to consider two-dimensional secondary structures as good approximation of real three-dimensional structures. Two fundamental properties of the sequence–structure map are that (1) the number of different sequences is much higher than the number of structures and (2) not all possible structures are equally probable (Fontana et al. 1993; Schuster et al. 1994). In this context, common structures are those which have many different sequences folding into them and rare structures are those which have only few sequences folding into them. In this section, we explore the structural repertoire of a pool of random sequences. We first describe the results of the folding of 108 RNA molecules of length 35 nt consisting of random sequences composed of the four types of nucleotides A, C, G, and U (Stich et al. 2008). As secondary structure of each molecule, we take the minimum free energy structure as given by the fold () routine from the Vienna RNA Package (Hofacker et al. 1994). RNA secondary structures consist of stems, where base pairing (A–U, G–C, G–U) between nucleotides occurs, and unpaired regions. In standard bracket notation, nucleotides paired with each other are denoted by “(” and “)”, while unpaired nucleotides are represented by “.”. Among unpaired regions, we can distinguish dangling ends and different kinds of loops: hairpin loops, bulges, interior loops, and multiloops. The simplest structure is called a stem–loop, it consists of one hairpin loop and one stem, and possibly one or two dangling ends. While there are 4n sequences of length n (the so-called sequence space), the number Sn of different structures (the structure space) is much smaller. Based on theoretical studies (Waterman 1978), the expression Sn 0.7131 n3/2 (2.2888)n has been given (Gr€ uner et al. 1996). Therefore, different sequences will actually fold into the same secondary structure, grouping into neutral networks of genomes (Gr€uner et al. 1996; Huynen et al. 1996). Neutral networks are formed by genomes sharing the same phenotype, here secondary structure, and which are connected by (single) mutational events. The sequence–structure map turns out to be very complex. Two sequences that are just one mutation apart may fold into structures very different from each other. At the same time, in a relatively small neighborhood of any sequence, almost all common structures can be found (Fontana et al. 1993).

70

M. Stich et al.

In our case, 108 sequences folded into 5,163,324 structures (Stich et al. 2008). A way to visualize the uneven distribution of sequences into structures is the frequency–rank diagram. In Fig. 4.1a, we have ranked the structures according to the number of sequences folding into them. One can see that there are around thousand common structures, each of them obtained from about 104 different sequences. On the other hand, we also find a few million rare structures yielded by only one or two sequences. Although for a much smaller pool, this has already been reported before (Schuster et al. 1994; Gr€ uner et al. 1996; Schuster and Stadler 1994; Tacker et al. 1996). In order to study the distribution of common vs. rare structures in more detail, we have proposed a classification where we characterize a structure in terms of three numbers (Stich et al. 2008): (a) the number of hairpin loops, H, (b) the sum of bulges and interior loops, I, and (c) the number of multiloops, M. For example, a simple stem–loop structure, denoted as SL, is characterized by (H,I,M) ¼ (1,0,0), and all stem–loop structures found in the pool are grouped into that structure family. Other important families are the hairpin structure family, HP, with one interior loop or bulge (1,1,0), the double stem–loop, DSL, represented by (2,0,0), and the simple hammerhead structure, HH, by (2,0,1). Of course, there exist more complicated structure families, as detailed in Stich et al. (2008). For the pool that we have folded, we find that only 21 structure families are enough to cover all the 5.2 million structures identified. Our analysis, displayed in Fig. 4.1b, shows that the vast majority of sequences fold into simple structure families. For example, 79.0% of all sequences belong to only three structure families (HP, HP2, SL, in decreasing abundance), and 92.1% of all sequences fold into simple structures with at most 3 stems (HP, HP2, SL, DSL, DSL2, HH). Note that 2.1% of all sequences remain open and do not fold. Our data is in agreement with other findings on the structural repertoire of RNA sequence

b

105

SL

104 Frequency

c

HP

3

10

open rest

2

10

DSL2

101

DSL HH HP3

0

10

100 101 102 103 104 105 106 107

Rank

HP2

Binned absolute frequency

a

105 104 103 102 101 100 10–1 10–2 10–3

HP HP2 SL DSL HP3 DSL2

10–4 100 101 102 103 104 105 106 107

Rank

Fig. 4.1 (a) Frequency–rank diagram of the 5,163,324 different secondary structures, obtained by folding 108 RNA sequences of length 35 nt. (b) Distribution of the sequences in structure families according to their frequency. Higher-order hairpins, HPx, are defined as (H,I,M) ¼ (1,x,0), being x 2, higher-order double stem–loops, DSLx, as (H,I,M) ¼ (2, x1,0), and higher-order hammerheads, HHx, as (2, x1,1). (c) Frequency–rank diagram according to the structural family. The upper thick solid curve denotes the same curve as in (a). Parts (a) and (c) after Stich et al. (2008)

4 Populations of RNA Molecules as Computational Model for Evolution

71

pools where the influence of the sequence length (Sabeti et al. 1997; Gevertz et al. 2005), the nucleotide composition (Knight et al. 2005; Kim et al. 2007), and pool size (Gevertz et al. 2005) has been studied. Now, we can reconsider the frequency–rank diagram. We sum up all structures of a given structure family within a rank interval. Through this binning procedure, we obtain for each structure family a curve which describes its relative frequency compared with that of the other families. The curves for the most frequent families are shown in Fig. 4.1c. We immediately see that the most frequent structures belong to the stem–loop family, followed by the hairpin family, double stem loops, higherorder hairpin families, and hammerheads. For low ranks, the SL curve is identical with the curve describing all structures. For ranks between 4 103 and 104, it is the HP curve which practically coincides with the total curve. Interestingly, the position of the bump around rank 103 falls together with the locations where the SL and HP families are equally present. Hence, we conclude that the bumps in the frequency–rank diagram correspond to the succession of different structural families and are not smoothed by better sampling of the sequence space. What implications have these findings for the RNA world scenario? The standard view of the RNA world hypothesis states that the first chains of polymerized polynucleotides consisted of random sequences. Therefore, it is important to study the structural and subsequently the functional repertoire of such short sequences. We have seen that a random pool is very rich in simple structures. However, as already mentioned above, short molecules cannot perform template-dependent replication. Therefore, we devised a four-step model of modular evolution as a possible pathway for the emergence of functional and progressively longer molecules starting with a random pool of RNA oligomers (Briones et al. 2009). The first step is the random polymerization of RNA molecules up to 40-mers. The second step is the folding of these sequences, leading to high fractions of simple structures like hairpins, as just shown. The third step is based on the observation that simple hairpin structures, similar to those formed by short random sequences in huge amounts, are actually known to show catalytic activity, leading to RNA–RNA ligation (Puerta-Ferna´ndez et al. 2003). If a certain fraction of the hairpin molecules originated is capable of displaying ligase activity, longer molecules may be formed. Even though the majority of the long molecules may not perform ligase activity, some of them will keep the modular structure of their building blocks and remain active to catalyze further RNA–RNA ligations (Manrubia and Briones 2007). This suggests that hairpin ribozymes, both in individual modules and in combined structures, could have catalyzed the synthesis of progressively longer RNA molecules from short and structurally simpler modules (Briones et al. 2009). Finally, the fourth step of the model consists of a maturation of these ligating RNA molecules of intermediate length into self-replicating RNA ligase networks, which could coexist and even compete with each other, leading eventually to a molecule long and complex enough to perform template-dependent RNA replication [further details in Briones et al. (2009)]. It is important to emphasize that the whole model relies strongly on the observation that simple structures like hairpins – with potential ligase activity – are ubiquitous in pools of random RNA sequences.

72

4.3

M. Stich et al.

Internal Organization of Evolving Populations

Above, we have discussed the static picture of the sequence–structure map. Once replication within a population is possible, evolution through Darwinian selection is triggered. Here, RNA serves as a model to study the interplay between mutation, selection, and the diversity sustained in populations of fast mutating replicators (Stich et al. 2007). First, we briefly describe the evolutionary algorithm. Our system consists of a population of N replicating RNA sequences, each of length n nucleotides. At the beginning of the simulation, every molecule is initialized with a random sequence. Every time that a sequence replicates, each of its nucleotides has a probability m (mutation rate) to be replaced by another nucleotide, randomly chosen among the four possibilities A, C, G, U. At each generation, the sequences are folded into secondary structures as described above. We define a target structure that represents in a simple way optimal performance in a given environment. It can be a hairpin, hammerhead, or any other structure: the qualitative behavior of the system does not depend on this choice. We compare every folded structure with the target structure by means of the base pair distance di, defined as the number of base pairs that have to be opened and closed to transform a given structure into the target structure (Hofacker et al. 1994). The closer a secondary structure is to the target structure, the higher the probability p(di) that the corresponding sequence i replicates: expðbdi Þ : pðdi Þ ¼ PN i¼1 expðbdi Þ

(4.1)

The parameter b denotes the selective pressure and is here chosen as b ¼ 2/n. Generations in our simulations are nonoverlapping and the offspring generation is calculated according to Wright–Fisher sampling. Two relevant quantities to characterize the state of the population are the P average distance d ¼ Ni¼1 di =N to the target structure and the fraction r of structures in the population folding exactly into the target structure. Because of the persisting action of mutation, both quantities fluctuate in time even after reaching the asymptotic regime. Therefore, we perform averages over long time intervals (and different realizations, starting from distinct initial RNA populations), obtain, respectively. ing mean values denoted by d and r In order to quantify collective properties of the molecular ensemble, we first determine the consensus sequence of the population, given by, for each position along the sequence, the most frequent type of nucleotide found within the population. In real RNA molecular and viral quasispecies, the consensus sequence is obtained by means of population sequencing (Thurner et al. 2004; Simmonds et al. 2004; Domingo 2006), and it does not necessarily correspond to any of the individual sequences present in the population. It is straightforward to fold the consensus sequence and obtain the structure of the consensus sequence, for which its

4 Populations of RNA Molecules as Computational Model for Evolution

73

coincidence with the target structure can be determined. At each time step we count either one, corresponding to coincidence, or zero, otherwise. Averages over time C , which corresponds to the (and realizations) of this binary variable yield r probability that, at a randomly chosen time step, the structure of the consensus sequence coincides with the target structure. We further define a consensus structure. It is calculated by determining, for each position along the molecule, the most frequent structural state found within the population, i.e., unpaired “.”, paired upstream “(”, or paired down-stream “)”. Due to this definition, the consensus structure does not necessarily represent a valid secondary structure of an RNA molecule. This procedure is hence fundamentally different from assigning a consensus structure to an alignment of sequences (Hofacker et al. 2002). Averages over time (and realizations) of the coincidence S. between the consensus structure and the target structure yield the probability r Within this model, evolution takes place in the following way: sequences which fold into structures similar to the target structure will replicate more likely and their fraction in the population increases. Mutation introduces diversity and enables the system to find structures that are closer to the target, and finally find and fix the target structure. Starting from a random set of sequences, we can distinguish several phases of evolution: the search phase, where d decreases while r ¼ 0. This phase finishes at generation gA when a molecule folds into the target structure for the first time. Then, the phase of fixation begins, where – on average – d still decreases and r increases. However, due to the stochastic nature of mutation – and hence in particular for large mutation rates as will be explored further below – the population may lose again the target structure (and r drops down to zero). If r does not drop to zero for 500 consecutive generations, we say that the target structure has been fixed at generation gF. Then, the asymptotic regime is reached, where d and r fluctuate around constant values and which corresponds to a mutation–selection equilibrium. If the mutation rate m is too large, the population is unable to maintain the target structure within the population. In absence of an analytic theory for the system we are studying, we determine the fixation threshold as the value mF at which the curve gF(m) diverges. Since we now have defined the main quantities to describe the population, we show the results in Fig. 4.2. They were obtained from simulations for a system of N ¼ 1,000 RNA molecules of length n ¼ 30 nt evolving toward a hairpin structure. S . The quantity r describes the funda; r C , and r In (a) we show the curves for r mental property of a quasispecies at mutation–selection equilibrium. For small m, r takes maximal values. This means that a population contains the largest fraction of correctly folded molecules if it evolves at small mutation rates. As m increases, r decreases monotonously until it approaches zero. To determine the fixation threshold, we look at Fig. 4.2b where we show the curves of the search time and search plus fixation time. The solid curve represents the search time. We observe that for small m finding the target structure is difficult because only little diversity is introduced and the search process is slow. Therefore, fixation takes a long time. As m increases, the introduced diversity in the population becomes larger and both search and search plus fixation times decrease. However, fixation turns out to be a

74

M. Stich et al.

a

b

1

300 250

0.8

200

ρ ρC ρS

0.6 0.4

gA gF

150 100

0.2 0

50

0

0.01

0.02

0.03

0.04

μ

0.05

0.06

0.07

0.08

0

0

0.01

0.02

0.03

0.04

μ

0.05

0.06

0.07

0.08

Fig. 4.2 (a) Asymptotic properties of a population of size N ¼ 1,000 and molecules of length n ¼ 30 nt as function of the mutation rate m. Displayed are the average fraction of correctly folded S . Averaging has been performed over 4,000 generations , and the quantities r C and r structures r and 20 realizations, disregarding the first 2,000 generations. (b) Search time gA and search plus fixation time gF. We locate the fixation threshold where gF diverges. Averaging has been performed over 200 realizations. The population evolves toward a hairpin target structure given by ..((((((. . .(((. . .))). . ..)))))) in bracket notation

difficult task if m is too large, and the curves for search and search plus fixation start to deviate. The search plus fixation time gF (dotted curve) diverges around m 0.045, where we approximately locate the fixation threshold for this n and for small m target structure. This means that while the population shows largest r and highest degree of diversity close to the fixation threshold, the search and fixation times are optimized for intermediate mutation rates around m 0.025 well below the fixation threshold. S . The C and r Coming back to Fig. 4.2a, we now have a look at the curves for r . This means C lies for all considered mutation rates above the curve of r curve of r that based upon the information of the consensus sequence only, one may overestimate the evolutionary success. This effect is observed both below and above the fixation threshold. For example, for m ¼ 0.05, where only 0.5% sequences fold into the target structure, and only into an intermittent way, the probability that the consensus sequence folds into the target structure is still 18%. Consequently, the population remains close to sequences that actually fold into the target structure although it is unable to fix it. Obviously, this is related to the fact that at least part of the population are descendents from the same sequence and hence are closely related to each other. Note that the probability that a sequence of the population folds into the target structure is different from the probability that the consensus sequence does. Since consensus sequences are readily obtained from molecular or viral quasispecies, one should take into account this difference. S , we observe a qualitatively different behavior: Considering now the curve for r for m < 0.025, the probability that the consensus structure coincides with the target structure is practically one, while for m > 0.025, it approaches zero. For small m, this effect can be easily explained: the weight of all the correctly folded molecules S high. But in Stich et al. (2007), we showed that even is strong enough to keep r

4 Populations of RNA Molecules as Computational Model for Evolution

75

neglecting the correctly folded molecules and for large mutation rates, among the remaining sequences there is a sufficiently large fraction of those molecules which have a similar structure to the target structure. An analogous effect is known for random sequences: in a small neighborhood of a given sequence, the most probable structures are identical or very similar to the structure of the reference sequence (Fontana et al. 1993). Even where rS ¼ 0, the distribution of the structure states along the chain may still resemble the target structure and the positions where the concordance is broken correspond to positions that are actually less stable. S the similarity among C senses the similarity among the sequences and r While r for most of the mutation the structures, both quantities take superior values than r rates in spite of the fact that selection is actually acting upon structure (not sequence) and that the corresponding fitness landscape is rough. This means that the population retains relevant structural information in a distributed fashion even above the fixation threshold. This represents a strong structural robustness and suggests that certain functional RNA secondary structures may effectively withstand high mutation rates (Stich et al. 2007).

4.4

Phenotypic Effect of Mutations

In the last section, we have already discussed the optimal mutation rate to promote adaptation in an evolving system. Here, we calculate the distribution of the effects of mutations on fitness and the relative fractions of beneficial and deleterious mutations (Stich et al. 2010). It is important to recall that the effect of mutations on the phenotype depends on the genomic and populational context. We explore two different situations: the mutation–selection equilibrium (equilibrated population) and the first stages of the adaptation process (adapting population). Here, we consider a population of N ¼ 1,000 molecules of length n ¼ 50 nt evolving toward a hairpin target structure. The change in fitness of an RNA sequence under replication is quantified by the change of distance to the target structure, i.e., by Dij ¼ di – dj, where i denotes the mother and j the daughter sequence. Hence, for Dij > 0 (Dij < 0), the mutations lead to an increase (decrease) of fitness and hence are beneficial (deleterious). If Dij ¼ 0, either no mutation occurred or the mutations had no effect on fitness (were neutral). As we sum up over N values of Dij at each generation (and over generations and realizations as specified below), we obtain a probability distribution P(D) of the changes in fitness. In Fig. 4.3a, we show for three different mutation rates the distributions P(D), obtained for populations at mutation–selection equilibrium. The part of the distribution with the largest weight represents replication events with no or neutral mutations (D ¼ 0). For a very low mutation rate, negative fitness events strongly dominate over the positive ones and hence beneficial mutations are rare. As the mutation rate increases, the curves move up for positive and negative D since there are more mutation events. Although in particular beneficial mutations occur more often, negative fitness effects still dominate in absolute numbers.

76

a

M. Stich et al. 100

–2

b 100 10 –1 p q Π(0)

10 –2

Π(Δ)

10

μ = 5x10–4 μ = 1x10–2 μ = 4x10–2

10 –3

10 –4

10 –4 μF 10 –6

–30

–20

–10

0

10

20

30

Δ

c

100

10

μ = 5x10–4 μ = 1x10–2 μ = 4x10–2

–2

10 –5 10 –4

10

–3

10

–2

μ

10 –1

d 100 10 –1 p q Π(0)

10 –2

Π(Δ)

100

10 –3

10 –4

10 – 4 μF 10 –6

–30

–20

–10

0

10

20

30

Δ

10 –5 10 –4

10

–3

10

–2

μ

10 –1

100

Fig. 4.3 Phenotypic changes of mutations for optimized (a, b) and adapting (c, d) populations. (a) Probability distribution P(D) obtained from 300 generations in the asymptotic regime and for three different values of m. (b) Beneficial (q) and deleterious (p) phenotypic mutation rates as function of the microscopic mutation rate m for optimized populations. Replication events without fitness change are given by P(0). (c) As (a), but for adapting populations (probability distributions obtained from the first 50 generations and 6 different realizations). (d) As (b), but for adapting populations. The thin curves denote the curves from (b). The target structure is ((((. . .. . ..(((((. (((((. . .. . .))))).))))). . .. . ..)))) in bracket notation. After Stich et al. (2010)

From the distribution P we can calculate the fraction of deleterious changes p and beneficial changes q in the following way: Z q¼

1

PðDÞdD;

(4.2)

PðDÞdD:

(4.3)

0þ

Z p¼

0

1

These quantities represent the beneficial and deleterious phenotypic mutation rates which shall not be confounded with the microscopic mutation rate m. By definition, p þ q þ P(0) ¼ 1.

4 Populations of RNA Molecules as Computational Model for Evolution

77

How q and p depend on m is depicted in Fig. 4.3b. For low mutation rates, we see that p is more than two orders of magnitude larger than q. As m increases, both p and q increase, although p > q for all m, in particular for mutation rates below the fixation threshold, for this n and target structure approximately located at mF ¼ 0.02. As m increases, the fraction of replication events with no change in fitness, given by P(0), decreases. The ratio p/q decreases from more than two orders of magnitude to less than one close to mF. This reflects the fact that the higher the mutation rate at which a population has reached mutation–selection equilibrium the lower the fraction of correctly folded molecules, and hence beneficial mutations are more probable. However, these beneficial mutations do not increase the degree of adaptation of the population due to the difficulties to get fixed at high error rate. In Fig. 4.3c,d, we show the distribution P(D) and the functional behavior of (p, q) ¼ f(m) for adapting populations. In this case, fitness changes are measured before the target structure has been found. The distributions P(D) behave in a qualitatively similar way, although quantitative differences to Fig 4.3a can be seen, e.g., for m ¼ 0.0005: The range of negative D is smaller than for an equilibrated population, so very deleterious mutations are not present, and also the overall level of deleterious mutations is lower. At the same time, beneficial mutations are more common. This observation can be explained by the fact that since the population is still relatively far from target, mutations that drive a sequence even further are less likely. For the same reason, mutations that have a positive effect on fitness are more probable. Figure 4.3d summarizes the results: In an adapting population, p is smaller than at equilibrium, and q is larger, although these differences get much lower as the error rate increases. However, in all cases there are still more deleterious mutations than beneficial ones. Again, both phenotypic mutation rates increase as m increases, while replication events without phenotypic change decrease.

4.5

Summary

Here, we have presented recent results with RNA populations as computational model to explore and understand evolutionary processes, using the complex underlying sequence–structure–function relationship of RNA molecules. In the first section, we showed some observations on the structural repertoire of random RNA sequences (Stich et al. 2008). One important result is that simple structures like stem–loops and hairpins are dominant in pools of short sequences. This finding, together with other results and arguments, allowed us to devise a stepwise model of modular evolution for the origin of the RNA world (Briones et al. 2009). In the second section, we introduced an algorithm of RNA evolution in silico (Stich et al. 2007). After characterizing the asymptotic state of the population (at mutation–selection equilibrium), we showed that search and fixation times are optimized for intermediate mutation rates, far from the fixation threshold where the creation of diversity is maximal and far from the regime of low mutation rates

78

M. Stich et al.

where evolutionary success is optimized (in terms of correctly folded molecules). These results have important implications for the adaptability of virus and replicator populations that, due to the changes in the selective pressures that they continuously experience, need to have the capability to adapt rapidly, which can be obtained by the selection of high mutation rates. However, the difficulties for the fixation of beneficial mutations, together with the low fitness values attained when replication takes place at mutation rates close to the error threshold, suggest that viral quasispecies operate at mutation rates considerably smaller. Furthermore, close to and even beyond the fixation threshold, RNA populations show clear signatures of the target structure they try to approach (Stich et al. 2007). For example, even a population that contains practically no molecule that folds into the correct structure, as a whole may actually harbor the target structure as the structure of its consensus sequence. This demonstrates that the evolutionary success of the population is more robust than suggested by the spectrum of its mutants alone. Finally, we have established a connection between the microscopic mutation rate m and the phenotypic mutation rates p and q (Stich et al. 2010). These mutation rates are used in phenomenological models of population dynamics and also in fitting models of data obtained from experiments (Eyre-Walker and Keightley 2007). We find that adapting populations have a much larger fraction of beneficial mutations than equilibrated ones, especially for small mutation rates. Furthermore, we have shown that increases in m do not cause linearly proportional increases in p and q, as often assumed in simple models of population evolution. In summary, our results encourage the combined approach of experimental research and computational modeling for studying molecular evolution. Acknowledgments The authors acknowledge support from Spanish MICIIN through projects FIS2008-05273 and BIO2007-67523, from INTA, and from Comunidad Auto´noma de Madrid, project MODELICO (S2009/ESP-1691).

References Biebricher CK, Eigen M (2005) The error threshold. Virus Res 107:117–127 Briones C, Stich M, Manrubia SC (2009) The dawn of the RNA world: Toward functional complexity through ligation of random RNA oligomers. RNA 15:743–749 Cases-Gonza´lez C, Arribas M, Domingo E, La´zaro E (2008) Beneficial effects of population bottlenecks in an RNA virus evolving at increases error rate. J Mol Biol 384:1120–1129 Domingo E (ed) (2005) Virus entry into error catastrophe as a new antiviral strategy. Virus Res 107:115–228 Domingo E (ed) (2006) Quasispecies: concept and implications for virology. Springer, Berlin Eigen M (1971) Self-organization of matter and the evolution of biological macromolecules. Naturwissenschaften 58:465–523 Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new mutations. Nat Rev Genet 8:610–618 Fontana W, Konings DAM, Stadler PF, Schuster P (1993) Statistics of RNA secondary structures. Biopolymers 33:1389–1404

4 Populations of RNA Molecules as Computational Model for Evolution

79

Gevertz J, Gan HH, Schlick T (2005) In vitro RNA random pools are not structurally diverse: a computational analysis. RNA 11:853–863 Gr€uner W, Giegerich R, Strothmann D, Reidys C, Weber J, Hofacker IL, Stadler PF, Schuster P (1996) Analysis of RNA sequence structure maps by exhaustive enumeration. I. Neutral networks. Monatsh Chem 127:355–374 Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P (1994) Fast folding and comparison of RNA secondary structures. Monatsh Chem 125:167–188 Hofacker IL, Fekete M, Stadler PF (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol 319:1059–1066 Huang W, Ferris JP (2003) Synthesis of 35–40 mers of RNA oligomers from unblocked monomers. A simple approach to the RNA world. Chem Commun 12:1458–1459 Huang W, Ferris JP (2006) One-step, regioselective synthesis of up to 50-mers of RNA oligomers by montmorillonite catalysis. J Am Chem Soc 128:8914–8919 Huynen MA, Stadler PF, Fontana W (1996) Smoothness within ruggedness: the role of neutrality in adaptation. Proc Natl Acad Sci USA 93:397–401 Johnston WK, Unrau PJ, Lawrence MS, Glasner ME, Bartel DP (1999) RNA-catalyzed RNA polymerization: accurate and general RNA-templated primer extension. Science 292:1319–1325 Joyce GF (2004) Directed evolution of nucleic acid enzymes. Annu Rev Biochem 73:791–836 Kim N, Gan HH, Schlick T (2007) A computational proposal for designing structured RNA pools for in vitro selection of RNAs. RNA 13:478–492 Knight R, De Sterck H, Markel R, Smit S, Oshmyansky A, Yarus M (2005) Abundance of correctly folded RNA motifs in sequence space, calculated on computational grids. Nucleic Acids Res 33:5924–5935 Loeb LA, Essigmann JM, Kazazi F, Zhang J, Rose KD, Mullins JI (1999) Lethal mutagenesis of HIV with mutagenic nucleoside analogs. Proc Natl Acad Sci USA 96:1492–1497 Manrubia SC, Briones C (2007) Modular evolution and increase of functional complexity in replicating RNA molecules. RNA 13:97–107 Puerta-Ferna´ndez E, Romero-Lo´pez C, Barroso-delJesu´s A, Berzal-Herranz A (2003) Ribozymes: recent advances in the development of RNA tools. FEMS Microbiol Rev 27:75–97 Sabeti PC, Unrau PJ, Bartel DP (1997) Accessing rare activities from random RNA sequences: the importance of the length of molecules in the starting pool. Chem Biol 4:767–774 Schuster P, Stadler PF (1994) Landscapes: complex optimization problems and biopolymer structures. Comput Chem 18:295–324 Schuster P, Fontana W, Stadler PF, Hofacker IL (1994) From sequences to shapes and back: a case study in RNA secondary structures. Proc R Soc Lond B Biol Sci 255:279–284 Sierra S, Da´vila M, Lowenstein PR, Domingo E (2000) Response of foot-and-mouth disease virus to increased mutagenesis. J Virol 74:8316–8323 Simmonds P, Tuplin A, Evans DJ (2004) Detection of genome-scale ordered RNA structure (GORS) in genomes of positive-stranded RNA viruses: implication for virus evolution and host persistence. RNA 10:1337–1351 Stich M, Briones C, Manrubia SC (2007) Collective properties of evolving molecular quasispecies. BMC Evol Biol 7:110 Stich M, Briones C, Manrubia SC (2008) On the structural repertoire of pools of short, random RNA sequences. J Theor Biol 252:750–763 Stich M, La´zaro E, Manrubia SC (2010) Phenotypic effect of mutations in evolving populations of RNA molecules. BMC Evol Biol 10:46 Tacker M, Stadler PF, Bornberg-Bauer EG, Hofacker IL, Schuster P (1996) Algorithm independent properties of RNA secondary structure predictions. Eur Biophys J 25:115–130 Thurner C, Witwer C, Hofacker IL, Stadler PF (2004) Conserved RNA secondary structures in flaviviridae genomes. J Gen Virol 85:1113–1124 Waterman MS (1978) Secondary Structure of Single-stranded Nucleic Acids. In: Rota G-C (ed) Studies in Foundation and Combinatorics, vol 1 of: Advances in Mathematics Supplementary Studies. Academic Press, New York, pp 167–212

Chapter 5

Pseudaptations and the Emergence of Beneficial Traits Steven E. Massey

Abstract There is increasing evidence for the emergence of some beneficial traits in biological systems in the absence of direct selection. Many of these encompass mutational robustness, which increasingly appears to arise as a byproduct of natural selection, as a consequence of the biased incremental change of complex biological systems. Understanding the emergence of robustness in disparate biological systems is facilitated by the use of graph theory and the concept of connectivity. A particular case that is explored here is that of the standard genetic code (SGC). The SGC is arranged so that mutations tend to result in conservative as opposed to radical amino acid changes, a property termed “error minimization”. A commonly cited explanation for this property is the “Adaptive Code” hypothesis, which proposes that error minimization has been directly selected for. However, it is shown that direct selection of the error minimization property is mechanistically difficult. In addition, it is apparent that error minimization may arise simply as a result of code expansion, this is termed the “emergence” hypothesis. The emergence of error minimization in the genetic code is likened to other biological examples, where mutational robustness arises from the innate dynamics of complex systems; these include neutral networks and a variety of subcellular networks. The concept of “biased incrementalism” is introduced to account for the emergence of robustness in these diverse systems, while the term “pseudaptation” is used for such traits that are beneficial to fitness, but are not directly selected for.

S.E. Massey Biology Department, University of Puerto Rico – Rio Piedras, P.O. Box 23360, San Juan, Puerto Rico 00931, USA e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_5, # Springer-Verlag Berlin Heidelberg 2010

81

82

5.1

S.E. Massey

Adaptive Evolution and Natural Selection

The modern definition of an adaptation is tautological in relation to natural selection; from Mayr’s book “What Evolution Is” (Mayr 2001), adaptations are beneficial traits that arise by natural selection, of if they occur by chance are maintained by natural selection. From a panselectionist perspective, all beneficial phenotypes are to be regarded as adaptations, arising from natural selection. However, it may be argued that the definition of “adaptation” is not inviolate; indeed, it is worth remembering that until the modern synthesis natural selection was not widely accepted as the predominant force behind adaptive evolution; the so-called “eclipse of Darwinism” (Huxley 1942). The theme of this work is to clarify the definition of adaptation, in the context of natural selection, and to examine examples of beneficial traits that have arisen in the absence of direct selection, and how they should be defined.

5.2

Emergence as a By-Product of Natural Selection

Emergence is a term used in studies of complexity, to describe properties that arise from the summation of numerous individual interactions. Diverse examples include the emergence of nonrandom network topologies (ranging from biological networks such as metabolic networks, social interaction networks such as sexual contact and scientific collaboration networks, infrastructure networks such as the Internet and power networks and chemical networks; Gleiss et al. 2001; Albert and Barabasi 2002), weather features such as hurricanes, and Adam Smith’s “invisible hand” that self-regulates the market. In biological systems emergent properties may be directly selected for; examples of this include termite mounds, the shoaling behavior of fish or the ability of ant colonies to solve geometric problems, such as the shortest route to a food source. In contrast, this chapter is devoted to addressing cases where beneficial traits emerge in the absence of direct selection.

5.3

“Pseudaptation” as a Descriptor of Beneficial Traits That Arise in the Absence of Direct Selection

The term “spandrel” was coined to describe phenotypes that arise without the direct agency of natural selection (Gould and Lewontin 1979; Gould 1997). However, it is unclear in the definition whether these traits are beneficial to fitness. Therefore, it is proposed that the term “spandrel” should be used to refer to phenotypes that arise nonadaptively, as a side-product of natural selection, but are not clearly beneficial to fitness. This work is devoted to discussing beneficial traits that are not directly selected for, hence a term is required for such phenomena; it is suggested that the term “pseudaptation” is used for such traits (Massey 2010).

5

Pseudaptations and the Emergence of Beneficial Traits

83

The prefix “pseud” is used to indicate the potential tendency to misinterpret such traits as true adaptations resulting from natural selection. In contrast, therefore, “adaptations” are beneficial traits that result from the agency of natural selection. The vast majority of beneficial traits are expected to be true adaptations.

5.4

The Genetic Code as a Case Study

The standard genetic code (SGC) will be used as a case study for illustrating how a pseudaptation may emerge in a complex system. The arrangement of amino acids to codons in the SGC is such that proteins are remarkably robust to the deleterious effects of mutations and transcriptional/translational errors, in comparison to randomly generated genetic codes (Alff-Steinberger 1969; Di Giulio 1989; Haig and Hurst 1992; Ardell 1998; Freeland et al. 2000; Gilis et al. 2001; Goodarzi et al. 2004, etc.). This property of the SGC is termed “error minimization” (EM) and results in a tendency for conservative as opposed to radical amino acid substitutions (Fig. 5.1). EM can be expressed mathematically by the “EM value”. This is a

Fig. 5.1 The influence of the structure of the standard genetic code on the proportions of conservative or radical amino acid substitutions. There are 75 different amino acid substitutions that can result from a single point mutation, due to the structure of the SGC. The similarity of the amino acids separated by a single point mutation was defined according to the Grantham matrix. The proportion of substitutions that corresponded to different Grantham values was binned accordingly. The chart shows a strong skew toward conservative substitutions

84

S.E. Massey

parameter that calculates the average difference between two amino acids arising from a nonsynonymous mutation and is defined as follows: EM ¼

61 X Nt X

!, dNi =Nt

61

ðMassey 2008Þ;

i¼1 N¼1

where there are i sense codons, Nt is the total number of sense codons separated by a single point mutation from the ith codon under consideration, dNi is the physicochemical distance between the amino acids coded for by the ith sense codon and the Nth sense point mutation, according to the 20 20 Grantham physicochemical similarity matrix (Grantham 1974). The smaller the value between two amino acids in the Grantham matrix, the more similar they are, thus the smaller the EM value the larger the extent of EM in a genetic code. The EM value of the SGC is 60.7, while the EM value of a computationally randomly generated code is 74.5. Only 0.03% of computationally randomly generated genetic codes possess EM values equal or better than that of the SGC, which is indication of the remarkable optimization of the SGC (Massey 2008a). Thus, the code is near optimal for the property of EM. The EM value of the SGC can be understood as representing the average connectivity of all the codons. Figure 5.2 shows how a typical codon may be represented as the node of a graph, with edges representing point mutations to different codons. Each codon may be represented this way, thus the SGC may be envisaged as a graph, composed of 64 nodes. The EM value represents the average connectivity of the SGC, thus robustness arises from a maximization of the average connectivity of the code in terms of neutrality. Another way of putting this is that the amino acids are assigned to codon blocks so that the likelihood of an amino acid substitution being selectively neutral is high. The property of EM is beneficial in that it limits the deleterious effects of mutations and transcriptional/translational errors. Thus, the “Adaptive Code” hypothesis proposes that the EM property is a beneficial trait that has been selected via natural selection (Freeland et al. 2000). The Adaptive Code hypothesis implies that “code space”, the space of alternative genetic codes, was “searched” by natural selection until a near-optimal code was reached (the SGC). However, there are problems with this scenario, discussed next.

5.4.1

Challenges for the Adaptive Code Hypothesis to Explain the Origin of the EM Property of the SGC

There are several challenges for explaining how the EM property was directly selected for by natural selection. First, in order to “find” an optimal or near-optimal genetic code for the property of EM, the code space of alternative codes needs to be searched. This necessitates the occurrence of “codon reassignments”, which are where the amino acid identity of a codon(s) is reassigned from one of the 20 amino acids to another. While these have occurred in nature, mainly in mitochondrial

5

Pseudaptations and the Emergence of Beneficial Traits

85

Table 5.1 The emergence of EM in simulations of genetic code evolution i)

ii)

iii) 5

V 1

A 2

D 3

6

8

G 4 V 1

iv)

7

9 5 13

10 6 14

11

A 2

12

D 3

v)

G 4

9

5

7

8

15

10

13

8

18

10

11

9

10

11

12

5

6

7

8

V 1

A 2

D 3

G 4

12 16

6 14

7 17

8

15

10

19

8

D3

V1

A2

D3

G4

V1

A2

20

G4

Selective criteria Average EM value Average percentage Percentage of alternate (amino acid of alternate codes optimization of alternate codes that have equal or difference according codes compared with the superior error to the Grantham standard genetic code minimization than the matrix) standard genetic code <150 70.2 31.2 0.3 <140 69.6 35.5 0.3 <130 68.8 41.3 0.7 <120 67.8 48.6 1.2 <110 67.4 51.4 1.1 <100 66.9 55.1 2 <90 65.3 66.7 6.8 <80 64.2 74.6 13.6 <70 64.5 72.5 12.2 <60 62.7 85.5 21.9 <50 64 76.1 17.7 <40 65.2 67.4 10.1 Genetic code evolution was simulated according to the 213 model (Massey 2006), utilizing a model of code expansion facilitated by gene duplication of charging enzymes and adaptor molecules (Massey 2008a). The 213 model proposes that the middle (2nd) position of the codon became informational (coding) first, followed by the 1st position of the codon and finally the 3rd position of the codon; hence “213”. The simulation was conducted as follows. Starting codons were assigned to valine, alanine, aspartate, and glycine. These were chosen for their likely ancient ancestry (Massey 2008a). Amino acids were randomly added to the expanding code according to their similarity with a parent amino acid and its requisite codon. The similarity criteria were based on the Grantham matrix and were set for each simulation. The figure shows the scheme of code expansion according to the 213 model. There are four initial amino acids and codons; new amino acids are added to the expanding code via duplication events, randomly but according to the similarity criterion, until the structure of the SGC is achieved. The similarity criteria under which code evolution was simulated are shown in the left hand column of the table. 10000 codes were simulated for each selective criterion. The average EM values of the alternative codes are displayed, along with the optimality of the alternative codes compared with the SGC. Table and figure reproduced from Massey (2008a)

86

S.E. Massey

Fig. 5.2 Connectivity of a codon in the genetic code. The connectivity of the tyrosine TAT codon is shown, the arrows represent point mutations that lead to a different sense codon. The average value of a point mutation of the TAT codon is 93.25 (resulting from (22 þ 144 þ 0 þ 0 þ 0 þ 83 þ 143 þ 160 þ 194)/8, where mutations to stop codons and synonymous mutations are equivalent to zero. These values are obtained from the Grantham matrix). All codons in the SGC are likewise connected, thus the SGC may be viewed as a regular network or graph of 64 nodes

genomes, they are extremely rare in free-living genomes. There are two main mechanisms for codon reassignment; the Codon Capture mechanism (Osawa and Jukes 1989) and the Ambiguous Intermediate mechanism (Schultz and Yarus 1994). The Codon Capture mechanism proposes that an AT or GC rich codon(s) was initially lost under extreme GC/AT pressure, and on its reappearance in the genome, after reversal of the GC/AT pressure, was reassigned to code for a different amino acid. The Ambiguous Intermediate mechanism proposes that a codon(s) undergoes a period of coding ambiguity, while it is reassigned to code from one amino acid to another. The searching of code space was simulated computationally to test the plausibility of selecting for an error minimized genetic code (Massey 2010). An initial alternative genetic code was generated by randomly assigning the 20 amino acids to the 20 codon groupings of the SGC. The EM value of this code was determined. Then a random codon reassignment was conducted, according either to the Codon Capture or Ambiguous Intermediate mechanism. If the EM value of the new code was better than the old, then the new code was accepted and the process repeated. If not, then the new code was rejected, and a new random codon reassignment was conducted on the previous code. The process was continued until the code attained an EM value better or equal to that of the SGC. In this way, the average number of codon reassignments required to achieve EM on a par with the SGC was determined. It was found that it is very difficult to select an optimal code via the Codon Capture mechanism, with only 1.2–3.2% searches resulting in success (Table 5.2). This effectively rules out searching of code space via the Codon Capture

5

Pseudaptations and the Emergence of Beneficial Traits

87

Table 5.2 The number of codon reasignments required to produce a robust genetic code Model for searching (a) Average number of (b) Average number of (c) Average number of codon reassignments codon reassignments of code space codon reassignments required to produce an required to produce an required to produce an optimal genetic code optimal genetic code (a optimal genetic code single codon reassigned (two codons reassigned (one or two codons reassigned) at a time) at a time) 1. Selection for 31 (SD ¼ 10) 23 (SD ¼ 7) 23 (SD ¼ 7) superior EM 0 failures 292 failures 0 failures 2. Selection for 31 (SD ¼ 10) 20 (SD ¼ 7) 23 (SD ¼ 7) superior EM, 3 failures 668 failures 0 failures with codon adjacency constraint 3. Selection for 16 (SD ¼ 9) Two purine (A and G) Two purine (A and G) ending codons or superior EM, 968 failures ending codons or two pyrimidine with GC/AT two pyrimidine (T and C) ending content (T and C) ending codons cannot be constraint codons cannot be reassigned under reassigned under the GC/AT the GC/AT constraint constraint 16 (SD ¼ 6) Two purine (A and G) Two purine (A and G) 4. Selection for ending codons or 988 failures ending codons or superior EM, two pyrimidine two pyrimidine with adjacency (T and C) ending (T and C) ending constraint and codons cannot be codons cannot be GC/AT content reassigned under reassigned under contraint the GC/AT the GC/AT constraint constraint 1000 random codes were generated, and then the average number of codon reassignments required to produce a code that was equal or more optimized than the SGC for EM was determined for each. Averages were calculated from 1,000 initial codes, except for model (3) where the average value was calculated from the simulations out of 1,000 that produced optimal codes. Simulations that failed to achieve an optimized genetic code are described as resulting in “failures”. Two constraints were applied to the simulations. The GC/AT constraint is that the codon reassigned should be either composed of GC only or AT only. A GC rich codon is likely to be lost in an extremely AT-biased genome, while AT rich codon is likely to be lost in an extremely GC-biased genome. This is part of the Codon Capture mechanism. The adjacency constraint is that the amino acid should be reassigned to a codon block adjacent to the original codon block, this follows a pattern observed in extant codon reassignments. Reproduced from Massey (2010)

mechanism. In addition, when searching of code space is simulated using the Ambiguous Intermediate mechanism, it is shown that 20–31 codon reassignments on average are required to achieve an alternative code with equivalent or better EM than the SGC (Table 5.2). This implies that there was a “burst” of codon reassignments up to the last universal ancestor, which possessed the SGC, and stasis since then. As it stands, the Adaptive Code does not provide an explanation for why this should be. In addition, if the code were optimized via a search through code space then it seems that the search ceased before full optimality had been achieved

88

S.E. Massey

Fig. 5.3 Increase in error minimization of random alternative genetic codes, with increasing numbers of codon reassignments. The increase in error minimization of a randomly generated genetic code with increasing numbers of codon reassignments was followed according to the selective models described in Table 5.2. Each codon reassignment resulted in an increase in the EM of the code. (a) One codon reassigned; (b) two codons reassigned; (c) one or two codons reassigned. Data taken from Table 1. Reproduced from Massey (2010)

(Fig. 5.3). Again, the Adaptive Code hypothesis does not provide an explanation for why this should be. Additional difficulties with the Adaptive Code hypothesis are as follows. First, when searching of code space is simulated via the Ambiguous Intermediate mechanism, the structures of the codes generated differ from that of the SGC; the code resulting from codon reassignments of single codons is more fragmented than the SGC, the code resulting from the codon reassignment of double codons has three amino acids (M, Q, T) that have large numbers of codons distributed throughout the code, and the code resulting from the codon reassignment of single and double codons displays both these features (Fig. 5.4). Second, no present day codon reassignment displays improved EM (Freeland et al. 2000). It is hard to envisage how a

Fig. 5.4 Typical alternative genetic codes that have undergone optimization. Alternative genetic codes were produced by computational simulation, following the Ambiguous Intermediate mechanism. Typical code structures are displayed as follows: (a) resulting from reassignments of single codons only, having undergone 31 codon reassignments; (b) resulting from reassignments of two codons only, having undergone 23 codon reassignments; (c) resulting from reassignments of a combination of 1 or 2 codons, having undergone 23 codon reassignments. Data taken from Table 1. Reproduced from Massey (2010)

5 Pseudaptations and the Emergence of Beneficial Traits 89

90

S.E. Massey

codon reassignment that improves EM can be selected for, given that every codon in every protein would be affected by the codon reassignment and most fitness-affecting changes would be expected to be deleterious, whereas improved EM would only affect a fraction of the total number of codons, for improved mutational robustness or robustness to phenotypic mutations. This can be expressed as follows: Maximum number of codons j in the genome affected by improved EM (for genotypic mutations) ¼ 3Nj mg where Nj is the total number of codon j in the genome and mg is the genomic mutation rate per bp. The factor of 3 is used to convert the triplet codon. As mg is very small, then the maximum number of codons j affected by improved EM is also very small. In contrast, the maximum number of codons that may be adversely affected by the reassignment, if occurring via the Ambiguous Intermediate mechanism, is Nj. Indeed, there is evidence that mitochondrial codon reassignments are deleterious (Massey and Garey 2007; Massey 2008b). This means that the potential benefit is far outweighed by the likely deleterious effect of a codon reassignment. This reasoning is also applicable to any improvements that a codon reassignment may confer against transcriptional/translational errors. Third, multiple extinctions are implied by the selective mechanism, with all species with previous suboptimal codes being subject to mass extinction, given that there are no extant organisms with codes that represent ancestors of the SGC. This problem is also applicable to the alternative “Emergence” hypothesis, though to a lesser degree.

5.4.2

The Emergence Hypothesis as an Explanation for the Origin of EM in the SGC

If selecting for an error-minimized genetic code is problematic, what is an alternative explanation for the EM property? A mechanism for the origin of the EM property has been proposed based on code expansion (growth) via gene duplication of charging enzymes (aminoacyl-tRNA synthetases) and adaptor molecules (tRNAs; Massey 2008a). An allusion to this mechanism was made by Crick (1968), who proposed that the process of genetic code expansion occurred via gene duplication of charging enzymes and adaptor molecules, resulting in “similar codons coding for similar amino acids”. In the 2008 study, it was demonstrated that a substantial amount of EM might arise neutrally (i.e., emerge), simply as a result of the addition of similar amino acids to similar codons, by mimicking the process of charging enzyme duplication. Simulations were conducted on three different mechanisms of code evolution. A substantial amount of EM was shown to arise in all three of these different models of genetic code evolution. This result implies that no matter what the actual

5

Pseudaptations and the Emergence of Beneficial Traits

91

details of code expansion (which remain to be determined), given the requirement for charging enzymes and adaptor molecules, and the observation that gene duplicates are likely to possess physicochemically related substrates, then EM is likely an emergent property of code evolution. Thus, the conclusion is that at least a proportion of the EM property has arisen neutrally, and was not directly selected for, hence constituting a pseudaptation. The “emergence” of the EM property in the simulations results from the incremental growth of the code, from the bias with which similar amino acids are added to similar codons, and from a parent codon. Thus, a process of “biased incrementalism” is responsible for the emergence of mutational robustness. This process of biased incrementalism also appears to be responsible for the emergence of mutational robustness in scale-free networks and neutral networks as discussed below. Hence this may be a universal process in complex systems, leading to the emergence of robustness. It is also potentially significant that all three systems, such as SGC, scale-free networks, and neutral networks, may be represented as graphs or networks.

5.5 5.5.1

The Emergence of Robustness in Other Biological Systems Scale-Free Networks

There are many different types of networks in nature, from protein–protein interaction networks, neuronal networks, up to ecosystem food webs and social interaction networks. A common property in these networks is that they are usually scale-free. The term scale-free refers to the distribution of connections of the nodes in the network, whereby the distribution follows a power law; P(k) k–y, where P(k) is the fraction of nodes in the network having k connections to other nodes, and y is the exponent; y is usually between 2 and 3 in empirically observed networks. This type of distribution means that a few nodes have many connections, while many of the nodes only have a few connections. As most nodes only have a few connections, scale-free networks are robust to the removal of nodes (Albert et al. 2000). In subcellular biological networks such as metabolic and gene networks, this results in mutational robustness, discussed below. The scale-free property is also widespread in nonbiological systems such as the Internet, electricity distribution networks, and transportation networks. This implies that when observed in biological networks the property is not a product of natural selection, but is an emergent property and hence represents a pseudaptation. A widely accepted mechanism for the origin of the scale-free property is that of preferential attachment during the growth of a network (Barabasi and Albert 1999). According to the model, a well-connected node is more likely to gain more connections, as nodes and connections are added to a growing network. This “rich gets richer” model results in the scale-free property of the network, and it is a passive process in that the structure of the network is not designed or selected for.

92

S.E. Massey

The concept falls into the larger concept of “biased incrementalism”, introduced above to account for the emergent property of robustness in the SGC, as new connections are added incrementally in a biased fashion (preferentially to highly connected nodes). Three biological networks that possess the scale-free property are discussed next. Metabolic networks describe the metabolism of an organism. The nodes of the network represent metabolites, while connections between the nodes represent chemical reactions that convert one metabolite into another. These reactions are catalyzed by enzymes. The overall flux of metabolic networks is robust to gene deletion (Edwards and Palsson 1999, 2000). Metabolic networks are typically scale free (Jeong et al. 2000; Ravasz et al. 2002), which means that most metabolites are connected to only a few other metabolites, while a few serve as “hubs” and are involved in many reactions. The robustness of metabolic networks to gene deletion may be accounted for by the scale-free property (Jeong et al. 2000). The origin of the scale-free property is unclear. There is some evidence that preferential attachment has given rise to the scale-free property (Light et al. 2005). This suggests that new enzymes are added to metabolism by gene duplication retain some of the metabolites in the original reaction. This would mean that the most connected metabolites are the most ancient, for which there is some evidence (Light et al. 2005; Wagner 2006). Thus, the scale-free property of metabolic networks appears to be an emergent property, and one that is beneficial to the organism in terms of increased robustness to genetic perturbation. Protein interaction networks attempt to characterize all of the proteinprotein interactions present within a cell. High-throughput technologies based on the yeast two-hybrid technique or mass spectroscopy allow such networks to be constructed on a large scale. Protein interaction networks are also typically scale-free (e.g., S. cerevisiae, Wagner 2001; human, Stelzl et al. 2005). Thus, these networks appear to be tolerant of gene deletions, i.e., they are mutationally robust (Li et al. 2006), although there is some debate as to whether the disruption of highly connected nodes is truly more deleterious than of less connected nodes (Hahn et al. 2004). The scale-free property appears to have arisen by preferential attachment of new edges to highly connected nodes, without the direct action of natural selection (Wagner 2003; Berg et al. 2004). Transcription factor networks represent the known regulatory interactions at the transcriptional level within a cell and may be derived from various sources such as molecular genetics and high-throughput technologies. A transcription factor network consists of two types of nodes representing transcription factor genes and genes that are the target of regulation; the transcription factor nodes are directionally connected to regulated gene nodes. These networks are scale-free in humans (Rodriguez-Caso et al. 2005), E.coli, and yeast (Maslov and Sneppen 2006), and as a result they are robust to disruption (Balaji et al. 2006; Krishnan et al. 2007; Guzman-Vargas and Santillan 2008). The precise mechanics leading to the scalefree property remain to be determined, but appear to be related to the observation that transcription factors with transcripts that have a short half life are more highly connected (Wang and Purisima 2005).

5

Pseudaptations and the Emergence of Beneficial Traits

5.5.2

93

Neutral Networks

Neutral networks are hyperdimensional regions of sequence space theoretically applicable to both RNA and protein sequences and are represented using graph theory. Nodes in the networks represent individual sequences, while connections between the nodes represent neutral mutations leading from one sequence to another. Migration of sequences to highly connected regions of the neutral network is likely, simply by chance (van Nimwegen, Wilke 2001; Fig. 5.5). By definition, sequences residing in such regions would have a greater number of connections, and thus a greater proportion of potential neutral mutations, increasing their mutational robustness. Hence, the stochastic migration of sequences to highly connected regions of a neutral network is expected to result in the passive emergence of mutational robustness in the absence of direct selection, this constitutes a pseudaptation. Natural selection still has an important role, in the formation of the neutral network in the first place. The migration of sequences through sequence space to highly connected regions is consistent with the concept of biased incrementalism, in that a sequence will change incrementally a point mutation at a time (usually), and the type of mutation that gets fixed is biased toward neutral mutations.

5.5.2.1

Neutral Networks in RNA

Efficient algorithms exist to predict RNA secondary structures (Tacker et al. 1996). Simulation studies using simple RNA structures have demonstrated the existence of neutral networks in RNA molecules (Schuster et al. 1994). The acquisition of robustness by stochastic migration to highly connected regions of the neutral network has also been demonstrated using computer simulations of simple RNA structures, consistent with one of the predictions of neutral network theory (van Nimwegen et al. 1999; Szollosi and Derenyi 2008). There is some evidence that viral RNAs (Huynen et al. 1993; Wagner and Stadler 1999; Sanjuan et al. 2006) and

Fig. 5.5 Migration to a highly connected region of a neutral network. On the left is a simplified representation of a neutral network structure. Nodes represent sequences, while edges represent neutral point mutations between the sequences. The structure on the right represents the most likely distribution of sequences that will be achieved by simple stochastic change, with larger nodes representing more likely sequences

94

S.E. Massey

micro RNAs (Borenstein and Ruppin 2006; Shu et al. 2008) are mutationally robust. The extent to which the robustness of wild type RNA molecules has arisen by migration through neutral networks remains to be determined.

5.5.2.2

Neutral Networks in Proteins

The concept of a neutral network was initially proposed to account for the observation that there was likely extensive sequence redundancy amongst proteins (Maynard Smith 1970. The term “neutral network” was coined by Schuster et al. 1994). There is indirect evidence for the presence of neutral networks in proteins. This includes the observation that proteins may retain the same structure and function, but vary extensively in sequence. The existence of neutral networks has been shown by simulation studies in 2D lattice models (Govindarajan and Goldstein 1997; Bornberg-Bauer and Chan 1999; Xia and Levitt 2002). Babjide et al. (1997) also present evidence for the existence of neutral networks in wild type proteins using knowledge-based interaction potentials to calculate the stability of a sequence in a given 3D structure. Simulation studies on 2D lattice proteins suggest that the acquisition of mutational robustness may occur in these simplified models (Taverna and Goldstein 2002). There is a substantial literature demonstrating the robustness of proteins to point mutations (reviewed in Wagner 2005, Chap. 5). Whether extant proteins have acquired robustness from migration through neutral networks is unclear, but should be a fruitful area of future research.

5.6

The Difficulty of Selecting for Mutational Robustness

Mutational robustness may take two forms; intrinsic and extrinsic (Elena et al. 2006). Intrinisic robustness is an innate property of a sequence or network, while extrinsic robustness is robustness that arises by the application of an external factor, such as a heat shock protein, DNA repair or any homeostatic mechanism. In this chapter, we have been concerned only with intrinsic robustness. The ability of natural selection to select for intrinsic mutational robustness is unclear. Theoretically, intrinsic mutational robustness could be directly selected for via group selection, whereby mutationally robust populations would be more successful in competing with less robust populations. However, evidence for such a group selection effect is elusive (Okasha 2001). The occurrence of high mutation rates has been proposed to lead to selection for intrinsic mutational robustness (Schuster and Swetina 1988); this has been demonstrated in digital organisms (Wilke et al. 2001), but examples from nature are scarce. The mechanistic difficulties of directly selecting for intrinsic mutational robustness are consistent with the observations of the emergence of mutational robustness in various biological systems in the absence of direct selection.

5

Pseudaptations and the Emergence of Beneficial Traits

95

A mechanism that may indirectly produce intrinsic mutational robustness is to select for robustness to phenotypic mutations (defined as errors in transcription and/ or translation). Sequences that have been selected to be robust to phenotypic mutations will by default be robust to genotypic mutations. This process has been demonstrated experimentally (Goldsmith and Tawfik 2009). In this case, although beneficial, the property of mutational robustness is not directly selected for, and so constitutes a pseudaptation.

5.7

Is Mutational Robustness the Only Pseudaptation?

Here, several instances of mutational robustness have been described as pseudaptations, that is, beneficial traits that have not been directly selected for. However, while these may be beneficial in the short term for ameliorating the effects of deleterious mutations, mutational robustness may not be beneficial in the long term because it reduces phenotypic variability (Lenski et al. 2006). However, increased robustness is not only detrimental in the long term; robustness also results in an increase in the amount of neutrality, which can act to improve adaptability by increasing the accessibility of sequence space. For instance, an interesting example of how the robustness of the SGC may improve adaptability is explored by Zhu and Freeland (2006). This apparent contradiction may be resolved by distinguishing between genotypic and phenotypic robustness; genotypic (sequence) robustness tends to decrease adaptability by reducing variation, while phenotypic robustness tends to increase adaptability by allowing sequences to vary, thus accessing novel areas of sequence space (Wagner 2008), for which there is empirical evidence (Bloom et al. 2006; Amitai et al. 2007). As described here, emergent robustness appears to arise by a process of biased incrementalism in a series of biological systems; the genetic code, neutral networks, and cellular networks. It may be worth noting that natural selection itself is a form of biased incrementalism; biased in that over time generations see an incremental increase in fitness due to the biased fixation of adaptive mutations. Fitness is analogous to survivability, which could be viewed as a form of robustness. Evolvability, or adaptability, is another beneficial trait that is sometimes described as an adaptation. Evolvability refers to the ability to adapt to new environmental challenges, thus the long-term survival of a lineage with enhanced evolvability will be more likely. Whether evolvability can be selected for is unclear, the anticipatory nature of selecting for evolvability being problematic. An example is the evolution of sex as a mechanism to increase adaptability (Otto and Lenormand 2002, a review). Acknowledging that enhanced evolvability will lead to improved success of a population over the long term, the process of selecting for evolvability remains to be demonstrated; thus, some cases may turn out to be pseudaptations that are not selected for at the individual level. For example, lateral gene transfer may lead to increased adaptability, notably in bacteria, but whether it is a trait that is selected for that purpose is unclear. Likewise, human social

96

S.E. Massey

behaviors such as worship, music, and morality are presently at the center of a debate as to whether they may be regarded as adaptations, the product of natural selection, or are nonadaptive in origin. The field of evolutionary psychology promises to reveal which of these behaviors were directly selected for. Those behaviors that are beneficial to an individual, but turn out not to have been selected for may be better described as pseudaptations. While adaptations often typify an individual species, the instances of mutational robustness characterized here as pseudaptations do not appear to. For instance, the mutational robustness of the genetic code is universal to all organisms. In addition, emergent properties, as they are often invisible to selection, are not necessarily restricted to beneficial traits, but are also likely to encompass deleterious traits also. Examples of such deleterious emergent traits await characterization. Acknowledgments This work was supported by the Biology Department, University of Puerto Rico – Rio Piedras, Puerto Rico.

References Albert R, Barabasi AL (2002) Statistical mechanics of complex networks. Rev Modern Phy 74:47–94 Albert R, Jeong H, Baraba´si AL (2000) Error and attack tolerance of complex networks. Nature 406:378–382 Alff-Steinberger C (1969) The genetic code and error transmission. Proc Natl Acad Sci USA 64:584–591 Amitai G, Devi Gupta R, Tawfik DS (2007) Latent evolutionary potentials under the neutral mutational drift of an enzyme. HFSP J 1:67–78 Ardell DH (1998) On error minimization in a sequential origin of the standard genetic code. J Mol Evol 47:1–13 Babjide A, Hofacker IL, Sippl MJ, Stadler PF (1997) Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force. Fold Des 2:261–269 Balaji S, Iyer LM, Aravind L, Babu MM (2006) Uncovering a hidden distributed architecture behind scale-free transcriptional regulatory networks. J Mol Biol 360:204–212 Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512 Berg J, Lassig M, Wagner A (2004) Structure and evolution of protein in interaction networks: a statistical model for link dynamics. BMC Evol Biol 4:51 Bloom JD, Labthavikul ST, Otey CR, Arnold FH (2006) Protein stability promotes evolvability. Proc Natl Acad Sci USA 103:5869–5874 Bornberg-Bauer E, Chan HS (1999) Modeling evolutionary landscapes: mutational stability, topology and superfunnels in sequence space. Proc Natl Acad Sci USA 96:10689–10694 Borenstein E, Ruppin E (2006) Direct evolution of genetic robustness in microRNA. Proc Natl Acad Sci USA 103:6593–6598 Crick FHC (1968) The origin of the genetic code. J Mol Biol 38:367–379 Di Giulio M (1989) The extension reached by the minimization of polarity distances during the evolution of the genetic code. J Mol Evol 29:288–293 Edwards JS, Palsson BO (1999) Systems properties of the Haemophilus influenzae Rd metabolic genotype. J Biol Chem 274:17410–17416 Edwards JS, Palsson BO (2000) Robustness analysis of the Escherichia coli metabolic network. Biotech Prog 16:927–939

5

Pseudaptations and the Emergence of Beneficial Traits

97

Elena SF, Carrasco P, Daros J-A, Sanjuan R (2006) Mechanisms of genetic robustness in RNA viruses. EMBO Rep 7:168–173 Freeland SJ, Knight RD, Landweber LF, Hurst LD (2000) Early fixation of an optimal genetic code. Mol Biol Evol 17:511–518 Gilis D, Massar S, Cerf NJ, Rooman M (2001) Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol 2:11 Gleiss PM, Stadler PF, Wagner A (2001) Relevant cycles in chemical reaction networks. Adv Complex Sys 1:1–18 Goldsmith M, Tawfik DS (2009) Potential role of phenotypic mutations in the evolution of protein expression and stability. Proc Natl Acad Sci USA 106:6197–6202 Goodarzi H, Nejad HA, Torabi N (2004) On the optimality of the genetic code, with consideration of termination codons. BioSystems 77:163–173 Gould SG (1997) The exaptive excellence of spandrels as a term and prototype. Proc Natl Acad Sci USA 94:10750–10755 Gould SG, Lewontin RC (1979) The spandrels of San Marco and the panglossian paradigm: a critique of the adaptionist program. Proc R Soc Lond B 205:581–598 Govindarajan S, Goldstein RA (1997) Evolution of model proteins on a foldability landscape. Proteins 29:461–464 Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185:862–864 Guzman-Vargas L, Santillan M (2008) Comparative analysis of the transcription-factor gene regulatory networks of E.coli and S.cerevisiae. BMC Syst Biol 2:13 Hahn MW, Conant GC, Wagner A (2004) Molecular evolution in large genetic networks: does connectivity equal constraint. J Mol Evol 58:203–211 Haig D, Hurst LD (1992) A quantitative measure of error minimization in the genetic code. J Mol Evol 33:412–417 Huxley J (1942) Evolution: the modern synthesis. MIT Press, Cambridge Massachusetts Huynen MA, Konings DAM, Hogeweg P (1993) Multiple coding and the evolutionary properties of RNA secondary structure. J Theor Biol 185:251–267 Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL (2000) The large-scale organization of metabolic networks. Nature 407:651–654 Krishnan A, Tomita M, Giuliani A (2007) Evolution of gene regulatory networks: robustness as an emergent property of evolution. Phys Stat Mech Appl 387:2170–2186 Lenski RE, Barrick JE, Ofria C (2006) Balancing robustness and evolvability. PLOS Biol 4:e428 Li D, Li J, Ouyang S, Wang J, Wu S, Wan P, Zhu Y, Xu X, He F (2006) Protein interaction networks of Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster: Large-scale organization and robustness. Proteomics 6:456–461 Light S, Kraulis P, Elofsson A (2005) Preferential attachment in the evolution of metabolic networks. BMC Genom 6:159 Maslov S, Sneppen K (2006) Large-scale topological properties of molecular networks. In: Koonin E, Wolf Y, Karev G (eds) Power laws, scale-free networks and genome biology. Springer, New York Massey SE (2006) A sequential “2-1-3” model of genetic code evolution that explains codon constraints. J Mol Evol 62:809–810 Massey SE, Garey JR (2007) A comparative genomics analysis of codon reassignments reveals a link with mitochondrial proteome size and a mechanism of genetic code change via suppressor tRNAs. J Mol Evol 64:399–410 Massey SE (2008a) A neutral origin of error minimization in the genetic code. J Mol Evol 67:510–516 Massey SE (2008b) The proteomic constraint and its role in molecular evolution. Mol Biol Evol 25:2557–2565 Massey SE (2010) Searching of code space for an error minimized genetic code via Codon Capture leads to failure, or requires at least 20 improving codon reassignments via the Ambiguous Intermediate mechanism. J Mol Evol 70:106–115

98

S.E. Massey

Maynard Smith J (1970) Natural selection and the concept of a protein space. Nature 225:563–564 Mayr E (2001) What evolution is. Basic Books, New York Okasha S (2001) Why won’t the group selection controversy go away? Brit J Phil Sci 52:25–50 Osawa S, Jukes TH (1989) Codon reassignment (codon capture) in evolution. J Mol Evol 28:271–278 Otto SP, Lenormand T (2002) Resolving the paradox of sex and recombination. Nat Rev Gen 3:252–261 Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297:1551–1555 Rodriguez-Caso C, Medina MA, Sole RV (2005) Topology, tinkering and evolution of the human transcription factor network. FEBS J 272:6423–6434 Sanjuan R, Forment J, Elena SF (2006) In silico predicted robustness of viroids RNA secondary structures. I. The effect of single mutations. Mol Biol Evol 23:1427–1436 Schultz DW, Yarus M (1994) Transfer RNA mutation and the malleability of the genetic code. J Mol Biol 235:1377–1380 Schuster P, Swetina J (1988) Stationary mutant distributions and evolutionary optimization. Bull Math Biol 50:635–660 Schuster P, Fontana W, Stadler PF, Hofacker IL (1994) From sequences to shapes and back: a case study in RNA secondary structures. Proc Roy Soc Lond B 255:279–284 Shu W, Ni M, Bo X, Zheng Z, Wang S (2008) In silico genetic robustness analysis of secondary structural elements in the miRNA gene. J Mol Evol 67:560–569 Stelzl U et al (2005) A human protein–protein interaction network: a resource for annotating the proteome. Cell 122:957–968 Szollosi GJ, Derenyi I (2008) The effect of recombination on the neutral evolution of genetic robustness. Math Biosci 214:58–62 Tacker M, Stadler P, Bornberg-Bauer E, Hofacker I, Schuster P (1996) Algorithm independent properties of RNA secondary structure predictions. Eur Biophys J Biophys Lett 25:115–130 Taverna DM, Goldstein RA (2002) Why are proteins so robust to site mutations? J Mol Biol 315:479–484 Van Nimwegen E, Crutchfield JP, Huynen M (1999) Neutral evolution of mutational robustness. Proc Natl Acad Sci USA 96:9716–9720 Wagner A (2001) The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol 18:1283–1292 Wagner A (2003) How the global structure of protein interaction networks evolves. Proc R Soc Lond B 270:457–466 Wagner A (2005) Robustness and evolvability in living systems. Princeton University Press, Princeton Wagner A (2006) The connectivity of large genetic networks. Design, history or mere chemistry? In: Koonin E, Wolf Y, Karev G (eds) Power laws, scale-free networks and genome biology. Springer, New York Wagner A (2008) Robustness and evolvability: a paradox resolved. Proc R Soc B 275:91–100 Wagner A, Stadler PF (1999) Viral RNA and evolved mutational robustness. J Exp Zool 285:119–127 Wang E, Purisima E (2005) Network motifs are enriched with transcription factors whose transcripts have short half-lifes. Trends Gen 21:492–495 Wilke CO (2001) Adaptive evolution on neutral network. Bull Math Sci 63:715–730 Wilke CO, Wang JL, Ofria C, Lenski RE, Adami C (2001) Evolution of digital organisms at high mutation rates leads to survival of the flattest. Nature 412:331–333 Xia Y, Levitt M (2002) Roles of mutation and recombination in the evolution of protein thermodynamics. Proc Natl Acad Sci USA 99:10382–10387 Zhu W, Freeland S (2006) The standard genetic code enhances adaptive evolution of proteins. J Theor Biol 239:63–70

Part II Genome / Molecular Evolution

Chapter 6

Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees John W. Whitaker and David R. Westhead

Abstract Horizontal gene transfer (HGT) is the movement of genetic material between species that would otherwise have isolated heritages. The immediate gain of a gene, or sets of genes, allows traits to be acquired far more rapidly than through Darwinian evolution. The entire set of genes within a species that were acquired through HGT is known as its transferome. Studies of prokaryotes transferomes have revealed that the propensity of a gene to be transferred is related to biological network structure. Recent increases in the number of sequenced eukaryotic genomes have made it possible to carry out analysis of their transferomes, and this has revealed novel insight into eukaryotic evolution. In this chapter, we present a review of some studies that have increased our understanding of transferomes.

6.1

Introduction

Inheritance is the movement of genetic information from one generation to the next. Traditionally information only flows vertically, from parent to offspring, within the same species. This rule of inheritance is so important that it forms the commonly used law upon which a species is defined. That is, for two groups of organisms to be considered the same species, they must be able to reproduce and the resulting offspring must be fertile. During traditional inheritance, new traits can be acquired within a species through the processes of mutation and selection. This has led to the idea of evolution forming a tree, with the last universal common ancestor at the base and modern day species at the leaves. Horizontal gene transfer (HGT) breaks the vertical law of inheritance and allows genes to move between species. The gain of a

J.W. Whitaker and D.R. Westhead Institute of Molecular and Cellular Biology, Garstang Building, University of Leeds, Leeds LS2 9J, UK e-mail: [email protected]; [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_6, # Springer-Verlag Berlin Heidelberg 2010

101

102

J.W. Whitaker and D.R. Westhead

gene through horizontal transfer can be of great advantage as it allows traits to be acquired far more rapidly than through mutation. The most powerful method of predicting the genes that have been acquired through horizontal transfer is through the construction of phylogenetic trees (Whitaker et al. 2009c). A phylogenetic tree allows the horizontally transferred genes to be identified, because the grouping of species within the tree will differ from that of the accepted taxonomy. The publication of whole genome sequences brings with it the opportunity to carry HGT predictions on a genomic scale. The study of HGT on a genomic level is known as transferomics (Whitaker et al. 2009a) and allows for the comparison of the levels of gene transfer between species. Moreover, it allows gene transfer to be considered in the context of biological systems, such as metabolic networks, thus revealing the underlying evolutionary pressures that influence the process of gene transfer. In this chapter, we review several studies which have carried out analysis of transferomes in the context of biological systems. Initially, we start with some seminal work which analysed the transferomes of prokaryotes and eukaryotes. Then we discuss our recent work which has looked at the transferomes of unicellular eukaryotes.

6.2

Horizontal Gene Transfer in Prokaryotes

The process of HGT has been most extensively studied in prokaryotes (Beiko et al. 2005; Lerat et al. 2005; Zhaxybayeva et al. 2006). Prokaryotes have a number of mobile genetic elements, including plasmids, transposons and intergrons, which can carry genes from one species to another. To encourage the movement of genetic elements, bacteria are able to swap DNA between cells, through conjugation, or take it up from the environment, through transformation. Furthermore, prokaryotic viruses, phages, can carry genes between prokaryotes through a process known as transduction. As prokaryotes are so well adapted to HGT it is not surprising that it occurs extensively. It has been estimated that since E. coli diverged from the Salmonella lineage, 100 million years ago, 18% of its 4,288 genes have been acquired through HGT (Lawrence and Ochman 1998). The observation of such extensive HGT has led to the suggestion that the ancestry of prokaryotes would be better represented by a network than a tree (William 1999). The genomes of prokaryotes can be split into two parts, a core genome and a dispensable genome, which together are termed the pan-genome (Tettelin et al. 2005). The core genome represents the genes that are common to all members of a species and carry out the core functions. The dispensable genome represents genes that are only partially shared between strains of a species. In Streptococcus agalactiae, the dispensable genome was estimated to make up 20% of the genes within the pan-genome (Tettelin et al. 2005). Analysis of the types of genes that are commonly transferred has brought the “complexity hypothesis”. According to the complexity hypothesis, genes which are involved in complex interactions are unlikely to be transferred, while genes that are in few interactions are more likely

6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees

103

to be transferred. More particularly, “informational genes” (e.g. genes involved in transcription, translation, and related process) are less likely to be transferred than “operational genes” (e.g. genes encoding metabolic enzymes). Furthermore, analysis of HGT within prokaryote metabolic networks found that genes on the periphery of the network were more likely to be transferred than those at the core (Pal et al. 2005). Taken together, these findings have led to the suggestion that prokaryotic genomes are in a constant state of flux, with new environment specific genes being constantly acquired, allowing rapid adaptation (Thomason and Read 2006). In this process, environmentally specific genes join at the networks edge allowing adaptation of the existing network to the new environment. For example, the gain of an enzyme might enable the breakdown of a rare suger, allowing it to enter glycolysis.

6.3

Horizontal Gene Transfer in Eukaryotes

HGT has not been studied as extensively in eukaryotes. In multicellular eukaryotes, where DNA would have to be transferred into the germ line, it seems unlikely that HGT will occur (Salzberg et al. 2001); although, it cannot be ruled out altogether. In unicellular eukaryotes, this barrier does not exist; however, they do not possess the same HGT machinery as prokaryotes, and therefore, it is unlikely to be as prevalent. Sources of HGT in eukaryotes include viruses, absorption from the environment, phagocytosis and endosymbiosis. Additionally, there are well characterised examples of gene transfer from prokaryotes to eukaryotes, e.g. the tumour inducing plasmid of Agrobacterium tumefaciens being transferred into a plant (Chilton et al. 1977). With the sequencing of many unicellular eukaryotic genomes, it has recently become feasible to study the extent to which HGT has occurred. HGT that accompanies endosymbiosis is termed endosymbiotic gene transfer (EGT) and was important in establishing the eukaryotic organelles: the mitochondria and plastids. In addition to the primary endosymbiosis events that established plastids as eukaryotic organelles, multiple endosymbioses have occurred in unicellular eukaryotes (Reyes-Prieto et al. 2007; Yoon et al. 2005). An important example is the event, or events, which gave rise to the chromalveolates, in which a heterotrophic eukaryote gained a plastid through endocytosis of a plastid-containing red alga (Cavalier-Smith 1999). This brought together five genomes in one cell; two nuclear, two mitochondrial and one plastid; and with them came the opportunity for large scale EGT (Huang et al. 2004a; Tyler et al. 2006) (see Fig. 6.1). A further potential source of EGT in eukaryotes is from chlamydia and may have occurred during the establishment of the primary plastid (Becker et al. 2008; Huang and Gogarten 2007). Over the past few years there have been many studies that have looked at the levels of HGT which occurred in unicellular eukaryotes (Alsmark et al. 2009; Andersson et al. 2007; Carlton et al. 2007; Huang et al. 2004a, b; Nosenko and Bhattacharya 2007; Richards et al. 2006; Striepen et al. 2004). Within these studies, the genes that are most commonly found to have been gained through transfer are

104

J.W. Whitaker and D.R. Westhead

Fig. 6.1 Secondary endosymbiosis and gene transfer. The large cell represents a primordial eukaryote that initially does not possess a plastid. The smaller cell represents an alga that does possess a plastid. The two nuclei are shown by black dotted lines, the two mitochondria are shown by grey ovals with black boarders and the plastid is shown by a black oval with a white and grey boarder. In the image on the left, the two cells are living autonomously but in a symbiotic relationship. In the middle image, the alga cell has been engulfed by the other cell to maximise the surface area between the two cells allowing more efficient exchange of nutrients. Owing to the close proximity of the cells DNA from the alga genome may be transferred into the nuclear genome of the other cell. Over time this leads to a reduction in the size of the algal genomes. In the image on the right, the alga nucleuses and mitochondria have been lost altogether leaving only the plastid

those which encode metabolic enzymes. This finding is in keeping with the complexity hypothesis (Alsmark et al. 2009). Eukaryotes that have phagotrophic lifestyles have been shown to have many HGTs. The predominant levels of HGT associated with phagocytosis and endosymbiosis have led to the suggestion that “you are what you eat” (Doolittle 1998). Here, the host nuclei are being repeatedly exposed to genes from endosymbionts or food bacteria. The repeated exposure means that a gene already present in the host nucleus could be replaced by a foreign gene. When a gene is replaced by a gene transfer of the same function, it is termed orthologous replacement. A recent study has identified a case of large scale HGT from bacteria, fungi and plants into the Bdelloid rotifers (Gladyshev et al. 2008). Bdelloid rotifers are asexual multicellular animals that are highly desiccation tolerant. It is believed that the gain of so many genes may have been facilitated by repeated desiccation and recovery. Furthermore, it is possible that genetic exchange between Bdelloid rotifers could be occurring by the same mechanism. This could explain how they have survived asexually for millions of years.

6.4

metaTIGER and Its Application to Transferomics

In this section, we shall discuss our recent studies of the transferomes of unicellular eukaryotes. We shall begin by describing the construction and functionality of the metabolic evolution resource, metaTIGER (Whitaker et al. 2009b). Then we shall describe how metaTIGER was used to investigate the transferomes of ten groups of unicellular eukaryotes (Whitaker et al. 2009a). Detailed descriptions of these works can be found in the corresponding publications; herein, we provide a brief summary of the studies.

6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees

6.4.1

105

metaTIGER

The reconstruction of metabolic networks is an essential aspect of genome analysis. metaTIGER is the first resource to bring together the reconstructed metabolic networks of 121 eukaryotes with detailed evolutionary information. The construction and functionality of metaTIGER are summarised in Fig. 6.2.

6.4.1.1

Enzyme Prediction

To construct metaTIGER, the websites of online sequence repositories and sequencing centres were searched for genomic sequence data. The quality of sequence data that was used varied from assembled genomes to expressed sequence tags (ESTs). The enzymes that are present within the genomes were predicted through homology to PRIAM enzymes sequence profiles (Claudel-Renard et al. 2003) using the program SHARKhunt (Pinney et al. 2005). The PRIAM enzyme sequences profiles correspond to the conserved domains of proteins that all share the same enzymatic function. For each of the profiles, enzymatic function is denoted by an E.C. number. When used by SHARKhunt the conserved domains are used to make position specific scoring matrices (PSSMs) and hidden Markov models (HMMs), which are respectively used by PSI-BLAST (Altschul et al. 1997)

Fig. 6.2 An overview of metaTIGER website. On the left are the sources of the information that were used in the construction of metaTIGER. In the centre grey box are the three main elements of the metaTIGER site. On the right are the ways that a user can interact with the metaTIGER site. Arrows show the flow of information

106

J.W. Whitaker and D.R. Westhead

and GeneWise2 (Birney and Durbin 2000) to search the genomic sequence data. SHARKhunt works by running an initial PSI-BLAST search that quickly identifies regions of the genome which are similar to the PRIAM PSI-BLAST profile. Then these regions are extracted matched against the corresponding HMM using GeneWise2. The region of the genome that matches the HMM is extracted and used to create a polypeptide sequence. The polypeptide is then tested using PSI-BLAST and the original PRIAM PSI-BLAST profile. The E-values and sequences are then given as output.

6.4.1.2

Website Construction

The enzyme predictions were uploaded into the metaTIGER relational database. To allow the enzyme predictions to be searched and interpreted in relation to the metabolic network, parts of the KEGG Ligand database (Kanehisa et al. 2006; Ogata et al. 1999) were also loaded into the metaTIGER database. Furthermore, custom KEGG pathway images, for each organism, were produced to allow the predicted enzymes to be viewed in the context of metabolic pathways. To allow comparative analysis of pathways two tools are provided on the metaTIGER website. These tools allow the enzymes that are present within a particular pathway to be compared between multiple organisms.

6.4.1.3

Phylogenetic Trees

Integrated into the metaTIGER site is evolutionary information in the form of a phylogenetic tree for each of the predicted enzymes. When producing the phylogenetic trees care was taken to make them in a way that reduced the chance of artefacts and makes them suitable for the prediction of HGTs. The trees can be viewed in the site using the interactive tree viewer iTOL (Letunic and Bork 2007) (see Fig. 6.5 for an example of phylogenetic tree viewed via the metaTIGER site) or they can be searched for clades of interest using PhyloGenie (Frickey and Lupas 2004). The tree searching functionality is of particular importance, when metaTIGER is applied to transferomic analysis, as it allows phylogenetic trees that depict HGT events to be rapidly identified.

6.4.2

Transferome Analysis

The tools and data that are intergraded into the metaTIGER website were used to investigate the process of HGT in unicellular eukaryotes (Whitaker et al. 2009a). The transferome analysis was made up of four sections: identification of a highconfidence HGT dataset; comparison of the gene transfer levels and identification of drug targets; connectivity analysis; and enrichment analysis.

6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees

6.4.2.1

107

The Identification of a High-Confidence HGT Dataset

To establish a dataset of putative EGTs and HGTs (E/HGTs), the metaTIGER tree search facility was used. Four different types of gene transfer events were identified: EGT from Cyanobacteria, EGT from Chlamydia, EGT from archaeplastida (land plants, green alga, red alga and glaucophytes) and HGT from bacteria. The gene transfer events were identified in ten groups of unicellular eukaryotes: Plasmodium, Theileria, Toxoplasma, Cryptosporidium, Leishmania, Trypanosoma, Phytophthora, diatoms, Ostreococcus and Saccharomyces. These ten groups were used as each of them had more one completed genome sequence. By using only groups with more than one completed genome sequences, it meant that contamination in a single genome sequence would not influence the results of the analysis. PhyloGenie tree queries were designed to identify all phylogenetic trees that depicted the corresponding E/HGT event. The tree queries constitute a way of screening a large database of trees for trees of potential interest; subsequent manual checking of the identified trees is necessary and was carried out, and unconvincing examples were rejected. When searching for trees depicting high-confidence HGT events, only clades with bootstrap support of 70% or above were considered. A cutoff of 70% was used because it corresponds to at least a 95% chance that the clade is correct (Hillis and Bull 1993). The E/HGT predictions that were made can be viewed as high-confidence predictions as three steps have been taken to ensure their quality: (1) the use of organism groups with more than one genome sequence; (2) the manual inspection of E/HGT depicting trees; and (3) a bootstrap cut-off of greater than 70%.

6.4.2.2

Comparison of the Gene Transfer Levels and Identification of Drug Targets

The level of predict E/HGTs was compared between the ten groups of unicellular eukaryotes and is shown in Fig. 6.3. The largest number of EGTs was found in the two photosynthetic groups (Ostreococcus and the diatoms), which confirmed that the high-confidence dataset of E/HGT predictions were suitable for revealing gene transfer trends. Organisms that posses a plastid like organelle but are not photosynthetic (Toxoplasma, Theileria and Plasmodium) were found to retain EGTs, indicating non-photosynthetic metabolic activities have been gained through EGT. Furthermore, organism groups that have believed to have once possessed a plastid, which is now lost (Cryptosporidium and Phytophthora), have also retained EGTs, indicating that enzymes which function outside of the plastid have been gained. No EGTs were found in Saccharomyces, which are not thought to have ancestrally possessed a plastid. There are two reasons why HGT may be good drug targets. First, the acquired genes could have previously been specific to prokaryotes and therefore be absent from the parasites host. Second, if the acquired gene is present within the pathogens host but acquired version is highly divergent from the hosts copy (e.g. the acquired

108

J.W. Whitaker and D.R. Westhead

Fig. 6.3 The metabolic transferomes. The total number of enzymes found in metaTIGER with an E-value beneath 1.0 10 30 are shown for each of the groups of unicellular eukaryotes. The counts of E/HGTs are indicated by the differently coloured bars

gene is of prokaryotic origin) then parasite specific inhibitors can be produced. The trypanosomatids (Trypanosoma and Leishmania) were found to possess a large number of genes that have been gained through HGT from bacteria. As there is a great need for new therapeutic strategies to combat these parasites, the predicted HGTs were investigated further. This revealed that one of the HGTs, Pyruvate decarboxylase, was already a target for the drug omeprazole, which is currently used in the treatment of Leishmania tropica. Moreover, three HGTs were suggested a possible new drug targets: isopentenyl pyrophosphate isomerase (see Fig. 6.4), isocitrate dehydrogenase and pyrroline-5-carboxylate reductase. To investigate the idea that Chlamydia assisted in the establishment of the primary plastid (Becker et al. 2008; Huang and Gogarten 2007) the predicted EGTs were considered. The EGT predictions were compared to identify gene transfers, where a gene had been transferred from Chlamydia into the archaeplastida, then into a third lineage during secondary endosymbiosis. The following examples were identified: four genes in the diatoms; three genes within Plasmodium and Toxoplasma; and one gene in Theileria. These results support the idea that Chlamydia assisted in the establishment primary plastid and show that Chlamydial genes have been transferred during secondary endosymbiosis. The metaTIGER phylogenetic tree of enoyl[acyl-carrier-protein] reductase is shown as an example in Fig. 6.5.

6.4.2.3

Connectivity Analysis

The enzymes whose genes have been identified as being gained through E/HGT have successfully integrated into their new hosts metabolic networks. For the genes to have become fixed within the lineages they must have provided an evolutionary advantage through enhancement of the metabolic network. If two or more enzyme encoding genes, which are connected within a metabolic pathway, are co-transferred they could provide a greater enhancement to the metabolic network than two

6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees

109

Fig. 6.4 The mevalonate isoprenoid biosynthesis pathways in T. cruzi. Enzymes are shown by E.C. number. The enzyme isopentenyl pyrophosphate isomerase (5.3.3.2) is a predicted HGT in the trypanosomatids. The enzyme farnesyl diphosphate synthase (2.5.1.10) has been validated as a drug target in T. cruzi, suggesting that isopentenyl pyrophosphate isomerase would also be an effective drug target

enzymes that are not connected. This greater potential for enhancement could provide greater evolutionary pressure for the fixation of co-transferred genes, which encode enzymes that are connected within metabolic pathways. To investigate this, the number of connexions between enzymes, whose genes were acquired via horizontal transfer, was considered. The connectivity analysis worked by comparing the number of connexions between E/HGTs to the number of connexions between enzymes picked at random from the species metabolic network. Enzymes were taken as being connected if they carried out consecutive reactions within the organisms metabolic network. This analysis was carried out separately for EGTs and HGTs. For EGTs, it was found that the number of connections was significantly greater than random. In particular, the number of connections between EGTs from the organisms that have now lost their plastids (Cryptosporidium and Phytophthora) was found to be significantly greater than random. This demonstrates the co-transfer of enzyme encoding genes that are connected within the metabolic network but do not function within the plastid. When the connectivity analysis was applied to the HGTs from bacteria, no organism groups were found to be significantly more connected than random.

Fig. 6.5 The metaTIGER phylogenetic tree of enoyl-[acyl-carrier-protein] reductase. The phylogenetic tree on the left shows the entire phylogenetic tree of enoyl-[acyl-carrier-protein] reductase (1.3.1.9) as viewed on the metaTIGER website. The phylogenetic tree on the right shows a single clade which has been enlarged and had certain taxa highlighted. Bacterial taxa are highlighted with a grey background and eukaryotic taxa are highlighted with a black background. Certain taxa have been enlarged to make the tree easier to interpret (NB: The diatoms are Phaeodactylum tricornutum and Thalassiosira pseudonana)

110 J.W. Whitaker and D.R. Westhead

6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees

111

However, the two groups with the largest numbers of HGTs (Leishmania and Ostreococcus) approached statistical significance. Thus, suggesting that if a less stringent criterion had been used during HGT prediction, significances might have also been found for HGTs from bacteria.

6.4.2.4

Enrichment Analysis

Genes are commonly characterised according to three ontology categories: molecular function, biological process and cellular location. In the case of metabolic enzymes, these categories relate to the chemical reactions they catalyse, the pathways and sub-networks within which they function and the location within the cell where they function. Enrichment analysis was carried out to investigate if the genes encoding enzymes in particular pathways or of particular molecular function are more likely to have been acquired through HGT. Enrichment analysis of cellular location was not carried out as it is likely to be less conserved between organism groups than other aspects of ontology. Of the different enrichment analyses that were used it was only enrichment of KEGG metabolic pathways that found significant results. The pathways enrichment was carried out on three levels: KEGG map group (large groups of related pathways); KEGG maps (a set of closely related pathways); and KEGG modules (specific metabolic pathways). Aspects of plastid metabolism that are known to occur within specific organism groups were found to be enriched with EGTs. Thus, demonstrating that the high-confidence E/HGTs predictions can uncover the underlying trends of enrichment. The most significant and unexpected trend at the level of KEGG map groups level was an enrichment of EGTs within carbohydrate metabolism. Several examples of metabolic pathways that could be important to the pathogenicity of some of the parasites being studied were identified: xylose metabolism in Leishmania; 1-3-beta-glucan metabolic in Phytophthora; trehalose metabolism in Phytophthora; and lipopolysaccharide biosynthesis in Phytophthora. Of the pathway enrichments that may be important to pathogenicity, lipopolysaccharide biosynthesis pathway is the most exciting because it has not been seen before in eukaryotes, outside than members to the archaeplastida. Moreover, lipopolysaccharides are important pathogenicity factor in Gram-negative bacteria. Thus, it is possible that lipopolysaccharides may be important to the pathogenicity of Phytophthora and its discovery may aid the development of new control agents.

6.5

Conclusions

Transferomics is the study of HGT on a genomic scale and can be used to reveal the underlying trends that influence gene transfer. Large scale transferomic studies are no longer exclusive to prokaryotes. Transferomic analysis of eukaryotes can be

112

J.W. Whitaker and D.R. Westhead

used to reveal insight into their evolution which may be useful in the development of new therapeutic strategies.

References Alsmark UC, Sicheritz-Ponten T, Foster PG, Hirt RP, Embley TM (2009) Horizontal gene transfer in eukaryotic parasites: a case study of Entamoeba histolytica and Trichomonas vaginalis. In: Gogarten MB, Gogarten JP, Olendzenski L (eds) Horizontal gene transfer genomes in flux, vol 532, Methods in molecular biology. Springer, Heidelberg, pp 489–500 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402 Andersson JO, Sjogren AM, Horner DS, Murphy CA, Dyal PL, Svard SG, Logsdon JM Jr, Ragan MA, Hirt RP, Roger AJ (2007) A genomic survey of the fish parasite Spironucleus salmonicida indicates genomic plasticity among diplomonads and significant lateral gene transfer in eukaryote genome evolution. BMC Genomics 8:51 Becker B, Hoef-Emden K, Melkonian M (2008) Chlamydial genes shed light on the evolution of photoautotrophic eukaryotes. BMC Evol Biol 8:203 Beiko RG, Harlow TJ, Ragan MA (2005) Highways of gene sharing in prokaryotes. Proc Natl Acad Sci USA 102:14332–14337 Birney E, Durbin R (2000) Using genewise in the Drosophila annotation experiment. Genome Res 10:547–548 Carlton JM, Hirt RP, Silva JC, Delcher AL, Schatz M, Zhao Q, Wortman JR, Bidwell SL, Alsmark UCM, Besteiro S, Sicheritz-Ponten T, Noel CJ, Dacks JB, Foster PG, Simillion C, Van de Peer Y, Miranda-Saavedra D, Barton GJ, Westrop GD, Muller S, Dessi D, Fiori PL, Ren Q, Paulsen I, Zhang H, Bastida-Corcuera FD, Simoes-Barbosa A, Brown MT, Hayes RD, Mukherjee M, Okumura CY, Schneider R, Smith AJ, Vanacova S, Villalvazo M, Haas BJ, Pertea M, Feldblyum TV, Utterback TR, Shu C-L, Osoegawa K, de Jong PJ, Hrdy I, Horvathova L, Zubacova Z, Dolezal P, Malik S-B, Logsdon JM Jr, Henze K, Gupta A, Wang CC, Dunne RL, Upcroft JA, Upcroft P, White O, Salzberg SL, Tang P, Chiu C-H, Lee Y-S, Embley TM, Coombs GH, Mottram JC, Tachezy J, Fraser-Liggett CM, Johnson PJ (2007) Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis. Science 315:207–212 Cavalier-Smith T (1999) Principles of protein and lipid targeting in secondary symbiogenesis: euglenoid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree. J Eukaryot Microbiol 46:347–366 Chilton M-D, Drummond MH, Merlo DJ, Sciaky D, Montoya AL, Gordon MP, Nester EW (1977) Stable incorporation of plasmid DNA into higher plant cells: the molecular basis of crown gall tumorigenesis. Cell 11:263 Claudel-Renard C, Chevalet C, Faraut T, Kahn D (2003) Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res 31:6633–6639 Doolittle WF (1998) You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet 14:307–311 Frickey T, Lupas AN (2004) PhyloGenie: automated phylome generation and analysis. Nucleic Acids Res 32:5231–5238 Gladyshev EA, Meselson M, Arkhipova IR (2008) Massive horizontal gene transfer in bdelloid rotifers. Science 320:1210–1213 Hillis DM, Bull JJ (1993) An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol 42:182

6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees

113

Huang J, Gogarten JP (2007) Did an ancient chlamydial endosymbiosis facilitate the establishment of primary plastids? Genome Biol 8:R99 Huang J, Mullapudi N, Lancto CA, Scott M, Abrahamsen MS, Kissinger JC (2004a) Phylogenomic evidence supports past endosymbiosis, intracellular and horizontal gene transfer in Cryptosporidium parvum. Genome Biol 5:R88 Huang J, Mullapudi N, Sicheritz-Ponten T, Kissinger JC (2004b) A first glimpse into the pattern and scale of gene transfer in Apicomplexa. Int J Parasitol 34:265–274 Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34:D354–D357 Lawrence JG, Ochman H (1998) Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci USA 95:9413–9417 Lerat E, Daubin V, Ochman H, Moran NA (2005) Evolutionary origins of genomic repertoires in bacteria. PLoS Biol 3:e130 Letunic I, Bork P (2007) Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23:127–128 Nosenko T, Bhattacharya D (2007) Horizontal gene transfer in chromalveolates. BMC Evol Biol 7:173 Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (1999) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 27:29–34 Pal C, Papp B, Lercher MJ (2005) Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nat Genet 37:1372–1375 Pinney JW, Shirley MW, McConkey GA, Westhead DR (2005) metaSHARK: software for automated metabolic network prediction from DNA sequence and its application to the genomes of Plasmodium falciparum and Eimeria tenella. Nucleic Acids Res 33:1399–1409 Reyes-Prieto A, Weber APM, Bhattacharya D (2007) The origin and establishment of the plastid in algae and plants. Annu Rev Genet 41:147–168 Richards TA, Dacks JB, Jenkinson JM, Thornton CR, Talbot NJ (2006) Evolution of filamentous plant pathogens: gene exchange across eukaryotic kingdoms. Curr Biol 16:1857–1864 Salzberg SL, White O, Peterson J, Eisen JA (2001) Microbial genes in the human genome: lateral transfer or gene loss? Science 292:1903–1906 Striepen B, Pruijssers AJ, Huang J, Li C, Gubbels MJ, Umejiego NN, Hedstrom L, Kissinger JC (2004) Gene transfer in the evolution of parasite nucleotide biosynthesis. Proc Natl Acad Sci USA 101:3154–3159 Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, DeBoy RT, Davidsen TM, Mora M, Scarselli M, Margarit y Ros I, Peterson JD, Hauser CR, Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ, Rosovitz MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou L, Zafar N, Khouri H, Radune D, Dimitrov G, Watkins K, O’Connor KJB, Smith S, Utterback TR, White O, Rubens CE, Grandi G, Madoff LC, Kasper DL, Telford JL, Wessels MR, Rappuoli R, Fraser CM (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci USA 102:13950–13955 Thomason B, Read TD (2006) Shuffling bacterial metabolomes. Genome Biol 7:204 Tyler BM, Tripathy S, Zhang X, Dehal P, Jiang RH, Aerts A, Arredondo FD, Baxter L, Bensasson D, Beynon JL, Chapman J, Damasceno CM, Dorrance AE, Dou D, Dickerman AW, Dubchak IL, Garbelotto M, Gijzen M, Gordon SG, Govers F, Grunwald NJ, Huang W, Ivors KL, Jones RW, Kamoun S, Krampis K, Lamour KH, Lee MK, McDonald WH, Medina M, Meijer HJ, Nordberg EK, Maclean DJ, Ospina-Giraldo MD, Morris PF, Phuntumart V, Putnam NH, Rash S, Rose JK, Sakihama Y, Salamov AA, Savidor A, Scheuring CF, Smith BM, Sobral BW, Terry A, Torto-Alalibo TA, Win J, Xu Z, Zhang H, Grigoriev IV, Rokhsar DS, Boore JL (2006) Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science 313:1261–1266

114

J.W. Whitaker and D.R. Westhead

Whitaker J, McConkey G, Westhead D (2009a) The transferome of metabolic genes explored: analysis of the horizontal transfer of enzyme encoding genes in unicellular eukaryotes. Genome Biol 10:R36 Whitaker JW, Letunic I, McConkey GA, Westhead DR (2009b) metaTIGER: a metabolic evolution resource. Nucleic Acids Res 37:D531–D538 Whitaker JW, McConkey GA, Westhead DR (2009c) Prediction of horizontal gene transfers in eukaryotes: approaches and challenges. Biochem Soc Trans 37:792–795 William M (1999) Mosaic bacterial chromosomes: a challenge en route to a tree of genomes. Bioessays 21:99–104 Yoon HS, Hackett JD, Van Dolah FM, Nosenko T, Lidie KL, Bhattacharya D (2005) Tertiary endosymbiosis driven genome evolution in dinoflagellate algae. Mol Biol Evol 22:1299–1308 Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT (2006) Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. Genome Res 16:1099–1108

Chapter 7

Comparative Genomics and Transcriptomics of Lactation Christophe M. Lefe`vre, Karensa Menzies, Julie A. Sharp, and Kevin R. Nicholas

Abstract Lactation is an important characteristic of mammalian reproduction sometimes referred to as the quintessence of mammals. Comparative genomics and transcriptomics experiments are allowing a more in-depth molecular analysis of the evolution of lactation throughout the mammalian kingdom and these recent results are reviewed here. Milk cell and mammary gland gene expression analysis with sequencing methodology have started to reveal conserved or specific milk protein and components of the lactation system of monotreme, marsupial and eutherian lineages. These experiments have confirmed the ancient origin of the complex lactation system and provided useful insight into the function of specific milk proteins in the control of the lactation programme or the role of milk in the regulation of growth and development of the young beyond simple nutritive aspects.

C.M. Lefe`vre Institute for Technology Research and Innovation, Deakin University, Waurn Ponds, Geelong, VIC 3217, Australia CRC for Innovative Dairy Products, Department of Zoology, University of Melbourne, Melbourne, VIC 3010, Australia Victorian Bioinformatics Consortium, Monash University, Clayton, Melbourne, VIC 3080, Australia e-mail: [email protected] K. Menzies, J.A. Sharp, and K.R. Nicholas Institute for Technology Research and Innovation, Deakin University, Waurn Ponds, Geelong, VIC 3217, Australia CRC for Innovative Dairy Products, Department of Zoology, University of Melbourne, Melbourne, VIC 3010, Australia

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_7, # Springer-Verlag Berlin Heidelberg 2010

115

116

7.1

C.M. Lefe`vre et al.

Introduction: Lactation Evolution and Diversity

Lactation consists in the nourishment of the young with copious milk secretion by the mammary gland. This aspect of mammalian reproduction is a defining characteristic of mammals and it is often referred to as the quintessence of mammals despite the existence of other differentiating characters (jaw bone structure, fur. . .). It has also been suggested that a key role of lactation during mammalian evolution has been to allow the development of affectivity with an opportunity for learning, therefore providing a substrate for the development of intelligence in man (Peaker 2002). Thus, due to its essential role in reproduction, lactation is in part responsible for the evolutionary success of mammals. Milk provision is a complex process, with changes in milk composition and interactions between parent and young beyond the straightforward nutritional function. The precise mechanism of how lactation evolved and its ancestral role are still unclear, but a diversity of lactation strategies has been adopted by mammals. Fossil and molecular evidence point to the appearance of early mammals toward the end of the Triassic on the synaptid branch of the tree of life separating mammalian ancestors from other living creatures during the Permian (about 320 million years ago, Fig. 7.1). Comparative genome analysis has recently emphasised at the molecular level the ancient origin of the essential components of the lactation system. This complex lactation system has gradually evolved during therapsid evolution in the Triassic period and was already well established in the crown mammals and probably in the preceding mammaliaforms of the late triassic. Today, after more than 200 million years of evolution, the diversity of mammalian species and the extreme variations in their reproductive strategies affecting in particular the lactation cycle provide numerous examples of lineage or speciesspecific adaptations of the lactation system during evolution. The earliest split in the mammalian phylogeny established the Prototheria (monotreme or Monotrema) which separated from the Theria about 166 (Bininda-Emonds et al. 2007) to 220 (Madsen 2009) million years ago. Theria latter split into Metatheria (marsupials or Marsupialia) and Eutheria (eutherian or placentalia) lineages as illustrated in Fig. 7.1. Only two genera of Monotremes have survived in Australiasia; the platypus (Ornithorhynchus anatinus) and echidnas (Tachyglossus and Zaglosus genera). These egg-laying Monotremes are often regarded as representative of early mammals with a more primitive prototherian lactation system. Genomics and transcriptomics approaches have recently enable the molecular analysis of monotreme (Lefevre et al. 2009), marsupial (Lefevre et al. 2007) and eutherian (Lemay et al. 2009) lactation. Comparative approaches have started to allow a detailed analysis of the functional evolution of specific molecular components of lactation (Menzies et al. 2009c; Topcic et al. 2009). The recent advances in genome sequencing of a number of mammalian species have provided invaluable resources for the comparative evolutionary analysis of milk proteins and other genes involved in lactation. The recent release of the bovine (Bos Taurus) genome draft has stimulated intense activity in lactation genomics

208

290 325 360

Paleozoic

250

Jurassic

changes in Milk Throughout lactation

Metatherian

Prototherian

Eutherians

Marsupials

secretion of complex milk

oviparity

Carboniferous Permian Triassic

146

Mesozoic

Cretaceous

65

Monotremes

0

Tertiary

Comparative Genomics and Transcriptomics of Lactation

Cenozoic

7

117

Constant secretion of complex milk

Placentation, viviparity

Therian Complex lactation system established

Mammals

Gradual Accrual of Milk secretion by cutaneous glands

Mammaliaformes

turtles, crocodiles, dinosaurs & birds,

Cynodonts Therapsids Synapsids

Sauropsids

Amniotes

Fig. 7.1 Evolution of mammals and lactation

(Elsik et al. 2009). Lactation gene sets have been compiled from mammary gland cDNA libraries at multiple stages of mammary development or lactation status to identify unique milk proteins or important mammary genes in the cow (Lemay et al. 2009) and other species including monotremes (Lefevre et al. 2009) and Marsupials (Lefevre et al. 2007). Some of these results are reviewed here.

7.2

Milk Cell Sequencing and Monotreme Lactation

Egg-laying monotremes are regarded as close representatives of ancient mammals. Tiny hatchlings are highly altricial and depend completely on milk as a source of nutrition during the period of suckling, which is prolonged relative to gestation and incubation (Griffiths 1978). Monotremes have no teat and milk is excreted from a series of ducts opening directly on the surface of the ventral skin patch of the areola. However, monotreme young exhibit a real suckling behaviour

C.M. Lefe`vre et al.

118

and do not simply leak the milk secretion. The role of milk in monotreme young development remains to be established and, apart from the initial lactation period, changes in milk composition similar to those reported in marsupials with lactation phase-specific changes in milk protein gene expression are still controversial. Changes in milk fat composition have been described in echidna but an effect of diet on milk fat content has been demonstrated and milk taken from the same platypus over a 3-month period in the wild showed no significant change in milk fat content (Griffiths 1978). However, in Echidna, changes in milk protein expression profiles have been reported ( Joseph and Griffiths 1992). The protein composition of monotreme milk has been investigated. Whey proteins including alpha-lactalbumin, lyzosyme (Guss et al. 1997; Messer et al. 1997, Shaw et al. 1993) and, more recently, whey acidic proteins WAP and WFDC2 (Sharp et al. 2007) as well as a complete set of caseins and other proteins (Lefevre et al. 2009) have been characterised.

7.2.1

Milk Cell cDNA Sequencing

In order to collect molecular probes and develop a non-invasive sequencing approach for the analysis of lactation in protected species such as the platypus (Ornithorhynchus anatinus) and the short-beak echidna (Tachyglossus aculeatus), a milk cell cDNA sequencing approach has been developed (Lefevre et al. 2009). Similar approaches may be more generally useful for the comparative analysis of lactation in mammalian species, which may be protected or not easily bred in the laboratory. Non-destructive approach may also be used in future experiments for the controlled study of variation of gene expression in the same mammal during the full course of lactation. Better knowledge of milk composition in endangered species may be useful to conservation programmes to determine best substitution practice for artificial feeding or cross species fostering. Milk cells preparation may contain cells not only from mammary epithelia origin but also immune cells or cells from skin or sebaceous glands. For example, at the end of lactation when milk production stops, massive infiltration of immune cells into the mammary gland of monotremes has been described (Griffiths 1978). The areola also contains sebaceous glands. Thus, milk cells may include skin cells, immune cells, exfoliated epithelial cells from ducts and mammary or sebaceous glands. The purification of milk fat globules mRNA from milk has been proposed as one possible approach for the enrichment of mammary epithelial transcripts from human milk (Maningat et al. 2007). However, shallow milk cell cDNA sequencing during peak lactation in monotremes has provided information about a number of caseins and whey protein genes. Milk proteins transcripts were detected at high level indicating that monotreme milk is enriched in exfoliated mammary epithelial cells (Lefevre et al. 2009). Potentially, all-milk cell fraction analysis may reveal changes in mammary gene expression signatures from non-epithelial compartments as well. In the future, deep

7

Comparative Genomics and Transcriptomics of Lactation

119

sequencing will be useful to analyse the variation of gene expression and the cell composition of milk during the course of lactation in monotremes and other species. In exploratory experiments, milk protein sequences from platypus and echidna were characterised including a full set of caseins. Some of these genes could not be accurately predicted from the current platypus genome sequence annotation alone. Sequence divergence between monotremes and other mammals represents an average of one change per nucleotide so that neutral or rapidly evolving sequences of monotremes and eutherians cannot be easily aligned for efficient annotation (Warren et al. 2008). The problem is also compounded with the rapid evolution of milk proteins such as casein (Mercier et al. 1976) typically genetically encoded by diverse combinations of short exons and the presence of unresolved gaps in the draft genome sequence of the platypus.

7.2.2

Monotreme Casein and the Ancient Origin of the Casein Gene Cluster

Caseins are major milk proteins and their dual functionality is to serve as a source of amino acids as well as to transport phosphate and calcium to support bone growth of the young. Alpha and beta caseins (CSN1 and CSN2) and their variants are also called “calcium sensitive caseins” because they precipitate easily in low to moderate calcium concentrations. They are secreted as large calcium-dependent aggregates or casein micelles sequestering calcium under the stabilising action of kappa casein (CSN3). It was previously believed that CSN3 was evolutionary unrelated to other caseins (Jones et al. 1985). However, this view has been challenged and gene structure analysis has revealed the similar and peculiar organisation of all casein genes, with short all in-frame exons, placing them together with other calcium-binding phosphoproteins in a new protein family (Kawasaki and Weiss 2003). Monotreme milk cells express all types of caseins and casein variants (Lefevre et al. 2009) similar to those reported in other mammals (Rijnkels 2002). In Fig. 7.2, the organisation of the monotreme casein cluster locus is compared with other mammals, including a marsupial (the opossum Monodelphis domestica) and eutherians. A physical linkage of casein genes is seen in the casein locus of all mammalian genomes examined and the locus has expended during mammalian radiation. A recent duplication of beta casein occurred in the monotreme lineage. Similar duplications have also occurred recursively along eutherian lineages (Rijnkels 2002). Casein sequences exhibit a rapid evolution. This is in part due to extensive exon usage variation. As in other mammals, a number of casein splice variants have been identified in monotremes and the platypus or the echidna may use different exons. Despite this variability, the close genomic proximity of the main alpha and beta casein genes in an inverted tail–tail orientation and the relative orientation of additional caseinlike genes and the more distant kappa casein gene are similar in all mammalian

C.M. Lefe`vre et al.

120 Platypus ultra362 CSN1 CSN2

CSN2b

CSN3

Opossum chr.5 CSN1

CSN2

Odam

CSN3

Cattle chr.6 CSN1S1

Mouse chr.5

HIS STAT

CSN1S2

Odam

Csn1s2a

AK05291 Csn1s2b

CSN3

CSN2

csna

STAT

HIS

Odam

csnk

csnb

Human chr.4 CSN1S1

STAT

HIS3

HIS1

CSN1S2

NP_999876.1 CSN1S2b

FDSCP ODAM

CSN3

CSN2

0

0.05 0.1

0.15 0.2

0.25 0.3 Mb

Fig. 7.2 Comparative analysis of the casein locus in mammals

genome sequences so far available. This configuration is likely to be important for the concerted expression of casein genes. During mammalian evolution, the casein cluster has expended by gene duplication within the cluster. Eutherian have expended the most, acquiring new genes including caseins or additional calcium-binding phosphoproteins from salivary secretion or enamel matrix (Kawasaki and Weiss 2003). Marsupials seem to possess only one copy of each CSN1 and CSN2 (Lefevre et al. 2007). Interestingly, marsupial beta caseins are longer than in other mammals (Lefevre et al. 2007) suggesting that the absence of a third casein homolog may be compensated by an apparent elongation of the CSN2 sequence. Two models are presented for the ancient organisation of the casein cluster in the crown mammals with either two or three calcium-binding casein in addition to kappa-casein (Fig. 7.2). Importantly, the most complex model is supported by similar gene organisation of eutherian CSN1S2 and fulllength monotreme CSN2b and the presence of a canonical phosphorylation site in the most ancient monotreme CSN2b coding region. This model also implies the deletion of the ancestral CSN1S2 in the marsupial lineage, supported by the presence of several retrotransposon type repeats in the corresponding region of the opossum casein locus. The simpler model is more difficult to explain as it implies the opportunistic construction of a strong casein-like phosphorylation site from the more ancient, non-duplicated, genome sequence and independent duplications in the eutherian lineage. Thus, it is certain that the ancestral casein locus was already highly organised before the common ancestor of extant mammals, and it is likely that three calcium sensing casein had already arisen from duplication in a more ancient ancestor (Lefevre et al. 2009).

7

Comparative Genomics and Transcriptomics of Lactation

121

Fig. 7.3 Quantitative estimates of gene expression in milk cells from monotremes. (a) Platypus milk cell transcriptome. (b) Echidna milk cell transcriptome

7.2.3

Monotreme Milk Transcriptome

Other genes have been identified from monotreme milk cells. The milk cell transcriptome of platypus and echidna estimated by cDNA sequencing is presented in Fig. 7.3. A global discrepancy was seen between the transcript frequencies in the two species, with the platypus transcriptome largely dominated by beta-lactoglobin and casein transcripts while echidna milk cell RNA includes a higher proportion of whey proteins. This discrepancy is consistent with the observation that platypus milk contains fewer whey proteins than echidna milk (Hopper and McKenzie 1974). A number of whey protein such as alpha-lactalbumin, lactotransferin and WAP and WDC2 have been identified. WAP has shown extensive rearrangements in mammalian lineages leading to a reorganisation of the number of exons from monotremes to marsupials and eutherian while a functional gene has been lost in human, cow and goats (Sharp et al. 2007). Interestingly, WAP domains have been shown to carry specific functional activities in different lineages (Topcic et al. 2009). However, the function of WAP is not fully understood. The monotreme ortholog of human C6orf58, a protein of unknown function expressed in epithelial cells of the digestive track of other mammals but not previously identified in milk, was found to be expressed at high level in monotreme milk cells. Putative proteins or proteins of unknown function have been identified including a gene with high similarity to chondromodulin II which is a positive regulator of chondrocyte proliferation (Mori et al. 1997; Yamagoe et al. 1998), a gene with similarity to prolactin inducible protein PIP (Murphy et al. 1987), and ovostatin.

C.M. Lefe`vre et al.

122

7.2.4

Ancient Origins and Variability of the Lactation System

Overall the conservation of the key milk caseins, in particular their consistent genomic organisation, indicates the early, pre-monotreme development of the fundamental lactation mechanism across all mammals. In contrast, either the lineage-specific gene duplications that have occurred specifically within the casein locus of monotremes and eutherians but not marsupials or the more complex rearrangements and losses that have occurred in WAP genes (Sharp et al. 2007), as well as the presence of putative lineage-specific milk proteins, emphases the independent selection on milk provision strategies to the young, likely to be linked to different developmental strategies. The monotremes therefore provide insight into the ancestral drivers for lactation and how these have adapted in different lineages, including our own.

7.3

Marsupial Lactation: The Marsupial Lactation Cycle and Mammary Gland Sequencing

Amongst mammals, marsupials exhibit one of the most interesting lactation system with complex changes during the lactation cycle.

7.3.1

Marsupial Lactation

After a short gestation period, marsupials give birth to a relatively immature newborn that is totally dependant on milk for growth and development during a relatively long lactation period. Important changes occur during the lactation cycle of marsupials in terms of mammary gland development, milk production, milk composition as well as development or behaviour of the young (Green et al. 1983). This is in sharp contrast with eutherians with a larger investment during gestation (Tyndale-Biscoe et al. 1988) and milk of a relatively constant composition, apart from the initial colostrum during the immediate postpartum period (Jenness 1986). Marsupial milk provides essential nutrients and putative growth factors for the development of the young and cross-fostering experiments have shown that milk controls post-natal development (Ballard et al. 1995; Trott et al. 2003; Waite et al. 2005). Endocrine and others factors, potentially intrinsic to the mammary gland, are likely to control milk secretion (Hendry et al. 1998) and marsupial milk contains autocrine/paracrine regulators of the mammary gland (Brennan et al. 2007; Nicholas et al. 1997). In special circumstances, macropod marsupials such as the tammar wallaby (Macropus eugenii) and red kangaroo (Macropus Rufus) may present asynchronous concurrent lactation, feeding concurrently two young of different ages with milk of different compositions from adjacent mammary glands;

7

Comparative Genomics and Transcriptomics of Lactation

123

a new born pouch young and a few month older animal (Lemon and Bailey 1966; Nicholas 1988). Although teat-sealing experiments have also shown gland-specific involution in mice, the case of marsupials goes farther to demonstrate the importance of local control in the complex lactation programme of marsupial. However, the molecular control mechanisms of marsupial milk composition are not fully known.

7.3.2

The Tammar Wallaby: An Animal Model of Marsupial Lactation

The tammar wallaby (Macropus Eugenii) is one of the most studied marsupial models. It is an annual breeder characterised by a short pregnancy lasting 26 days followed by an extended lactation period of about a 300 days. The lactation cycle is divided into three phases of approximately 100 days each based on the sucking pattern of the young (permanently attached to the teat, permanently in the pouch and intermittently sucking, in and out of the pouch) and milk composition. Shortly after birth, the single young weighing only 400 mg crawls into the pouch and attaches to one of four teats, each associated with a separate mammary gland. The chosen teat will provide all the milk during the entire period of lactation with massive growth of the associated glandular tissue while the other three glands do not generally participate in any milk production. Changes in expression levels of milk protein genes have been described for a number of milk proteins in several marsupial species. In particular, lactation stagespecific genes, such as early lactation protein (ELP), mid-late whey acidic protein (WAP) and late lactation proteins (LLP-A and LLP-B), have been characterised in the tammar and other marsupial species (Bird et al. 1994; Demmer et al. 2001; Green et al. 1980, 1991; Nicholas et al. 1987, 2001; Simpson et al. 1998; Trott et al. 2002). With the exception of WAP which is also found in milk of many eutherians (Hennighausen and Sippel 1982) but not in humans, goat and ewe (Hajjoubi et al. 2006), all of these phase-specific milk proteins are marsupial-specific and have not been found in eutherian or monotreme milk. Other marsupial-specific milk proteins include trichosurin (Piotte et al. 1998) and the newly identified putative proteins include PTMP-1 and PTMP-2 (Lefevre et al. 2007). PTMP-1 does not occur in the genome sequence of the American marsupial opossum and may be Macropod lineage-specific.

7.3.3

Tammar Mammary Transcriptome Sequencing

We have also reported expression of marsupial genes quantified by sampling the mammary transcriptome at specific stages of the tammar lactation cycle (Lefevre

124

C.M. Lefe`vre et al.

Fig. 7.4 Mammary gland transcriptome from the Marsupial tammar wallaby at different stages of the lactation cycle

et al. 2007) by shallow and deep cDNA sequencing methods (Fig. 7.4). Ten percent of the mammary transcriptome was estimated to represent marsupial-specific genes and 15% mammal-specific genes. These results have also identified non-coding RNA expressed during lactation. PTNC-1 is a novel non-coding RNA derived from a region of the genome that is ultra-conserved in mammals suggesting an important functional role. Other non-coding RNAs candidates have also been identified. Further work will be required to characterise the function of these molecules. During the course of lactation, the tammar mammary gland expresses a limited number of common or phase-specific milk protein genes at high and increasing levels. This accounts for over 60% of all transcripts during copious late lactation. The remaining transcripts predominantly represent translational machinery components, immune-related product or genes involved in energy production. These results depict the lactating mammary gland as an organ highly specialised in the synthesis of milk. Observations from the mammary tissue of late pregnant animals have shown how the late pregnant mammary gland is primed for the rapid commencement of milk production after parturition. Similarly, the large increase in protein content of tammar milk during mid to late lactation is accompanied by an increase of secreted milk protein gene expression in the mammary gland. Secreted protein gene expression correlates with growth of the mammary gland, growth of the young, milk production and milk protein synthesis, which all steadily increase during the lactation cycle (Findlay 1982; Green et al. 1980). This global change of gene expression in the mammary gland may reflect a combination of changes in cellular gene expression and cell type populations within the tissue. As the mammary gland size steadily increases during the lactation cycle, progressive replacement of the stroma by alveolar tissue during the course of pregnancy and lactation, and a marked increase of alveolar size during late lactation have been described (Findlay 1982). The increase in relative abundance of milk protein transcripts may correspond to an increase of milk protein gene expression in mammary epithelial cells (lactogenesis) only or an increase in the number and proportion of secretory epithelial cells in the mammary gland during lactation (mammogenesis).

7

Comparative Genomics and Transcriptomics of Lactation

125

The mammary transcriptome most likely represent a combination of these processes. Transcriptomics of milk cells in this species as described above for monotremes would provide interesting complementary data. The combination of cDNA and signature digital sequencing methodologies has highlighted some of the caveat and limitations of sequencing approaches for the study of gene expression in the highly specialised mammary gland. In lactating tissue with a large dominance of milk protein transcripts, sequencing is less effective method for gene discovery. Next generation sequencing might overcome these limitations in the future. One advantage of digital sequencing for the estimation of gene expression over differential gene expression estimation by microarray is that it provides an estimation of relative mRNA levels. However, the ongoing development of marsupial microarrays will allow the detail analysis of differential expression of a larger gene catalogue to investigate the molecular, hormonal and cellular mechanisms involved in the regulation of lactation in marsupials.

7.4

Eutherian Lactation: Fur Seal Adaptation and Mammary Gland Involution

Within eutherian animal diversity the Pinniped family includes a variety of extreme adaptations of the lactation system, containing species with some of the shortest lactation periods or, most interestingly, species with the most elongated periods between successive nursing periods.

7.4.1

Adaptations of Lactation in Pinnipeds

The three families of Pinnipeds, comprising Phocids (true seals), Odobenids (walrus), and Otariids (sea lions, fur seals), evolved from a carnivorous ancestor around 25 million years ago and diverged during the middle Miocene (10 million years ago) (Fordyce 2002). Each family adopted different approaches to lactation. The walrus has the lowest reproductive rate of any pinniped species. Calves accompany their mother from birth, nursing on demand during these trips and are not weaned for 2 years or more. Phocid seals evolved large sizes to reduce heat loss, risk of predation and increased body reserves. This enabled them to adopt a “fasting strategy” of lactation (Oftedal et al. 1987) whereby amassed body reserves of stored nutrients facilitate fasting on land during continuous milk production over relatively short periods (4–42 days, depending on the species). In contrast, otariid seals retained smaller body sizes and insulating fur adopting a “foraging lactation” strategy, breeding at rockeries to gain proximity to local prey resources (Bonner 1984). Reduced prey availability and the need to exploit

C.M. Lefe`vre et al.

126

resources farther off shore led to extended lactation (4–12 month) with a reduction of foraging trip frequency and an extension of the foraging period. Otariid seals produce milk with no detectable lactose and have adopted a lactation strategy which is characterised by alternation between periods of several days of copious milk production on shore and extended periods of maternal foraging at sea (Bonner 1984). Inter-suckling intervals of up to 23 days are among the longest ever recorded for a mammal (Bonner 1984). For other mammals in general, accumulation of milk in the mammary gland when suckling is interrupted causes rapid down regulation of milk protein gene expression, followed by involution via apoptotic cell loss after a few days (Li et al. 1997). However, in otariid the mammary gland remains functional despite sustained interruptions in suckling activity.

7.4.2

The Mammary Transcriptome of an Otariid: The Lactating Fur Seal During the Foraging Period

The mammary transcriptome from the mammary gland of a foraging Cape fur seal (Arctocephalus pusillus) is represented in Fig. 7.5. In contrast to the tammar transcriptome in Fig. 7.4, milk proteins are less predominant. During foraging periods at sea in the absence of sucking, fur seal mammary glands have been recorded to produce 80% less milk than when lactating on land (Arnould and

Lysozyme

CSN1S2

CSN1S1

CSN2 IGJ Serum amyloid A-3 ? CSN3

Fig. 7.5 Mammary gland transcriptome of lactating fur seal during the foraging period

7

Comparative Genomics and Transcriptomics of Lactation

127

Boyd 1995), and milk protein gene expression decreases (Sharp et al. 2006), aspects which are common with cessation of sucking in other mammalian species and characteristic of the reversible initiation phase of involution. However, in other mammals these events are rapidly followed by involution with marked apoptotic mammary gland cell death. The fur seal mammary gland does not pursue involution at this time (Sharp et al. 2006) and remains active in readiness for return to land to continue nursing the young.

7.4.3

Adaptation of Otariid Lactation Suggest a Key Role for Alpha-Lactalbumin in Mammary Gland Involution

Transcriptomics and genome analysis of three otariids: Cape fur seal (Arctocephalus pusillus), California sea lion (Zalophus californianus) and Antarctic fur seal (Arctocephalus gazella) and three phocids: grey seal (Halichoerus grypus), ringed seal (Pusa hispida) and harbour seal (Phoca vitulina) have shown that the expression of LALBA has been knocked down during otariid evolution due to a cis-acting mutations in the promoter region (Sharp et al. 2008). LALBA encodes alphalactalbumin, a milk protein involved in lactose synthesis. There are other examples in nature where lactose is not required for milk production. In tammar wallaby (Macropus eugenii) milk, carbohydrate is low and lactose is absent throughout peak lactation (Messer and Elliott 1987) during which time other unknown factors act as the major osmole, demonstrating that lactose is not necessary for milk production. LALBA was reported to cause apoptosis of mouse and human mammary epithelial cell lines and fur seal primary mammary cells (Sharp et al. 2008). Modified LALBA (combined with oleic acid) also causes apoptosis of tumour cell lines (Tolin et al. 2009). Whether absence of LALBA alone or in combination with other changes is responsible for the delay of involution in otariid remains to be established, but the extinction of LALBA expression apparently represents a key event in the evolution of lactation in this family.

7.5

Discussion

Genome analysis has shown that, in general, milk protein genes are not co-clustered together in the genome except for the caseins. The conserved genomic organisation of the caseins genes (Lefevre et al. 2009; Warren et al. 2008) or the co-clustering of other milk proteins with mammary genes in the bovine genome (Lemay et al. 2009) suggests that the need for coordinate expression during lactation may be an influential factor in shaping the genome of mammals. Compared with other genes of the bovine genomes, mammary and milk genes are more conserved in mammals and evolve slowly in the bovine lineage. The most conserved proteins are

C.M. Lefe`vre et al.

128

associated with secretory processes, especially component of the milk fat globule, while the most divergent are associated with nutritional and immunological components of milk (Lemay et al. 2009). In all, the high conservation of mammary genes suggests that lactation evolved by co-opting existing structures and pathways for the synthesis and secretion of copious milk (Lemay et al. 2009; Menzies et al. 2009c), and that a complex lactation system was already fully implemented in early mammals. The apparently strong negative selection and the absence of positive selection in milk and mammary genes support the hypothesis that milk evolution has been constrained to optimise survival of both mother and offspring. Further analysis of the mammalian diversity will be needed to confirm this or identify differential constraints on the molecular components and biological pathways of lactation. Significantly more mammary gene duplications have occurred since the divergence of the monotremes and therians than for other bovine genes. This variability in copy number may be in part responsible for the variability in milk composition. The regulation of transcription and other physiological energy partitioning processes may also play a role and studies on the transcriptional regulation of genes in epithelial cell culture, mammary explants or mammary gland tissue from a number of animal models are starting to address this aspect (Brennan et al. 2008; Lemay et al. 2007; Menzies et al. 2009a, b, c; Rudolph et al. 2003). A number of generaspecific milk proteins have also been identified, especially in marsupials. For ubiquitous milk proteins such as caseins and WAP, lineage-specific recombination of protein domains has been described. A detailed analysis of the structure of WAP has shown extensive rearrangements of the genes in mammalian lineages leading to a reorganisation of the number of exons from monotremes to marsupials and eutherians while a functional gene has been lost in human, cow and goats (Sharp et al. 2007). Preliminary experiments suggest specific WAP domains carry unique functional activities in different lineages (Topcic et al. 2009). These studies provide a broad picture of the evolutionary landscape of lactation revealing the importance of conserved metabolic and secretory pathways concurrently with the modular reorganisation of existing milk components or the appearance of specific milk proteins. This mix of robustness and flexibility allows the adoption of a diversity of lactation strategies under physiologic, behavioural or environmental conditions. Thus, both the ancient and highly conserved or the more variable and specific molecular components of lactation are, in part, responsible for the success of the mammals to survive, adapt and evolve.

7.6

Conclusion

During mammalian radiation, species have diversified lactation strategies to accommodate reproductive success and adapt to the environment. There is much to learn from the natural resource of animal diversity about the genetics of lactation. This has been illustrated by the comparative analysis of gene expression in a variety of

7

Comparative Genomics and Transcriptomics of Lactation

129

lactating mammalian lineages. Sequencing approaches will enable a broader exploration of lactation diversity. We have shown that milk cells provide easy access to functional data. Comparative genome analysis of the lactation system is also a new and complementary methodology. It will then become possible to study in detail how the evolutionary constraints on lactation vary between lineages depending on lactation strategies or environmental adaptations. In all mammals, milk provision is a complex process with changes in milk composition and interactions between parent and young beyond the straightforward nutritional function. The role of milk on the mammary gland or the development of the young is starting to emerge through studies of lactation in mammals with extreme adaptations of the lactation systems such as fur seals or marsupials. Such adaptations provide valuable models to enhance our understanding of the biology of lactation. The central role of milk is best studied in animal models with extreme adaptation to lactation that allow researchers to more easily identify regulatory mechanisms that are present, but not as readily apparent in eutherian species (Nicholas et al. reviews). Early development of the eutherian young is programmed and regulated in utero. Inappropriate signalling results in abnormal development and mature onset disease. The marsupial gives birth to an altricial young and much of the early development is regulated by milk. It is now apparent that new roles for milk are emerging and future studies using the marsupial and other models will allow researchers to more fully understand the central role of milk to deliver timedependent signals for both growth and development of the young, protect the young and mammary gland from infection and regulate the development and function of the mammary gland. A better understanding of the temporal delivery of these signals will provide new opportunities for treatment and prevention of disease. The results presented here have illustrated how comparative analysis of lactation by genomics and transcriptomics enables a better understanding of the role of milk in the programming of mammalian development.

References Arnould JPY, Boyd IL (1995) Temporal patterns of milk production in Antarctic fur seals (Arctocephalus gazella). J Zool 237:1–12 Ballard FJ, Grbovac S, Nicholas KR, Owens PC, Read LC (1995) Differential changes in the milk concentrations of epidermal growth factor and insulin-like growth factor-I during lactation in the tammar wallaby, Macropus eugenii. Gen Comp Endocrinol 98:262–268 Bininda-Emonds OR, Cardillo M, Jones KE, MacPhee RD, Beck RM et al (2007) The delayed rise of present-day mammals. Nature 446:507–512 Bird PH, Hendry KA, Shaw DC, Wilde CJ, Nicholas KR (1994) Progressive changes in milk protein gene expression and prolactin binding during lactation in the tammar wallaby (Macropus eugenii). J Mol Endocrinol 13:117–125 Bonner WN (1984) Lactation strategies in pinnipeds: problems for a marine mammalian group. Symp Zool Soc Lond 51:253–272

130

C.M. Lefe`vre et al.

Brennan AJ, Sharp JA, Digby MR, Nicholas KR (2007) The tammar wallaby: a model to examine endocrine and local control of lactation. IUBMB Life 59:146–150 Brennan AJ, Sharp JA, Lefevre CM, Nicholas KR (2008) Uncoupling the mechanisms that facilitate cell survival in hormone-deprived bovine mammary explants. J Mol Endocrinol 41:103–116 Demmer J, Stasiuk SJ, Grigor MR, Simpson KJ, Nicholas KR (2001) Differential expression of the whey acidic protein gene during lactation in the brushtail possum (Trichosurus vulpecula). Biochim Biophys Acta 1522:187–194 Elsik CG, Tellam RL, Worley KC, Gibbs RA, Muzny DM et al (2009) The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science 324:522–528 Findlay L (1982) The mammary glands of the tammar wallaby (Macropus eugenii) during pregnancy and lactation. J Reprod Fertil 65:59–66 Fordyce RE (ed) (2002) Fossil record. Academic Press, San Diego, California, USA, pp 453–471 Green B, Newgrain K, Merchant J (1980) Changes in milk composition during lactation in the tammar wallaby (Macropus eugenii). Aust J Biol Sci 33:35–42 Green B, Griffiths M, Leckie RM (1983) Qualitative and quantitative changes in milk fat during lactation in the tammar wallaby (Macropus eugenii). Aust J Biol Sci 36:455–461 Green B, VandeBerg JL, Newgrain K (1991) Milk composition in an American marsupial (Monodelphis domestica). Comp Biochem Physiol B 99:663–665 Griffiths M (1978) The biology of monotremes. Academic Press, New York, NY Guss JM, Messer M, Costello M, Hardy K, Kumar V (1997) Structure of the calcium-binding echidna milk lysozyme at 1.9 A resolution. Acta Crystallogr D Biol Crystallogr 53:355–363 Hajjoubi S, Rival-Gervier S, Hayes H, Floriot S, Eggen A et al (2006) Ruminants genome no longer contains whey acidic protein gene but only a pseudogene. Gene 370:104–112 Hendry KA, Simpson KJ, Nicholas KR, Wilde CJ (1998) Autocrine inhibition of milk secretion in the lactating tammar wallaby (Macropus eugenii). J Mol Endocrinol 21:169–177 Hennighausen LG, Sippel AE (1982) Characterization and cloning of the mRNAs specific for the lactating mouse mammary gland. Eur J Biochem 125:131–141 Hopper KE, McKenzie HA (1974) Comparative studies of alpha-lactalbumin and lysozyme: echidna lysozyme. Mol Cell Biochem 3:93–108 Jenness R (1986) Lactational performance of various mammalian species. J Dairy Sci 69:869–885 Jones WK, Yu-Lee LY, Clift SM, Brown TL, Rosen JM (1985) The rat casein multigene family. Fine structure and evolution of the beta-casein gene. J Biol Chem 260:7042–7050 Joseph M, Griffiths M (1992) Whey proteins in milks of monotremes and wallabies. Australian Mammology 14:125–127 Kawasaki K, Weiss KM (2003) Mineralized tissue and vertebrate evolution: the secretory calciumbinding phosphoprotein gene cluster. Proc Natl Acad Sci USA 100:4060–4065 Lefevre CM, Digby MR, Whitley JC, Strahm Y, Nicholas KR (2007) Lactation transcriptomics in the australian marsupial, Macropus eugenii: transcript sequencing and quantification. BMC Genomics 8:417 Lefevre CM, Sharp JA, Nicholas KR (2009) Characterisation of monotreme caseins reveals lineagespecific expansion of an ancestral casein locus in mammals. Reprod Fertil Dev 21:1015–1027 Lemay DG, Neville MC, Rudolph MC, Pollard KS, German JB (2007) Gene regulatory networks in lactation: identification of global principles using bioinformatics. BMC Syst Biol 1:56 Lemay DG, Lynn DJ, Martin WF, Neville MC, Casey TM et al (2009) The bovine lactation genome: insights into the evolution of mammalian milk. Genome Biol 10:R43 Lemon M, Bailey LF (1966) A specific protein difference in the milk from two mammary glands of a red kangaroo. Aust J Exp Biol Med Sci 44:705–707 Li M, Liu X, Robinson G, Bar-Peled U, Wagner KU et al (1997) Mammary-derived signals activate programmed cell death during the first stage of mammary gland involution. Proc Natl Acad Sci USA 94:3425–3430 Madsen O (2009) Mammals (mammalia). In: Hedges SB, Kumar SB (eds) The timetree of life. Oxford Univeristy Press, Oxford, pp 459–461

7

Comparative Genomics and Transcriptomics of Lactation

131

Maningat PD, Sen P, Sunehag AL, Hadsell DL, Haymond MW (2007) Regulation of gene expression in human mammary epithelium: effect of breast pumping. J Endocrinol 195:503–511 Menzies KK, Lee HJ, Lefevre C, Ormandy CJ, Macmillan KL, Nicholas KR (2009a) Insulin, a key regulator of hormone responsive milk protein synthesis during lactogenesis in murine mammary explants. Funct Integr Genomics 10(1):87–95 Menzies KK, Lefevre C, Macmillan KL, Nicholas KR (2009b) Insulin regulates milk protein synthesis at multiple levels in the bovine mammary gland. Funct Integr Genomics 9:197–217 Menzies KK, Lefevre C, Sharp JA, Macmillan KL, Sheehy PA, Nicholas KR (2009c) A novel approach identified the FOLR1 gene, a putative regulator of milk protein synthesis. Mamm Genome 20:498–503 Mercier JC, Chobert JM, Addeo F (1976) Comparative study of the amino acid sequences of the caseinomacropeptides from seven species. FEBS Lett 72:208–214 Messer M, Elliott C (1987) Changes in alpha-lactalbumin, total lactose, UDP-galactose hydrolase and other factors in tammar wallaby (Macropus eugenii) milk during lactation. Aust J Biol Sci 40:37–46 Messer M, Griffiths M, Rismiller PD, Shaw DC (1997) Lactose synthesis in a monotreme, the echidna (Tachyglossus aculeatus): isolation and amino acid sequence of echidna alpha-lactalbumin. Comp Biochem Physiol B Biochem Mol Biol 118:403–410 Mori Y, Hiraki Y, Shukunami C, Kakudo S, Shiokawa M et al (1997) Stimulation of osteoblast proliferation by the cartilage-derived growth promoting factors chondromodulin-I and -II. FEBS Lett 406:310–314 Murphy LC, Tsuyuki D, Myal Y, Shiu RP (1987) Isolation and sequencing of a cDNA clone for a prolactin-inducible protein (PIP). Regulation of PIP gene expression in the human breast cancer cell line, T-47D. J Biol Chem 262:15236–15241 Nicholas KR (1988) Asynchronous dual lactation in a marsupial, the tammar wallaby (Macropus eugenii). Biochem Biophys Res Commun 154:529–536 Nicholas KR, Messer M, Elliott C, Maher F, Shaw DC (1987) A novel whey protein synthesized only in late lactation by the mammary gland from the tammar (Macropus eugenii). Biochem J 241:899–904 Nicholas K, Simpson K, Wilson M, Trott J, Shaw D (1997) The tammar wallaby: a model to study putative autocrine-induced changes in milk composition. J Mammary Gland Biol Neoplasia 2:299–310 Nicholas KR, Fisher JA, Muths E, Trott J, Janssens PA et al (2001) Secretion of whey acidic protein and cystatin is down regulated at mid-lactation in the red kangaroo (Macropus rufus). Comp Biochem Physiol A Mol Integr Physiol 129:851–858 Oftedal OT, Boness DJ, Tedmam RA (1987) The behaviour, physiology, and anatomy of lactation in the Pinnipedia. Curr Mammal 1:175–245 Peaker M (2002) The mammary gland in mammalian evolution: a brief commentary on some of the concepts. J Mammary Gland Biol Neoplasia 7:347–353 Piotte CP, Hunter AK, Marshall CJ, Grigor MR (1998) Phylogenetic analysis of three lipocalinlike proteins present in the milk of Trichosurus vulpecula (Phalangeridae, Marsupialia). J Mol Evol 46:361–369 Rijnkels M (2002) Multispecies comparison of the casein gene loci and evolution of casein gene family. J Mammary Gland Biol Neoplasia 7:327–345 Rudolph MC, McManaman JL, Hunter L, Phang T, Neville MC (2003) Functional development of the mammary gland: use of expression profiling and trajectory clustering to reveal changes in gene expression during pregnancy, lactation, and involution. J Mammary Gland Biol Neoplasia 8:287–307 Sharp JA, Cane KN, Lefevre C, Arnould JP, Nicholas KR (2006) Fur seal adaptations to lactation: insights into mammary gland function. Curr Top Dev Biol 72:275–308 Sharp JA, Lefevre C, Nicholas KR (2007) Molecular evolution of monotreme and marsupial whey acidic protein genes. Evol Dev 9:378–392

132

C.M. Lefe`vre et al.

Sharp JA, Lefevre C, Nicholas KR (2008) Lack of functional alpha-lactalbumin prevents involution in cape fur seals and identifies the protein as an apoptotic milk factor in mammary gland involution. BMC Biol 6:48 Shaw DC, Messer M, Scrivener AM, Nicholas KR, Griffiths M (1993) Isolation, partial characterisation, and amino acid sequence of alpha-lactalbumin from platypus (Ornithorhynchus anatinus) milk. Biochim Biophys Acta 1161:177–186 Simpson K, Shaw D, Nicholas K (1998) Developmentally-regulated expression of a putative protease inhibitor gene in the lactating mammary gland of the tammar wallaby, Macropus eugenii. Comp Biochem Physiol B Biochem Mol Biol 120:535–541 Tolin S, De Franceschi G, Spolaore B, Frare E, Canton M et al (2009) The oleic acid complexes of proteolytic fragments of alpha-lactalbumin display apoptotic activity. FEBS J 277(1):163–173 Topcic D, Auguste A, De Leo AA, Lefevre C, Digby MR, Nicholas KR (2009) Characterization of the tammar wallaby (Macropus eugenii) whey acidic protein gene: new insights into the function of the protein. Evol Dev 11:363–375 Trott JF, Wilson MJ, Hovey RC, Shaw DC, Nicholas KR (2002) Expression of novel lipocalin-like milk protein gene is developmentally-regulated during lactation in the tammar wallaby, Macropus eugenii. Gene 283:287–297 Trott JF, Simpson KJ, Moyle RL, Hearn CM, Shaw G et al (2003) Maternal regulation of milk composition, milk production, and pouch young development during lactation in the tammar wallaby (Macropus eugenii). Biol Reprod 68:929–936 Tyndale-Biscoe H, Janssens PA, Australian Academy of Science, Australian Society for Reproductive Biology, Australian Mammal Society (1988) The developing marsupial: models for biomedical research, vol viii. Springer-Verlag, Berlin, p 245 Waite R, Giraud A, Old J, Howlett M, Shaw G et al (2005) Cross-fostering in Macropus eugenii leads to increased weight but not accelerated gastrointestinal maturation. J Exp Zool 303:331–344 Warren WC, Hillier LW, Marshall Graves JA, Birney E, Ponting CP et al (2008) Genome analysis of the platypus reveals unique signatures of evolution. Nature 453:175–183 Yamagoe S, Mizuno S, Suzuki K (1998) Molecular cloning of human and bovine LECT2 having a neutrophil chemotactic activity and its specific expression in the liver. Biochim Biophys Acta 1396:105–113

Chapter 8

Evolutionary Dynamics in the Aphid Genome: Search for Genes Under Positive Selection and Detection of Gene Family Expansions Morgane Ollivier and Claude Rispe

Abstract Aphids have a high adaptative potential and their capacity to adapt to various environments could be linked with specific expansions in gene repertoires. A large scale acquisition of genomic data has been recently undertaken with the genome of Acyrthosiphon pisum (reference gene set) and EST data from three other species: Myzus persicae, Aphis gossypii and Toxoptera citricida. We identified paralogs through an intra-genomic Reciprocical Best Hit search in A. pisum and highlighted a high and steady level of duplications in A. pisum. We assembled, ESTs, predicted coding sequences and identified pairs of orthologs with A. pisum. We identified a fraction of fast-evolving sequences (high ratio of non-synonymous to synonymous rates) including genes shared by aphids but not identified in nonaphid species. Phylogenetic study of fast-evolving genes (Apo, C002, Spaetzel) shows that rate accelerations and duplication events are linked and could favour the emergence of specific biological functions.

8.1

Introduction

Studies of the adaptation of species to their environment have historically been focused on analyses of phenotypic variation. The enormous increase in sequence data now allows to directly detect at the gene level processes which contribute to adaptation. In a given population, genes are under drift and selection effects. Selection can act against deleterious mutation or in favour of advantageous mutations. It is possible to detect traces of selection on genomes by comparing

M. Ollivier and C. Rispe INRA, UMR 1099 BiO3P, Domaine de la Motte, F-35653, Le Rheu, France e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_8, # Springer-Verlag Berlin Heidelberg 2010

133

134

M. Ollivier and C. Rispe

homologous sequences from different organisms and by computing maximum likelihood (ML) synonymous (dS) and non-synonymous (dN) evolutionary rates. The ratio omega (dN/dS) is indeed used as an indicator of variable evolutionary pressures among protein-coding genes: low ratios are typical of highly constrained sequences (under purifying selection), while values close to unity would reflect relaxed selection and values above unity would result from positive selection. Another interesting point to consider are gene duplications. Whereas the majority of duplicated sequences are removed from a genome, duplications can provide new evolutionary opportunities as duplicated genes are often under particular selective pressures (either relaxation or positive selection). Aphids (Insecta: Hemiptera) are small insects that feed on plant sap. Some species are crop-feeding and considered as pests in agriculture. Their effects on crops are enhanced by host–plant specialisation (Hawthorne and Via 2001; Hufbauer and Via 1999) and their rapid demographic increases due to viviparous clonal reproduction (parthenogenesis). Phenotypic plasticity (of reproductive mode, of dispersal) enhances their high adaptive potential. Their life cycle is remarkable as it shows alternation of asexual and sexual reproduction (Fig. 8.1). The capacity of aphids to adapt to various environments could be linked with shifts in gene repertories (more genes/specific gene regulation) and expansions of specific gene families. Recently, the genome of the pea aphid Acyrthosiphon pisum (Aphidinae, Macrosiphini) has been completely sequenced as a joint effort of the International Aphid Genomics Consortium (2010); it comprises close to 34,000 predicted genes. Collections of ESTs are available for three other aphid species, Myzus persicae, Toxoptera citricida and Aphis gossypii. This body of data provides a significant material to analyse fine-scale evolutions (selective pressures on the different genes, amplification of some gene families) and to relate specific evolutions in the aphid genome and biological adaptations in the pea aphid genome and between aphid species. To better evaluate the adaptive aspects of gene repertoires and gene sequence evolution in aphids as a group, we developed a two step approach. The first consisted in an evaluation of the importance of duplication in the pea aphid genome, by comparison with two others insect genomes. The second study was a comparison of the coding genomes of different aphid species comprising two different tribes (Macrosiphini and Aphidini) and characterised by different life-cycle and host–plant preferences: A. pisum, M. persicae, A. gossypii and T. citricida. For these comparisons, we used the pea aphid genome and all ESTs available for the three other species. We quantified the fraction of genes shared by different aphid species but unknown from other insects, which thus could play a special role in the biology of aphids. We also analysed the patterns of divergence among putative orthologs, and especially focus on fast-evolving sequences, which could be so as a result of positive selection and rapid adaptation to environmental changes, or strong co-evolutionary interactions as those between insects and their host–plant.

8

Evolutionary Dynamics in the Aphid Genome

135

Viviparous female Spring

Winter

Summer

Parthenogenetic females n clonal generations

Eggs Fall

“Sexual” lineage 1 Sexual generation Sexuals Fig. 8.1 Life cycle of the pea aphid Acyrthosiphon pisum. A parthenogenetic female generates several clonal generations. In fall, the photoperiod decreases and parthenogenetic female produces males and oviparous females that can mate. Oviparous females produce eggs that can stay in diapause during the winter. In spring, Viviparous females emerge from the eggs

8.2

Dynamic of Duplication During the Evolutionary Time (IAGC, Plos Biology, 2010)

Genome comparisons are very efficient to detect specificity of gene repertoires among species, like relative duplication phenomenon (e.g. as it has been done in the Drosophila genus Zdobnov et al. 2002; Heger and Ponting 2007). In a group, it is possible to measure the relative importance and the dynamic of duplications between genomes. A “self-blast” of the coding sequences (CDS) from a genome can indeed allow identifying paralogous genes. We then can measure the divergence time between copies with the dS rate which is a rough measure of evolutionary time since duplication. This method has been efficient to detect global duplication events in Arabidopsis thaliana (Blanc and Wolfe 2004) or Paramecium

136

M. Ollivier and C. Rispe

tetraurelium (Aury et al. 2006), which appeared as very clear peaks in the distribution of dS distances among all paralogs. With this method, we studied the A. pisum genome and for comparison two other insect genomes, D. melanogaster (Adams et al. 2000), which comprises more than 14,000 predicted genes, and Apis mellifera (Weinstock et al. 2006), which comprises about 9,000 predicted genes. Each coding genome was blasted on itself (blastP, Evalue ¼ 1.0 e10). Reciprocical Best Hit (RBH) (Hirsh and Fraser 2001; Jordan et al. 2002) in each genome were considered as potential gene copies dating back to the nearest duplication event. We then aligned and computed the synonymous mutation rate between all RBH pairs of sequences in the three genomes using a codon-based model (Codeml from PAML; Yang 1997). Comparison of dS gene value distributions across the pea aphid, fruitfly and honeybee genomes (Fig. 8.2) shows a particularly high and steady level of duplications in the pea aphid genome, well above that observed in the bee and fruitly genomes.

4000

3500

Pairs of paralogs

3000

2500

2000

1500

1000

500

0 0 – 0.25

0.25 – 0.50

0.50 – 0.75

0.75 – 1.00

1.00 – 1.25

1.25 – 1.50

Classes of dS (synonymous changes per sites)

Fig. 8.2 Widespread gene duplication in an ancestor of the pea aphid as suggested by the distributions of synonymous divergences among pairs of recent paralogs (Reciprocal Best Hits) within pea aphid, honey bee and drosophila. Black: Acyrthosiphon pisum, grey: Drosophila melanogaster, white: Apis mellifera

8

Evolutionary Dynamics in the Aphid Genome

8.3

137

Comparative Analysis of the A. pisum Genome and EST-Based Genes Sets from Other Aphid Species (Ollivier et al. IMB, Accepted)

Comparisons of the gene repertoires of related organisms and of the evolutionary rates of genes may bring insights about the genes and functions that are particularly significant at the biological level for that group of organisms.

8.3.1

Search for Orthologous Genes

We assembled ESTs for three aphid species: Myzus persicae, Aphis gossypii and Toxoptera citricida. From these collections of unigenes, we predicted CDS in each species. They are available in Aphidbase (http://www.aphidbase.com; Legeai et al. 2010). We identified putative orthologs thanks to RBH method and found 259 RBH between the four species (restricted set biased towards genes with a high level of expression), 4649 RBH between A. pisum and M. persicae, 1789 RBH between A. pisum and A. gossypii and 982 RBH between A. pisum and T. citricida. Evolutionary rates (non-synonymous mutation rates, dN; Synonymous mutation rate, dS and dN/dS ratio) were computed between all orthologous pairs of sequences, using codeml from PAML.

8.3.2

Pairwise Comparisons and Estimation of Evolutionary Rates

Distributions of dN/dS ratios (Fig. 8.3) were similar for the three pairwise comparisons, A. pisum/M. persicae, A. pisum/A. gossypii and A. pisum/T. citricida. We observed three L-shaped distributions with a low mode and a long right tail corresponding to RBH with the highest ratios in all comparisons. We focused on those sequences, as they might be fast-evolving genes of particular interest, and found 248, 60 and 32 genes for which dN/dS > 0.4, for the three comparisons respectively. We also recorded all sequences that were RBH in the three pairwise comparisons and which had no hit in Uniprot (tentative aphid-specific genes). This category comprised 10% of all pairwise RBH, so 445, 159 and 66 genes respectively in the three comparisons. In this sets, dN and dN/dS ratios were three times higher than in the reference set (P value >10 103, Z-test), and 50% of those genes had a dN/dS > 0.40. This suggests that those genes are evolving particularly fast at the proteic level and are under positive or relaxed selection. It can also explain why those genes are

138

M. Ollivier and C. Rispe 2500

Number of RBH

2000

1500

1000

500

0 0 – 0.1

0.1 – 0.2

0.2 – 0.3 dN/dS ratio

0.3 – 0.4

> 0.4

Fig. 8.3 Distribution of the estimated pairwise ratio of non-synonymous to synonymous divergence, for RBH genes among the pea aphid (complete genome) and EST-based gene sets from each of three other aphid species. White: A. pisum/M. persicae, black: A. pisum/T. citricida, grey: A. pisum/A. gossypii

only recognised within aphids: they may have diverged too much from other related sequences in other animals group.

8.3.3

Phylogenetic Analyses of Two Fast-Evolving Sequences

8.3.3.1

Gene “Apo”: Example of Specific Lineage Duplications

This gene, with no similarity in Uniprot database, presented high dN/dS ratios in pairwise comparisons. We found this gene in all aphid species and in four copies in A. pisum. An ML phylogenetic tree (Fig. 8.4) strongly supported the grouping of the four A. pisum copies, suggesting a lineage-specific duplication. A free–ratio model (PAML) was significant and showed an increase of the dN/dS ratio for Apo2 (1.69), the Apo3/Apo4 (1.66) group and the ancestral branch to A. pisum (2.02); whereas the dN/dS ratio for M. persicae, T. citricida and A. gossypii branches are under 0.40. The ratio increases were related with duplication events. Similar pattern was found for other sequences like Juvenil Hormone Acid Methyl transferase and Glycosyl-hydrolase (see Ollivier et al. 2010, IMB). In each case, we found strong increases of the dN/dS ratios consistent with specific lineage duplications. This shows that duplication strongly influenced evolutionary rates, possibly as the result of an adaptative process.

8

Evolutionary Dynamics in the Aphid Genome

139 A. pisum - Apo1

A. pisum - Apo2 95 A. pisum - Apo3 100 A. pisum - Apo4

M. persicae

T. citricida 100

A. gossypii

0.05

Fig. 8.4 Maximum likelihood tree of “Apo” gene in four aphid species (Lnl ¼ 1683.99, Gamma ¼ 2.21; Likelihood settings from best-fit model (TrNþG) selected by AIC in Modeltest). Bootstraps values indicated under nodes

8.3.3.2

Protein C002: A Specific Protein of Aphid Lineage

This protein, as an example, presents a high dN/dS ratio between A. pisum and M. persicae (0.57). This gene has no hit in uniprot. We found these genes in a single copy in the four species considered. The global dN/dS ratio (one–ratio model from PAML) computed on the species tree was exceptionally high at 0.73. This gene has recently been identified as specific to salivary glands and essential in feeding (Mutti et al. 2008): this protein is transferred from aphid to plant during feeding; C002 knock-down insects die prematurely. We may thus interpret this very high rate as the result of an adaptative response of strong plant interaction. The fact that these gene has no homologs in other insects group suggests too a specific adaptation.

8.3.4

Functional Annotation of Fast-Evolving Genes

We compiled the 5139 A. pisum sequences found in RBH pairs: 3141 could be annotated through Blast2GO (Conesa et al. 2005; http://www.blast2go.org/) with 26.138 GO terms. We found an annotation for 60% of A. pisum sequences, but,

140

M. Ollivier and C. Rispe

analysing separately the “Fast-Evolving” (dN/dS > 0.40) genes, only 30% were annotated. The sets of annotated A. pisum sequences were too small to make statistical comparisons in A. pisum/A. gossypii and A. pisum/T. citricida comparisons. However, in the A. pisum/M. persicae comparison, we found significant differences among frequencies of GO categories between the “fast-evolving” subset of sequences and the rest of genes. 22 GO were over represented in the subset (P value < 0.01, exact Fisher’s test). One category that appears significantly enriched under Fisher’s test is of particular interest: genes annotated as “defence response to fungus”, genes “cactus” and “Spaetzel”. They are involved in development and innate immunity in the Toll signalling pathway. Genes involved in defence and immunity are relatively few in A. pisum overall (Gerardo et al. 2010). dN/dS ratios are respectively 0.50 and 0.44 for the Cactus and Spaetzel gene, and while Cactus is single copy, we found five copies in Spaetzel gene resulting from a serial lineage duplication. These duplications may have enhanced increases of non-synonymous substitution rates in Spaetzel lineage. Aphids present a particular immune system pattern and genes involved in this function seem evolving in a particular pathway. These genes are thus probably under strong selective pressure.

8.4

Conclusion and Prospects

We highlighted an unusually high rate of duplication in A. pisum genome. This finding can give us new insights to test theoretical predictions on the relation between duplications and evolutionary rates (Ohno 1970). Because cases of positive selection (Hugues 1994) often occur among gene families, we expected that a large fraction of the pea aphid genome is thus concerned by patterns of accelerated evolution, which could favour the emergence of new biological functions and of adaptations. The comparisons of A. pisum genome and EST-based gene sets from three other species, even though they constitute partial genomes helped highlighting two particular gene sets: fast-evolving genes and/or genes that are aphid specific. The fact that some genes have no hit in non-Aphid databases can reflect a deep divergence of those genes with their ortholog in other non-aphid species. These genes could have evolved for specific functions in link to aphid biology. We have shown that duplications can strongly influence evolution rates of at least some of the gene copies. We have developed some examples of fast-evolving genes, some of them being “aphid-specific”. These genes may be under positive or relaxed selection and could be the result of an adaptative process. However, our study has been limited by relatively small number of homologous genes and the exact role of duplication in aphid adaptation remains to be demonstrated in a larger scale. We will consider, in our future prospects, two main objectives: 1. A fine-scale study of the high level of duplication and of influence of duplications on evolutionary rates. 2. We will focus on a particular biological feature in aphids: reproduction polyphenism. Some aphid species are considered as sexual and present, in

8

Evolutionary Dynamics in the Aphid Genome

141

their biological cycle, an asexual and a sexual phase, as previously described (Fig. 8.1). But some species have lost the sexual phase and have become entirely clonal. Loss of sexuality and of recombination is expected to result in an accumulation of deleterious mutations and then in the doom of asexual lineages (Kondrashov 1988). We aim to evaluate the extent by which clonal aphid species are affected by mutation accumulation and to determine their evolutionary time of persistence. For this particular project, we have obtained the sequencing of 20,000 ESTs sequences for six new Aphid species, including both taxa that maintain a sexual reproduction and taxa that are entirely clonal. Genomic data will then soon be available for more aphid species, including one complete genome (A. pisum) and partial genomes (ESTs-based data or genomic data from low-coverage sequencing projects). In such situation, as we start to refine our knowledge of genomes in the whole aphid group, a relevant strategy is to determine all possible phylomes. The group of Tonı´ Gabaldo´n (“Comparative Genomics”, CRG Barcelone) has for example developed a pipeline to generate phylomes from partial or entire genomes of several species (Huerta-Cepas et al. 2008; http://phylomedb.org/). Thanks to collaboration with this group, in Autumn 2009, we have started to generate such phylomes with the extant genomic data for aphids. This will allow us to retrieve all orthologs available between all species. Between pair of asexual and sexual species, we will thus be able to compare the accumulation of non-synonymous mutations in sexual and asexual taxa. We will also be able to quantify duplication patterns along the different branches of the aphid species tree. Finally, we will test the correlation between duplication, acceleration of evolution and specific aphid biological features.

References Adams MD, Celniker SE et al (2000) The genome sequence of Drosophila melanogaster. Science 287(5461):2185–2195 Aury JM, Jaillon O et al (2006) Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444(7116):171–178 Blanc G, Wolfe KH (2004) Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16(7):1667–1678 Conesa A, Gotz S et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674–3676 Gerardo NM, Altincicek B et al (2010) Immunity and defense in pea aphids Acyrthosiphon pisum Genome Biol 11:R21 Hawthorne DJ, Via S (2001) Genetic linkage of ecological specialization and reproductive isolation in pea aphids. Nature 412(6850):904–907 Heger A, Ponting CP (2007) Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes. Genome Res 17(12):1837–1849 Hirsh AE, Fraser HB (2001) Protein dispensability and rate of evolution. Nature 411(6841): 1046–1049

142

M. Ollivier and C. Rispe

Huerta-Cepas J, Bueno A et al (2008) PhylomeDB: a database for genome-wide collections of gene phylogenies. Nucleic Acids Res 36:D491–D496 Hufbauer RA, Via S (1999) Evolution of an aphid–parasitoid interaction: variation in resistance to parasitism among aphid populations specialized on different plants. Evolution 53(5): 1435–1445 Hugues A (1994) The evolution of functionally novel proteins after gene duplication. Proc Biol Sci 256:119–124 IAGC (2010) Genome sequence of the Pea Aphid Acyrthosiphon pisum. Plos Biol doi 10.1311/ journal.phio.1000313 Jordan IK, Rogozin IB et al (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 12(6):962–968 Kondrashov AS (1988) Deleterious mutations and the evolution of sexual reproduction. Nature 336:435–441 Legeai F, Shigenobu S et al (2010) AphidBase: a centralized bioinformatic resource for annotation of the pea aphid genome. Insect Mol Biol 19(2):5–12 Mutti NS, Louis J et al (2008) A protein from the salivary glands of the pea aphid, Acyrthosiphon pisum, is essential in feeding on a host plant. Proc Natl Acad Sci USA 105(29):9965–9969 Ohno S (ed) (1970) Evolution by gene duplication. New York, Springer Ollivier M, Legeai F, Rispe C (2010) Comparative analysis of the Acyrthosiphon pisum genome and EST-based gene sets from other aphid species. Insect Mol Biol 19(2):33–45 Weinstock GM, Robinson GE et al (2006) Insights into social insects from the genome of the honeybee Apis mellifera. Nature 443(7114):931–949 Yang ZH (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13(5):555–556 Zdobnov EM, von Mering C et al (2002) Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science 298(5591):149–159

Chapter 9

Mammalian Chromosomal Evolution: From Ancestral States to Evolutionary Regions Terence J. Robinson and Aurora Ruiz-Herrera

Abstract Chromosome painting by fluorescence in situ hybridization (FISH) has allowed the detection of regions of orthology in most orders of mammals permitting the formulation of ancestral mammalian karyotypes at higher taxonomic levels. We show (1) how the availability of genome sequence data from outgroup species has facilitated the identification of chromosomes and chromosomal segments that define eutherian monophyly, and (2) that FISH together with in silico analysis of genomic sequences point to a nonrandom distribution of evolutionary breakpoints that are rich in repeat elements and segmental duplications. These regions may mediate rearrangement by nonallelic homologous recombination between misaligned copies of duplicated regions and lead to breakpoint reuse. Characters that have arisen convergently (i.e., homoplasy), pose a significant challenge in systematics, as does lineage sorting of genetic polymorphisms across successive speciation nodes (hemiplasy). We show how hemiplasy, a theoretically plausible evolutionary phenomenon, can materially affect data sets and explore the distinction between homoplasy and hemiplasy based on persistence times of phylogenetic markers.

T.J. Robinson Evolutionary Genomics Group, Department of Botany & Zoology, University of Stellenbosch, Private Bag X1, Matieland 7602, South Africa e-mail: [email protected] A. Ruiz-Herrera Unitat de Citologia i Histologia, Departament de Biologia Cellular, Fisiologia i Inmunologia, Universitat Auto`noma de Barcelona, Campus Bellaterra, 08193 Barcelona, Spain Institut de Biotecnologia i Biomedicina, Universitat Auto`noma de Barcelona, Campus Bellaterra, 08193 Barcelona, Spain e-mail: [email protected] This manuscript is a synthesis of spoken presentations by: Robinson TJ: Molecular discoveries at the root of the eutherian tree: Homoplasy, hemiplasy and ancestral states in the phylogenetic reconstructions of mammalian karyotypes. Ruiz-Herrera A: The genomic puzzle of mammalian evolutionary breakpoints: can we track any trend?

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_9, # Springer-Verlag Berlin Heidelberg 2010

143

144

9.1

T.J. Robinson and A. Ruiz-Herrera

Introduction

Chromosome reorganization resulting from inversions, translocations, fusions, and fissions, among other structural changes, contributes to the shuffling of the mammalian genome and thus to the generation of new chromosomal forms on which natural selection may work. These rearrangements can be caused by the improper repair of double strand breaks (DSBs) and if the DNA damage occurs in the germ line and the structural rearrangements are transmissible, the modified chromosome(s) have the potential to establish in a population through selection and/or stochastic processes. It is this context that mammalian phylogenomics (the combination of genomics and phylogenetics that elucidates the phylogenetic relationships among species by analysis of their entire genomes) has become one of the most integrative fields in evolutionary biology. A component of this, specifically how chromosomal rearrangements are involved in speciation and macroevolution, is fundamental for understanding the dynamics of mammalian chromosomal evolution. In this overview, we focus on recent developments related to three topical issues in chromosomal phylogenomics. We report on recent attempts to cladistically define chromosomal characters that are consistent with eutherian monophyly by examining the composition of the putative eutherian ancestral karyotype (defined by cross-species chromosome painting, Ferguson-Smith and Trifonov 2007) and the genome assemblies of two outgroup species, the opossum (Monodelphis domestica) and chicken (Gallus gallus). Second, we summarize evidence supporting a causal relationship between segmental duplication, repetitive elements, and evolutionary breakpoints at the junction of conserved syntenies, and the propensity for breakpoint reuse among eutherian species. Finally, we examine the complications attendant in inferring evolutionary relationships from the cladistic analysis of chromosomal characters (so called rare genomic changes, Rokas and Holland 2000). We suggest that the critical distinction between homoplasy (convergence or reversal) and hemiplasy (persistence of the rearrangement across speciation nodes, Avise and Robinson 2008) may be resolved in instances where divergence times for nodes are well defined, and the persistence time is less than the divergence time from a common ancestor.

9.2

Chromosomal Evolution

Nadeau and Taylor (1984) proposed the random-breakage model of chromosomal evolution. Their thesis, which extended earlier work by Ohno (1973), emphasized three important points: (1) chromosomal segments are expected to be conserved among species, (2) that a diploid number of 48 was likely for the common ancestor of all mammals, and (3) chromosomal rearrangements are randomly distributed within genomes. Almost 40 years later, and given advances resulting from

9

Chromosomal Evolution

145

molecular cytogenetics, large-scale genome sequencing projects, and new mathematical algorithms, it is interesting to assess how prescient these early observations were.

9.2.1

Ancestral Karyotypes

Ancestral reconstructions are of interest for different reasons: (1) conserved syntenies among species allow the prediction of gene locations based on chromosomal orthologs (with clear application to species for which genomic data are not available), (2) ancestral reconstructions provide a framework for estimating rates and directions of chromosomal change, and (3) mapping karyotypic characters on evolutionary trees can highlight the importance of chromosomal change in phylogenetic reconstructions. Data derived from cross-species fluorescence in situ hybridization (Zoo–FISH) are useful for inferring the composition of ancestral karyotypes at various taxonomic and hierarchical levels, i.e., Eutheria (Chowdhary et al. 1998; Richard et al. 2003; Yang et al. 2003; Svartman et al. 2004; Ferguson-Smith and Trifonov 2007; Robinson and Ruiz-Herrera 2008), Boreoeutheria (Froenicke et al. 2006; Robinson et al. 2006), Rodentia (Graphodatsky et al. 2008), Primates (Stanyon et al. 2008), Carnivora (Graphodatsky et al. 2002), Cetartiodactyla (Balmus et al. 2007), and Perissodactyla (Trifonov et al. 2008). Of the 46 chromosomes in the putative ancestral eutherian karyotype (Fig. 9.1a) Robinson and Ruiz-Herrera (2008) show that two intact chromosome pairs (corresponding to human chromosomes 13 and 18) and three conserved chromosome segments (10q, 8q, and 19p in the human karyotype) are probably symplesiomorphic for Eutheria because they are also present as unaltered orthologs in one or both of the outgroup species (opossum and chicken). Seven additional syntenies (4q/8p/4pq, 3p/21, 14/15, 10p/12pq/22qt, 19q/16q, 16p/7a, and 12qt/22q), each involving human chromosomal segments that in combination correspond to intact chromosomes in the ancestral eutherian karyotype, are also present in one or both outgroup taxa and thus are probable symplesiomorphies for Eutheria. However, eight chromosome pairs (corresponding in toto to human chromosomes 1, 5, 6, 9, 11, 17, 20, and the X) and three chromosome segments (2p-q13, 7b, and 2q13-qter) are derived characters that support the monophyly of eutherian mammals. There is also considerable recent support for a 2n ¼ 46 chromosome number in the boreoeutherian ancestor that is not dissimilar to Ohno’s 2n ¼ 48. The boreoeutherian ancestor originally proposed by Froenicke (2005) is virtually identical to the eutherian karyotype presented by Ferguson-Smith and Trifonov (2007) with both benefiting from refinements by Robinson and Ruiz-Herrera (2008), i.e., the HSA 4q/8p/4pq, HSA2p-q13, HSA10p/12pq/22qt, HSA 19q/ 16q, HSA 16p/7a, and HSA 12qt/22q syntenies (see Table 1 in Robinson and Ruiz-Herrera 2008).

146

T.J. Robinson and A. Ruiz-Herrera

Fig. 9.1 (a) The ancestral eutherian autosomal karyotype based on Ferguson-Smith and Trifonov (2007) with refinements by Robinson and Ruiz-Herrera (2008). The X chromosome is conserved across all eutherian mammals and is not included here. (Asterisk) Analysis of reciprocal chromosome painting data together with genome sequence information indicates that the breakpoint is located in HSA 3p (see Ruiz-Herrera and Robinson 2007). (b) Schematic representation of orthologous blocks detected in different mammals that correspond to human chromosome 3. Species included in the comparison have been studied by reciprocal chromosome painting providing for a rigorous delimition of the boundaries of synteny. Adapted from Ruiz-Herrera and Robinson (2008)

9

Chromosomal Evolution

9.2.2

147

Evolutionary Breakpoints

In silico analysis led to the formulation of the fragile-breakage model (Bourque and Pevzner 2002; Pevzner and Tesler 2003; Bourque et al. 2004). Contrary to the random-breakage model (Sect. 9.2 above), Pevzner and Tesler (2003) showed that transformation of the mouse gene order to that in human would require considerable breakpoint reuse due to the large number of syntenic blocks less than 1 Mb in size. This suggests that chromosomal rearrangements are not randomly distributed in the genome, but are concentrated rather in certain regions that can be considered “hot spots” for recombination – an observation substantiated by chromosome painting studies (Fig. 9.1b; see also Froenicke 2005; Ruiz-Herrera et al. 2005) that indicated some genomic regions areas are more prone to breakage and reorganization than others (Bourque et al. 2004; Murphy et al. 2005; Ruiz-Herrera et al. 2005, 2006; Ma et al. 2006, Kemkemer et al. 2009; Larkin et al. 2009). In a phylogenetic context, the term “breakpoint reuse” accounts for the recurrence of the same breakpoint in two different species but, based on comparison with an outgroup lineage, not in the common ancestor (Murphy et al. 2005; Larkin et al. 2009; Sankoff 2009). The assumption that some chromosome regions have been reused during the mammalian chromosomal evolution raises several intriguing questions. (a) Is any particular DNA configuration or sequence composition driving chromosome evolution?, (b) how are these regions organized in the three-dimensional cell nucleus?, and (c) by which mechanisms are they regulated in the germ line? The chromosomal rearrangements that shape mammalian genomes originate as DSBs. This type of lesion can result from exogenous factors (ionizing radiation and chemical agents), endogenous agents (free radicals or a stalled replication fork), or through highly specialized cellular processes that include meiosis and the recombination of immunoglobulins in the immune system. In all instances, however, mammalian cells repair DSBs by homologous recombination (HR) or nonhomologous end joining (NHEJ) (Karran 2000). NHEJ dominates during G1 to the early S phase of the cell cycle, and HR occurs mainly in late S and the G2 phases. Should either mechanisms (HR or NHEJ) fail, DSBs are ineffectively repaired leading to cell death, or enhanced genomic instability as reflected by large-scale chromosomal alterations (i.e., deletions, duplications, translocations). In somatic cells these rearrangements often distinguish neoplasms (see RuizHerrera and Robinson 2008 and references therein). If these new chromosomal forms are produced in the germ line, however, they may be coincidental with the formation of new species. An interesting aspect to emerge from comparative genomic studies is the finding that breakpoint regions are rich in repetitive elements. These include tandem repeats (Puttagunta et al. 2000; Kehrer-Sawatzki et al. 2005), segmental duplications (SD) (Goidts et al. 2004; Carbone et al. 2006; Bailey and Eichler 2006; Kehrer-Sawatzki and Cooper 2008), and transposable elements (TEs) (Ca´ceres et al. 1999; Carbone et al. 2009; Longo et al. 2009), each of which is dealt with

148

T.J. Robinson and A. Ruiz-Herrera

serially below. Additionally, new data suggest that the permissiveness of some regions of the genome to undergo chromosomal breakage could be determined by changes in chromatin conformation (Carbone et al. 2009; Lemaitre et al. 2009).

9.2.3

Tandem Repeats

Tandem repeats have been regarded as an important source of DNA variation and mutation (Armour 2006) having the capacity to form a variety of secondary structures such as hairpins and bipartite triplexes (Catasti et al. 1999). The instability that characterizes tandem repeats is thought to result from slippage during DNA replication and recombination during meiosis (Usdin and Grabczyk 2000). Expansions of the repeat array occur when an unusual secondary structure is formed in the lagging daughter strand during DNA replication. Deletions, on the other hand, occur when an unusual configuration develops in the template for lagging-strand DNA synthesis (Usdin and Grabczyk 2000). It seems probable that just as tandem repeats are affected by deletions and expansions in some well-known human diseases, so too are they implicated in the formation of evolutionary breakpoints. Some simple tandem repeats have been detected in breakpoint regions, for instance, the dinucleotide [TA]n (Kehrer-Sawatzki et al. 2005) and [TCTG]n, [CT]n and [GTCTCT]n (Puttagunta et al. 2000). These early observations led to further investigations of the possible role of tandem repeat in shaping mammalian genome architecture (Ruiz-Herrera et al. 2006). The analysis of the distribution of tandem repeats in human chromosomes by Ruiz-Herrera and colleagues (RuizHerrera et al. 2006), and their spatial relationship to evolutionary breakpoints highlights two important points. First, it emphasizes the high concentration of tandem repeats found at the telomeres and the pericentromeric areas (in agreement with recent reports on the distribution of duplicated regions by Schueler and Sullivan 2006 and Riethman 2008). The second is the concentration of tandem repeats at evolutionary chromosomal bands. Although this is by no means ubiquitous, the correspondence is typified by human chromosomes 3 and 7 (Robinson et al. 2006; Ruiz-Herrera and Robinson 2008). For example, bands with the greatest number of tandem repeats in human chromosome 3 (3p25, 3p21.3, 3p12, 3q13.1, 3q21, and 3q29) are also the chromosomal regions that have been implicated in evolutionary rearrangements (Ruiz-Herrera and Robinson 2008).

9.2.4

Segmental Duplications

SD, or large blocks of genomic sequence (from 1 kb to hundreds of kb) that share >90% of sequence identity, constitute at least 5% of the human genome (Eichler 2001). They are unevenly distributed along different human chromosomes but

9

Chromosomal Evolution

149

concentrate mainly in the pericentromeric and subtelomeric regions of chromosomes (Vallente-Samonte and Eichler 2002). From an evolutionary perspective, sequence data have identified SD as an important element in large-scale genome reorganization that underpins evolutionary lineages (reviewed in Bailey and Eichler 2006). Nonallelic homologous recombination (NAHR, homologous recombination among paralogous sequences) mediated by duplicated sequences can, depending on their orientation, result in deletions, duplications, inversions, and translocations (Bailey and Eichler 2006; Turner et al. 2008; Marques-Bonet et al. 2009). For example, Armengol et al. (2003) found an accumulation of SD in rearrangement breakpoints when comparing the human and mouse whole-genome assemblies; these findings were subsequently extended to the rat genome (Armengol et al. 2005). The presence of SDs in evolutionary breakpoint regions has similarly been shown in primates (Antonell et al. 2005; Nickerson and Nelson 1998; Carbone et al. 2006; Kehrer-Sawatzki and Cooper 2008). Nine pericentric inversions distinguish the human and the chimpanzee karyotypes in addition to the ancestral fusion of human chromosome 2 (Yunis and Prakash 1982). SDs are located at the breakpoints of six of these pericentric inversions, affecting human chromosomes 1, 9, 12, 15, 16, and 18 (Kehrer-Sawatzki and Cooper 2008). The gorilla specific translocation t(4;19) also appears to be rich in SDs (Stankiewicz et al. 2004). An analysis of human chromosome 3 typifies how SDs have shaped the evolutionary architecture of mammalian genomes (Ruiz-Herrera and Robinson 2008). This chromosome contains 2,062 duplicated regions (90% homology and 1 kb length) accounting for 1.7% (3.3 Mbp) of its length (see http://genome.icsc.edu). Of these duplicated regions 480 (23.28%) represent 10 kb of continuous sequence, 36 of which occur in 3p25 (7.5%), 160 in 3p12 (33.33%), 173 in 3q21 (36.04%), and 89 in 3q29 (18.5%) (Ruiz-Herrera and Robinson 2008). The accumulation of SD in 3q29 is not surprising given that transchromosomal duplications tend to concentrate in the subtelomeric and pericentromeric areas (Eichler 2001). Of interest is the fact that three of the four chromosomal bands implicated as evolutionary breakpoints during the eutherian evolution (3p25, 3p12 and 3q21; Fig. 9.1b) also have the highest concentration of SDs in HSA3. These values contrast sharply with bands not implicated in evolutionary breakpoints such as 3p14, 3q13.3, and 3q26 (Ruiz-Herrera and Robinson 2008).

9.2.5

Transposable Elements

Other repetitive elements, such as TEs, have been implicated in genomic reorganization and structural variation by mechanisms that include HR and transposition (Gray 2000; Ostertag and Kazazian 2001; Feschotte and Prithman 2007; Cordaux and Batzer 2009). TEs are DNA sequences that are able to move from one locus to another, often duplicating themselves in the process. They are classified into two classes according to their sequence structure and mechanism of transposition

150

T.J. Robinson and A. Ruiz-Herrera

(Wicker et al. 2007): Class 1 includes those that transpose through reverse transcription of an RNA intermediate (retrotransposons), and Class 2 refers to DNA transposons that move through transposition of a DNA intermediate. Retrotransposons have been the most successful TEs to colonize mammalian genomes – they make up approximately 40 and 50% of the human and opossum genomes, respectively (Lander et al. 2001; Gentles et al. 2007). TEs, as with SDs, have the capacity to influence genome plasticity. This can be done, for example, by (1) the alteration of gene function and regulation, (2) contributing to the creation of new genes, and (3) inducing chromosomal rearrangements (see Feschotte and Prithman 2007 and Cordaux and Batzer 2009 for reviews). TE-triggered chromosomal rearrangements have been extensively recorded in plants and animals such as maize and Drosophila (Walker et al. 1995; Ca´ceres et al. 1999). In the case of primates not all the inversion breakpoints between human and chimpanzee map to regions of SDs. The breakpoints of the inversions affecting human chromosome 4, 5, and 17 are rich in Alu elements (Kehrer-Sawatzki and Cooper 2008). Moreover, a high proportion of Alu elements at the ends of SDs suggest that they were generated by Alu mispairing, followed by HR in the human genome (Bailey et al. 2003). There is also evidence in the recent literature for an accumulation of L1 elements in evolutionary breakpoint regions (Zhao and Bourque 2009). Longo and collaborators (Longo et al. 2009), for example, described an accumulation of L1 elements and ERVs (endogenous retroviruses) in an evolutionary breakpoint in the tammar wallaby genome, a marsupial species. Gibbons (Family Hylobatidae) represent an interesting case among Hominoidea (which also include humans and the other great apes, i.e., chimpanzee, gorilla, and orang-utan), as they are characterized by a strikingly unstable karyotype – this in sharp contrast to the stability observed for the great apes and most of the more distantly related primate species (Muller et al. 2003). In a series of elegant studies, Carbone and co-workers have established a physical map containing most of the synteny disruptions existing in the white-cheeked gibbon (Nomascus leucogenys) (Carbone et al. 2006, 2009). They isolated most of the synteny-breakpoints in gibbon BAC clones and subsequently identified them at highest resolution. Their results revealed an enrichment of active Alu in the gibbon breakpoints, these being less methylated (CpG-rich) than their orthologous counterparts in the human genome. The authors hypothesized that this epigenetic state could promote changes into an open chromatin configuration that, in turn, may be responsible for the higher rate of chromosomal breakage characterizing the Hylobatidae (Carbone et al. 2009).

9.3

Hemiplasy

During the course of comparing the syntenic blocks in eutherian mammals (see Sect. 9.2.1 above), we noticed several candidate examples of hemiplasy (two of which involved chiropterans and afrotherians; Robinson et al. 2008). It was apparent from these comparisons that a complication with using chromosomal characters to infer

9

Chromosomal Evolution

151

phylogenetic relationships concerns the distinction between characters that have arisen convergently (i.e., are homoplasic), and those that are due to common ancestry but which result in homoplasy-like outcomes even though the character states themselves are genuinely homologous – i.e., are hemiplasic (Avise and Robinson 2008). A likely outcome of the failure to identify hemiplasy (as with homoplasy) is a misleading phylogenetic interpretation of chromosomal characters, and hence attempts to disentangle the effects of homoplasy and hemiplasy in a specific phylogeny are both useful and conceptually interesting.

9.3.1

Defining Hemiplasy

In brief, hemiplasy can arise where character states a, b, and c represent any type of genetic polymorphism (including alternative states of karyotypic features – see Fig. 9.2a). The more persistent the polymorphic state, the greater the probability of an eventual discordance between a species tree and a gene tree. Figure 9.2a illustrates how idiosyncratic lineage sorting can eventuate in gene– tree/species–tree discordance, and how alternative explanations are possible where conflicting hypotheses are suggested by different data sets. For example, sequencebased phylogenies have suggested an association of elephant shrew, tenrec, and golden mole to the exclusion of aardvark (Amrine-Madsen et al. 2003; Murphy et al. 2007a) or, alternatively, aardvark, tenrec, and golden mole to the exclusion of elephant shrews (Waddell and Shelley 2003). In contrast, molecular cytogenetic data point to a sister relationship between elephant shrew and aardvark to the exclusion of golden mole (Robinson et al. 2004). This latter association would contradict much other phylogenetic evidence and we have argued (Robinson et al. 2008) that this conflict may be explained by the polymorphic state of the 10q/17 and 3/20 syntenies in an afroinsectiphillian common ancestor that subsequently sorted idiosyncratically to produce a gene tree/species tree discordance (Fig. 9.2b). Both the 10q/17 and 3/20 syntenies in the aardvark and elephant shrew are caused by centric fusions that must have arisen in the common ancestor to Afroinsectiphillia prior to the basal divergence of aardvark 75 mya. They then became independently fixed in the lineage leading to the elephant shrew (thought to have diverged at 73 mya), but were lost in the lineage to Afroinsectivora (represented in our analysis only by the golden mole) subsequent to the divergence of this clade 65 mya meaning also that the character states themselves would be genuinely homologous and have persisted minimally as polymorphic states for 2 million years.

9.3.2

Distinguishing Hemiplasy

In an attempt to emphasize the distinction between hemiplasy and homoplasy of chromosomal characters, consider the tree presented in Fig. 9.2c. This scheme shows two Robertsonian fusions (A/B and C/E) associated with divergence dates

152

T.J. Robinson and A. Ruiz-Herrera

Fig. 9.2 (a) A schematic representation of how a chromosomal polymorphism that traversed successive speciation nodes can become fixed in the descendant species in a pattern that appears discordant with the species phylogeny. Idiosyncratic sorting of a Robertsonian (Rb) fusion polymorphism (a, b, c) into the descendant taxa would result in lineages that are fixed for the karyotypic state prior to fusion (i.e., the 2n is unaltered), and those that are homozygous for the rearrangement (i.e., 2n-2). Note that allele “c” in is a derived character state that it is shared by two descendant taxa (II and III) that do not constitute a clade at the organismal level. (b) A diagramme showing how the Robertsonian fusions 10q/17 and 3/20 arose 75 mya in a common ancestor to Afroinsectiphillia and sorted idiosyncratically (oval) suggesting that these derived chromosomal syntenies must have persisted for at least 2 million years in order to temporally encompass the relevant speciation nodes. (c) A hypothetical phylogeny showing the presence of chromosomal characters A/B, C/D, and C/E in five species (I–V). Two alternative hypotheses can be proposed to accommodate the distribution of the characters among species (see text)

9

Chromosomal Evolution

153

that vary from 15 to 2 mya for pertinent nodes. It is instructive first to examine the A/B adjacent synteny. Hemiplasy would require an unlikely persistence time of 13 mya to account for the presence of A/B in distant parts of the species tree (i.e., species I and II). The alternative – and more likely explanation – is convergence, with A/B arising independently in both lineages (homoplasy). In contrast to this pattern, we argue that chromosomal character C/E most likely reflects an instance of hemiplasy. As with A/B, two mutually exclusive hypotheses can be advanced to explain the pattern shown in Fig. 9.2c. First, it could be argued that the rearrangement (C/E) was present in the common ancestor of II–V (dated at 4 mya), and its absence in II is due to reversal. Alternatively, it was fixed in the common ancestor to IV þ V (2 mya), and convergently so in III. Two “rare genomic changes” would be required in either scenario. Second, and in contrast to the first hypothesis, hemiplasy would suggest the origin of a single rearrangement (¼ a single “rare genomic change”) at the common node (4 mya), followed by incomplete lineage sorting when the ancestral polymorphism is retained through speciation events, i.e., C/E becomes fixed in the lineages leading to species III–V and is lost in the lineage to II. The maximum persistence time required for retention of the chromosomal polymorphism under this scenario is 2 million years. Moreover, this latter explanation most parsimoniously accounts for the presence of the C/D synteny in species II. This is that the C/E rearrangement was present in a polymorphic state in the common ancestor of II–V (i.e., a fused C/E and the unfused homologues C and E), a combination that would permit the independent fusion of C with D. The alternative explanation (the de novo fission of C/E on the branch leading to species II followed by a fusion of C with D) being considered less likely. This scheme emphasizes a critical distinction between hemiplasy and homoplasy. This is that hemiplasy is generally more likely for near neutral polymorphisms or those that are overdominant. It is also more likely when the internodal distances in a phylogenetic tree are short (relative to effective population sizes, see Robinson et al. 2008). On the other hand, homoplasy is less likely to be constrained by narrow divergence times – the greater the temporal distance, the more likely the possibility of convergence and reversals of chromosomal rearrangements.

9.4

Conclusions

Contemporary studies of mammalian chromosome evolution are informed by factors that include data from various sources. First, ancestral karyotypes (and the critical distinction between symplesiomorphic and synapomorphic characters that can only be inferred using appropriate outgroups) usually form the comparative basis for determining the mode and often, the tempo of karyotypic change. This in turn is reliant on the correct identification of orthologous blocks (either by FISH or chromosome banding), and is further shaped by knowledge of segmental duplication, repetitive elements, and breakpoint reuse. In turn these data can have bearing

154

T.J. Robinson and A. Ruiz-Herrera

on the phylogenetic distinction between characters that are convergent/reversals (i.e., homoplasious), and those that potentially reflect persistence of characters across species nodes (hemiplasy). Considerable progress has been made in determining the major features of mammalian chromosomal evolution. However, recent developments in sequencing efficiency and expectations of an improvement in annotation technology make it likely that initiatives such as the recent proposal to target 10,000 vertebrate species for whole-genome sequencing (Genome 10K Community of Scientists 2009) will provide a level of resolution and taxonomic scope that is unprecedented for studying vertebrate and, in particular, mammalian evolutionary relationships. It can be anticipated that data generated by the G10KCOS initiative will provide detailed answers on the mechanisms of genomic change, including rearrangements, duplications, and losses, and definitive insights into the origin of mammalian karyotypic diversity. Acknowledgments Financial support to TJR (National Research Foundation, South Africa) and ARH (Parque Zoolo´gico de Barcelona, Spain) is gratefully acknowledged. Anne Ropiquet is thanked for discussion on chromosomal phylogenies and Clement Gilbert for comments on an earlier version of this manuscript.

References Amrine-Madsen H, Koepfli K-P, Wayne RK, Springer MS (2003) A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships. Mol Phylogenet Evol 28:225–240 Antonell A, de Luis O, Domingo-Roura X, Pe´rez-Jurado LA (2005) Evolutionary mechanisms shaping the genomic structure of the Williams-Beuren syndrome chromosomal region at human 7q11.23. Genome Res 15:1179–1188 Armengol L, Pujana MA, Cheung J, Scherer SW, Estivill X (2003) Enrichment of segmental duplications in regions of breaks of synteny between the human and mouse genomes suggest their involvement in evolutionary rearrangements. Hum Mol Genet 12:2201–2208 Armengol L, Marques-Bonet T, Cheung J, Khaja R, Gonza´lez JR, Scherer SW, Navarro A, Estivill X (2005) Murine segmental duplications are hot spots for chromosome and gene evolution. Genomics 86:692–700 Armour JA (2006) Tandemly repeated DNA: why should anyone care? Mutat Res 598:6–14 Avise JC, Robinson TJ (2008) Hemiplasy: a new term in the lexicon of phylogenetics. Syst Biol 57:503–507 Bailey JA, Liu G, Eichler EE (2003) An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet 73(4):823–834 Bailey JA, Eichler EE (2006) Primate segmental duplications: crucibles of evolution, diversity and disease. Nat Rev Genet 7:552–564 Balmus G, Trifonov VA, Biltueva LS, O’Brien PC, Alkalaeva ES, Fu B, Skidmore JA, Allen T, Graphodatsky AS, Yang F, Ferguson-Smith MA (2007) Cross-species chromosome painting among camel, cattle, pig and human: further insights into the putative Cetartiodactyla ancestral karyotype. Chromosome Res 15(4):499–515 Bourque G, Pevzner PA (2002) Reconstructing gene orders in the ancestral genomes. Genome Res 12:26–36

9

Chromosomal Evolution

155

Bourque G, Pevzner PA, Tesler G (2004) Reconstructing the genomic architecture of ancestral mammals: lessons from human, mouse, and rat genomes. Genome Res 14:507–516 Ca´ceres M, Ranz JM, Barbadilla A, Long M, Ruiz A (1999) Generation of a widespread Drosophila inversion by a transposable element. Science 285:415–418 Carbone L, Vessere GM, ten Hallers BF, Zhu B, Osoegawa K, Mootnick AR, Kofler A, Wienberg J, Rogers J, Humphray S, Scott C, Harris RA, Milosavljevic A, de Jong P (2006) A high-resolution map of synteny disruptions in gibbon and human genomes. PLoS Genet 2:223 Carbone L, Harris RA, Vessere GM, Mootnick AR, Humphray S, Rogers J, Kim SK, Wall JD, Martin D, Jurka J, Milosavljevic A, de Jong PJ (2009) Evolutionary breakpoints in the gibbon suggest association between cytosine methylation and karyotype evolution. PLoS Genet 5:e1000538 Catasti P, Chen X, Mariappan SVS, Bradbury EM, Gupta G (1999) DNA repeats in the human genome. Genetica 106:15–36 Chowdhary BP, Raudsepp T, Froenicke L, Scherthan H (1998) Emerging patterns of comparative genome organization in some mammalian species as revealed by Zoo-FISH. Genome Res 8:577–589 Cordaux R, Batzer MA (2009) The impact of retrotransposons on human genome evolution. Nat Rev Genet 10:691–703 Eichler EE (2001) Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet 17:661–669 Ferguson-Smith MA, Trifonov V (2007) Mammalian karyotype evolution. Nat Rev Genet 8:950–962 Feschotte C, Prithman EJ (2007) DNA transposons and the evolution of the eukaryotic genomes. Annu Rev Genet 41:331–368 Froenicke L (2005) Origins of primate chromosomes – as delineated by Zoo-FISH and alignments of human and mouse draft genome sequences. Cytogenet Genome Res 108:122–138 Froenicke F, Wienberg J, Stone G, Adams L, Stanyon R (2003) Towards the delineation of the ancestral eutherian genome organization: comparative genome maps of human and the African elephant (Loxodonta africana) generated by chromosome painting. Proc R Soc Lond B Biol Sci 270:1331–1340 Froenicke L, Calde´s MG, Graphodatsky A, M€ uller S, Lyons LA, Robinson TJ, Volleth M, Yang F, Wienberg J (2006) Are molecular cytogenetics and bioinformatics suggesting contradictory models of ancestral mammalian genomes? Genome Res 16:306–310 Genome 10K Community of Scientists (2009) Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J Hered 100:659–674 Gentles AJ, Wakefield MJ, Kohany O, Gu W, Batzer MA, Pollock DD, Jurka J (2007) Evolutionary dynamics of transposable elements in the short-tailed opossum Monodelphis domestica. Genome Res 17:992–1004 Goidts V, Szamalek JM, Hameister H, Kehrer-Sawatzki H (2004) Segmental duplication associated with the human-specific inversion of chromosome 18: a further example of the impact of segmental duplications on karyotype and genome evolution in primates. Hum Genet 117:168–176 Graphodatsky AS, Yang F, Perelman PL, O’Brien PC, Serdukova NA, Milne BS, Biltueva LS, Fu B, Vorobieva NV, Kawada SI, Robinson TJ, Ferguson-Smith MA (2002) Comparative molecular cytogenetic studies in the order Carnivora: mapping chromosomal rearrangements onto the phylogenetic tree. Cytogenet Genome Res 96:137–145 Graphodatsky AS, Yang F, Dobigny G, Romanenko SA, Biltueva LS, Perelman PL, Beklemisheva VR, Alkalaeva EZ, Serdukova NA, Ferguson-Smith MA, Murphy WJ, Robinson TJ (2008) Tracking the evolution of genome organization in rodents by ZOO-FISH. Chromosome Res 16:261–274 Gray YH (2000) It takes two transposons to tango: transposable-element-mediated chromosomal rearrangements. Trends Genet 16:461–468

156

T.J. Robinson and A. Ruiz-Herrera

Karran P (2000) DNA double strand break repair in mammalian cells. Curr Opin Genet Dev 10:144–150 Kehrer-Sawatzki H, Cooper DN (2008) Molecular mechanisms of chromosomal rearrangement during primate evolution. Chromosome Res 16:41–56 Kehrer-Sawatzki H, Szamalek JM, Tanzer S, Platzer M, Hameister H (2005) Molecular characterization of the pericentric inversion of chimpanzee chromosome 11 homologous to human chromosome 9. Genomics 85:542–550 Kemkemer C, Kohn M, Cooper DN, Froenicke L, Hogel J, Hameister H, Kehrer-Sawatzki H (2009) Gene synteny comparisons between different vertebrates provide new insights into breakage and fusion events during mammalian karyotype evolution. BMC Evol Biol 9:84 Korstanje R, O’Brien PCM, Yang F, Rens W, Bosma AA, van Lith HA, van Zutphen LF, Ferguson-Smith MA (1999) Complete homology maps of the rabbit (Oryctolagus cuniculus) and human by reciprocal chromosome painting. Cytogenet Cell Genet 86:317–322 Lander ES and the Int Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921 Larkin DM, Pape G, Donthu R, Auvil L, Welge M, Lewin HA (2009) Breakpoint regions and homologous synteny blocks in chromosomes have different evolutionary histories. Genome Res 19:770–777 Lemaitre C, Zaghloul L, Sagot MF, Gautier C, Arneodo A, Tannier E, Audit B (2009) Analysis of fine-scale mammalian evolutionary breakpoints provides new insight into their relation to genome organisation. BMC Genomics 10:335 Li T, O’Brien PCM, Biltueva L, Fu B, Wang J, Nie W, Ferguson-Smith MA, Graphodatsky AS, Yang F (2004) Evolution of genome organizations of squirrels (Sciuridae) revealed by crossspecies chromosome painting. Chromosome Res 12:317–335 Longo MS, Carone DM, NISC Comparative Sequencing Program, Green ED, O’Neill MJ, O’Neill RJ (2009) Distinct retroelement classes define evolutionary breakpoints demarcating sites of evolutionary novelty. BMC Genomics 10:334 Ma J, Zhang L, Suh BB, Raney BJ, Burhans RC, Kent WJ, Blanchette M, Haussler D, Miller W (2006) Reconstructing contiguous regions of an ancestral genome. Genome Res 16:1557–1565 Marques-Bonet T, Girirajan S, Eichler EE (2009) The origins and impact of primate segmental duplications. Trends Genet 25:443–545 Muller S, Stanyon R, O’Brien PCM, Ferguson-Smith MA, Plesker R, Wienberg J (1999) Defining the ancestral karyotype of all primates by multidirectional chromosome painting between tree shrews, lemurs and humans. Chromosoma 108:393–400 Muller S, Hollatz M, Wienberg J (2003) Chromosomal phylogeny and evolution of gibbons (Hylobatidae). Hum Genet 113:493–501 Murphy WJ, Larkin DM, Everts-van-der Wind A, Bourque G, Tesler G, Auvil L, Beever JE, Chowdhary BP, Galibert F, Gatzke L, Hitte C, Meyers SN, Milan D, Ostrander EA, Pape G, Parker HG, Raudsepp T, Rogatcheva MB, Schook LB, Skow LC, Welge M, Womack JE, O’brien SJ, Pevzner PA, Lewin HA (2005) Dynamics of mammalian chromosome evolution inferred from multispecies comparative maps. Science 309:613–617 Murphy WJ, Pringle TH, Crider TA, Springer MS, Miller W (2007a) Using genomic data to unravel the root of the placental mammal phylogeny. Genome Res 17:413–421 Murphy WJ, Davis B, David VA, Agarwala R, Schaffer AA, Pearks Wilkerson AJ, Neelam B, O’Brien SJ, Menotti-Raymond M (2007b) A 1.5-Mb-resolution radiation hybrid map of the cat genome and comparative analysis with the canine and human genomes. Genomics 89:189–196 Nadeau JH, Taylor BA (1984) Lengths of chromosomal segments conserved since divergence of man and mouse. Proc Natl Acad Sci USA 81:814–818 Nickerson E, Nelson DL (1998) Molecular definition of pericentric inversion breakpoints occurring during the evolution of humans and chimpanzees. Genomics 50:368–372 Ohno S (1973) Ancient linkage groups and frozen accidents. Nature 244:259–262

9

Chromosomal Evolution

157

Ostertag EM, Kazazian HH (2001) Twin priming: a proposed mechanism for the creation of inversions in L1 retrotransposition. Genome Res 11:2059–2065 Perelman PL, Graphodatsky AS, Serdukova NA, Nie W, Alkalaeva EZ, Fu B, Robinson TJ, Yang F (2005) Karyotypic conservatism in the suborder Feliformia (Order Carnivora). Cytogenet Genome Res 108:348–354 Pevzner P, Tesler G (2003) Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution. Proc Natl Acad Sci USA 100:7672–7677 Puttagunta R, Gordon LA, Meyer GE, Kapfhamer D, Lamerdin JE, Kantheti P, Portman KM, Chung WK, Jenne DE, Olsen AS, Burmeister M (2000) Comparative maps of human 19p13.3 and mouse chromosome 10 allow identification of sequences at evolutionary breakpoints. Genome Res 10:1369–1380 Richard F, Lombard M, Dutrillaux B (2003) Reconstruction of the ancestral karyotype of eutherian mammals. Chromosome Res 11:605–618 Riethman H (2008) Human telomere structure and biology. Annu Rev Genomics Hum Genet 9:1–19 Robinson TJ, Ruiz-Herrera A (2008) Defining the ancestral eutherian karyotype: a cladistic interpretation of chromosome painting and genome sequence assembly data. Chromosome Res 16:1133–1141 Robinson TJ, Fu B, Ferguson-Smith MA, Yang F (2004) Cross-species chromosome painting in the golden mole and elephant shrew: support for the mammalian clades Afrotheria and Afroinsectiphillia but not Afroinsectivora. Proc Biol Sci 271:1477–1484 Robinson TJ, Ruiz-Herrera A, Froenicke L (2006) Dissecting the mammalian genome – new insights into chromosomal evolution. Trends Genet 22:297–301 Robinson TJ, Ruiz-Herrera A, Avise JC (2008) Hemiplasy and homoplasy in the karyotypic phylogenies of mammals. Proc Natl Acad Sci USA 105:14477–14481 Rokas A, Holland PW (2000) Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol 15:454–459 Ruiz-Herrera A, Robinson TJ (2008) Evolutionary plasticity breakpoints in human chromosome 3. BioEssays 30:1126–1137 Ruiz-Herrera A, Garcia F, Mora L, Egozcue J, Ponsa` M, Garcia M (2005) Evolutionary conserved chromosomal segments in the human karyotype are bounded by unstable chromosome bands. Cytogenet Genome Res 108:161–174 Ruiz-Herrera A, Castresana J, Robinson TJ (2006) Is mammalian chromosomal evolution driven by regions of genome fragility? Genome Biol 7:R115 Ruiz-Herrera A, Robinson TJ (2007) Chromosomal instability in Afrotheria: fragile sites, evolutionary breakpoints and phylogenetic inference from genome sequence assemblies. BMC Evol Biol 7:199 Sankoff D (2009) The where and wherefore of evolutionary breakpoints. J Biol 8:66 Schueler MG, Sullivan BA (2006) Structural and functional dynamics of human centromeric chromatin. Annu Rev Genomics Hum Genet 7:301–313 Stankiewicz P, Shaw CJ, Withers M, Inoue K, Lupski JR (2004) Serial segmental duplications during primate evolution result in complex human genome architecture. Genome Res 14:2209–2220 Stanyon R, Rocchi M, Capozzi R, Roberto R, Misceo D, Ventura M, Cardone MF, Bigoni F, Archidiacono N (2008) Primate chromosome evolution: ancestral karyotypes, marker order and neocentromeres. Chromosome Res 16:17–39 Svartman M, Stone G, Page JE, Stanyon R (2004) A chromosome painting test of the basal eutherian karyotype. Chromosome Res 12:45–53 Trifonov VA, Stanyon R, Nesterenko AI, Fu B, Perelman PL, O’Brien PC, Stone G, Rubtsova NV, Houck ML, Robinson TJ, Ferguson-Smith MA, Dobigny G, Graphodatsky AS, Yang F (2008) Multidirectional cross-species painting illuminates the history of karyotypic evolution in Perissodactyla. Chromosome Res 16:89–107

158

T.J. Robinson and A. Ruiz-Herrera

Turner DJ, Miretti M, Rajan D, Fiegier H, Carter NP, Blayney ML, Beck S, Hurles ME (2008) Germline rates of the novo meiotic deletions and duplications causing several genomic disorders. Nat Genet 40:90–95 Usdin K, Grabczyk E (2000) DNA repeat expansions and human disease. Cell Mol Life Sci 57:914–931 Vallente-Samonte R, Eichler EE (2002) Segmental duplications and the evolution of the primate genome. Nat Rev Genet 3:65–72 Waddell PJ, Shelley S (2003) Evaluating placental inter-ordinal phylogenies with novel sequences including RAG1, gamma-fibrinogen, ND6, and mt-tRNA, plus MCMC-driven nucleotide, amino acid, and codon models. Mol Phylogenet Evol 28:197–224 Walker EL, Robbins TP, Bureau TE, Kermicle J, Dellaporta SL (1995) Transposon-mediated chromosomal rearrangements and gene duplications in the formation of the maize R-r complex. EMBO J 14:2350–2363 Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, Paux E, SanMiguel P, Schulman AH (2007) A unified classification system for eukaryotic transposable elements. Nature Rev Genet 8:973–982 Yang F, Alkalaeva EZ, Perelman PL, Pardini AT, Harrison WR, O’Brien PC, Fu B, Graphodatsky AS, Ferguson-Smith MA, Robinson TJ (2003) Reciprocal chromosome painting among human, aardvark, and elephant (superorder Afrotheria) reveals the likely eutherian ancestral karyotype. Proc Natl Acad Sci USA 100:1062–1066 Yang F, Fu B, O’Brien PCM, Nie W, Ryder OA, Ferguson-Smith MA (2004) Refined genomewide comparative map of the domestic horse, donkey and human based on cross-species chromosome painting: insight into the occasional fertility of mules. Chromosome Res 12:65–76 Yunis JJ, Prakash O (1982) The origin of man: a chromosomal pictorial legacy. Science 215:1525–1530 Zhao H, Bourque G (2009) Recovering genome rearrangements in the mammalian phylogeny. Genome Res 19:934–942

Chapter 10

Mechanisms and Evolution of Dorsal–Ventral Patterning Claudia Mieko Mizutani and Rui Sousa-Neves

Abstract In the last two decades, a great progress has been made with the discovery and understanding of conserved signaling pathways, in particular those involved in embryonic dorsal–ventral patterning and the organization of the nervous system. Remarkably, the spatial distribution of these signal molecules appears conserved across a large group of animals that have centralized nervous systems. Despite these achievements, there are still many unanswered questions on how the nervous system organization evolves and responds to variations in organism size. In this review, we discuss the progression of the field from early observations made more than a century ago and introduce future challenges regarding the problem of scaling of the nervous system during evolution.

10.1

Introduction

Animal development can lead to diverse life forms from a relatively limited number of genes. A great progress to our understanding of the mechanisms of development has been made using model organisms suitable to genetic and molecular analyses. These model organisms are likely to continue uncovering mechanisms relevant to a wide variety of species and of significance for human health. One example is the conservation of the molecular components employed to differentiate neural tissues C.M. Mizutani Department of Biology, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 447080, USA Department of Genetics, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 447080, USA e-mail: [email protected] R. Sousa-Neves Department of Biology, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 447080, USA e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_10, # Springer-Verlag Berlin Heidelberg 2010

159

160

C.M. Mizutani and R. Sousa-Neves

from epidermis and the subsequent subdivision of the nervous system into discrete regions of gene expression. Most recently, the sequencing of several genomes and technological advances made in the past decade brought previously intractable organisms to scrutiny. These advances also opened the possibility to tackle questions that could not have been answered before and deserve attention. One of them is how do organisms change over time? Another is how the body plan and organs can be rescaled across species? Answers to these questions are essential to our understanding of the evolution of novel body plans. Broadly, two general mechanisms have been proposed to explain the generation of different body plans and tissues, which in principle should apply to the dorsal– ventral (D/V) axis formation. The first proposes that the evolution of cis-regulatory sequences that control gene expression plays a significant role in body plan diversity, while the second implicates the evolution of coding sequences in key patterning genes. The first possibility has been tested by transferring previously isolated cis-regulatory sequences from one organism to another, and assaying the expression patterns generated by means of a reporter gene. In many cases, the patterns of expression observed largely resemble that of the host (Kassis 1990; Ludwig et al. 1998; Crocker et al. 2008; Liberman and Stathopoulos 2009). That is, despite extensive modifications in regulatory sequences, the final expression pattern resembles that of the species that implements the information rather than the donor of these regulatory sequences. In other cases documented so far, we observe the inverse: the patterns generated resemble those of the donor (Wittkopp et al. 2002; Gompel et al. 2005; Crocker et al. 2008). In addition to mutations in cis-regulatory sequences, there is also evidence that changes in coding sequences lead to different developmental programs. One example is the case of hybrid lethal systems, which provides an effective way of making the development of two similar species incompatible. Such complementary lethal genes are innocuous when present in individuals of a single species, but cause lethality and/or sterility when combined in a hybrid between different species (Sturtevant 1929; Yamamoto et al. 1997; Brideau et al. 2006). The molecular identification of hybrid lethal genes isolated so far reveals that differences in the coding sequences are responsible for the developmental incompatibilities observed. Thus, both changes in regulatory sequences, as well as changes in coding sequences, can lead to the generation of developmentally distinct processes and consequently novel life forms. In addition, these results also highlight that gene networks, rather than individual genes, are coevolving to adapt to mutations in both coding and noncoding sequences. In this review, we discuss the early molecular events that contribute to germ layer specification, with an emphasis on the establishment of D/V morphogenetic gradients that regulate patterns of neural gene expression in Drosophila. This problem traces back to the nineteenth century, and recent investigation led to the identification of key molecular players and a unifying view of neural development. We also discuss the problem of morphogenetic scaling across species and possible mechanisms that could explain how patterns of gene expression are reshaped in response to size changes.

10 Mechanisms and Evolution of Dorsal–Ventral Patterning

10.1.1

161

The Unity of Plan Hypothesis and Body Axis Inversion

From humans to small bees and worms, animals exhibit complex behaviors and social organizations generated by nervous systems of great complexity. Three questions stand out when we observe these complex structures: (1) to what extent different nervous systems share a similar and conserved molecular architecture; (2) how and when did this organization arise; and (3) how do these structures evolve and become more complex? Over the past 20 years, key findings from the field of developmental biology have provided answers to some of these questions, unlocking clues on the origins and evolution of the nervous system. The advent of developmental biology as a field combining anatomy, embryology genetics and molecular biology brought together two important discoveries separated by a large number of years. The first one was an observation made by the French anatomist E´thienne Goeffroy Saint-Hilaire in 1822, a proponent of the “unity of plan” hypothesis (Geoffroy St. Hilaire 1822). Based on the anatomy of a lobster to that of vertebrates, he suggested that invertebrates and vertebrates shared the same elements of body construction, which could be explained by an inversion of the embryonic D/V axis that caused the ventral position of the invertebrate nervous system vs. a dorsal position in vertebrates. The second discovery was the classical neural induction transplantation experiment carried out by Spemann and Mangold, a century later in 1924 (Spemann and Mangold 1924). Their experiment led to the identification of the Spemann organizer, a region of the embryo capable of inducing surrounding cells to differentiate as neural tissue. What are the signals released by the Organizer that result in neural induction and could the D/V inversion be confirmed at the molecular level? Several decades had to elapse before the answers to these questions were obtained and the final outcome of those efforts was quite remarkable. At the center of the mechanism of neural induction was the discovery of a gene cassette that function antagonistically: the invertebrate genes short gastrulation (sog) and decapentaplegic (dpp) and their vertebrate counterparts BMP-4 and Chordin (Chd). Genetic manipulations of these genes revealed that dpp/BMP-4 encodes a secreted protein belonging to the TGF-b family of transforming growth factors (Padgett et al. 1987), which has a dual function; it signals to cells to promote epidermal specification (Irish and Gelbart 1987; Wharton et al. 1993) and at the same time, it blocks neural development. In vertebrates, Chd is secreted by the Spemann organizer and it promotes neural development by blocking the BMP-4 anti-neural signal (Sasai et al. 1994). Similarly in flies, Sog is an antagonist of Dpp and also protects the future site of neuroectoderm by binding to Dpp and preventing it to activate its receptors (Francois et al. 1994; Biehs et al. 1996). Thus, neural induction is achieved by a double-negative mechanism whereby neural development is a result of repression of a repressive signal. The exciting side of this research was that not only these long sought morphogens were finally isolated and provided a mechanistic basis for neural induction, but also they were shown to be completely interchangeable between vertebrates and invertebrates, and finally,

162

C.M. Mizutani and R. Sousa-Neves

their opposite expression patterns along the D/V axis were also shown to be upsidedown in these organisms (Padgett et al. 1993; Francois et al. 1994; Schmidt et al. 1995; Holley et al. 1995). That is, sog is expressed ventrally in invertebrates, while Chd is expressed dorsally, and in both cases, their expression domains demarcate the future site of nervous system development. Together, these facts highlighted the preceding ideas of axis inversion set forth by Saint-Hilaire and were suggestive of a common ancestry among vertebrates and invertebrates (Arendt and Nubler-Jung 1994; De Robertis and Sasai 1996; Ferguson 1996; Bier 1997).

10.1.2 Dorsal, a Gene at Odds with the Evolutionary Conservation of D/V Patterning Before the discovery of neural inducers, a series of studies in Drosophila demonstrated that the early embryo is initially patterned by a ventral-to-dorsal gradient of another protein called Dorsal, an NFk-B-related transcription factor. The Dorsal nuclear gradient is established via a complex proteolytic cascade of exclusively maternal information that culminates with a regulated transport of Dorsal into the nucleus, resulting in a nuclear concentration gradient with high levels of Dorsal in ventral most nuclei, moderate levels in lateral nuclei, and very low or absent levels in dorsal nuclei (Roth et al. 1989; Rushlow et al. 1989; Steward 1989). Once inside the nucleus, Dorsal can activate or repress the expression of several zygotic target genes that implement the differentiation of the three primary germ layers of the embryo (Ray et al. 1991; Stathopoulos et al. 2002). High levels of Dorsal activate mesodermal genes (e.g., snail and twist) in the ventral side of the embryo, while moderate levels activate neureoctodermal genes (e.g., sog). In contrast, Dorsal represses ectodermal genes (e.g., dpp), and as a consequence these genes have their expression restricted to the dorsal region of the embryo, where there are low or undetectable levels of Dorsal (Fig. 10.1). Even though the Dorsal gradient is crucial for defining the three primary germ layers in Drosophila, this does not seem to represent the ancestral role of the Dorsal/NFkB signaling pathway. Rather, Dorsal/NFkB pathway is involved in immune response in both vertebrates and invertebrates (reviewed by Ferrandon et al. 2007), whereas the recruitment of this signaling pathway in D/V patterning is likely to be an innovation found in some invertebrates. One can also speculate that the innovative role of the Dorsal gradient in D/V patterning is under a rapid process of evolution. Recent work in divergent insect groups indicates that the mechanisms controlling the formation of the Dorsal gradient is highly variable within insects, possibly reflecting adaptations of this gradient to short germ band (e.g., tribolium) vs. long germ band (e.g., flies) modes of development (Chen et al. 2000; Nunes da Fonseca et al. 2008). A number of studies indicate that the Dorsal gradient influences the further subdivision of the Drosophila neuroectoderm into restricted domains of

10 Mechanisms and Evolution of Dorsal–Ventral Patterning

163

Fig. 10.1 Formation of dorsal–ventral gradients in the Drosophila embryo. (a) Scheme of an early Drosophila embryo, in lateral view (anterior to the right). The embryo develops as a syncitium blastoderm. Nuclei divide and migrate to the periphery of the embryo, where cellularization takes place. (b) Dorsal–ventral gradients emanating from ventral and dorsal regions subdivide the embryo into three primary domains that give rise to the mesoderm (MES), neuroectoderm (NE), and ectoderm (ECT). (c and d) Cross-section view of embryo. (c) Representation of the Dpp and Dorsal gradients. Small blue dots represent Dpp molecules that form a dorsal-to-ventral gradient in the extracellular domain. The nuclear Dorsal gradient is represented by red colored nuclei. (d) Expression domains of dorsal–ventral genes that elicit the differentiation of mesoderm, neuroectoderm, and ectoderm

neuroectodermal gene expression, as discussed in the next section. This observation stands in contrast to the subdivision of the neural tube in vertebrates, which employs the morphogens BMP and Sonic Hedgehog (Shh) (Liem et al. 1995, 2000; Briscoe et al. 1999; Litingtung and Chiang 2000). Those differences led to view that the D/V patterning of the nervous system of Drosophila and vertebrates have arisen from completely different molecular mechanisms and may have evolved by convergent evolution.

10.1.3

From Saint Hilaire and Spemann Toward a Unifying Mechanism for Neural Organization

Recently, research on nervous system origins has sparked another round of interest. First, further analyses of the patterning of nervous system into organized D/V domains of gene expression became available, along with studies of upstream

164

C.M. Mizutani and R. Sousa-Neves

signaling events that generate this patterning (Mizutani et al. 2006; Mizutani and Bier 2008). Second, experiments carried out in organisms that belong to other phylogenetic branches, such as hemichordates, annelids, cnidarians, and sea anemone, have served as outgroups for valuable comparative studies of nervous system and axis formation evolution (Samuel et al. 2001; Rentzsch et al. 2006; Lowe et al. 2006; Denes et al. 2007; Lapraz et al. 2009; Nomaksteinsky et al. 2009; Saina et al. 2009). The idea that the nervous system patterning predated the split between vertebrates and invertebrates implies that a centralization and organization of the nervous system must have originated a long time ago, an estimated time of 500–600 million years. Supporting this view, the BMP/dpp signaling pathway has clearly emerged as a conserved pathway in all bilaterian organisms studied so far, and in most cases it has been shown to be involved in not only nervous system centralization, but also in its patterning (De Robertis 2008; Mizutani and Bier 2008). In the next section, we focus on the D/V subdivision of the nervous system of Drosophila, and subsequently we discuss how morphogenetic gradients involved in the overall D/V patterning of the primary germ layers may evolve in closely Drosophila species. More detailed discussions on the evolution of the nervous system have been reviewed elsewhere (Lowe et al. 2006; Mizutani and Bier 2008; Arendt et al. 2008; Holland 2009).

10.2

Neural Patterning and Specification of Neuroblasts in Drosophila

Due to its simplicity, the ventral nervous cord of insects has served as a paradigm to the study of differentiation of neuroblasts or neural stem cells. At early embryonic stages, once the neural and nonneural ectodermal domains are established by the activity of BMP/dpp and Chd/sog, the neural domain is further subdivided into expression domains of key transcription factors that confer a unique identity to each of the 30 neuroblasts per hemisegment that delaminate from the neuroectoderm (reviewed in Bhat 1999; Technau et al. 2006). Each Neuroblast is committed to generate a stereotyped neural cell lineage (Doe and Skeath 1996; Doe 1992, 2008) after receiving “positional information” from both D/V and anterior–posterior (A/P) expressing genes (Fig. 10.2). Information provided from the D/V axis dictates the formation of main neural cell types, such as motorneurons, serotonergic, and sensory neurons (Schmid et al. 1999). In the Drosophila embryo, the neural identity genes responsible for D/V patterning are ventral nervous system defective (vnd), intermediate neuroblasts defective (ind), and muscle segment homeobox/Drop (msh/Dr), which are expressed in nonoverlapping domains. vnd is expressed in the ventral most layer of the neuroectoderm, while ind is expressed in the intermediate region, and finally msh is expressed in the dorsal most region (Jimenez et al. 1995; Isshiki et al. 1997; McDonald et al. 1998; Weiss et al. 1998; Mellerick and Modica 2002) (Fig. 10.1d).

10 Mechanisms and Evolution of Dorsal–Ventral Patterning

165

Fig. 10.2 Neuroblast formation and neural determination in Drosophila. (a) Blastoderm stage embryo. Germ layers are indicated, as well as the D/V neuroectodermal domains. (b) Mesodermal cells invaginate, bringing the two halves of the neuroectoderm together at the ventral midline. (c) Delamination of neuroblasts from respective neuroectodermal domains. (d) Ventral view of neuroblast map, roughly representing the 30 neuroblasts per hemisegment. (e) Neuron types formed along the D/V axis. Serotonergic neurons are formed in ventral region, sensory neurons in lateral regions, and motoneurons in all three domains. Colors of neuroblasts in (d) and neurons in (e) indicate their ventral (blue), lateral (green), and dorsal (red) identities

The study of nervous system patterning along the D/V axis provided additional evidence for the common ancestry of the nervous system, since the vertebrate homologues for vnd, ind, and msh (Nkx2.2., Gsh, and Msx) are also expressed in the same arrangement along the D/V axis of the neural tube after it is inverted (Valerius et al. 1995; Wang et al. 1996; Suzuki et al. 1997; Weiss et al. 1998; Briscoe et al. 1999; Liu et al. 2004; Kriks et al. 2005). It has been shown that the BMP signaling pathway is responsible for repressing the expression of neural identity genes in a dosage-dependent fashion by reaching the adjacent neural domain, such that ventrally expressing genes are more sensitive to its repression than dorsally expressing genes are. As a result, the domains of vnd/NKx2.2. and ind/Gsh are pushed away from the dorsal source of BMP secretion, while msh/Msx domain is placed more dorsally since this gene can tolerate high levels of BMPs before being repressed. In addition to this differential sensitivity to BMP levels, there is also a cross-regulatory interaction among those neural identity genes that cooperate in this patterning. Namely, they can repress each other in the ventralto-dorsal direction, an interaction referred to as “ventral dominance” (Cowden and Levine 2003). Thus, vnd represses ind, while both vnd and ind repress msh expression. This same relationship also appears to be at least partially conserved in vertebrates (Mizutani et al. 2006; Illes et al. 2009). It is noteworthy that even though the specification of D/V neural cell types in the nervous system is also dependent on other morphogens in vertebrates and invertebrates (i.e., Shh and Dorsal, respectively), the BMP signaling can provide most of the information for neural patterning in the absence of these additional cues (Jacob and Briscoe 2003; Mizutani et al. 2006). The findings above reconcile discrepancies found in some vertebrate and invertebrate lineages of noncentralized nervous systems, which more likely represent highly derived forms, and establish a common unifying mechanism that patterns the

166

C.M. Mizutani and R. Sousa-Neves

nervous system. Further evidence of the ancestral role of BMP signaling in neural patterning was substantiated by studies carried out in an outgroup organism, the marine annelid Platynereis dumerilii, which belongs to the second major invertebrate branch of lophotrochozoa (Denes et al. 2007). Thus, the nervous system evolution seems highly conservative and is likely to have relied on the ancestral BMP signaling pathway to generate a similar architecture of neural cell types arranged along the D/V axis for millions of years. The picture that emerges from these studies also suggest that this ancestral signaling cassette can be superimposed to other graded morphogenetic signals, such as Dorsal in the case of Drosophila, and Shh in vertebrates. Remarkably, in the case of insects, the whole system must still be able to maintain the layers of gene expression of vnd, ind, and msh with similar number of cells. This view is supported by the highly stereotyped neuroblast maps between divergent insects such as grasshopper, Drosophila, and silverfish, and even more distant arthropods such as crustaceans (Thomas et al. 1984; Doe 1992; Whitington 1996; Ungerer and Scholtz 2008). Genetic experimentation in Drosophila has shown that alterations in the width of expression domains of vnd, ind, and msh can lead to profound alterations of loss or duplication of specific neuron cell types (Fig. 10.3). Even though some partial modifications in the patterns of expression of those neural identity gene may exist,

a

Early NE domains vnd -

wt

ind-

Late stage neurons (ventral and intermediate)

b

c

v

i RP2

Loss of ventral neurons Duplication of RP2

d

Loss of RP2

Fig. 10.3 Alterations in width of neuroectodermal domains lead to loss or duplications of specific neurons. (a) Early neuroectodermal expression domains in wt and in vnd and ind mutants. The vnd domain is represented in green, ind in blue, and msh in red. Position of ventral midline is indicated by arrowhead. In vnd mutant, ind expression domain is expanded, while in ind mutant, both vnd and ind are expanded. (b–d) Late stage embryos stained for even-skipped, which recognizes neurons of ventral and intermediate fate (v and i). (b) Wild type. (c) vnd mutant displaying loss of ventral neurons and duplication of RP2 motorneuron, an intermediate neuron. (d) ind mutant with loss of RP2 neurons (red arrows). [(b and c) pictures were reproduced from McDonald et al. 1998. Picture in (d) was reproduced from Weiss et al 1998]

10 Mechanisms and Evolution of Dorsal–Ventral Patterning

167

as it has been reported in the bettle tribolium (Wheeler et al. 2005), in general there seems to be a strong pressure to maintain a conserved organization of neuroblast number and types. Therefore, it would not be surprising if there were a robust mechanism that assures that the same number of cells is maintained in the nervous system of insects despite their differences in embryo size.

10.3

Scaling of Germ Layers During Evolution

It seems intuitive that to understand the evolution of the nervous system, the mechanism of scaling of animals and tissues will have to be considered. One way to address this problem could be through the investigation of related organisms that differ in size. If the patterning of the nervous system requires polarized morphogenetic signals that emanate from opposing sides of the embryo, then we might expect that species of different embryonic sizes in which the source of morphogenetic signals are located further apart should have variations in the organization of the nervous system or other cell fates along the D/V axis. Recent progress on the mechanisms of morphogenetic gradient scaling and evolution has been made for A/P patterning in different fly species (McGregor et al. 2001; Gregor et al. 2005, 2008; Lott et al. 2007). However, there are still a number of gaps regarding scaling in the case of D/V patterning. For instance, comparisons across species that differ in size using molecular markers for germ layer domains are necessary to assess changes occurred during evolution of D/V patterning. Also, quantitative expression profiles might resolve the question of whether the levels of morphogens across related organisms that differ in size are similar or significantly different. On a first estimate, divergent Drosophilids appear to display variations in the width of peak levels of the Dorsal gradient (Crocker et al. 2008), although a more precise quantitative measurement for those differences is still lacking. Such comparisons are important to test the generality of predictions made by current mathematical models based on D/V morphogenetic activity in one species (Eldar et al. 2002; Mizutani et al. 2005; Zinzen et al. 2006; Kanodia et al. 2009) and begin elucidating the general principles that control the number of cells allocated to particular tissue types. A better understanding of mechanisms that govern tissue size and pattern is essential to manipulate tissue regeneration, which is of relevance to the field of stem cell biology.

10.3.1

Investigation of Drosophila Sibling Species with Embryos That Vary in Size

In addition to those comparative studies, the investigation of closely related species that can hybridize has the potential to clarify mechanisms of scaling by direct

168

C.M. Mizutani and R. Sousa-Neves

experimentation. For instance, the Drosophila D/V patterning relies on both maternal (e.g., Dorsal) and zygotic (e.g., Dpp/Sog) cues that can be completely separated by generating hybrid embryos from the cross of species that produce embryos of different sizes (i.e., which receive maternal information exclusively from one species and zygotic information from both parents). In this regard, the D. melanogaster subgroup of sibling species offers unique advantages to such studies. D. simulans and D. sechellia became separated from the ancestor of D. melanogaster approximately 5 million years ago. D. sechellia is believed to have differentiated from D. simulans more recently in Seychelles Islands some 0.5 million years ago (Lachaise et al. 1986). The external anatomy of those three sibling species is very similar and the only way to reliably distinguish them is by differences in the male genitalia and to a lesser extent by the pigmentation of the sixth abdominal tergite in females. Both at the genomic and chromosomal levels, these species are almost identical (Horton 1939; Lemeunier and Ashburner 1984; Clark et al. 2007). However, D. melanogaster and D. sechellia produce eggs of considerably different sizes (Fig. 10.4) (Lott et al. 2007), which has been shown to be genetically determined and under little influence by environmental factors (Warren 1924). The similarity and discrete differences among these sibling species, coupled to the ability of hybridizing them, offer the opportunity to begin addressing the questions raised above. When the D/V partition of these species was analyzed using D/V markers such as snail, the larger sized embryo of D. sechellia has a significantly wider mesoderm than D. melanogaster (Mizutani, C.M., unpublished data). This difference in mesoderm size is remarkable, given the recent divergence of these two species. However, it is not yet clear how the maternal gradient of Dorsal and the zygotic Dpp gradient behave when the scale is modified to sustain similar neuroectodermal domains. It remains to be determined if larger animals need to produce higher levels of these morphogens or whether compensatory mechanisms

Fig. 10.4 Scaling of neuroectodermal domains and germ layers in Drosophila species with embryos of different sizes. (a) D. busckii. (b) D. melanogaster. (c) D. sechellia. The neuroectodermal domains (NE) are maintained in all three species, but the mesodermal domains (MES) vary in size

10 Mechanisms and Evolution of Dorsal–Ventral Patterning

169

to circumvent variations in distance are at play. Other questions that these observations raise are whether a maternal gradient of one species can allocate the correct number of cells per germ layer of another species, or if this process relies on zygotic activity. As mentioned above, hybridization experiments should resolve these and other issues regarding scaling. For instance, if only maternal cues define speciesspecific D/V patterning, then hybrid embryos between those two species should have a D/V subdivision similar to that provided by the mother of one of the species, since information for the embryo size plus the entire machinery dedicated to establish the Dorsal gradient are provided solely by the mother. Conversely, if hybrid embryos displayed an intermediate D/V subdivision between two species, then this would be indicative that zygotic determinants participate in the speciesspecific partition of the germ layers. Ultimately, such experimental tests might help define more precisely how D/V patterning evolves. If the large embryo of D. sechellia has an increased mesoderm than D. melanogaster, then we should expect a smaller embryo to have a narrower mesodermal domain to compensate for the conserved size of the neuroectoderm observed in several other insects (Whitington 1996). D. buskii lays embryos of about one third of the size of D. melanogaster (Fig. 10.4) (Gregor et al. 2005), and indeed the miniature embryos of this species have proportionally less mesodermal cells than D. melanogaster and D. sechellia (Mizutani, C.M., unpublished data). Thus, this is again in agreement that the D/V partition of the embryo should be sensitive to the sources of morphogenetic information, distances, and consequently embryo size. However, in contrast to the mesodermal variation, the number of cells confined to the neuroectoderm is the same in all three species. Although this is consistent with the fact that the nervous system patterning is under a strong pressure to maintain an organization that preserves its function, the mechanisms that limit the number of cells in the neuroectoderm cannot be explained by a D/V partition based merely on either zygotic or maternal morphogenetic gradients. What are the alternatives to explain this paradox? At this juncture, these observations are difficult to reconcile and suggest we might be entering in new avenues of investigation of evolutionary mechanisms of body plan formation. The ideas we would like to discuss below are still highly speculative and based on a recent discovery of nuclei movements in the Drosophila embryo.

10.3.2

Do Embryos Employ a Cell Counting Mechanism That Couples Nuclear Density and Morphogenetic Activity?

In general, the action of morphogenetic gradients is depicted as involving two static cell populations: one committed to the role of sending a signal and another naı¨ve that receives and implements these signals. This view is convenient to establish the differences in these two cell populations, but in living organisms, cell populations are spatially displaced during cell divisions and morphogenetic movements. In the

170

C.M. Mizutani and R. Sousa-Neves

Drosophila syncitium embryo, cell nuclei have a dynamic movement toward the periphery of the embryo and undergo a few rounds of division while the Dorsal gradient is being established (Roth et al. 1989; DeLotto et al. 2007; Kanodia et al. 2009). Once the embryo enters the 14th cycle, a long pause takes place without any further cell divisions, and cellularization occurs with the invagination of cell membranes during blastoderm stage E5 (Fig. 10.1a). During this stage, which lasts about 40 min, most zygotic genes are regulated in response to D/V and A/P gradients. Contrary to the classical view of the blastoderm as being a stationary stage when the nuclei stop dividing and no major cell movements of invaginations or germ band extension occur until later in gastrulation, Keranen and colleagues recently demonstrated that complex and ordered nuclei movements do occur during stage E5, ultimately contributing to a highly stereotyped nuclear density packing in the embryo (Keranen et al. 2006). Those authors show that some nuclei can move as far as 20 mm (or three cell diameters) in a stereotyped fashion. In normal embryos, lateral nuclei move toward the dorsal region, increasing their density along the dorsal midline. Most ventral nuclei have a limited movement and reach a lower density than other regions of the embryo by the end of stage E5. Interestingly, these movement patterns are affected in mutants that disrupt the Dorsal gradient formation. It is well known that mutations in gd7 and Toll3 create apolar embryos without any Dorsal (gd7) or ubiquitous Dorsal (Toll3) (Konrad et al. 1988; Schneider et al. 1991; Stathopoulos et al. 2002; Mizutani et al. 2006). What has escaped previous analyses is the fact that those apolar embryos also exhibit a different distribution of nuclei densities (Keranen et al. 2006), suggesting that the Dorsal signaling might be required for the orderly control of cell movements observed in wild type embryos. The control of nuclear movements can be directly or indirectly controlled by Dorsal, and it is also possible to involve the zygotic expression of Dpp, since in the apolar embryos, the expression of Dpp is either ubiquitous or absent. In either case, it seems that the D/V morphogenetic gradients can modulate the final number of cell nuclei that occupy different regions across the D/V axis, including the lateral region that gives rise to the neuroectoderm. If morphogenetic gradients indeed influence nuclear density as the data suggest, then this might be an important piece of information to resolve the scaling paradox of the nervous system. The high levels of Dorsal observed in ventral the nuclei could be the result of not only an increased translocation of Dorsal to fixed positioned nuclei, but might also involve a prior (or concomitant) control of the nuclear density in the ventral region about to achieve the highest accumulation of nuclear Dorsal. This mechanism could in principle limit the number of prospective neuroectodermal cells that acquire intermediate levels of Dorsal, and potentially explain the constant width of the neuroectoderm of closely related species that vary in the width of the mesoderm (Fig. 10.5). In this case, the delimitation of cells within the ventral, intermediate, and dorsal domains of the neuroectoderm would be under the control of the nuclear clustering activities of D/V morphogens. However, in the mesoderm, Dorsal would be functioning in its well-characterized role of threshold regulation of target genes, and thus susceptible to variations in embryonic size.

10 Mechanisms and Evolution of Dorsal–Ventral Patterning

171

Fig. 10.5 Distribution of dorsal–ventral gradients, cell fate positions and nuclei density packing in different Drosophila species. Graph representing Dorsal (red line) and Dpp (blue line) gradient levels. Abscise indicates position of cell fates and gene expression domains along the D/V axis: sna (red), vnd (blue), ind (green), msh (magenta), and dpp (yellow). Colored bar indicates nuclear density packing (orange, low density; black, intermediate density; blue, high density). All three species, D. melanogaster (a), D. sechellia (b), and D. busckii (c) have equally sized neuroectodermal domains (vnd, ind and msh), but variable mesodermal domains (sna). Peak levels of Dorsal gradient in D. melanogaster (a) are higher than in D. sechellia (b), while the gradient is wider in D. sechellia than in D. melanogaster (Mizutani, unpublished data). One can speculate that D. busckii has a Dorsal gradient with higher peak and narrower width than D. melanogaster (c). Nuclei distribution for D. sechellia and D. busckii is hypothetical and takes into consideration a higher concentration of nuclei in the ventral region in the case of D. sechellia and lower concentration in D. busckii, in comparison to D. melanogaster. Such distribution could in principle change the final Dorsal gradient shape. Finally, another hypothetical representation is the Dpp gradient, which would scale with size in all three species, based on models made in Xenopus (Ben-Zvi et al. 2008)

10.4

Emergence of Novel Nervous System Properties Despite Conservation in Cellular Architecture

In this review, we discussed that the organization of the nervous system is robust and highly conservative. However, it is clear that this system finds breaches in this robustness to create novelty. This observation is particularly pertinent in the light of the sharp behavioral mating preferences and ecological differences that exist among

172

C.M. Mizutani and R. Sousa-Neves

the D. melanogaster sibling species (Watanabe and Kawanishi 1979; Lachaise et al. 1986). There are many alternative ways to create flexibility in nervous system function without necessarily changing neuronal cell identities in closely related species (reviewed in Katz and Harris-Warrick 1999). For instance, one can speculate that the increase in mesoderm verified in D. sechellia could lead to changes in peripheral muscle tissue, with the consequence of modifying neural muscular junction connections. Such modifications could in turn alter output responses distinct from D. melanogaster in a variety of behaviors, including locomotion, or mating courtship rituals produced by males. Indeed, the muscle of Lawrence, responsible for wing vibration in males during courtship, has an increased number of fibers in D. sechellia than D. melanogaster, which could explain the difference in love song frequencies in the two species (Orgogozo et al. 2007). Another way to create novel behavioral functions would be through mutations in single genes that regulate some aspect of neural physiology or response. One example is the loss of olfactory and gustatory receptors in D. sechellia, which allowed this species to specialize in feeding on Morinda fruit (Matsuo et al. 2007; McBride 2007). This fruit contains toxic levels of octanol and is avoided by all other sibling species, D. melanogaster, D. simulans, and D. mauritiana that retained these receptors. With the sequencing of the genome of twelve Drosophila species (Clark et al. 2007), pair wise genome comparison among the D. melanogaster subgroup has allowed the discovery of ancestral and fast-evolving alleles with predicted neural functions, including potassium channels and additional gustatory and odorant receptors (Sousa-Neves and Rosas, 2010).

10.5

Conclusion

The nervous system organization along the D/V axis into distinct domains of gene expression is conserved in most bilaterian organisms and appears to rely on the ancient BMP/dpp and Chd/sog signaling cassette. Previous work in insect embryos has shown that this conservation can be resolved at the cellular level, since neuroblast maps among divergent insects are very similar in terms of number and types of cells. Evolutionary changes in embryo size must pose a tremendous challenge to the scaling properties of morphogenetic gradients to constrain the number of cells within these neural domains, and at the same time, novel body plans can be created by altering the determination of other germ layers under a low evolutionary pressure. We speculate that in addition to its traditional role in defining cell fate and proliferation, morphogenetic gradients may also coordinate nuclear clustering and distribution, which may function as a cell counting mechanism that allocates the correct number of cells within specific dorsal–ventral domains of embryos in Drosophilids. Future experimental and computational modeling studies in closely related Drosophila species might reveal emerging properties of evolutionary mechanisms of germ layer formation.

10 Mechanisms and Evolution of Dorsal–Ventral Patterning

173

References Arendt D, Nubler-Jung K (1994) Inversion of dorsoventral axis? Nature 371:26 Arendt D, Denes AS, Jekely G, Tessmar-Raible K (2008) The evolution of nervous system centralization. Philos Trans R Soc Lond B Biol Sci 363:1523–1528 Ben-Zvi D, Shilo BZ, Fainsod A, Barkai N (2008) Scaling of the BMP activation gradient in Xenopus embryos. Nature 26:1205–1211 Bhat KM (1999) Segment polarity genes in neuroblast formation and identity specification during Drosophila neurogenesis. Bioessays 21:472–485 Biehs B, Francois V, Bier E (1996) The Drosophila short gastrulation gene prevents Dpp from autoactivating and suppressing neurogenesis in the neuroectoderm. Genes Dev 10: 2922–2934 Bier E (1997) Anti-neural-inhibition: a conserved mechanism for neural induction. Cell 89:681–684 Brideau NJ, Flores HA, Wang J, Maheshwari S, Wang X, Barbash DA (2006) Two Dobzhansky– Muller genes interact to cause hybrid lethality in Drosophila. Science 314:1292–1295 Briscoe J, Sussel L, Serup P, Hartigan-O’Connor D, Jessell TM, Rubenstein JL, Ericson J (1999) Homeobox gene Nkx2.2 and specification of neuronal identity by graded Sonic hedgehog signalling. Nature 398:622–627 Chen G, Handel K, Roth S (2000) The maternal NF-kappaB/dorsal gradient of Tribolium castaneum: dynamics of early dorsoventral patterning in a short-germ beetle. Development 127:5145–5156 Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN et al (2007) Evolution of genes and genomes on the Drosophila phylogeny. Nature 450:203–218 Cowden J, Levine M (2003) Ventral dominance governs sequential patterns of gene expression across the dorsal-ventral axis of the neuroectoderm in the Drosophila embryo. Dev Biol 262:335–349 Crocker J, Tamori Y, Erives A (2008) Evolution acts on enhancer organization to fine-tune gradient threshold readouts. PLoS Biol 6:e263 De Robertis EM (2008) Evo-devo: variations on ancestral themes. Cell 132:185–195 De Robertis EM, Sasai Y (1996) A common plan for dorsoventral patterning in Bilateria. Nature 380:37–40 DeLotto R, DeLotto Y, Steward R, Lippincott-Schwartz J (2007) Nucleocytoplasmic shuttling mediates the dynamic maintenance of nuclear dorsal levels during Drosophila embryogenesis. Development 134:4233–4241 Denes AS, Jekely G, Steinmetz PR, Raible F, Snyman H, Prud’homme B, Ferrier DE, Balavoine G, Arendt D (2007) Molecular architecture of annelid nerve cord supports common origin of nervous system centralization in bilateria. Cell 129:277–288 Doe CQ (1992) Molecular markers for identified neuroblasts and ganglion mother cells in the Drosophila central nervous system. Development 116:855–863 Doe CQ (2008) Neural stem cells: balancing self-renewal with differentiation. Development 135:1575–1587 Doe CQ, Skeath JB (1996) Neurogenesis in the insect central nervous system. Curr Opin Neurobiol 6:18–24 Eldar A, Dorfman R, Weiss D, Ashe H, Shilo BZ, Barkai N (2002) Robustness of the BMP morphogen gradient in Drosophila embryonic patterning. Nature 419:304–308 Ferguson EL (1996) Conservation of dorsal–ventral patterning in arthropods and chordates. Curr Opin Genet Dev 6:424–431 Ferrandon D, Imler JL, Hetru C, Hoffmann JA (2007) The Drosophila systemic immune response: sensing and signalling during bacterial and fungal infections. Nat Rev Immunol 7:862–874

174

C.M. Mizutani and R. Sousa-Neves

Francois V, Solloway M, O’Neill JW, Emery J, Bier E (1994) Dorsal–ventral patterning of the Drosophila embryo depends on a putative negative growth factor encoded by the short gastrulation gene. Genes Dev 8:2602–2616 Geoffroy St.-Hilaire E (1822) Conside´rations ge´ne´rales sur la verte`bre. Me´m Mus Hist Nat 9:89–119 Gompel N, Prud’homme B, Wittkopp PJ, Kassner VA, Carroll SB (2005) Chance caught on the wing: cis-regulatory evolution and the origin of pigment patterns in Drosophila. Nature 433:481–487 Gregor T, Bialek W, de Ruyter van Steveninck RR, Tank DW, Wieschaus EF (2005) Diffusion and scaling during early embryonic pattern formation. Proc Natl Acad Sci USA 102: 18403–18407 Gregor T, McGregor AP, Wieschaus EF (2008) Shape and function of the bicoid morphogen gradient in dipteran species with different sized embryos. Dev Biol 316:350–358 Holland LZ (2009) Chordate roots of the vertebrate nervous system: expanding the molecular toolkit. Nat Rev Neurosci 10:736–746 Holley SA, Jackson PD, Sasai Y, Lu B, De Robertis EM, Hoffmann FM, Ferguson EL (1995) A conserved system for dorsal-ventral patterning in insects and vertebrates involving sog and chordin. Nature 376:249–253 Horton IH (1939) A comparison of the salivary gland chromosomes of Drosophila melanogaster and D. simulans. Genetics 24:234–243 Illes JC, Winterbottom E, Isaacs HV (2009) Cloning and expression analysis of the anterior parahox genes, Gsh1 and Gsh2 from Xenopus tropicalis. Dev Dyn 238:194–203 Irish VF, Gelbart WM (1987) The decapentaplegic gene is required for dorsal–ventral patterning of the Drosophila embryo. Genes Dev 1:868–879 Isshiki T, Takeichi M, Nose A (1997) The role of the msh homeobox gene during Drosophila neurogenesis: implication for the dorsoventral specification of the neuroectoderm. Development 124:3099–3109 Jacob J, Briscoe J (2003) Gli proteins and the control of spinal-cord patterning. EMBO Rep 4:761–765 Jimenez F, Martin-Morris LE, Velasco L, Chu H, Sierra J, Rosen DR, White K (1995) vnd, a gene required for early neurogenesis of Drosophila, encodes a homeodomain protein. EMBO J 14:3487–3495 Kanodia JS, Rikhy R, Kim Y, Lund VK, DelottoR Lippincott-Schwartz J, Shvartsman SY (2009) Dynamics of the dorsal morphogen gradient. Proc Natl Acad Sci USA 106:21707–21712 Kassis JA (1990) Spatial and temporal control elements of the Drosophila engrailed gene. Genes Dev 4:433–443 Katz PS, Harris-Warrick RM (1999) The evolution of neuronal circuits underlying species-specific behavior. Curr Opin Neurobiol 9:628–633 Keranen SV, Fowlkes CC, Luengo Hendriks CL, Sudar D, Knowles DW, Malik J, Biggin MD (2006) Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution II: dynamics. Genome Biol 7:R124 Konrad KD, Goralski TJ, Mahowald AP (1988) Developmental genetics of the gastrulation defective locus in Drosophila melanogaster. Dev Biol 127:133–142 Kriks S, Lanuza GM, Mizuguchi R, Nakafuku M, Goulding M (2005) Gsh2 is required for the repression of Ngn1 and specification of dorsal interneuron fate in the spinal cord. Development 132:2991–3002 Lachaise D, David JR, Lemeunier F, Tsacas L, Ashburner M (1986) The reproductive relationship of Drosophila sechellia with Drosophila mauritiana, Drosophila simulans and Drosophila melanogaster from the afro-tropical region. Evolution 40:262–271 Lapraz F, Besnardeau L, Lepage T (2009) Patterning of the dorsal–ventral axis in echinoderms: insights into the evolution of the BMP-chordin signaling network. PLoS Biol 7:e1000248 Lemeunier F, Ashburner M (1984) Relationships within the melanogaster species subgroup of the genus Drosophila (Sophophora). Chromosoma 89:343–351

10 Mechanisms and Evolution of Dorsal–Ventral Patterning

175

Liberman LM, Stathopoulos A (2009) Design flexibility in cis-regulatory control of gene expression: synthetic and comparative evidence. Dev Biol 327:578–589 Liem KF Jr, Tremml G, Roelink H, Jessell TM (1995) Dorsal differentiation of neural plate cells induced by BMP-mediated signals from epidermal ectoderm. Cell 82:969–979 Liem KF Jr, Jessell TM, Briscoe J (2000) Regulation of the neural patterning activity of sonic hedgehog by secreted BMP inhibitors expressed by notochord and somites. Development 127:4855–4866 Litingtung Y, Chiang C (2000) Specification of ventral neuron types is mediated by an antagonistic interaction between Shh and Gli3. Nat Neurosci 3:979–985 Liu Y, Helms AW, Johnson JE (2004) Distinct activities of Msx1 and Msx3 in dorsal neural tube development. Development 131:1017–1028 Lott SE, Kreitman M, Palsson A, Alekseeva E, Ludwig MZ (2007) Canalization of segmentation and its evolution in Drosophila. Proc Natl Acad Sci USA 104:10926–10931 Lowe CJ, Terasaki M, Wu M, Freeman RM Jr, Runft L, Kwan K, Haigo S, Aronowicz J, Lander E, Gruber C et al (2006) Dorsoventral patterning in hemichordates: insights into early chordate evolution. PLoS Biol 4:e291 Ludwig MZ, Patel NH, Kreitman M (1998) Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. Development 125:949–958 Matsuo T, Sugaya S, Yasukawa J, Aigaki T, Fuyama Y (2007) Odorant-binding proteins OBP57d and OBP57e affect taste perception and host-plant preference in Drosophila sechellia. PLoS Biol 5:e118 McBride CS (2007) Rapid evolution of smell and taste receptor genes during host specialization in Drosophila sechellia. Proc Natl Acad Sci USA 104:4996–5001 McDonald JA, Holbrook S, Isshiki T, Weiss J, Doe CQ, Mellerick DM (1998) Dorsoventral patterning in the Drosophila central nervous system: the vnd homeobox gene specifies ventral column identity. Genes Dev 12:3603–3612 McGregor AP, Shaw PJ, Hancock JM, Bopp D, Hediger M, Wratten NS, Dover GA (2001) Rapid restructuring of bicoid-dependent hunchback promoters within and between Dipteran species: implications for molecular coevolution. Evol Dev 3:397–407 Mellerick DM, Modica V (2002) Regulated vnd expression is required for both neural and glial specification in Drosophila. J Neurobiol 50:118–136 Mizutani C, Bier E (2008) EvoD/Vo: the origins of BMP signalling in the neuroectoderm. Nat Rev Genet 9:663–677 Mizutani CM, Nie Q, Wan FY, Zhang YT, Vilmos P, Sousa-Neves R, Bier E, Marsh JL, Lander AD (2005) Formation of the BMP activity gradient in the Drosophila embryo. Dev Cell 8:915–924 Mizutani CM, Meyer N, Roelink H, Bier E (2006) Threshold-dependent BMP-mediated repression: a model for a conserved mechanism that patterns the neuroectoderm. PLoS Biol 4:e313 Nomaksteinsky M, Rottinger E, Dufour HD, Chettouh Z, Lowe CJ, Martindale MQ, Brunet JF (2009) Centralization of the deuterostome nervous system predates chordates. Curr Biol 19:1264–1269 Nunes da Fonseca R, von Levetzow C, Kalscheuer P, Basal A, van der Zee M, Roth S (2008) Selfregulatory circuits in dorsoventral axis formation of the short-germ beetle Tribolium castaneum. Dev Cell 14:605–615 Orgogozo V, Muro NM, Stern DL (2007) Variation in fiber number of a male-specific muscle between Drosophila species: a genetic and developmental analysis. Evol Dev 9:368–377 Padgett RW, St Johnston RD, Gelbart WM (1987) A transcript from a Drosophila pattern gene predicts a protein homologous to the transforming growth factor-beta family. Nature 325:81–84 Padgett RW, Wozney JM, Gelbart WM (1993) Human BMP sequences can confer normal dorsal–ventral patterning in the Drosophila embryo. Proc Natl Acad Sci USA 90:2905–2909 Ray RP, Arora K, Nusslein-Volhard C, Gelbart WM (1991) The control of cell fate along the dorsal–ventral axis of the Drosophila embryo. Development 113:35–54

176

C.M. Mizutani and R. Sousa-Neves

Rentzsch F, Anton R, Saina M, Hammerschmidt M, Holstein TW, Technau U (2006) Asymmetric expression of the BMP antagonists chordin and gremlin in the sea anemone Nematostella vectensis: implications for the evolution of axial patterning. Dev Biol 296:375–387 Roth S, Stein D, Nusslein-Volhard C (1989) A gradient of nuclear localization of the dorsal protein determines dorsoventral pattern in the Drosophila embryo. Cell 59:1189–1202 Rushlow CA, Han K, Manley JL, Levine M (1989) The graded distribution of the dorsal morphogen is initiated by selective nuclear transport in Drosophila. Cell 59:1165–1177 Saina M, Genikhovich G, Renfer E, Technau U (2009) BMPs and chordin regulate patterning of the directive axis in a sea anemone. Proc Natl Acad Sci USA 106:18592–18597 Samuel G, Miller D, Saint R (2001) Conservation of a DPP/BMP signaling pathway in the nonbilateral cnidarian Acropora millepora. Evol Dev 3:241–250 Sasai Y, Lu B, Steinbeisser H, Geissert D, Gont LK, De Robertis EM (1994) Xenopus chordin: a novel dorsalizing factor activated by organizer-specific homeobox genes. Cell 79:779–790 Schmid A, Chiba A, Doe CQ (1999) Clonal analysis of Drosophila embryonic neuroblasts: neural cell types, axon projections and muscle targets. Development 126:4653–4689 Schmidt J, Francois V, Bier E, Kimelman D (1995) Drosophila short gastrulation induces an ectopic axis in Xenopus: evidence for conserved mechanisms of dorsal–ventral patterning. Development 121:4319–4328 Schneider DS, Hudson KL, Lin TY, Anderson KV (1991) Dominant and recessive mutations define functional domains of Toll, a transmembrane protein required for dorsal–ventral polarity in the Drosophila embryo. Genes Dev 5:797–807 Sousa-Neves R, Rosas A (2010) An Analysis of Genetic Changes during the Divergence of Drosophila species. PloS One 5(5): e10485. doi:10.1371/journal.pone.0010485 Spemann H, Mangold H (1924) Uber induction von embryonanlagen durch implantation artfremder organis atoren. W Roux’ Arch Ent Org 100:599–638 Stathopoulos A, Van Drenth M, Erives A, Markstein M, Levine M (2002) Whole-genome analysis of dorsal–ventral patterning in the Drosophila embryo. Cell 111:687–701 Steward R (1989) Relocalization of the dorsal protein from the cytoplasm to the nucleus correlates with its function. Cell 59:1179–1188 Sturtevant AH (1929) Contributions to the genetics of Drosophila simulans and Drosophila melanogaster. I. The genetics of Drosophila simulans. Publs Carnegie Instn 399:1–62 Suzuki A, Ueno N, Hemmati-Brivanlou A (1997) Xenopus msx1 mediates epidermal induction and neural inhibition by BMP4. Development 124:3037–3044 Technau GM, Berger C, Urbach R (2006) Generation of cell diversity and segmental pattern in the embryonic central nervous system of Drosophila. Dev Dyn 235:861–869 Thomas JB, Bastiani MJ, Bate M, Goodman CS (1984) From grasshopper to Drosophila: a common plan for neuronal development. Nature 310:203–207 Ungerer P, Scholtz G (2008) Filling the gap between identified neuroblasts and neurons in crustaceans adds new support for Tetraconata. Proc Biol Sci 275:369–376 Valerius MT, Li H, Stock JL, Weinstein M, Kaur S, Singh G, Potter SS (1995) Gsh-1: a novel murine homeobox gene expressed in the central nervous system. Dev Dyn 203:337–351 Wang W, Chen X, Xu H, Lufkin T (1996) Msx3: a novel murine homologue of the Drosophila msh homeobox gene restricted to the dorsal embryonic central nervous system. Mech Dev 58:203–215 Warren DC (1924) Inheritance of Egg Size in Drosophila melanogaster. Genetics 9:41–69 Watanabe TK, Kawanishi M (1979) Mating preference and the direction of evolution in drosophila. Science 205:906–907 Weiss JB, Von Ohlen T, Mellerick DM, Dressler G, Doe CQ, Scott MP (1998) Dorsoventral patterning in the Drosophila central nervous system: the intermediate neuroblasts defective homeobox gene specifies intermediate column identity. Genes Dev 12:3591–3602

10 Mechanisms and Evolution of Dorsal–Ventral Patterning

177

Wharton KA, Ray RP, Gelbart WM (1993) An activity gradient of Decapentaplegic is necessary for the specification of dorsal pattern elements in the Drosophila embryo. Development 117:807–822 Wheeler SR, Carrico ML, Wilson BA, Skeath JB (2005) The Tribolium columnar genes reveal conservation and plasticity in neural precursor patterning along the embryonic dorsal–ventral axis. Dev Biol 279:491–500 Whitington PM (1996) Evolution of neural development in the arthropods. Semin Cell Dev Biol 7:605–614 Wittkopp PJ, Vaccaro K, Carroll SB (2002) Evolution of yellow gene regulation and pigmentation in Drosophila. Curr Biol 12:1547–1556 Yamamoto MT, Kamo M, Yamamoto S, Watanable TK (1997) Cytogenetic mapping of lethal hybrid rescue gene of Drosophila simulans. Genes Genet Syst 72:297–301 Zinzen RP, Senger K, Levine M, Papatsenko D (2006) Computational models for neurogenic gene expression in the Drosophila embryo. Curr Biol 16:1358–1365

Chapter 11

Evolutionary Genomics for Eye Diversification Atsushi Ogura

Abstract There are several types of eyes in morphology such as camera eye, compound eye, mirror eye, and single lens eye, and all the eye types have been evolved from the same origin, the prototype eye. Even though there are conserved genes and networks in the eye evolution, little is known about what kinds of genetic basis have been contributed to the eye diversification. It is essential for discovering genes for the morphological diversification to develop a platform of genomic and transcriptomic comparison among species. We, therefore, developed microarray that cover the genes related to development, function, and structure of molluscan eye, as an example, for the evolutionary genomic studies.

11.1 11.1.1

Evolutionary Genomics for Eye Diversification Evolution of the Eye

The eye is one of the most elaborate organs in animals and the study of its evolution is of particular interest. The evolution of animal eyes has been one of the most fundamental and classical subjects in the field of biology dating back to the time of Darwin. However, it has been difficult to understand how this complex organ arose simply from mutations and selections. Darwin discussed this matter in his “On the Origin of Species by Means of Natural Selection” in a chapter titled, “Difficulties of the Theory”, in which he wrote that “organs of extreme perfection and complication” such as the eye remained inexplicable by his theory (Darwin 1859).

A. Ogura Division of Advanced Sciences, Ochadai Academic Production, Ochanomizu University, Ohtsuka 2-1-1, Bunkyo, Tokyo 112-8610, Japan e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_11, # Springer-Verlag Berlin Heidelberg 2010

179

180

A. Ogura

The evolutionary study of animal eyes was also difficult for a long time from the viewpoint of molecular evolution and biology. There were only a few molecular theories to link primitive eyes to the elaborate and varied eye organs commonly seen today. It seemed that natural selection could not adequately explain the evolutionary mechanism underlying the development of complex animal eyes. However, studies based on basic control genes in the developmental processes in animal eyes have revealed that there is a conserved key regulatory network represented by the Pax6 genes among almost all animals (Gehring 1996; Fernald 2000). Even though there are no clear evolutionary tracks between the various types of animal eyes, the evolutionary history of the eye can be explained from the conserved molecular mechanisms. Recent studies have also reported that not only the core gene regulatory network for eye development but also genes downstream of the network and other peripheral genes related to the function and structure of eyes have been conserved among animals at least since the split of bilateral animals (Box 11.1). The origin and ancestral prototype of the eye as well as the molecular mechanism underlying the diversification of the various eye types, however, remain unclear. Box 11.1 akhirin, apkc, apterous, arf4, arm, arr3, Arrestin, Ascll, ash1, ath5, Atoh4, Atoh7, atonal, bad, BarH1, BarH2, barhl2, baz, bbs1, bbs2, bHLH, Big brother, blimp1, blue-opsin, Bmp4, Bmp7, BRD-U, brn1, Brn3a, Brn3b, Brother, bunched, c-kit, c-myc, calb2a, calb2b, Calphotin, CG13030, chaoptic, Chx10, cko, Cpsf1, crb, crb1, cre, crx, Cryba4, Cryz, cut, dachshund, daughterless, dbx1/2, delta1, dkk1, Dlx1, Dlx2, drosocrystallin, dynein, ectopic, eli, elk3, En-1, equarin, err2beta, ERRbeta, esrrb, Etv6, extramacrochaete, ey, Eya, flox, flr, Foxa2, FoxC1, FoxC2, FoxD1, FoxE3, FoxF1, FoxG1, FoxK1, FoxL1, FoxM1, FoxN2, FoxN3, FoxN4, Foxn5, FoxO, FoxO3, FoxP1, FoxP2, FoxP4, FoxS1, fzd4/5, gam1, gapeh, glass, gli2, gli3, glu, hairy, Hes1, hes1, hes1, hes2, hes5, hmgb3, homothorax, Hoxb1, Ihx, inaF, Islet-1, Jagged1, jmjd2c, jmjN/C, kif3, klingon, Krt1-12, L1cam, lazaro, lfe1, Lhx2, Lhx9, Lmo4, lok, lozenge, lrp5, lrx3, m-opsin, mab21l1, mab21l2, Maf, Math3, Math5, mdka, mdkb, meis2, melanopsin, mirror, Mitf, Mmp9, Mocs3, mts, Munster, musashi, myc, nanog, ncad, ncam1, Necab2, nestin, NeuroD, NeuroD1, ngn3, nkx2, nkx6, Nlz1, Nlz2, noggin, nohp1, notch, nphp, nr2e1, Nr2e3, nr2f1, nr2f2, nr3b2, nrl, nrx, ocelliless, oct4, of, onecut, opsin1, Optix, OTX1, OTX2, otx2, ovl, p27Xic1, p57kip2, par3, par6, patj, Pax2, Pax6, pax6cre, pax7, PaxB, pebbled, peripherin, PhospholipaseD, phyllopod, pi3p, Pias3, pikachurin, Pitx3, PNR, pp2a, pralemmin, Prep1, prospero, Prox1, ptc, pten, Rab, RARa, RARb, RARg, rax, Rb1, recoverin, retp1, Rex1, rhodopsin, ror, rough, rpgrip1, rs1, runt, rx1, rx2, rx3, rxr, sara, scabrous, shaven, Shh, sit, six1, six3, six6, smo, snare, so, sox1, sox11, sox2, sox3, sox4a, sox8, sox9, SoxN, spineless, ssea-1, stardust, sufu, sufuko, (continued)

11

Evolutionary Genomics for Eye Diversification

181

sumo1, syn3, tangerinA, target, tbx3, tbx5, teashirt, TFIID, TGIF, TGIF2, timeless, tiptop, to, tramtrack, TRbeta1, TRbeta2, trp-like, trpgamma, trpm1, tsk, tws, ubc9, vax2, vsx1, vsx2, warts, wnt, wnt2b, xhmgb3, xwnt8, zic1, zic2, zic3

These genes were collected from NCBI and Pubmed and considered to be related to development, structure, and function of animal eyes.

11.1.2

Origin and Prototype of the Eye

The evolution of different eye types might have occurred many times as independent events in the lineages of different animals. However, few eyes have ever been found as fossils because they are soft organs, thereby making it difficult to examine the origin of animal eyes. Only in some animals, such as trilobites, the eyes consist of calcite lenses can be fossilized. Trilobite fossils with compound eyes have been found that date back to the early Cambrian period some 540 million years ago. This suggests that the origin of eyes occurred before the preCambrian period. Cnidarian is one of the most primitive animals to possess eyes, and intensive studies of the cnidarian eye have revealed the fundamental mechanisms for eye formation and development (Kozmik et al. 2003). Once fundamental genes for animal eyes and a common type of photoreceptor cell were discovered to be conserved among animals from the common origin, the evolution and diversification of different eye types could be considered, not as independent events, but as divergent events that originated from a prototype eye present in the ancestral species. This raises the question as to the exact form and structure of the prototype eye. Gehring and Ikeo have inferred a two celled prototype eye consisting of one photoreceptor cell and one pigment cell (Gehring and Ikeo 1999). Recently, Gehring has suggested that the eye organelle in the Protist, dinoflagellate, might be the prototype eye, and the origin of all animals eyes (Gehring 2005). The characteristics of the prototype eye can be estimated by a comparison of the structure and molecular basis of extant animal eyes. The next question for researches is how various types of eyes came to be diverged from the prototype eye.

11.1.3

Diversification of the Eye

Photoreceptors, as suggested by Salvini-Plawen and Mayr on the basis of morphological and embryological studies, have evolved independently in 40–65 different lineages (Salvivi-Plawen and Mayr 1977). However, studies based on molecular biology and evolution have revealed that, even though the evolutionary processes of

182

A. Ogura

different types of eyes seem different, the molecular basis is shared among the various eye types and they arose by divergent evolution (Nilsson 2004; Serb and Eernisse 2008). These phenomena have often been explained by the concepts of convergent and divergent evolution. Convergent evolution is defined as the mechanism by which similar tissue or organ structures can be evolved from different origins or via different processes. Divergent evolution, on the other hand, is defined as the evolutionary process in which different types of tissues and organs can be evolved from the same origin. The camera eye of vertebrates and cephalopods, in spite of the outward similarities, can be considered to be the result of divergent evolution using the same gene source and genetic mechanisms (Ogura et al. 2004). Jumping spiders also possess highly evolved camera eye like vertebrates but they have acquired their eyes independently, which was validated by the phylogenetic analysis (Su et al. 2007). Camera eye can be also found in more primitive species of Cnidaria, cubozoan jellyfish (Nilsson 2004). These divergent mechanisms are the key to explaining the diversification of not only the evolution of camera eye but also that of various eye types. Molluscs provide one of the best targets for the study of this topic because, even within one lineage of molluscs, all eye types can be found (Kozmik et al. 2008). Squid and octopuses have a camera eye, the nautilus has a pinhole eye, the scallop has a mirror eye, and the ark shell has a compound eye (Fig. 11.1).

Fig. 11.1 Eyes of molluscs. Pictures show various types of eyes in molluscs; (a) Loligo vulgaris, a squid belonging to the family Loliginidae. (b) An eyeball extracted from Loligo. (c) Embryo of idiosepius, pygmy squid. (d) Nautilus pompilius that has a pinhole eye. (e) Pecten yessoensis, a Japanese sea scallop that has a hundred of tiny mirror eyes

11

Evolutionary Genomics for Eye Diversification

183

The vertebrate camera eye was developed from the neural plate and formed an optic vesicle, which was subsequently invaginated to form an optic cup. On the contrary, the cephalopod camera eye developed as an evagination of the brain leading to an invagination of the ectoderm. These differences in origin have resulted in distinct differences in the orientation of the photoreceptor cells between vertebrates and cephalopods, in which they face the light source in cephalopods but face in the opposite direction in vertebrates. The compound eye has a complex structure and is found in many species including Arthropoda and Mollusca. They are very different from the camera eye and consist of hundreds of individual eyes with lenses and photoreceptors. In Drosophila, for example, the eye primordia formed as an invagination of the embryonic ectoderm that forms the eye imaginal disk in the larvae. During metamorphosis, the eye disk organizes itself to form the compound eye, the photoreceptor cells of which extend their axons backwards from the periphery to establish contact with the brain.

11.1.4

Genomic and Transcriptomic Approaches to Eye Evolution

Recent work on evolutionary genomics in various types of eyes, together with comparative analyses of gene expression comparison among closely related species, has led to the hypothesis of a dynamic mechanism for the diversification of eyes (Wistow 2006; Choy et al. 2006; Bao and Friedrich 2009; Baker et al. 2009). The advantages of these large-scale genomic and transcriptomic studies of animal eyes are that they can trace the evolutionary process of not only the key regulatory genes, such as Pax6, but also genes related to eye function and maintenance through the analyses of orthologous gene sets involved in eye evolution. These advancements were achieved by large-scale analyses using microarray technologies and next generation sequencers. Molluscs provide a good example of the application of evolutionary genomics studies, as all eye types have evolved in one lineage. It is essential to identify the genes responsible for the morphological diversification so as to allow the development of a platform for the comparison of gene expression among species. In this example, a microarray that covers the genes related to the development, function, and structure of the molluscan eye was developed by constructing full-length cDNA libraries for the octopus, nautilus, scallop, and two squid species (Fig. 11.2). This strategy provides comparative genomic and transcriptomic approaches to the molecular mechanism for diversification in the molluscan eye. The Molluscan Eye Array, based on the above microarray, is designed for the comparative gene expression analysis of the molluscan eye with genes expressed in eye of loligo, octopus, nautilus, and pecten, as well as genes known to be expressed in vertebrate eyes, and genes expressed in the idiosepius, the pygmy squid, and brain. We have designed conserved regions of the genes for the microarray probes to detect the gene expression of orthologous genes.

184

A. Ogura

Fig. 11.2 Scheme of Molluscan Eye Array design was illustrated. Eyeballs of five different species, loligo, idiosepius, octopus, nautilus, and pecten, were extracted and used for the construction of cDNA libraries

As a result of the Molluscan Eye Array experiments using RNA samples from the adult eye of the idiosepius, nautilus, and pecten, we could estimate the genes expressed differentially among species that played an important role in the diversification of eye structure. More than 88% of the probe designed from the same species tested in the experiment could be identified as expressed genes, and 10–30% of the probes could be detected by the RNA samples of different species that were unknown transcripts ever (Fig. 11.3a). To validate the reliability of interspecies array, we have tested gene expression of Pax6 in idiosepius with the probe designed from zebrafish Pax6 gene and confirmed its expression by in situ hybridization. Next, to distinguish the stage-specific and camera eye-specific expression of eye genes in cephalopods, we used RNAs from three different embryonic stages of the pygmy squid eye for the array. We found that 2,893 genes are expressed in the squid embryonic eye but not in the eyes of nautilus or pecten. Only 269/2,893 (9.3%) genes were adult-specific expression in the idiosepius. In addition, 634/2,893 genes are commonly observed in the gene expression databases of vertebrate eye and retina (Fig. 11.3b). These results show that this approach provides an efficient platform and database for searching candidate genes involved in camera eye acquisition. Furthermore, expression diversities of eye-related genes in molluscs were examined by calculating how much genes were shared to be expressed among species. Pecten shows lower gene expression diversity comparing with squid and nautilus statistically (Fig. 11.3c). This result indicates that pecten tended to conserve commonly used genes since the last common ancestor of mollusca and not to acquire novel gene much more than other molluscs. Cephalopods, on the other

11

Evolutionary Genomics for Eye Diversification

185

Fig. 11.3 Characteristics of eye gene expressions in molluscs were shown in the figure. (a) Proportions of probes designed from idiosepius, nautilus, and pecten hybridized that were detected as expressed genes in the three Molluscan Eye Array experiments, idiosepius mRNA, nautilus mRNA, and pecten mRNA were indicated. (b) Squid camera eye-specific genes were estimated by comparing mRNA expression in the Molluscan Eye Array. (c) Species-specific expressions represent exclusive gene expression in a particular species, and conserved expressions represent gene expression that were observed in more than one species

hand, tended to acquire lineage-specific genes in relation to the evolution of camera eye structure, which makes their expression diversities higher than pecten. Thus, evolutionary genomic and transcriptomic approaches might contribute to the elucidation of the diversification mechanisms of animal eyes by searching common and unique genes in the developmental processes and eye structures.

References Baker RH et al (2009) Genomic analysis of a sexually-selected character: EST sequencing and microarray analysis of eye-antennal imaginal discs in the stalk-eyed fly Teleopsis dalmanni (Diopsidae). BMC Genomics 10(1):361 Bao R, Friedrich M (2009) Molecular evolution of the Drosophila retinome: exceptional gene gain in the higher Diptera. Mol Biol Evol 26(6):1273–1287

186

A. Ogura

Choy KW et al (2006) Genomic annotation of 15, 809 ESTs identified from pooled early gestation human eyes. Physiol Genomics 25(1):9–15 Darwin (1859) On the Origin of Species by Means of Species Fernald RD (2000) Evolution of eyes. Curr Opin Neurobiol 10(4):444 – 450 Gehring WJ (1996) The master control gene for morphogenesis and evolution of the eye. Genes Cells 1:11–15 Gehring WJ (2005) New perspectives on eye development and the evolution of eyes and photoreceptors. J Hered 96(3):171–184 Gehring WJ, Ikeo K (1999) Pax 6: mastering eye morphogenesis and eye evolution. Trends Genet 15(9):371–377 Kozmik Z et al (2003) Role of Pax genes in eye evolution: a cnidarian PaxB gene uniting Pax2 and Pax6 functions. Dev Cell 5(5):773–785 Kozmik Z et al (2008) Assembly of the cnidarian camera-type eye from vertebrate-like components. Proc Natl Acad Sci USA 105(26):8989–8993 Nilsson DE (2004) Eye evolution: a question of genetic promiscuity. Curr Opin Neurobiol 14(4): 407–414 Ogura A et al (2004) Comparative analysis of gene expression for convergent evolution of camera eye between octopus and human. Genome Res 14(8):1555–1561 Salvivi-Plawen LV, Mayr E (1977) On the evolution of photoreceptors and eyes. Evol Biol 10:207–263 Serb JM, Eernisse DJ (2008) Charting evolution’s trajectory: using molluscan eye diversity to understand parallel and convergent evolution. Evol Educ Outreach 1(4):439–447 Su KF et al (2007) Convergent evolution of eye ultrastructure and divergent evolution of visionmediated predatory behaviour in jumping spiders. J Evol Biol 20(4):1478–1489 Wistow G (2006) The NEIBank project for ocular genomics: data-mining gene expression in human and rodent eye tissues. Prog Retin Eye Res 25(1):43–77

Chapter 12

Do Long and Highly Conserved Noncoding Sequences in Vertebrates Have Biological Functions? Yoichi Gondo

Abstract Vertebrate genomes consist of only a small fraction of protein-coding sequences with vast majority of repetitive and nonrepetitive noncoding sequences. Based on the completion of whole genome sequencing including human, it has become possible to characterize the genomic structure directly at the DNA sequence level. With the first approximation of the functional portion of the genome to be highly evolutionary conserved, comparative genomics with bioinformatics and experimental tools are now revealing the details of each element in the genome. In this chapter, recent efforts to extract highly conserved sequences are reviewed with particularly focusing on noncoding and nonrepetitive human and rodent genomes. Strikingly, extracted highly conserved sequences in noncoding sequences exhibit much higher conservation in many vertebrate genomes but not in other invertebrate species than actually functional protein-coding sequences do. Some testable working hypotheses to maintain such highly conserved sequences are also reviewed and discussed.

Abbreviations LINE SINE UTR SNP CNG UCE POLA LCNS

Long interspersed elements Short interspersed elements Untranslated region Single nucleotide polymorphism Conserved non-genic sequence Ultraconserved element DNA polymerase alpha catalytic subunit gene Long conserved noncoding sequence

Y. Gondo Mutagenesis and Genomics Team, RIKEN BioResource Center, 3-1-1 Koyadai, Tsukuba 305-0074, Japan e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_12, # Springer-Verlag Berlin Heidelberg 2010

187

188

Y. Gondo

HNRNPD HNRPDL KO

12.1

Heterogeneous nuclear ribonucleoprotein D Heterogeneous nuclear ribonucleoprotein D-like Knockout

Introduction

Most of higher eukaryotes contain noncoding sequences in the genome. Classically, the DNA reassociation kinetics analyses by using the self-hybridization of fragmented genomic DNA, called Cot curve analysis, experimentally revealed that significant portions of higher eukaryotes encompassed various types of repetitive sequences (e.g., Britten and Kohne 1968; Wetmur and Davidson 1968). The gene-coding sequences were also estimated by various methods including RNA–DNA reassociation kinetics or Rot curve analysis. For instance, the complexity of RNA expression was studied by RNA–DNA association kinetics (Chikaraishi et al. 1978). They found that a unique fraction (31.2%) of rat genomic DNA was found in nuclear RNA of the rat brain and exhibited the highest RNA complexity among various tested rat tissues. Based on the average length of the rat nuclear RNA (4,500 nucleotides) (Bantle and Hahn 1976) and finding that two-thirds (4,500 nucleotides) (1.9 Gb) of the rat genome are unique sequences, Chikaraishi et al. (1978) estimated that the total number of rat gene was 130,000. Based on the spontaneous mutagenesis studies of viability polygenes in Drosophila melanogaster, Mukai (1978) suggested that most of the functional mutations affecting viability polygenes occurred in noncoding sequences. He and others estimated the spontaneous mutation rate of viability polygenes on the second chromosome of D. melanogaster to be at least 0.14 per generation (Mukai 1964; Mukai et al. 1972; Ohnishi 1977). Estimating the number of the protein-coding genes on the second chromosome to be 2,200 based on the “one-band one-gene hypothesis” (Judd et al. 1972) and the average spontaneous mutation rate per protein-coding gene per generation to be 105 or less, Mukai (1978) calculated the total mutation rate of the protein-coding genes on the second chromosome to be 0.022 per generation, which could explain at most 16% (¼022/0.14) of the mutations of viability polygenes. Based on these considerations, Mukai (1978) concluded that most (>84%) mutations affecting viability polygenes occurred in noncoding sequences. Since the completion of the human genome project (International Human Genome Sequencing Consortium 2001), it has become possible to identify the genomic structure directly at the level of the DNA sequence. Bioinformatics and various software programs have been developed not only to detect repetitive sequences but also to identify and predict known and unknown protein-coding sequences in whole genomic DNA sequences. Functional genomicists are now working to reveal the biological functions of protein-coding as well as noncoding sequences.

12

Do Long and Highly Conserved Noncoding Sequences in Vertebrates

12.2

189

Genomic Structure of Human and Mammalian Genome

The initial analysis by the International Human Genome Sequencing Consortium (2001) revealed that the interspersed repetitive sequences of long interspersed elements (LINEs), short interspersed elements (SINEs), retrovirus-like elements, and DNA transposon fossils occupied 21, 13, 8, and 3% of the human genomic DNA sequences, respectively. In short, interspersed repetitive sequences comprise 45% of the human genome. In consideration of the amount of chromosomal duplications (3.6%) and other repetitive elements, repetitive sequences were expected to occupy at least 50% of the human genome. Additionally, the total number of protein-coding gene in the draft human genome sequence was estimated to be 30,000–35,000, with an average coding length of 1,400 bp (International Human Genome Sequencing Consortium 2001). After completing the euchromatic sequencing of the human genome, the International Human Genome Sequencing Consortium (2004) reported the size of the human genome to be 3.08 Gb, with the total finished sequences of 2.85 Gb and estimated total gaps of 0.23 Gb. They estimated 28 Mb of the gaps as euchromatic, concluding that the total euchromatic human genome was thus 2.88 Gb. Coding sequences were estimated to be 1.2% of the euchromatic genome or 34 Mb in total. Based on the number of protein-coding genes in the human genome (25,000), the average length of the protein-coding sequence was expected to be 1,400 bp. Based on the subsequent DNA sequencing of whole genomes of mouse and other mammals, mammalian genomes were found to have a similar structure to the human genome (Fig. 12.1) (e.g., International Mouse Sequencing Consortium 2002). As a rough approximation, 1.2–1.5% coding sequences in which

R4

R5

C

R3

R2 N

R1

Fig. 12.1 Composition of human genome deduced from the analysis of whole genomic DNA sequence. International Human Genome Sequencing Consortium (2001) depicted that the portion of protein-coding sequences (C) is only 1.2–1.5% of the genome. Roughly a half of the genomic sequences are various classes of repetitive sequences: R1, LINEs; R2, SINEs; R3, retrovirus-like elements; R4, DNA transposon fossils and R5, other repetitive elements. Another half of the genome consists of noncoding and nonrepetitive sequences (N)

190

Y. Gondo

approximately 25,000–30,000 genes are coded in the 3 Gb of the mammalian genome, although the size of the euchromatic mouse genome was estimated to be 2.5 Gb and significantly smaller (International Mouse Sequencing Consortium 2002).

12.3

Noncoding Sequences in Mammalian Genome

Most (98–99%) of the mammalian genomic sequence is, therefore, noncoding. Repetitive and nonrepetitive sequences each occupy roughly half of the mammalian genome. The biological functions of repetitive as well as non-repetitive noncoding sequences in the genome remain to be elucidated. As an alternative, the “junk DNA” hypothesis has been raised that most of the noncoding DNAs may not have any biological functions (e.g., Nowak 1994). Nonfunctional genomic DNA sequences are expected to have a fast base-substitution rate in the course of evolution due to a lack of evolutionary constraints that would eliminate detrimental base substitutions. For instance, genomic noncoding sequences such as pseudogenes, introns, untranslated regions (UTR) of mRNA, and intergenic sequences, assumed to be less functional sequences than protein-coding sequences, usually have more single nucleotide polymorphisms (SNPs) within species and exhibit higher divergence between the homologs among various species than the protein-coding sequences do. In turn, the degree of homology detected by aligning the syntenic sequences between different species has been used to find protein-coding sequences and functional regulatory elements (reviewed by O’Brien et al. 1999). For instance, sequences that are more than 80% similar between human and rodents are empirically recognized as good candidates for protein-coding sequences and/or functionally constrained parts of the genome. The overall similarity between human and mouse genome was estimated to be 66.7% (International Mouse Sequencing Consortium 2002).

12.4

Conserved Noncoding Sequences

The human genome project and following whole genome sequencing of various species have also allowed us to conduct more precise alignment and comparison of genomic DNA sequences. Also, various conserved sequences have been identified not only in protein-coding sequences but also in noncoding fractions of the genome. For the first time, the capacity to search such highly conserved noncoding sequences in a large scale between vertebrate species became available when the whole mouse genomic sequence was released. The initial publication of the whole mouse genome sequence (Mouse Genome Sequencing Consortium 2002) described the comparison of whole genomic sequences between human and mouse. Approximately, 5% of the 50–100 bp windows in the human genome was conserved in the mouse genome. Since protein-coding sequences comprise 1.5% of the genome at most, the noncoding portion of the identified conserved

12

Do Long and Highly Conserved Noncoding Sequences in Vertebrates

191

sequences is 3.5% or more in the human genome. Dermitzakis et al. (2002) in the same issue of Nature reported more extensive searches focusing on the human chromosome 21. They searched 100 bp with 70% identical sequences between human chromosome 21 and syntenic mouse sequences after masking the repetitive sequences. By further eliminating known coding sequences, Dermitzakis et al. (2002) finally obtained 2,262 of conserved nongenic sequences (CNGs). They further analyzed 220 CNG in 20 mammalian species and found that CNGs are more conserved than protein-coding sequences and noncoding RNAs (Dermitzakis et al. 2003). Indeed, approximately 80% similarity in the protein-coding genomic sequences is high enough to keep the 100% identity of the amino acid sequence in the protein. Because of the degeneracy of the genetic code, base-substitutions between purines (A<>G) and pyrimidines (T<>C) do not change the coded amino acid residue except three cases out of the 64 codons. Thus, to have more than 90% homologies in any two sequences in a significant stretch, there must be some mechanism(s) to maintain or create such long conserved sequences other than to maintain the protein function.

12.4.1

Ultraconserved Elements

Bejerano et al. (2004) expanded the extraction of such highly conserved sequences to the whole genomes of human, mouse and rat with more stringent condition of 200 bp in length with 100% identity. They found 481 ultraconserved elements (UCEs). Most of the UCEs are also conserved in many vertebrate species. For instance, 477, 467, and 324 UCEs exhibited averages of 99.2, 95.7, and 76.8% identities in dog, chicken, and fugu fish genomes, respectively (Bejerano et al. 2004). The distribution frequencies of SNPs were also analyzed within human and chimp populations. Comparing to the distribution frequencies of SNPs in entire genomes, Bejerano et al. (2004) found 20-fold fewer SNPs in the UCEs from both genomes. Bejerano et al. (2004) also analyzed the 481 UCEs with respect to their genomic neighborhood. UCEs were found in both exons (111) and non-exons (256). The remaining 114 UCEs were unknown in terms of this definition. Nonexonic UCEs are further classified to 100 intronic UCEs and 156 intergenic UCEs. Thus, at least, 211 UCEs, which were exonic and intronic, were clearly transcribed. Nonexonic UCEs tended to cluster in gene deserts. Further analysis of UCE locations suggested that exonic UCEs were likely to exist close to known RNA regulating genes whereas intergenic nonexonic UCEs had a tendency to flank genes for regulation of transcription, DNA binding, and development. Intronic UCEs were also often found in development-related genes. The longest UCEs (779, 770, and 731 bp) were clustered in introns of the DNA polymerase alpha catalytic subunit gene (POLA) on the X chromosome. Particularly, the 779 bp UCE was adjacent to another 275 bp UCE, comprising a total of 1,046 bp highly conserved sequence (Bejerano et al. 2004).

192

Y. Gondo

12.4.2

Long Conserved Noncoding Sequences

12.4.2.1

Discovery of LCNS

Just after the completion of whole mouse genome sequencing (Mouse Genome Sequencing Consortium 2002), we have independently started genomewide extractions of highly conserved noncoding sequences between human and mouse (Sakuraba et al. 2008). We firstly masked not only repetitive sequences but also all the proteincoding sequences from human and mouse genomes, thereby extracting only, noncoding and nonrepetitive portion of each genome. Then highly homologous sequences of 500 bp in length and 95% identity were extracted by BLAST search between the human and mouse nonrepetitive noncoding sequences. The human and mouse genomic sequence databases have been updated many times. In response, we conducted the extraction of highly conserved noncoding/nonrepetitive sequences three times during 2002–2007 (Sakuraba et al. 2008). A total of 611 long conserved noncoding sequences (LNCS) were found. We did not consider synteny when we conducted the extraction of LNCS. Nevertheless, the LCNS pairs were syntenic (Sakuraba et al. 2008). 12.4.2.2

Similarity among 611 LCNS

In spite of the repeat masking, minor duplications may exist in the extracted LCNS. We conducted BLAST search for each LCNS to the other and found that six LCNS pairs exhibited some similarity (Sakuraba et al. 2008). The result is summarized in Table 12.1. Four pairs (LCNS 504 and 719, LCNS744 and 767, LCNS 596 and 835 and LCNS 541 and 788) had 85–92.2% similarity but were rather short stretch of 84–294 bp. Each LCNS of the 4 pairs was located on separate chromosomes. The LCNS501 and 503 pair were very similar (90.5%) in short length (63 bp) but they were located very close to each other on the same chromosome in human (chromosome 4) as well as in mouse (chromosome 5). LCNS501 and 503 were found in an intron of the heterogeneous nuclear ribonucleoprotein D (HNRNPD) and heterogeneous nuclear ribonucleoprotein D-like (HNRPDL) genes, respectively. Thus, LCNS501 and 503 seemed to be a part of the intrachromosomal duplication around the HNRNPD and HNRPDL genes. Since this structure is the same in the mouse (Hnrnpd gene and Hnrpdl genes on mouse chromosome 5), the duplication for these paralogs seems to have occurred before the divergence between human and mouse lineages. Then, mutations may have accumulated to reduce the similarity to 90.5% in 65 bp. However, the orthologs are extremely similar (95.7% and 97.1%) in very long stretch of 580 and 686 bp in LCNS501 and 503, respectively (Table 12.1) so they were extracted as one LCNS. It may be possible to explain the huge difference in similarity between orthologs and paralogs as follows. The intrachromosomal duplication around the ancestral HNRNPD gene occurred long time before the common ancestor of human and mouse appeared and the paralogous sequences accumulated many mutations. On the other hand,

Table 12.1 Six pairs of LCNS with sequence similarity Similarity Human (hg18) Conservation Lengtha a % Identity Chr. Start End LCNS I.D. Length 242 504 98.6 133 85.0% chr8 37,357,379 37,357,882 352 719 95.3 chr10 77,076,129 77,076,843 259 744 97.8 294 92.2% chr16 53,780,778 53,781,520 580 767 98.0 chr5 3,565,400 3,566,166 269 596 99.7 84 90.5% chr3 138,466,129 138,466,724 610 835 95.4 chr13 94,416,647 94,417,478 289 541 95.4 104 90.5% chr2 58,711,520 58,712,060 340 788 95.3 chr14 96,500,728 96,501,514 501 580 95.7 63 90.5% chr4 83,494,858 83,495,432 503 686 97.1 chr4 83,565,412 83,566,096 522 1,023 95.9 1,019 90.5% chrX 41,093,072 41,094,094 654 1,061 97.2 chrX 41,093,034 41,094,094 Six pairs out of 611 LCNS had some sequence similarity and are shown in pairwise a Length is in bp Chr. chr8 chr14 chr8 chr13 chr9 chr14 chr11 chr12 chr5 chr5 chr6 chrX

Mouse (mm9) Start End 27,787,978 27,788,481 23,017,453 23,018,171 95,069,373 95,070,116 72,166,145 72,166,910 100,171,994 100,172,686 118,834,842 118,835,707 25,908,519 25,909,051 107,367,078 107,367,862 100,390,903 100,391,482 100,464,327 100,465,010 102,886,532 102,887,550 12,869,761 12,870,815

12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 193

194

Y. Gondo

mutations were hardly accumulated in these syntenic regions after the human and mouse speciated so that the LCNS501 and 503 were highly conserved. It, however, does not explain why only the LCNS501 and 503 were conserved but the flanking syntenic regions had diverged between human and mouse. It may be because of the evolutionary constraint against very functional sequences of LCNS501 and 503 and will be discussed furthermore in Sect. 12.5. The LCNS522 and 654 were another peculiar pair. They showed much longer homology of 1,019 bp with 90.5% similarity compared with the other five LCNS pairs (Table 12.1). LCNS 522 and 654 located on the different chromosomes in the mouse but they were almost the same stretch on human X chromosome. In other words, LCNS522 and 654 were one identical sequence in human but duplicated interchromosomally in the mouse genome. Thus, strictly speaking, human and mouse have 610 and 611 LCNS, respectively.

12.4.2.3

Comparison of LCNS with UCE

We compared the contents of 611 LCNS with the 481 UCE (Bejerano et al. 2004), since the total extracted numbers of the conserved elements in the two independent studies were quite similar in spite of different extraction criteria. The result is summarized in Fig. 12.2. As depicted, 138 (23%) LCNS and 150 (31%) of UCE overlap. LCNS are longer than UCEs by definition and 12 LCNS indeed encompassed two different UCEs. Another new set of 473 LCNS, independent from the UCEs was, therefore, found (Sakuraba et al. 2008); Bejerano et al. (2004) extracted the 481 UCEs with whole genome comparison including protein-coding as well as repetitive sequences. Thus, 69 and 9 UCE overlap protein-coding and repetitive sequences, respectively (Fig. 12.2), which were naturally different from the 611 LCNS. In addition, the 138 LCNS that contained one or two UCEs had extra sequences that did not overlapped any UCEs. Such nonoverlapping portions of the 138 LCNS to UCE were also newly identified highly conserved noncoding/ nonrepetitive sequences. It may be noteworthy that no UCEs were identified on human chromosome 21 (Bejerano et al. 2004); on the other hand, we found three

Fig. 12.2 Sequence comparison between 611 LCNS and 481 UCE. In addition to the 481 UCE, 473 additional LCNS were discovered as highly conserved elements in vertebrates. NonUCEoverlapping sequences in 138 LCNS that contained a part of a UCE are also newly added highly conserved sequences

611 LCNS

481 UCE

473

253 138 150 69

9

12

Do Long and Highly Conserved Noncoding Sequences in Vertebrates

195

LCNS on human chromosome 21 in the syntenic region to mouse chromosome 16 (see Supplementary Table 1 of Sakuraba et al. 2008).

12.4.2.4

Length, Identity, and Location of LCNS

Some characteristics of LCNS length and identity are summarized in Tables 12.2 and 12.3. Naturally, the length of LCNS was much longer than UCE. Table 12.2 depicted top 20 largest and 19 shortest LCNS whose mean identities were 96.1% in both. The longest LCNS146 was 1,865 bp with 95.1% identity, barely satisfying the similarity criterion. The second longest LCNS572, however, exhibited 98.0% identity in the stretch of 1,768 bp. Forty-five LCNS were longer than 1,000 bp. The mean and median lengths of 611 LCNS were 685 and 636 bp, respectively. The 20 most and least similar LCNS are also described in Table 12.3, the mean lengths of which were 617 and 627 bp, respectively. The mean and median identities of 611 LCNS are 95.6 and 96.2%, respectively. The locations of LCNS were classified as UTR, intronic, or intergenic (Table 12.4 and Sakuraba et al. 2008) as done by Bejerano et al. (2004). We eliminated protein-coding exons but kept UTR and still 22 (3.6%) of LCNS were found in 50 or 30 UTRs. A large fraction of LCNS located in intron and more than half (55.3%) were found in intergenic region often distant from nearby genes. Two hundred and seventy nine LCNS were more than 10 kb distant from the closest gene. In spite of the intrinsic differences of length and homology, overall location was quite similar between LCNS and UCE. The LCNS were also clustered as the case for UCE (see Fig. 1 of Sakuraba et al. 2008). As shown above, LCNS located all chromosomes including human chromosome 21 except Y chromosome. The distribution of LCNS varies. For instance, human chromosome 2 carries more than 2-folds of LCNS than average. Mouse chromosome 2 that is mostly syntenic to human chromosome 2 also had 2 - 3 folds enriched with LCNS than average. Contrarily, the number of LCNS on human chromosome 12, 21, 22, and Y and mouse chromosome 10, 15, 16, and Y were extremely underrepresented (Sakuraba et al. 2008). Another key similarity of LCNS to UCE was the degree of conservation in the other species. The 611 LCNS were surveyed in genomic DNA database of dog, chicken, frog, fugu, tetraodon, and zebrafish in which 606, 493, 397, 82, 58, and 83 LCNS were identified, respectively. Three invertebrate species (Ciona intestinalis, Ciona savignyi, and Drosophila melanogaster) were also surveyed but no LCNS homologs were found. Interestingly, the degree of conservation of LCNS in vertebrate species was more or less inversely proportional to the evolutionary distance. Not only the number of LCNS identified in the six vertebrate species but also the average identities were negatively correlated to the evolutionary distance. The average identities of LCNS located in dog, chicken, frog, fugu, tetraodon, and zebrafish genomes were 95.6%, 94.1%, 91.6%, 90.8%, 90.9%, and 90.8%, respectively (Sakuraba et al. 2008).

Table 12.2 Twenty longest and nineteen shortest LCNS I.D. Conservation Human (hg18) % Identity Chr. Start End Lengtha LCNS146 1,865 95.1 chr6 99,531,674 99,533,530 LCNS572 1,768 98.0 chr14 25,985,222 25,986,988 LCNS230 1,722 96.9 chr19 35,532,814 35,534,534 LCNS076 1,548 95.5 chr2 156,292,959 156,294,493 LCNS033 1,473 95.1 chr10 23,526,498 23,527,964 LCNS200 1,436 95.1 chr1 91,071,304 91,072,739 LCNS557 1,359 96.2 chr6 86,377,974 86,379,319 LCNS440 1,291 97.4 chrX 24,825,753 24,827,039 LCNS577 1,282 95.2 chr14 25,983,926 25,985,203 LCNS334 1,257 95.1 chr14 35,884,301 35,885,557 LCNS478 1,253 97.2 chr3 159,508,610 159,509,860 LCNS050 1,250 96.2 chr2 143,820,605 143,821,853 LCNS583 1,232 96.4 chr5 91,478,639 91,479,869 LCNS639 1,219 95.1 chr10 103,201,169 103,202,380 LCNS111 1,213 95.4 chr2 176,462,362 176,463,573 LCNS482 1,202 97.6 chr3 159,258,850 159,260,050 LCNS632 1,195 96.4 chr18 70,637,515 70,638,707 LCNS316 1,185 96.6 chr14 28,928,655 28,929,832 LCNS474 1,180 96.4 chr1 97,051,832 97,053,011 LCNS364 1,162 95.7 chr14 56,126,548 56,127,704 LCNS181 503 95.2 chr1 32,282,701 32,283,345 LCNS228 503 95.6 chr19 35,724,274 35,724,775 LCNS062 503 96.8 chr2 144,978,686 144,979,187 LCNS038 503 95.8 chr9 127,775,210 127,775,707 Chr. chr4 chr12 chr7 chr2 chr2 chr5 chr9 chrX chr12 chr12 chr3 chr2 chr13 chr19 chr2 chr3 chr18 chr12 chr3 chr14 chr4 chr7 chr2 chr2

Start 22,228,018 47,799,178 38,435,408 56,308,034 19,372,376 106,987,921 88,347,742 90,654,218 47,797,882 57,488,091 66,995,021 43,833,513 80,139,355 45,520,854 74,328,073 66,739,049 84,583,528 51,220,916 119,421,838 49,075,805 129,390,856 38,271,355 44,952,525 34,022,608

Mouse (mm9) End 22,229,881 47,800,939 38,437,128 56,309,578 19,373,845 106,989,346 88,349,092 90,655,508 47,799,156 57,489,341 66,996,272 43,834,757 80,140,585 45,522,070 74,329,285 66,740,249 84,584,719 51,222,097 119,423,010 49,076,962 129,391,399 38,271,857 44,953,237 34,023,110

Location Intergene 30 UTR Intron Intergene Intergene Intergene 30 UTR Intron 30 UTR Intergene Intron Intron Intergene Intron Intergene Intergene Intron Intergene Intergene Intron Intron Intron Intron Intergene

>10 kb 10 kb

>10 kb >10 kb >10 kb >100 kb >10 kb >10 kb >10 kb >10 kb >100 kb 10 kb

>10 kb

>100 kb 10 kb >10 kb

Distanceb >10 kb

196 Y. Gondo

LCNS426 503 95.6 chrX 39,829,216 39,829,717 LCNS358 502 96.2 chr10 77,543,699 77,544,199 LCNS411 502 96.0 chr18 43,323,216 43,323,716 LCNS113 502 95.4 chr2 177,211,340 177,211,838 LCNS114 502 95.8 chr2 177,393,500 177,394,001 LCNS153 502 98.8 chr6 97,651,729 97,652,230 LCNS424 502 95.2 chr9 75,991,328 75,991,829 LCNS433 502 95.2 chrX 147,900,026 147,900,525 LCNS295 501 96.2 chr17 32,268,766 32,269,265 LCNS403 501 99.2 chr18 22,169,478 22,169,978 LCNS291 501 95.0 chr2 57,958,773 57,959,273 LCNS277 501 96.2 chr2 60,709,281 60,709,781 LCNS197 501 95.8 chr4 85,619,277 85,619,777 LCNS264 500 96.2 chr11 115,737,826 115,738,325 LCNS035 500 96.0 chr9 127,959,649 127,960,155 a Length is in bp b Distance from the nearby protein-coding sequence in the mouse genome

chrX chr14 chr18 chr2 chr2 chr4 chr19 chrX chr11 chr18 chr11 chr11 chr5 chr9 chr2

11,645,035 23,468,812 76,812,642 74,976,214 75,155,446 24,596,593 19,511,315 67,145,332 84,424,428 15,003,269 26,714,982 23,915,928 102,074,332 46,468,340 33,888,639

11,645,535 23,469,312 76,813,143 74,976,715 75,155,947 24,597,094 19,511,816 67,145,833 84,424,928 15,003,769 26,715,482 23,916,426 102,074,830 46,468,838 33,889,144

Intron Intron Intergene Intergene Intergene Intron Intergene Intergene Intergene Intron Intergene Intergene Intergene Intergene Intergene >10 kb >10 kb >10 kb >10 kb >100 kb

>10 kb >100 kb >10 kb >100 kb >10 kb >100 kb >10 kb >10 kb

12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 197

Table 12.3 Twenty most and least similar LCNS I.D. Conservation % Identity Chr. Lengtha LCNS438 962 99.8 chrX LCNS269 596 99.7 chr3 LCNS441 819 99.5 chrX LCNS637 667 99.3 chr10 LCNS344 785 99.2 chr7 LCNS403 501 99.2 chr18 LCNS414 581 99.1 chr18 LCNS103 557 99.1 chr2 LCNS592 551 99.1 chr5 LCNS400 559 98.9 chr18 LCNS152 538 98.9 chr6 LCNS477 616 98.9 chr3 LCNS472 525 98.9 chr9 LCNS039 516 98.8 chr9 LCNS153 502 98.8 chr6 LCNS506 739 98.8 chr7 LCNS640 550 98.7 chr10 LCNS634 603 98.7 chr5 LCNS585 664 98.6 chr5 LCNS242 504 98.6 chr8 LCNS620 627 95.1 chr2 LCNS249 546 95.1 chr16 LCNS328 586 95.1 chr14 LCNS342 586 95.1 chr14 Human (hg18) Start End 24,918,245 24,919,206 138,466,129 138,466,724 24,804,732 24,805,549 102,437,335 102,438,001 20,970,118 20,970,902 22,169,478 22,169,978 43,024,590 43,025,170 174,904,641 174,905,197 77,183,641 77,184,191 20,946,991 20,947,549 97,769,812 97,770,349 181,919,488 181,920,103 134,485,104 134,485,628 127,696,600 127,697,115 97,651,729 97,652,230 114,117,479 114,118,217 102,405,068 102,405,616 139,475,193 139,475,792 81,183,117 81,183,780 37,357,379 37,357,882 44,024,538 44,025,163 49,663,946 49,664,490 33,182,370 33,182,955 98,953,026 98,953,611 Chr. chrX chr9 chrX chr19 chr12 chr18 chr18 chr2 chr13 chr18 chr4 chr3 chr2 chr2 chr4 chr6 chr19 chr18 chr13 chr8 chr17 chr8 chr12 chr12

Start 90,555,381 100,171,994 90,674,418 44,773,397 119,958,510 15,003,269 77,101,560 73,106,824 95,472,039 13,897,326 24,471,724 33,781,632 28,740,382 34,100,300 24,596,593 15,388,331 44,745,421 36,448,133 91,510,428 27,787,978 85,148,509 91,497,355 55,017,916 109,360,726

Mouse (mm9) End 90,556,342 100,172,686 90,675,236 44,774,063 119,959,293 15,003,769 77,102,140 73,107,380 95,472,588 13,897,883 24,472,261 33,782,247 28,740,906 34,100,813 24,597,094 15,389,069 44,745,970 36,448,659 91,511,091 27,788,481 85,149,135 91,497,899 55,018,500 109,361,309

Location Intron Intergene Intron Intergene Intergene Intron Intron Intergene Intergene Intron Intron Intergene Intron Intron Intron 30 UTR Intergene Intergene Intergene Intergene Intron Intergene Intron Intron

>10 kb >10 kb

>10 kb 10 kb >10kb >100 kb

>10 kb >10 kb

>10 kb

10 kb >10 kb >10 kb

>10 kb >100 kb

>100 kb

Distanceb

198 Y. Gondo

LCNS361 889 95.1 chr10 78,060,662 78,061,550 LCNS140 786 95.0 chr8 59,976,155 59,976,939 LCNS350 583 95.0 chr6 1,723,057 1,723,639 LCNS280 603 95.0 chr2 60,370,645 60,371,246 LCNS381 723 95.0 chr13 99,409,271 99,409,993 LCNS406 522 95.0 chr18 35,155,674 35,156,195 LCNS072 522 95.0 chr2 147,006,391 147,006,912 LCNS442 522 95.0 chrX 71,478,279 71,478,798 LCNS067 803 95.0 chr2 146,405,657 146,406,459 LCNS057 542 95.0 chr2 144,471,545 144,472,086 LCNS432 642 95.0 chrX 147,827,152 147,827,793 LCNS291 501 95.0 chr2 57,958,773 57,959,273 LCNS088 601 95.0 chr2 163,813,314 163,813,913 LCNS549 621 95.0 chr15 65,691,994 65,692,614 LCNS621 661 95.0 chr2 44,661,079 44,661,739 LCNS255 680 95.0 chr16 50,492,105 50,492,780 a Length is in bp b Distance from the nearby protein-coding sequence in the mouse genome

chr14 chr4 chr13 chr11 chr14 chr18 chr2 chrX chr2 chr2 chrX chr11 chr2 chr9 chr17 chr8

23,913,918 6,717,506 32,062,621 24,212,663 122,852,051 27,787,891 47,158,094 99,489,220 46,521,971 44,480,335 67,071,319 26,714,982 63,413,486 63,159,161 85,694,024 92,233,969

23,914,806 6,718,290 32,063,203 24,213,264 122,852,772 27,788,411 47,158,615 99,489,740 46,522,769 44,480,871 67,071,960 26,715,482 63,414,085 63,159,778 85,694,683 92,234,648

Intergene Intron Intron Intergene Intergene Intergene Intergene Intergene Intergene Intron Intron Intergene Intergene Intron Intron Intergene >10 kb >100 kb

10 kb >10 kb >10 kb >100 kb 10 kb >10 kb >100 kb >10 kb >100 kb >10 kb >10 kb >10 kb >100 kb

12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 199

200

Y. Gondo

Table 12.4 Summary of LCNS locations

12.5

50 30 >100 kb >10 kb 10 kb >100 kb >10 kb 10 kb

6 1.0% 3.6% 16 2.6% Intron 3 0.5% 41.1% 119 19.5% 129 21.1% Intergenic 147 24.1% 55.3% 132 21.6% 59 9.7% Total 611 The location of 611 LCNS are classified to one of eight categories based on the distance from the nearby protein-coding sequence in the mouse genome UTR

Working Hypotheses for Genomic Sequence Conservation

To understand the biological function(s) of the highly conserved noncoding sequences, it is necessary to consider plausible mechanisms of making conserved sequences in many different species. Four working hypotheses that would create and/or maintain highly conserved sequences in coding sequences as well as in noncoding sequences will be discussed below. These working hypotheses are not exclusive to each other. Two or more combinations of plausible mechanisms may contribute to maintain the conservation of genomic sequences among various species. Among the four working hypotheses, only the first one (Sect. 12.5.1) requires significant biological function for the maintenance of conserved sequences whereas the other three hypotheses do not necessary need such functions to explain the highly identical sequences among various species.

12.5.1

Functional Constraint

As described above, the primary working hypothesis for maintenance of LCNS is that evolutionary constraints keep the functionally important genomic DNA sequence from changing. Such functional genomic sequences may be protected from accumulation of spontaneous and/or induced mutations by natural selection. Mutations usually disrupt and disturb the normal function of the gene (or genomic sequence), since the nature of the mutation is random in terms of base-pair array in the genome. It is why radiations, chemical mutagens, and other genotoxic agents are usually harmful to biology and cause various genetic disorders including tumorigenesis, genetic diseases, and predispositions of various genetic risk factors to individuals. Such detrimental mutations are eliminated from natural populations by Darwinian selection. Thus, having more significant function, a genomic sequence tends to exhibit higher degree of conservation among various species due to the evolutionary constraints. To directly test this hypothesis, in vivo assay has been conducted (Poulin et al. 2005; Pennacchio et al. 2006; Visel et al. 2008). By using a transgenic mouse

12

Do Long and Highly Conserved Noncoding Sequences in Vertebrates

201

enhancer assay with reporter genes, highly conserved elements have been experimentally examined of their enhancer cis-regulatory activity. For instance, Pennacchio et al. (2006) tested 167 highly conserved sequences and found that 45% of the sequences had tissue-specific cis-regulatory function at mouse embryonic day 11.5. Furthermore, Visel et al. (2008) compared such enhancer activities between UCE and highly conserved but not in 100% identity sequences by using the transgenic approach. They confirmed the enhancer activity not only in UCE but also in the other highly conserved sequences, suggesting UCE may be a part of a larger enhancer family in the genome. Derti et al. (2006) proposed another possible function of UCE. They proposed that the UCE and/or flanking sequences might maintain the diploid karyotype by the dosage sensitivity. Mammalian UCEs are highly depleted among segmental duplications and copy number variants. This hypothesis seems to be concordant with the fact that UCEs were not found on Y chromosome, human chromosome 21, or in the syntenic regions of the mouse genome. The Y chromosome is only the only nondiploid region in mammals. Human chromosome 21, in which trisomy causes Down syndrome, might be less tolerant of diploid constraint. We, however, found three LCNS on human chromosome 21 and the syntenic region in the mouse (Sakuraba et al. 2008 and Sect. 12.4.2.3). Knockout (KO) mouse studies of UCEs have raised controversial findings related to the functional constraint hypothesis. For instance, Ahituv et al. (2007) disrupted four UCE independently and analyzed the KO mice. None of four KO mouse strains exhibited any anomalies, indicating such UCE should be dispensable. Then, McLean and Bejerano (2008) found that ultraconserved-like elements were over 300-fold less likely than neutral DNA to have been lost during rodent evolution. If UCEs are dispensable, then they should have been lost from the population, similar to neutral sequences. The mutagenesis analysis of highly conserved sequences is also discussed in Sect. 12.5.2.

12.5.2

Mutational Cold Spots

If a genomic sequence is a mutational cold spot, meaning little or no mutation occurs in a sequence, such a genomic sequence might keep the same array of base pairs in many generations and consequently conserved in many different species. Since many mutagens directly target genomic DNA sequences to modify or break down DNA molecules, tightly packed chromatin structure, e.g., in heterochromatic regions, prevent the mutagen from attacking DNA molecules, resulting in a void of the accumulation of mutations. Alternatively (or together), an enhanced DNA repair system in particular genomic sequences would be another mechanism to give rise to mutational cold spots. Whatever the mechanisms of making mutational cold spots would be, if they exist in the genome, they would be highly conserved portions of the genome.

202

Y. Gondo

Bejerano et al. (2004) found much less but some SNPs in human UCE than average. Thus, some mutations have occurred in UCEs. Several analyses of genotype data in human SNP projects (Drake et al. 2006; Katzman et al. 2007) indirectly suggested that UCE and highly conserved sequences were not mutational cold spots. We, therefore, experimentally tested if LCNS are mutational cold spots by using ENU mutagenesis (Sakuraba et al. 2008). We have produced 10,000 ENUmutagenized G1 mice and extracted each DNA (Sakuraba et al. 2005). By using a high-throughput mutation discovery system combining PCR amplification and heteroduplex detection (Sakuraba et al. 2005), several LCNS as well as nonLCNS were subjected to detect ENU-induced mutations (Sakuraba et al. 2008). We found 12 and 136 ENU-induced mutations by screening a total of 16.5 and 181.0 Mb of LCNS and nonLCNS, respectively. Thus, ENU-mutations were found one in 1.371 Mb and in 1.331 Mb of LCNS and nonLCNS, respectively. This very equivalent ENU-induced mutation frequency was also reproduced in a new enhanced mutation discovery system, in which we found 23 and 207 ENU-induced mutations by screening 24.2 and 223.9 Mb of LCNS and nonLCNS, respectively (Sakuraba et al. 2008). Thus, the mutational cold spot hypothesis is unlikely to explain the maintenance of highly conserved sequences in vertebrates during evolution. All the G1 mice that were examined for the ENU mutagenesis study above were maintained as frozen sperm (Sakuraba et al. 2005); therefore, it is possible to analyze live mice carrying an ENU-induced mutation in the LCNS. The total of 35 mouse strains carrying an ENU-induced mutation in an LCNS are listed in our WEB site (http://www.brc.riken.go.jp/lab/mutants/genedriven.htm) and freely available based upon request to RIKEN BioResource Center (BRC) (http://www. brc.riken.jp/lab/animal/en/depo.shtml).

12.5.3

Horizontal Transfer

Another mechanism to make a highly conserved genomic sequence among various species is a recent event of DNA transfer from one species to the other. Interspeciesactive transposition and retroposition would be a plausible mechanism. If a DNA segment horizontally transferred to many species at one time very recently, the transmitted portion of the genomic DNA would have the very similar sequences in the affected species. One discrepancy is that the horizontal transfer by transposon, for instance, usually gives rise to multiple copies in the genome, comprising a part of repetitive sequences. Also, if horizontal transfer happened very recently, the degree of conservation should not be inversely proportional to the evolutionary distance. As described in Sect. 12.4.2.4, however, the degree of the LCNS conservation was inversely proportional to the evolutionary distance. A simple transposon hypothesis does not explain syntenic localization of UCE and LCNS pairs in human and mouse. A combination of functional constraint and horizontal transfer may have occurred. At the beginning of adaptive radiation of vertebrate species, horizontal

12

Do Long and Highly Conserved Noncoding Sequences in Vertebrates

203

transfer might have been very active via various transposons and spread out to many radiated ancestors of vertebrates. If the transposons had been originated not from the direct ancestor species but from e.g., fungi, viruses, and/or bacteria, it is reasonable that neither UCE nor LCNS would be found in any invertebrate species. In this model, various sequences could have been horizontally transmitted to various loci in the genome of many vertebrate ancestors. Then bottleneck and founder effects reduced the number of ancestors and a few lineages furthermore may have undergone adaptive radiations. Each lineage, then, would maintain the syntenic localization of highly conserved sequences like UCEs and LCNS in human, mouse, and rat. Functional constraints might have been maintaining only the highly conserved sequences like UCE and LCNS but flanking sequences diversified. Bejerano et al. (2006) showed some evidence of retroposon-like origins of UCEs.

12.5.4

Concerted Evolution and Gene Conversion

Some portions of genomic sequences have been homogenized to the identical or similar sequences, resulting in the concerted evolution (Nenoi et al. 1998; Gondo et al. 1998; Nei et al. 2000; Okada Y, Gondo Y, Ikeda JE, unpublished). An example has been found in the genomic sequences of the ubiquitin gene among very diversified species of fungi, plants, and animals including human. Gene families code ubiquitin and head-to-tail tandem structure and unequal crossing over seems to maintain the identical genomic DNA sequence of the poly-ubiquitin gene (Nenoi et al. 1998; Nei et al. 2000). A deubiquitinase gene coding for USP17 in human (Gondo et al. 1998; Saitoh et al. 2000) was also found to be very conserved among tested mammalian species (Gondo et al. 1998; Okada Y, Gondo Y, Ikeda JE unpublished). The USP17 gene was found on human chromosomes 4 and 8 with 50–100 head-to-tail tandem copies and a few copies, respectively (Gondo et al. 1998; Okada et al. 2002). The USP17 gene was also identified in many mammalian species in head-to-tandem repeat structure except in the mouse (Gondo et al. 1998; Okada Y, Gondo Y. Ikeda JE unpublished). The copy numbers on human chromosome 4 were highly polymorphic (Gondo et al. 1998) but the 4.7 kb unit sequence of the USP17 gene with the flanking sequences was very identical between copies (99%). The degree of homology (> 99%) between the 4.7 kb repeating units was at the level of the UCE and LCNS. The extremely high similarity was found not only within the tandem repeat on the chromosome 4 but also in a few copies on the chromosome 8. Thus, simple unequal crossing over to homogenize the unit sequence may not be enough to explain the highly conserved 4.7-kb sequences in human and other mammalian species. Some unknown gene-conversion mechanism might have homogenized the 4.7-kb unit sequences between the tandemly repeated sequences on chromosome 4 as well as between unit sequences on chromosome 4 and 8. If the homogenization mechanism of the ubiquitin and the 4.7-kb unit including the USP17

204

Y. Gondo

gene is revealed, it might provide another working hypothesis to give rise to highly conserved sequences.

12.6

Conclusions

Highly conserved sequences have been found in vertebrates. The rich accumulation of the knowledge of highly conserved sequences in vertebrates raises various questions and working hypotheses. The answers, however, are yet to be determined. One of the most critical issues in this field of study is the lack of highly conserved sequences like UCE and LCNS in invertebrate species. Invertebrates may have their own highly conserved sequences. It is necessary to survey in the other clade if some other classes of highly conserved sequences exist. The horizontal transfer hypothesis emphasizes the importance of genomic sequence data not only from species that are closely related to vertebrates but also from more distantly related organisms including fungi, bacteria, and viruses. Even metagenomics of lower eukaryotes and prokaryotes may provide key genomic sequencing data set to explain the presence of highly conserved sequences in vertebrates. New generation sequencing technologies should enhance such surveys. Extensive surveys of highly conserved sequences in all kingdoms may provide clues to understand the nature of highly conserved sequences in the genome such as the origin, mechanism of conservation, and function if any at all. Acknowledgments Author appreciates Dr. Daniel E. Janes for constructive discussions and critical reading of this manuscript. The author thanks Dr. Yoshiyuki Sakaki and his colleagues at RIKEN Genomic Sciences Center and Dr. Masayuki Yamamura and his colleagues at Tokyo Institute of Technology for the extraction of LCNS and useful discussions. The author also acknowledges Dr. Yoshiyuki Sakuraba and the members of the Population and Quantitative Genomics Team at RIKEN Genomic Sciences Center, where the most of the LCNS works described in this chapter was conducted. This work is partly supported by Grants-in-Aid for Scientific Research (A) (KAKENHI 15200032 and KAKENHI 21240043).

References Ahituv N, Zhu Y, Visel A, Holt A, Afzal V, Pennacchio LA, Rubin EM (2007) Deletion of ultraconserved elements yields viable mice. PLoS Biol 5(9):e234 Bantle JA, Hahn WE (1976) Complexity and characterization of polyadenylated RNA in the mouse brain. Cell 8:139–150 Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D (2004) Ultraconserved elements in the human genome. Science 304(5675):1321–1325 Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Rubin EM, Kent WJ, Haussler D (2006) A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature 441(7089):87–90 Britten RJ, Kohne D (1968) Repeated sequences in DNA. Science 161(841):529–540

12

Do Long and Highly Conserved Noncoding Sequences in Vertebrates

205

Chikaraishi DM, Deeb SS, Sueoka N (1978) Sequence complexity of nuclear RNAs in adult rat tissue. Cell 13:111–120 Dermitzakis ET, Reymond A, Lyle R, Scamuffa N, Ucla C, Deutsch S, Stevenson BJ, Flegel V, Bucher P, Jongeneel CV, Antonarakis SE (2002) Numerous potentially functional but nongenic conserved sequences on human chromosome 21. Nature 420(6915):578–582 Dermitzakis ET, Reymond A, Scamuffa N, Ucla C, Kirkness E, Rossier C, Antonarakis SE (2003) Evolutionary discrimination of mammalian conserved non-genic sequences (CNGs). Science 302(5647):1033–1035 Derti A, Roth FP, Church GM, Wu CT (2006) Mammalian ultraconserved elements are strongly depleted among segmental duplications and copy number variants. Nat Genet 38(10): 1216–1220 Drake JA, Bird C, Nemesh J, Thomas DJ, Newton-Cheh C, Reymond A, Excoffier L, Attar H, Antonarakis SE, Dermitzakis ET, Hirschhorn JN (2006) Conserved noncoding sequences are selectively constrained and not mutation cold spots. Nat Genet 38(2):223–227 Gondo Y, Okada T, Matsuyama N, Saitoh Y, Yanagisawa Y, Ikeda JE (1998) Human megasatellite DNA RS447: copy-number polymorphisms and interspecies conservation. Genomics 54(1):39–49 International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921 International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:932–945 Judd BH, Shen MW, Kaufman TC (1972) The anatomy and function of a segment of the X chromosome of Drosophila melanogaster. Genetics 71(1):139–156 Katzman S, Kern AD, Bejerano G, Fewell G, Fulton L, Wilson RK, Salama SR, Haussler D (2007) Human genome ultraconserved elements are ultraselected. Science 317(5840):915 McLean C, Bejerano G (2008) Dispensability of mammalian DNA. Genome Res 18(11): 1743–1751 Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562 Mukai T (1964) The genetic structure of natural populations of Drosophila melanogaster I. Spontaneous mutation rate of polygenes controlling vaiability. Genetics 50:1–19 Mukai T (1978) Population genetics. Kodansha Scientific, Tokyo, in Japanese Mukai T, Chigusa SI, Mettler LE, Crow JF (1972) Mutation rate and dominance of genes affecting viability in Drosophila melanogaster. Genetics 72(2):335–355 Nei M, Rogozin IB, Piontkivska H (2000) Purifying selection and birth-and-death evolution in the ubiquitin gene family. Proc Natl Acad Sci USA 97(20):10866–10871 Nenoi M, Mita K, Ichimura S, Kawano A (1998) Higher frequency of concerted evolutionary events in rodents than in man at the polyubiquitin gene VNTR locus. Genetics 148(2):867–876 Nowak R (1994) Mining treasures from “junk DNA”. Science 263:608–610 O’Brien SJ, Menotti-Raymond M, Murphy WJ, Nash WG, Wienberg J, Stanyon R, Copeland NG, Jenkins NA, Womack JE, Marshall Graves JA (1999) The promise of comparative genomics in mammals. Science 286(5439):458–481 Ohnishi O (1977) Spontaneous and ethyl methanesulfonate-induced mutations controlling viability in Drosophila melanogaster. II. Homozygous effect of polygenic mutations. Genetics 87(3):529–545 Okada T, Gondo Y, Goto J, Kanazawa I, Hadano S, Ikeda JE (2002) Unstable transmission of the RS447 human megasatellite tandem repetitive sequence that contains the USP17 deubiquitinating enzyme gene. Hum Genet 110(4):302–313 Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD, Plajzer-Frick I, Akiyama J, De Val S, Afzal V, Black BL, Couronne O, Eisen MB, Visel A, Rubin EM (2006) In vivo enhancer analysis of human conserved non-coding sequences. Nature 444(7118):499–502

206

Y. Gondo

Poulin F, Nobrega MA, Plajzer-Frick I, Holt A, Afzal V, Rubin EM, Pennacchio LA (2005) In vivo characterization of a vertebrate ultraconserved enhancer. Genomics 85(6):774–781 Saitoh Y, Miyamoto N, Okada T, Gondo Y, Showguchi-Miyata J, Hadano S, Ikeda JE (2000) The RS447 human megasatellite tandem repetitive sequence encodes a novel deubiquitinating enzyme with a functional promoter. Genomics 67(3):291–300 Sakuraba Y, Sezutsu H, Takahasi KR, Tsuchihashi K, Ichikawa R, Fujimoto N, Kaneko S, Nakai Y, Uchiyama M, Goda N, Motoi R, Ikeda A, Karashima Y, Inoue M, Kaneda H, Masuya H, Minowa O, Noguchi H, Toyoda A, Sakaki Y, Wakana S, Noda T, Shiroishi T, Gondo Y (2005) Molecular characterization of ENU mouse mutagenesis and archives. Biochem Biophys Res Commun 336(2):609–616 Sakuraba Y, Kimura T, Masuya H, Noguchi H, Sezutsu H, Takahasi KR, Toyoda A, Fukumura R, Murata T, Sakaki Y, Yamamura M, Wakana S, Noda T, Shiroishi T, Gondo Y (2008) Identification and characterization of new long conserved noncoding sequences in vertebrates. Mamm Genome 19(10–12):703–712 Visel A, Prabhakar S, Akiyama JA, Shoukry M, Lewis KD, Holt A, Plajzer-Frick I, Afzal V, Rubin EM, Pennacchio LA (2008) Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nat Genet 40(2):158–160 Wetmur J, Davidson N (1968) Kinetics of renaturation of DNA. J Mol Biol 31(3):349–370

Part III Morphological Evolution / Speciation

Chapter 13

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina Anne Duplouy and Scott L. O’Neill

Abstract Maternally inherited insect symbionts often manipulate host reproduction for their own benefit. Symbionts are transmitted to the next host generation through the female hosts, and as such males represent dead ends for transmission. Natural selection therefore favors symbiont-induced phenotypes that provide a reproductive advantage to infected females, regardless of possible negative selective effects on males. Male-killing (MK) is one such phenotype, in which symbionts kill the male progeny of infected females. Compared with other symbiont-associated reproductive phenotypes, MK is relatively unexplored mechanistically as well as ecologically. A male-killing Wolbachia bacterium strain named wBol1 has been described in the tropical butterfly Hypolimnas bolina. By reviewing the different features of this association it is possible to summarize what is already known about the biology and evolution of MK symbionts, as well as highlight the current gaps in our understanding of this striking reproductive phenotype.

13.1

Introduction

There are numerous symbiotic associations known to occur within nature; however, few associations are more complex than those involving endosymbiosis. The study of endosymbionts challenges the scientific community with questions about how each member of the symbiosis coexists and how they maximize their reproductive fitness. Endosymbionts are extremely common and over the course of evolution have arisen in very different taxonomic groups. In insects, although endosymbiotic eukaryotic microorganisms are common (e.g., the yeast-like endosymbiont

A. Duplouy and S.L. O’Neill School of Biological Sciences, The University of Queensland, Brisbane, QLD 4072, Australia e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_13, # Springer-Verlag Berlin Heidelberg 2010

209

210

A. Duplouy and S.L. O’Neill

Symbiotaphrina buchneri infecting anobiid beetles, Noda and Kodama 1996; or the fungal symbiont of the brown-banded cockroach species Supella longipalpa, Gibson and Hunter 2009), most described endosymbionts are bacteria including members of the Proteobacteria (e.g., Buchnera and Wolbachia), Flavobacteria (e.g., Blattabacterium), and Mollicutes (e.g., Spiroplasma) (Werren and O’Neill 1997; Bourtzis and Miller 2003, 2006), amongst others. Insect endosymbionts also show diversity in their modes of transmission, either vertically (maternally) transmitted from mother to offspring or horizontally transmitted. In the latter case, symbiont may be infectious within a single species or between different species. Examples of occasional horizontal transfer of maternally transmitted symbionts have been reported (Werren and O’Neill 1997). “Primary endosymbionts” are usually obligate endosymbionts, needed for host reproduction and/or survival. For example, Moran et al. (2005) showed that Buchnera aphidicola provides essential nutrients deficient within the aphid host’s diet. Some primary endosymbionts have been shown to display phylogenetic concordance with their hosts over millions of years demonstrating long-term coevolution (Moran et al. 1993, 1994; Bandi et al. 1994). Facultative endosymbionts, often referred to as “secondary endosymbionts,” infect individuals already carrying a primary symbiont. A classic example is the pea aphid Acyrthosiphon pisum that harbors multiple secondary symbionts such as Hamiltonella defensa, in addition to the primary symbiont Buchnera sp. (Moran et al. 2005; Oliver et al. 2007). The functional roles of secondary symbionts within the host are not always well defined, as any effect can be hidden by the action of the primary symbionts (Chen et al. 2000; Moran et al. 2005; Ruan et al. 2006). Finally, “reproductive symbionts,” also termed “guest microbes” (Bourtzis and Miller 2003), were first described as symbionts able to enhance their own fitness by manipulating host reproduction (Taylor and Hoerauf 1999). Some of these distortions involve sex ratio manipulation of the host. Spiroplasma for example kills males in Drosophila species (Hurst et al. 1999a), while Cardinium sterilizes certain males of the wasp Encarsia pergandiella (Hunter et al. 2003). However, recent studies have revealed additional capabilities of reproductive symbionts that enhance their fitness without affecting the host’s reproductive system (Brownlie et al. 2009).

13.2 Wolbachia Pipientis Wolbachia pipientis is a species of obligate intracellular alpha-Proteobacteria closely related to Rickettsia. Wolbachia were first discovered in the early 1920s in the ovaries of the mosquito Culex pipiens (Hertig and Wolbach 1924). Based on genetic variation Wolbachia strains were divided into eight highly divergent supergroups named A through H (Bandi et al. 1998; Zhou et al. 1998; Bourtzis and Miller 2003; Lo et al. 2007). The two most studied and described Wolbachia supergroups, A and B, diverged approximately 50–70 million years ago (Werren et al. 1995; Werren and O’Neill 1997). Wolbachia belonging to these two groups, known as the

13

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina

211

“arthropod Wolbachia,” are mostly harbored by insects but are also described from other host phyla such as Crustacea or Arachnida. Supergroups A and B Wolbachia are mostly parasitic and induce a broad range of reproductive distortions in their hosts. In comparison, Wolbachia belonging to both the C and D supergroups are mutualistic strains required for fertility and development of their filarial nematode hosts (Bandi et al. 1998). Within the C and D clusters, Wolbachia phylogeny is concordant with host phylogeny, suggesting long-term coevolution. The remaining four clusters (E–H) infect various arthropods or nematodes; however, these associations are often poorly described and symbiont-induced effects are not always known (Vandekerckhove et al. 1999; Lo and Evans 2007; Covacin and Barker 2007). W. pipientis, the most extensively studied reproductive endosymbiont to date, has the greatest diversity of host interactions including mutualism and all types of known reproductive manipulations – cytoplasmic incompatibility, feminization, parthenogenesis, or male-killing (O’Neill et al. 1997).

13.2.1

Reproductive Distortions

Maternally transmitted endosymbionts, such as Wolbachia, can enhance their transmission rate by manipulating their host’s reproduction (O’Neill et al. 1997; Bourtzis and Miller 2003). To understand the benefits they gain from these manipulations, it is worthwhile summarizing what is known about the most common symbiont-induced reproductive phenotypes. The first reproductive manipulation to be attributed to Wolbachia was cytoplasmic incompatibility (CI). In the 1950s, Ghelelovitch (1952) and Laven (1959) described crosses between strains of the mosquito Culex pipiens that sometimes failed to produce progeny. Later, Yen and Barr (1971) showed that Wolbachia was the causative agent of these reproductive failures. Wolbachia-infected males when crossed with uninfected females failed, whereas all other possible crosses (crosses between uninfected individuals, and between infected females and either uninfected males or males carrying the same infection) resulted in normal reproductive output. The mechanistic basis of this reproductive incompatibility between uninfected females and infected males has been linked to abnormalities during fertilization by cytological studies (Tram and Sullivan 2002). Abnormal behavior of chromosomal material from infected males causes incompatibility with female pronuclei and later the death of the progeny. The CI of these gametes provides an advantage to infected females, as they can successfully mate with both infected and noninfected males. As a result, the maternally transmitted symbiont spread rapidly into the host population. CI is not unique to Wolbachia: Cardinium also induces CI in the parasitoid wasp Encarsia pergandiella (Hunter et al. 2003; Perlman et al. 2008), and CI has been described as the most common endosymbiont-induced reproductive manipulation in arthropods. As Wolbachia are maternally transmitted, some strains distort the sex ratio of their host population to favor the female sex only, creating populations where males are sometimes extremely rare. Three mechanisms, feminization, parthenogenesis, and

212

A. Duplouy and S.L. O’Neill

male-killing (MK), cause imbalanced sex ratio in the host population. Feminizing symbionts such as Cardinium and Wolbachia have been found in numerous arthropod hosts including the isopod Armadillidium vulgare (Cordaux et al. 2004), the butterfly species Ostrinia furnacalis and Eurema hecabe (Narita et al. 2007; Kageyama et al. 2008), and the spider mite Brevipalpus phoenicis (Weeks et al. 2001). During feminization, genetic males reproduce as functional females, which therefore transmit Wolbachia to their progeny (Rigaud 1997; Stouthamer et al. 1999). Feminization is often mistaken for parthenogenesis, as both mechanisms produce female-biased populations. Although feminization requires sexual reproduction, parthenogenesis allows the production of viable progeny without the need for a male partner. Two types of parthenogenesis have been described: arrhenotokous parthenogenesis (or arrhenotoky) occurs when diploid females arise from fertilized eggs and thelytokous parthenogenesis (or thelytoky) where females are produced from unfertilized eggs. In the wasp species Trichogramma spp., thelytoky is induced by Wolbachia (Stouthamer and Kazmer 1994), which restores diploidy by enhancing the fusion of the two nuclei of the first mitotic division (Stouthamer and Kazmer 1994; Huigens et al. 2000). Finally, a wide range of endosymbiont-infected arthropods produce only daughters as male offspring die at an early development stage. Males are usually killed embryonically, but deaths also occur much later, typically in fourth instar larvae (Hurst 1991). This common reproductive manipulation is known as male-killing (MK). MK is caused by at least nine different bacteria from four taxonomic groups: Mollicutes, Flavobacteria, Rickettsiaceae, and Enterobacteriaceae (Hurst et al. 1997, 2003). However, there are still very few studies investigating the underlying cytogenetic and genetic mechanisms of this phenotype.

13.3

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina

Although MK systems are diverse, a review of the association between the MK Wolbachia strain wBol1 and H. bolina provides a general overview of this reproductive phenotype. H. bolina, also known as the common or great egg-fly (Australia), or blue-moon butterfly (New Zealand), was first described by Linnaeus in 1758. This species has a vast subtropical distribution from Sri Lanka to French Polynesia and a latitudinal range from Hong-Kong to Canberra, Australia. Occasional reports describe H. bolina in Japan and New Zealand since the 1970s (Ramsay 1971; Clarke and Sheppard 1975; Morishita and Kazuhiko 2002; Patrick 2004), but it is suspected that these regions do not support endemic populations (Common and Waterhouse 1972). Individuals observed in Japan and New Zealand were probably migratory individuals using favorable meteorological conditions (Ryan and Harris 1990; Christensen 2004) to invade from close neighboring regions such as South East Asia (SEA) or Australia, where stable populations exist (Gibbs 1961; Ramsay and Ordish 1966; Ramsay 1971; Christensen 2004).

13

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina

13.3.1

213

All-Female Broods in the Butterfly H. bolina

A strong female sex distortion has been described in numerous H. bolina populations throughout their wide geographical distribution (Simmonds, 1926, Clarke et al. 1975; Dyson et al. 2002; Charlat et al. 2005). All-female broods were first described in the 1920s (Poulton 1923; Simmonds 1926). This reproductive trait was showed to be exclusive to females and therefore due to a cytoplasmic factor (Clarke et al. 1975). It was reported not to be parthenogenesis as males were dying at early stages of development (Clarke et al. 1975, 1983). Dyson et al. (2002) identified W. pipientis as the causative agent of male rareness in H. bolina, using PCR amplification and sequence analysis of a bacterial surface protein gene (Zhou et al. 1998). This Wolbachia strain termed wBol1 was shown to kill the male progeny of infected female butterflies at an early embryonic stage before caterpillars hatch from the eggs (Dyson et al. 2002, Fig. 13.1). First identified in Fiji, wBol1 was found to be present in most H. bolina populations across the South Pacific (Charlat et al. 2005). One intriguing feature of the wBol1/H. bolina association has been a variation in wBol1 infection prevalence among different host populations. wBol1 infections were absent from

eggs 4 days

wBol1 (2)

Death of the wBol1-infected male embryos

(1)

4 days

(5) (3)

(4)

5 caterpillar instars 25 days

7 days

pupae

wBol1

Fig. 13.1 Life cycle of wBol1-infected Hypolimnas bolina: (1) a wBol1-infected female mates with an uninfected male, (2) all males die during embryogenesis, only female eggs hatch 4 days after being laid, (3) caterpillars develop in 20 days through 5 larval instars, (4) wBol1-infected females emerge from 7-day old pupae, (5) and 4 days after emerging from the pupae, females are reproductively mature

214

A. Duplouy and S.L. O’Neill N

Island not infected by wBol 1 Low and medium infection rate (2)

High infection rate (1) Equator

(3) (5)

(7) (4)

(8)

(6)

(9) (11)

Australia

New Zealand

(10) (12)

1000 km

Fig. 13.2 Wolbachia infection frequencies in 12 H. bolina populations. (1) Philippines, (2) Thailand, (3) Vanuatu, (4) Fiji, (5) New Caledonia: Ile des Pins, (6) Australia: Brisbane, (7) Independent Samoa, (8) American Samoa, French Polynesia: (9) Moorea, (10) Tahiti, (11) Rurutu, and (12) Tubuai. Less than 65% of the females are wBol1-infected in islands with low and medium infection frequencies, and 65–100% of the females are wBol1-infected in islands with high infection frequency

Australian and the Tubuai (French Polynesia, Austral Islands Archipelago) H. bolina populations, while wBol1 infection frequencies of up to 50% in Fijian populations and more than 85% in both the Independent Samoan and Tahitian populations were recorded (Fig. 13.2; Charlat et al. 2005, 2006).

13.3.2

Competition Between Wolbachia Infections

A number of possible reasons have been suggested to explain the heterogeneity in wBol1 infection rates (Fig. 13.2, Table 13.1). In the extreme case of Tubuai (Austral Islands Archipelago, French Polynesia), no butterflies were found to be infected by the male killer strain wBol1, while on the closest neighboring island of Rurutu, only 210 km away, female wBol1 infection rate was more than 75% (Charlat et al. 2005, 2006). It was found that butterflies from Tubuai were infected with another Wolbachia strain, named wBol2. The wBol2 strain is an A-group Wolbachia that is phylogenetically distant from wBol1, a B-group Wolbachia. Crosses between wBol1-infected females and wBol2-infected males were fully incompatible and

13

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina

215

Table 13.1 Percentage of males and females in different populations naturally uninfected (column 2) or infected by the different Wolbachia strains (columns 3–5) MK % wBol2% wBol1-bPopulations % Uninfected % wBol1-arepressor infected male/ infected male/ male/female infected male/ gene female female female Philippines 0/0 100/100 0/0 0/0 Present Thailand 0/0 100/100 0/0 0/0 Present Ile des Pins 100/17 0/83 0/0 0/0 Fiji 100/50 0/50 0/0 0/0 Vanuatu 100/70 0/30 0/0 0/0 Australia 100/100 0/0 0/0 0/0 Ind. Samoa 0/0 100/100 0/0 0/0 Present Am. Samoa 0/0 0/0 0/0 100/100 Moorea 98/17 0/80 0/3 2/0 Tahiti 100/4 0/90 0/6 0/0 Rurutu 98/29 0/69 0/0 2/2 Tubuai 2/2 0/0 0/0 98/98 MK repressor gene presence is shown in column 6 (Charlat et al. 2005, 2006, 2007b; Hornett et al. 2006)

lead to unviable progeny. This phenotype was the result of wBol2-induced CI in H. bolina (Charlat et al. 2006). The competition between wBol2 and wBol1 and the strong CI observed between the two Wolbachia strains make the invasion of Tubuai by the MK strain, wBol1, extremely unlikely. The presence of wBol2 was reported in several other islands of the South Pacific where wBol1 was not shown to occur (Charlat et al. 2006).

13.3.3

When the MK Phenotype Is Repressed, wBol1 Induces CI

At the other extreme, all H. bolina from South East Asian populations were infected by wBol1, including males (Charlat et al. 2005; Hornett et al. 2006). Under the strong selection pressure exerted by the wBol1 infection, butterflies have evolved resistance to the MK phenotype. This mutation led to survival of male offspring and restored a balanced sex ratio (Hornett et al. 2006, Table 13.1). If wBol1 from host populations with the MK repressor gene were shown to retain their ability to induce MK in nonresistant host, then it would suggest either (1) that the repressor gene was the result of an extremely recent mutation in the host or (2) that the MK character was linked to a desirable trait providing an advantage to the repressed wBol1. Otherwise, long-term evolution in a host population that repressed MK may result in the loss of wBol1’s MK virulence – a character no longer able to spread in the population. Hornett and co-workers (2008) conducted crosses between MK resistant H. bolina from SEA and nonresistant populations of French Polynesia (Moorea and Tahiti, Society Islands Archipelago) and tested whether wBol1 from SEA could induce MK. The SEA wBol1 infection was able to distort host

216

A. Duplouy and S.L. O’Neill

reproduction when transferred into a French Polynesian background, indicating that wBol1 from SEA can still induce the MK phenotype in nonresistant hosts. The study also revealed a complete failure in egg hatch when SEA males carrying both the MK infection and MK repressor gene(s) were crossed with uninfected females. Control crosses showed that the females were not sterile, suggesting that in addition to MK wBol1 also induces CI in this population of H. bolina (Hornett et al. 2008).

13.3.4

MK Wolbachia Diversity in H. bolina

More recently, Charlat et al. (2009) shown that the MK phenotype in H. bolina was induced by two substrains, wBol1-a and wBol1-b. Although they are extremely closely related phylogenetically, genetic variations between them have only been found at two loci, wBol1-a and wBol1-b show phenotypic differences that make them interesting candidates for comparative analysis (Charlat et al. 2009). wBol1-a and wBol1-b seem to differ in their sensitivity to the MK repressor from SEA. Preliminary results suggest that wBol1-a MK was repressed when transferred into a SEA background, while wBol1-b showed persistent MK phenotype in this novel host background (Charlat pers. comm. 2007). These results suggest small variations in the MK genetic bases between these two substrains. These two substrains also differ in their transmission level. The wBol1-b infection, which has been found in only French Polynesia and Vanuatu (Charlat et al. 2009, Table 13.1), was associated with mitochondrial haplotypes (mitotypes) 3 and 6. These mitotypes were also found in Wolbachia-free butterflies, suggesting imperfect vertical transmission of wBol1-b. In contrast, the most common strain wBol1-a was present on all the islands where wBol1 was previously described (Charlat et al. 2005, 2006 Table 13.1) and was strictly associated with mitotype 1. The almost complete absence of uninfected butterflies carrying mitotype 1 suggests a very high transmission efficiency of wBol1-a. More recent investigations into wBol1-a genetic variation have found no evidence that wBol1-a prevalence was related to genetic differences between wBol1-a populations (Duplouy et al. 2009). The age of the infection in the South Pacific islands may vary; for example, the wBol1-a invasion of Fiji could be more recent than that of Tahiti, where a larger proportion of females carried the infection.

13.3.5

A Rapidly Evolving System

The association between wBol1 and H. bolina has proved to be highly dynamic. In 2001, Samoan H. bolina populations were shown to have at most a single male per hundred females. Charlat and colleagues (2007a) reported in a 2006 survey equal sex ratios and a second case of MK repression in the South Pacific. It was not known whether the genetic basis of MK resistance was similar in both SEA and Samoan

13

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina

217

populations. Nonetheless, the shift in population sex ratio from 100:1 to 1:1 in less than ten generations seemed to be one of the fastest ever recorded (Charlat et al. 2007a). A more ancient but similar evolution of a MK repressor gene has also been described in butterflies from Malaysian Borneo (Hornett et al. 2009). The spread of wBol1 through SEA and the South Pacific was estimated to have taken less than 3,000 years (Duplouy et al. 2009). However, in some populations, local invasions were suggested to have occurred more rapidly, on the scale of a century. Museum samples from different South Pacific islands were tested for both the infection type and prevalence in previous butterfly generations. In the 120 years from 1883 until 2002, the infection frequencies in the French Polynesian Islands of Ua Huka and Tahiti varied from very low prevalence (0% and less than 20%, respectively) to very high prevalence (more than 80%) (Hornett et al. 2009).

13.4

Open Questions in wBol1 Research

13.4.1 wBol1 Biogeography The biogeography of the wBol1-a infection in the South Pacific was one of the most intriguing aspects of this system. The presence of butterfly populations on numerous South Pacific islands provided clear evidence of natural migrations occurring between islands; however, the range of these exchanges remained an unknown factor. Butterfly populations infected with the CI-inducing strain wBol2 were almost as common as wBol1-infected populations in the South Pacific. In contrast, populations where the two infections coexist have rarely been recorded, and doubly infected butterflies have never been found (Charlat et al. 2005). Models predicted that, in this system, a CI-infected population would resist MK invasion. Under the same conditions, a MK-infected population would only resist invasion by CI-inducing Wolbachia if the latter did not reach a certain frequency threshold (Freeland and McCabe 1997; Engelst€adter et al. 2004). If the limit was exceeded, then the CI-inducing strain became more competitive and therefore, spread into the population driving the former MK infection to extinction (Engelst€adter et al. 2004). Butterfly populations where wBol1 and wBol2 were in competion were rare in the South Pacific islands (Charlat et al. 2005; Engelst€adter et al. 2008). This rarity suggested a low migration rate between islands, allowing MK-infected populations to resist wBol2 invasion.

13.4.2

Effects of MK Infection on Host Fitness

Endosymbiotic infections are generally costly to maintain as the symbionts exploit resources that are destined for their host (Haine 2008). In order to be maintained

218

A. Duplouy and S.L. O’Neill

and spread within host populations, symbionts may develop strategies that enhance the fitness of infected hosts relative to uninfected individuals. Wolbachia strains have developed very intimate relationships with their hosts and stress treatments have shown that some strains are beneficial to their hosts (Hedges et al. 2008; Brownlie et al. 2009). Modeling predicts that MK fixation would lead to population extinction because of a severe shortage of males (Hamilton 1967; Hurst 1991; Randerson et al. 2000); however, wBol1 infection sometimes exceeds 75% of host individuals in a population. The success of infected individuals over uninfected ones suggests that wBol1 infection may confer a fitness advantage to its hosts, but the nature of this benefit has not yet been characterized in H. bolina.

13.4.2.1

Benefits from the Infection in Other Host/MK Wolbachia Associations

Direct benefits from the infection, such as an increased size, fecundity, or longevity, have been recorded in different associations with MK Wolbachia (Ikeda 1970; Majerus and Hurst 1997; Fry et al. 2004). These observations contrast with the wBol1/H. bolina system where no benefit of this type has been shown (Dyson and Hurst 2004; Charlat et al. 2007b). Similarly, although indirect benefits from MK infection have been described in several other MK systems, none has yet been associated with a fitness increase in wBol1-infected butterflies. (Werren 1987) suggested that MK endosymbionts could reduce sibling inbreeding, thereby favoring infected females. This explanation makes sense for species that lay many eggs on the same plant and are not very mobile after hatching. In the case of H. bolina, however, butterflies lay few eggs per plant and are good migrants as individuals have frequently invaded New Zealand from Australia, a journey of 2,000 km (Ramsay 1971; Ryan and Harris 1990; Patrick 2004). Majerus and Hurst (1997) suggested that the success of MK strains in ladybirds (e.g., Adalia bipunctata) was correlated with different host characteristics, including cannibalism at various developmental stages; so infected females gain nutrition from feeding on their dead brothers, and from large clutch sizes, as MK reduces sibling competition for food by diminishing their numbers by half. H. bolina does not exhibit the characteristics of a host in which a MK Wolbachia would be successful. This butterfly is strictly herbivorous during its larval stages and as adult feeds exclusively on nectar. As such, male death would provide no direct nutritional benefit to infected sisters and as females lay only 1–2 eggs per plant, food competition would be limited (Nafus 1993; Kemp 1998).

13.4.2.2

Alternative Hypotheses

wBol1-a may confer a “hidden” selective advantage to infected hosts (Duplouy et al. 2009). Insects are often infected with entomopathogenic agents (fungi, viruses

13

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina

219

or bacteria). Phytophageous insects, such as butterflies, also have to avoid plant defenses, such as toxic compounds, developed by their host plants to fight against natural enemies (Lindroth 1989; Li et al. 2003; Wen et al. 2006). Caterpillars are common prey for parasitoid or predatory wasps such as Cotesia spp. or Polistes spp. (Stamp and Bowers 1988; Nafus 1993; Beckage et al. 1994; van Nouhuys and Hanski 2005). These selective pressures allow the survival of only resistant or adapted individuals (Hochberg 1991; Russell and Moran 2005; Moran 2006; Haine 2008). Wolbachia may confer their host a benefit when exposed to toxins and/or parasites and thereby increase its prevalence within host populations. Recent studies have shown Wolbachia-infected flies delay mortality after virus infection (Hedges et al. 2008; Teixeira et al. 2008). Investigating the effect of wBol1 infection in a metacommunity involving the host, the symbiont, and at least a third party such as a virus or a parasitoid wasp could provide insights into fitness benefit(s) this infection provides the butterfly host with. Fitness benefit(s) that could therefore help explaining the striking success of wBol1-a in H. bolina.

13.4.3

Mechanisms of MK

13.4.3.1

Cytology of MK

Two types of MK have been characterized based on the timing of male death (Hurst 1991). “Early MK” occurs during embryogenesis while “late MK” takes effect during larval or pupal stages. Both early and late MK were observed in Wolbachiainfected insects (Hurst et al. 1999b; Fialho and Stevens 2000; Jiggins et al. 2000; Dyson et al. 2002; Jaenike 2007); however, the underlying mechanisms of either phenomena have not yet been elucidated. Studies on MK Spiroplasma-infected Drosophila have shown that male embryo death was associated with abnormal mitoses, while later death was caused by degeneration of cell nuclei (pycnosis) (Counce and Poulson 1962). In a similar system, modification of the dosage compensation complex (DCC), which is involved in sex differentiation, can also rescue males from MK symbionts. This indicates that the DCC may be involved in expression of the MK phenotype (Veneti et al. 2005). Although MK in Wolbachia-infected insects must also involve host sex determination, similar mechanisms to those in Spiroplasma associations have not yet been identified. One study showed that treatment of wBol1-a-infected butterflies with bacteriostatic antibiotics delayed the MK effect. This demonstrates that wBol1 was able to identify male individuals and induce MK at different time points during host development (Charlat et al. 2007c). However, it is unknown if the basic mechanisms of the MK phenotype are identical at each time point. As suggested, MK could be expressed through different pathways (Hurst and Jiggins 2000), which would complicate the identification of the mechanistic basis of these MK phenotypes.

220

13.4.3.2

A. Duplouy and S.L. O’Neill

Genomics of MK

To date the genomes of one mutualistic and three CI-inducing Wolbachia strains have been sequenced (Wu et al. 2004; Foster et al. 2005; Klasson et al. 2008, 2009) and several others are underway. Wolbachia’s intracellular biology has hampered the completion of whole genome-sequencing projects. The genome sequence of the MK strain wBol1 is nearing completion (Duplouy pers.comm.), and analysis of the first chromosomal DNA sequence of a MK Wolbachia strain will certainly be of great value. Comparative genomic analysis of wBol1 with the closely related and fully sequenced wPip strain, which induces CI in Culex mosquitoes, should provide an unique opportunity to investigate the evolution of Wolbachia genomes across relatively short evolutionary timescales. This first genomic comparison between a MK strain and a CI-inducing strain offers opportunities to test hypotheses concerning the evolution and induction of the MK phenotype, such as identifying candidate genes involved in both MK and CI. Previous whole genome analyses have attempted to link genetic elements, such as ankyrin coding genes, to the induction of different reproductive manipulations (Iturbe-Ormaetxe et al. 2005; Duron et al. 2007; Walker et al. 2007; Klasson et al. 2008). Ankyrin repeat domains are believed to be involved in cellular and molecular functions via protein–protein interactions (Caturegli et al. 2000; Mosavi et al. 2004). Twenty-three, 29, and 60 ankyrin genes have been annotated in the Wolbachia strains wMel, wRi, and wPip, respectively (Wu et al. 2004; Klasson et al. 2009; Walker et al. 2007), while wBm seems to contain only 5 ankyrin coding genes (Foster et al. 2005). wBol1 is phylogenetically close to the wPip strain, and it is therefore expected that the MK strain also contains a large number of ankyrin coding genes. The number and density of ankyrin coding genes in pathogenic strains make them good candidates in the search for genes likely to play a role in the interactions between Wolbachia and its host (Iturbe-Ormaetxe et al 2005; Duron et al. 2007; Walker et al. 2007). Despite intensive efforts, Wolbachia transformation is currently not an available technique. While waiting for an efficient transformation protocol for Wolbachia, genomic comparison of Wolbachia strains may provide extremely valuable data. If the mechanisms of MK are similar across strains, the genetic basis of this phenotype should be conserved between these strains and putative MK genes could potentially be identified. Genome comparisons of phylogenetically related strains such as wBol1 and wPip or wBol1-a and wBol1-b, which induce different phenotypes in their hosts, may identify highly variable genes or genetic features potentially involved in the induction of the observed phenotypic differences. To date, only protein coding genes have been investigated as potential genetic mechanisms underlying Wolbachia-induced phenotypes (Sinkins et al 2005; Walker et al. 2007; Duron et al. 2007). Small RNA molecules (sRNAs) are known in other systems to act through RNA interference (RNAi) to regulate translation of targeted genes (Tjaden et al. 2006 and including references). Similarly, MK Wolbachia could use sRNAs, rather than proteins to distort their hosts’ reproductive system. Comparative projects should therefore not only focus on

13

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina

221

protein coding genes present in Wolbachia genomes, but also on the diversity of sRNA sequences, as they could also play a key role in the distortion of host reproductive systems.

13.4.3.3

Role of the Host in the Expression of MK

We have already described the symbiont-induced effects on different aspects of host biology; however, biological interactions are rarely unidirectional. Hosts can also act to mitigate any negative fitness effects associated with the symbiont. These interactions have been highlighted in different Wolbachia associations; however, the molecular mechanisms that underlie these interactions are not understood. In SEA and Samoa, H. bolina evolved resistance to the MK phenotype of wBol1-a, saving males from embryonic death (Hornett et al. 2006; Charlat et al. 2007a). Although the investigation of the butterfly genetics is in progress and should soon provide answers (Hornett pers. comm. 2009), it is not yet known if the resistance mechanism involves one or several genes, and whether this resistance is identical in both the SEA and Samoan populations (Charlat et al. 2007a). This repression, however, confirms the active involvement of the host in the phenotype induced by its symbiont. More interestingly, butterfly resistance to MK resulted in wBol1-a shifting to inducing CI (Hornett et al. 2008). In general, the reproductive phenotype observed in the natural host has been maintained in transfected hosts (Braig et al. 1994; Riegler et al. 2004; Sakamoto et al. 2005; McMeniman et al. 2008); however, immediate phenotypic shifts after transfection have been reported (Sasaki et al. 2002, 2005; Jaenike 2007). Phylogenetic studies of Wolbachia have demonstrated that very closely related strains express different phenotypes in their native hosts (Baldo et al. 2006), suggesting that shifts in phenotype expression are probably more common than originally thought. It also suggests that MK and CI might share a similar molecular basis that is differently expressed depending on host genotype (Jaenike 2007). Both phenotypes could be mechanistically similar; however, MK has evolved to be more extreme in its outcome than the CI.

13.5

Conclusion

Wolbachia have attracted the attention of a large scientific community, hoping to understand the biology of this bacterium that induces such a wide range of host phenotypes and has great potential as a biological control agent of insect pests and human diseases (Brelsfoard et al. 2009; McMeniman et al. 2009; Moreira et al. 2009). Many discoveries have been made in the last decade, but a multitude of questions still remain to be answered. MK is one of the least known Wolbachia phenotypes. Although we have a relatively good understanding of how MK Wolbachia affect host populations, genetics and dynamics, the cytology and genomics

222

A. Duplouy and S.L. O’Neill

aspects underlying the MK phenotype both remain poorly understood. We may come closer to finding answers with projects such as whole genome comparison of MK strains, but we are still far from having resolved all of Wolbachia’s mysteries. Acknowledgments We would like to thank Dr. I. Iturbe-Ormaetxe, Dr. M. Woolfit and Dr. P. Cook for very constructive comments on the manuscript. We are grateful to the Australian Research Council (DP0772992) and to The University of Queensland (UQCS and UQIRTA) for provision of the funds.

References Baldo L, Hotopp JCD, Jolley KA, Bordenstein SR, Biber SA, Choudhury RR, Hayashi C, Maiden MCJ, Tettelin H, Werren JH (2006) Multilocus sequence typing for Wolbachia. Appl Environ Microbiol 72(11):7098–7110 Bandi C, Damiani G, Magrassi L, Grigolo A, Fani R, Sacchi L (1994) Flavobacteria as intracellular symbionts in cockroaches. Proc Biol Sci 257:43–48 Bandi C, Anderson TJC, Genchi C, Blaxter ML (1998) Phylogeny of Wolbachia in filarial nematodes. Proc Biol Sci 265:2407–2413 Beckage NE, Tan FF, Schleifer KW, Lane RD, Cherubin LL (1994) Characterization and biological effects of Cotesia congregata polydnavirus on host larvae of the tobacco hornworm, Manduca sexta. Arch Insect Biochem Physiol 26:165–195 Bourtzis K, Miller TA (eds) (2003) Insect symbiosis. CRC Press, New York, NY Bourtzis K, Miller TA (eds) (2006) Insect symbiosis, vol 2. CRC Press, New York, NY Braig HR, Guzman H, Tesh RB, O’Neill SL (1994) Replacement of the natural Wolbachia symbiont of Drosophila simulans with a mosquito counterpart. Nature 367:453–455 Brelsfoard CL, StClair W, Dobson SL (2009) Integration of irradiation with cytoplasmic incompatibility to facilitate a lymphatic filariasis vector elimination approach. Parasit Vectors 2:38 Brownlie JC, Cass BN, Riegler M, Witsenburg JJ, Iturbe-Ormaetxe I, McGraw EA, O’Neill CL (2009) Evidence for metabolic provisioning by a common invertebrate endosymbiont, Wolbachia pipientis, during periods of nutritional stress. PLoS Pathog 5:6 Caturegli P, Asanovich KM, Walls JJ, Bakken JS, Madigan JE, Popov VL, Dumler JS (2000) ankA: an Ehrlichia phagocytophila group gene encoding a cytoplasmic protein antigen with ankyrin repeats. Infect Immun 68(9):5277–5283 Charlat S, Hornett EA, Dyson EA, Ho PPY, Thi-Loc N, Schilthuizen M, Davies N, Roderick GK, Hurst GDD (2005) Prevalence and penetrance variation of male-killing Wolbachia across Indo-Pacific populations of the butterfly Hypolimnas bolina. Mol Ecol 14:3525–3530 Charlat S, Engelstadter J, Dyson E, Hornett E, Duplouy A, Tortosa P, Davies N, Roderick G, Wedell N, Hurst G (2006) Competing selfish genetic elements in the butterfly Hypolimnas bolina. Curr Biol 16:2453–2458 Charlat S, Hornett EA, Fullard JH, Davies N, Roderick GK, Wedell N, Hurst GDD (2007a) Extraordinary flux in sex ratio. Science 317:214 Charlat S, Reuter M, Dyson EA, Hornett EA, Duplouy A, Davies N, Roderick GK, Wedell N, Hurst GDD (2007b) Male-killing bacteria trigger a cycle of increasing male fatigue and female promiscuity. Curr Biol 17:273–277 Charlat S, Davies N, Roderick GK, Hurst GDD (2007c) Disrupting the timing of Wolbachiainduced male-killing. Biol Lett 3:154–156 Charlat S, Duplouy A, Hornett EA, Dyson EA, Davies N, Roderick GK, Wedell N, Hurst GDD (2009) The joint evolutionary histories of Wolbachia and mitochondria in Hypolimnas bolina. BMC Evol Biol 9:64

13

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina

223

Chen D-Q, Montllor CB, Purcell AH (2000) Fitness effects of two facultative endosymbiotic bacteria on the pea aphid, Acyrthosiphon pisum, and the blue alfalfa aphid, A. kondoi. Entomol Exp Appl 95:315–323 Christensen B (2004) Tracking of migrant blue moon butterfly, Hypolimnas bolina nerina, using web-based software. Weta 28:47–48 Clarke C, Sheppard PM (1975) The genetics of the mimetic butterfly Hypolimnas bolina (L.). Philos Trans R Soc Lond B Biol Sci 272(917):229–265 Clarke C, Sheppard P, Scali V (1975) All-female broods in the butterfly Hypolimnas bolina (L.). Proc Biol Sci 189:29–37 Clarke SC, Jonhson G, Jonson B (1983) All-female broods in Hypolimnas bolina (L.). A re-survey of West Fiji after 60 years. Biol J Linn Soc 19:221–235 Common IFB, Waterhouse DF (1972) Butterflies of Australia. Angus and Robertson, Sydney Cordaux R, Michel-Salzat A, Frelon-Raimond M, Rigaud T, Bouchon D (2004) Evidence for a new feminizing Wolbachia strain in the isopod Armadillidium vulgare: evolutionary implications. Heredity 93:78–84 Counce SJ, Poulson DF (1962) Developmental effects of the sex-ratio agent in embryos of Drosophila willistoni. J Exp Zool 151:17–31 Covacin C, Barker SC (2007) Supergroup F Wolbachia bacteria parasite lice (Insecta: Phthiraptera). Parasitol Res 100:479–485 Duplouy A, Hurst GDD, O’Neill SL, Charlat S (2009) Rapid spread of male-killing Wolbachia in the butterfly Hypolimnas bolina. J Evol Biol. Doi:10.1111/j.1420-9101.2009.01891.x Duron O, Boureux A, Echaubard P, Berthomieu A, Berticat C, Fort P, Weill M (2007) Variability and expression of ankyrin domain genes in Wolbachia infecting the mosquito Culex pipiens. J Bacteriol 189(12):4442–4448 Dyson EA, Hurst GDD (2004) Persistence of an extreme sex-ratio bias in a natural population. PNAS 101(17):6520–6523 Dyson E, Kamath M, Hurst G (2002) Wolbachia infection associated with all-female broods in Hypolimnas bolina (Lepidoptera: Nymphalidae): evidence for horizontal transmission of a butterfly male killer. Heredity 88:166–171 Engelst€adter J, Telschow A, Hammerstein P (2004) Infection dynamics of different Wolbachiatypes within one host population. J Theor Biol 231:345–355 Engelst€adter J, Telschow A, Yamamura N (2008) Coexistence of cytoplasmic incompatibility and male-killing-inducing endosymbionts, and their impact on host flow. Theor Popul Biol 73:125–133 Fialho RF, Stevens L (2000) Male-killing Wolbachia in a flour beetle. Proc Biol Sci 267:1469–1474 Foster J, Ganatra M, Kamal I, Ware J, Makarova K, Ivanova N, Bhattacharyya A, Kapatral V, Kumar S, Posfai J, Vincze T, Ingram J, Moran L, Lapidus A, Omelchenko M, Kyrpides N, Ghedin E, Wang S, Goltsman E, Joukov V, Ostrovskaya O, Tsukerman K, Mazur M, Comb D, Koonin E, Slatko B (2005) The Wolbachia genome of Brugia malayi: endosymbiont evolution within a human pathogenic nematode. PLoS Biol 3:599–614 Freeland SJ, McCabe BK (1997) Fitness compensation and the evolution of selfish cytoplasmic elements. Heredity 78:391–402 Fry AJ, Palmer MR, Rand DM (2004) Variable fitness effects of Wolbachia infection in Drosophila melanogaster. Heredity 93:379–389 Ghelelovitch S (1952) Sur le determinisme genetique de la sterilite dans les croisements entre differentes souches de Culex autogenicus Roubaud. C R Acad Sci III 234:2386–2388 Gibbs GW (1961) New Zealand butterflies. Tuatara J Biol Soc 9:65–76 Gibson CM, Hunter MS (2009) Inherited fungal and bacterial endosymbiont of a parasitic wasp and its cockroach host. Microb Ecol 57(3):542–549 Haine ER (2008) Symbiont-mediated protection. Proc Biol Sci 275:353–361 Hamilton WD (1967) Extraordinary sex ratios. Science 156(774):477–488

224

A. Duplouy and S.L. O’Neill

Hedges LM, Brownlies JC, O’Neill SL, Johnson KN (2008) Wolbachia and virus protection in insects. Science 322:702 Hertig M, Wolbach SB (1924) Studies on Rickettsia-like microorganisms in insects. J Med Res 44:329–374 Hochberg ME (1991) Viruses as costs to gregarious feeding behaviors in the Lepidoptera. Oikos 61(3):291–296 Hornett EA, Charlat S, Duplouy AMR, Davies N, Roderick GK, Wedell N, Hurst GDD (2006) Evolution of male killer suppression in natural population. PLoS Biol 4(9):e283 Hornett EA, Duplouy AMR, Davies N, Roderick GK, Wedell N, Hurst GDD, Charlat S (2008) You can’t keep a good parasite down: evolution of a male-killer suppressor uncovers cytoplasmic incompatibility. Evolution 62(5):1258–1263 Hornett EA, Charlat S, Wedell N, Jiggins CD, Hurst GDD (2009) Rapidly shifting sex ratio across a species range. Curr Biol 19:1628–1631 Huigens ME, Luck RF, Klaassen RHG, Maas MFPM, Timmermans MJTN, Stouthamer R (2000) Infectious parthenogenesis. Nature 405:178–179 Hunter MS, Perlman SJ, Kelly SE (2003) A bacterial symbiont in the Bacteroidetes induces cytoplasmic incompatibility in the parasitoid wasp Encarsis pergandiella. Proc Biol Sci 270:2185–2190 Hurst L (1991) The incidences and evolution of cytoplasmic male killers. Proc Biol Sci 244:91–99 Hurst GDD, Jiggins FM (2000) Male-killing bacteria in insects: mechanisms, incidence, and implications. Emerg Infect Dis 6(4):329–336 Hurst GDD, Hurst LD, Majerus MEN (1997) Cytoplasmic sex ratio distorters. In: O’Neill SL, Hoffmann AA, Werren JH (eds) Influential passengers, inherited microorganisms and arthropod reproduction. Oxford University Press Inc, New York, pp 125–154 Hurst GDD, van der Schulenburg JHG, Majerus TMO, Bertrand D, Zakharov IA, Baungaard J, Volkl W, Stouthamer R, Majerus MEN (1999a) Invasion of one insect species, Adalia bipunctata, by two different male-killing bacteria. Insect Mol Biol 8(1):133–139 Hurst GDD, Jiggins FM, van der Schulenburg JHG, Bertrand D, West SA, Goriacheva II, Zakharov IA, Werren JH, Stouthamer R, Majerus MEN (1999b) Male-killing Wolbachia in two species of insect. Proc Biol Sci 266(1420):735–740 Hurst GDD, Jiggins FM, Majerus MEN (2003) Inherited microorganisms that selectively kill male hosts: the hidden players of insect evolution? In: Bourtzis K, Miller TA (eds) Insect symbiosis. CRC Press, New York, NY, pp 177–197 Ikeda H (1970) The cytoplasmic-inherited ‘sex-ratio-condition’ in natural and experimental populations of Drosophila bifasciata. Genetics 65:311–333 Iturbe-Ormaetxe I, Riegler M, O’Neill SL (2005) New names for old strains?Wolbachia wSim is actually wRi. Genome Biol 6:401 Jaenike J (2007) Spontaneous emergence of a new Wolbachia phenotype. Evolution 61 (9):2244–2252 Jiggins FM, Hurst GDD, Jiggins CD, von der Schulenburg JHG, Majerus MEN (2000) The butterfly Danaus chrysippus is infected by a male-killing Spiroplasma bacterium. Parasitology 120:439–446 Kageyama D, Narita S, Noda H (2008) Transfection of feminizing Wolbachia endosymbionts of the butterfly, Eurema hecabe, into the cell culture and various immature stages of the silkmoth, Bombyx mori. Microb Ecol 56(4):733–741 Kemp DJ (1998) Oviposition behaviour of post-diapause Hypolimnas bolina (L.) (Lepidoptera: Nymphalidae) in tropical Australia. Aust J Zool 46:451–459 Klasson L, Walker T, Sebaihia M, Sanders MJ, Quail MA, Lord A, Sanders S, Earl J, O’Neill SL, Thomson N, Sinkins SP, Parkhill J (2008) Genome evolution of Wolbachia strain wPip from the Culex pipiens group. Mol Biol Evol 25(9):1877–1887 Klasson L, Westberga J, Sapountzis P, Naslund K, Lutnaes Y, Darby AC, Veneti Z, Chend L, Braig HR, Garrett R, Bourtzis K, Andersson SGE (2009) The mosaic genome structure of the Wolbachia wRi strain infecting Drosophila simulans. PNAS 106(14):5725–5730

13

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina

225

Laven H (1959) Speciation by cytoplasmic isolation in the Culex pipiens complex. Cold Spring Harb Symp Quant Biol 24:166–175 Li W, Schuler MA, Berenbaum MR (2003) Diversification of furanocoumarin-metabolizing cytochrome P450 monooxygenases in two papilionids: specificity and substrate encounter rate. PNAS 100(Suppl 2):14593–14598 Lindroth RL (1989) Host plant alteration of detoxication activity in Papilio glaucus glaucus. Entomol Exp Appl 50:29–35 Lo N, Evans TA (2007) Phylogenetic diversity of the intracellular symbiont Wolbachia in termites. Mol Phylogenet Evol 44:461–466 Lo N, Paraskevopoulos C, Bourtzis K, O’Neill SL, Werren JH, Bordenstein SR, Bandi C (2007) Taxonomic status of the intracellular bacterium Wolbachia pipientis. Int J Syst Evol Microbiol 57:654–657 Majerus MEN, Hurst GDD (1997) Ladybirds as a model for the study of male-killing symbionts. Entomophaga 42(1/2):13–20 McMeniman CJ, Lane AM, Fong AW, Voronin DA, Iturbe-Ormaetxe I, Yamada R, McGraw EA, O’Neill SL (2008) Host adaptation of a Wolbachia strain after long-term serial passage in mosquito cell lines. Appl Environ Microbiol 74(22):6963–6969 McMeniman CJ, Lane RV, Cass BN, Fong AWC, Sidhu M, Wang Y-F, O’Neill SL (2009) Stable introduction of a life-shortening Wolbachia infection into the mosquito Aedes aegypti. Science 323:141–144 Moran NA (2006) Symbiosis. Curr Biol 16(20):866–871 Moran NA, Munson MA, Baumann P, Ishikawa H (1993) A molecular clock in endosymbiotic bacteria is calibrated using the insect hosts. Proc Biol Sci 253:167–171 Moran NA, Baumann P, von Dohlen C (1994) Use of DNA sequences to reconstruct the history of the association between members of the Sternorrhyncha (Homoptera) and their bacterial endosymbionts. Eur J Entomol 91:79–83 Moran NA, Dunbar HE, Wilcox JL (2005) Regulation of transcription in a reduced bacterial genome: nutrient-provisioning genes of the obligate symbiont Buchnera aphidicola. J Bacteriol 187(12):4229–4237 Moreira LA, Iturbe-ormaetxe I, Jeffery JAL, Lu G, Pyke AT, Hedges LM, Rocha BC, HallMendelin S, Day A, Riegler M, Hugo LE, Johnson KN, Kay BH, McGraw EA, van der Hurk AF, Ryan PA, O’Neill SL (2009) A Wolbachia symbiont in Aedes aegypti limits infection with dengue, chikungunya and Plasmodium. Cell 139(7):1268–1278 Morishita and Kazuhiko (2002) A migrant from an oceanic island – Hypolimnas bolina, 6 days stay near Zushi Beach, Kanagawa, Japan. Butterflies 32:24–26 Mosavi LK, Cammett TJ, Desrosiers DC, Peng Z-Y (2004) The ankyrin repeat as molecular architecture for protein recognition. Protein Sci 13:1435–1448 Nafus DM (1993) Movement of introduced biological control agents onto nontarget butterflies, Hypolimnas spp. (Lepidoptera: Nymphalidae). Environ Entomol 22(2):265–272 Narita S, Kageyama D, Nomura M, Fukatsu T (2007) Unexpected mechanism of symbiontinduced reversal of insect sex: feminizing Wolbachia continuously acts on the butterfly Eurema hecabe during larval development. Appl Environ Microbiol 73(13):4332–4341 Noda H, Kodama K (1996) Phylogenetic position of yeast-like endosymbionts of Anobiid beetles. Appl Environ Microbiol 62(1):162–167 O’Neill SL, Hoffmann AA, Werren JH (1997) Influencial passengers,inherited microorganisms and arthropod reproduction. Oxford University Press Inc., New York Oliver KM, Campos J, Moran NA, Hunter MS (2007) Population dynamics of defensive symbionts in aphids. Proc Biol Sci 275:293–299 Patrick BH (2004) Invasion of the blue moon butterfly in Taranaki. Weta 28:45–46 Perlman SJ, Kelly SE, Hunter MS (2008) Population biology of cytoplasmic incompatibility: maintenance and spread of Cardinium symbionts in a parasitic wasp. Genetics 178:1003–1011 Poulton EB (1923) All female families of Hypolimnas bolina, bred in Fiji by HW Simmonds. Proc R Ent Soc Lond 1923:9–12

226

A. Duplouy and S.L. O’Neill

Ramsay GW (1971) The blue moon butterfly Hypolimnas bolina nerina in New Zealand during autumn, 1971. N Z Entomol 5:73–75 Ramsay GW, Ordish RG (1966) The Australian blue moon butterfly Hypolimnas bolina nerina (F.) in New Zealand. NZ J Sci 9:719–729 Randerson JP, Smith NGC, Hurst LD (2000) The evolutionary dynamics of male-killers and their hosts. Heredity 84:152–160 Riegler M, Charlat S, Stauffer C, Mercot H (2004) Wolbachia transfer from Rhagoletis cerasi to Drosophila simulans: investigating the outcomes of host-symbiont coevolution. Appl Environ Microbiol 70(1):273–279 Rigaud T (1997) Inherited microorganisms and sex determination of arthropod hosts. In: O’Neill SL, Hoffmann AA, Werren JH (eds) Influential passengers, inherited microorganisms and arthropod reproduction. Oxford University Press Inc, New York, pp 81–101 Ruan Y-M, Xu J, Liu S-S (2006) Effects of antibiotics on fitness of the B biotype and a non-B biotype of the whitefly Bemisia tabaci. Entomol Exp Appl 121:159–166 Russel JA, Moran NA (2005) Horizontal transfer of bacterial symbiont: heritability and fitness in a novel aphid host. Appl Environ Microbiol 71(12):7987–7994 Ryan PA, Harris AC (1990) A note of recent records of Australian butterflies in New Zealand. N Z Entomol 13:40–41 Sakamoto H, Ishikawa Y, Sasaki T, Kikuyama S, Tatsuki S, Hoshizaki S (2005) Transinfection reveals the crucial importance of Wolbachia genotypes in determining the type of reproductive alteration in the host. Genet Res 85:205–210 Sasaki T, Kubo T, Ishikawa H (2002) Interspecific transfer of Wolbachia between two lepidopteran insects expressing cytoplasmic incompatibility: a Wolbachia variant naturally infecting Cadra cautella causes male-killing in Ephesia kuehniella. Genetics 162:1313–1319 Sasaki T, Massaki N, Kubo T (2005) Wolbachia variant that induces two distinct reproductive phenotypes in different hosts. Heredity 95:389–393 Simmonds HW (1926) Sex ratio of Hypolimnas bolina in Viti Levu, Fiji. Proc R Ent Soc Lond 1:29–32 Sinkins SP, Walker T, Lynd AR, Steven AR, Makepeace BL, Godfray HC, Parkhill J (2005) Wolbachia variability and host effects on crossing type in Culex mosquitoes. Nature 14:257–260 Stamp NE, Bowers MD (1988) Direct and indirect effects of predatory wasps (Polistes sp.: Vespidae) on gregarious caterpillars (Hemileuca lucina: Saturniidae). Oecologia 75:619–624 Stouthamer R, Kazmer D (1994) Cytogenetics of microbe-associated parthenogenesis and its consequences for gene flow in Trichogramma wasps. Heredity 73:317–327 Stouthamer R, Breeuwer JAJ, Hurst GDD (1999) Wolbachia pipientis: microbial manipulator of arthropod reproduction. Annu Rev Microbiol 53:71–102 Taylor MJ, Hoerauf A (1999) Wolbachia bacteria of filarial nematodes. Parasitol Today 15 (11):437–442 Teixeira L, Ferreira A, Ashburner M (2008) The bacterial symbiont Wolbachia induces resistance to RNA viral infections in Drosophila melanogaster. PLoS Biol 6(12):2753–2763 Tjaden B, Goodwin SS, Opdyke JA, Guillier M, Fu DX, Gottesman S, Storz G (2006) Target prediction for small, noncoding RNAs in bacteria. Nucleic Acids Res 34(9):2791–2802 Tram U, Sullivan W (2002) Role of delayed nuclear envelope breakdown and mitosis in Wolbachia-induced cytoplasmic incompatibility. Science 296:1124–1126 van Nouhuys S, Hanski I (2005) Metacommunities of butterflies, their host plant, and their parasitoids. In: Holyoak M, Leibold MA, Holt RD (eds) Metacommunities spatial dynamics and ecological communities. University of Chicago Press, USA Vandekerckhove TTM, Watteyne S, Willems A, Swings JG, Mertens J, Gillis M (1999) Phylogenetic analysis of the 16 S rDNA of the cytoplasmic bacterium Wolbachia from the novel host Folsomia candida (Hexpoda, Collembola) and its implications for Wolbachia taxonomy. FEMS Microbiol Lett 180:179–286

13

Male-Killing Wolbachia in the Butterfly Hypolimnas bolina

227

Veneti Z, Bentley JK, Koana T, Braig HR, Hurst GDD (2005) A functional dosage compensation complex required for male-killing in Drosophila. Science 307:1461–1463 Walker T, Klasson L, Sebaihia M, Sanders MJ, Thomson NR, Parkhill J, Sinkins SP (2007) Ankyrin repeat domain-encoding genes in the wPip strain of Wolbachia from the Culex pipiens group. BMC Biol 5(39):1–9 Weeks AR, Marec F, Breeuwer JAJ (2001) A mite species that consists entirely of haploid females. Science 292:2479–2482 Wen Z, Rupasinghe S, Niu G, Berenbaum MR, Schuler MA (2006) CYP6B1 and CYP6B3 of the Black Swallowtail (Papilio polyxenes): adaptative evolution through subfunctionalization. Mol Biol Evol 23(12):2434–2443 Werren JH (1987) The coevolution of autosomal and cytoplasmic sex ratio factors. J Theor Biol 124:317–334 Werren JH, O’Neill SL (1997) The evolution of heritable symbionts. In: O’Neill SL, Hoffmann AA, Werren JH (eds) Influential passengers, inherited microorganisms and arthropods reproduction. New York, Oxford University Press Inc., pp 1–41 Werren JH, Windsor D, Guo L (1995) Distribution of Wolbachia among neotropical arthropods. Proc Biol Sci 262:197–204 Wu M, Sun LV, Vamathevan J, Riegler M, Deboy R, Brownlie JC, McGraw EA, Martin W, Esser C, Ahmadinejad N, Wiegand C, Madupu R, Beanan MJ, Brinkac LM, Daugherty SC, Durkin AS, Kolonay JF, Nelson WC, Mohamoud Y, Lee P, Berry K, Young MB, Utterback T, Weidman J, Nierman WC, Paulsen IT, Nelson KE, Herve Tettelin, O’Neill SL, Eisen JA (2004) Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: a streamlined genome overrun by mobile genetic elements. PLoS Biol 2:327–341 Yen JH, Barr AR (1971) New hypothesis of the cause of cytoplasmic incompatibility in Culex pipiens L. Nature 232:657–658 Zhou W, Rousset F, O’Neill SL (1998) Phylogeny and PCR-based classification of Wolbachia strains using wsp gene sequences. Proc Biol Sci 265(1395):509–515

Chapter 14

Evolution of Immunosuppressive Organelles from DNA Viruses in Insects Brian A. Federici and Yves Bigot

Abstract Endoparasitic wasps inject particles into their lepidopteran hosts that enable these parasitoids to evade or directly suppress the hosts’ innate immune response, especially encapsulation by hemocytes. For decades, these particles have been considered virions produced by DNA viruses known as polydnaviruses (family Polydnaviridae). Structurally, there are two main types of particles, those resembling, respectively, virions of baculoviruses or ascoviruses. These particles contain double-stranded DNA in the form of multiple small circular molecules that are transcribed but not replicated in cells of the lepidopteran hosts. Instead particle DNA is replicated from the wasp genome and selectively amplified for packaging into the particles in the reproductive tract of female wasps. Once assembled and secreted into calyx lumen, the particles become mixed with eggs and injected into caterpillars during wasp oviposition. Particle DNA, referred to as the “viral genome,” has now been sequenced for several polydnaviruses. Annotation shows that most of this DNA consists of noncoding DNA or wasp genes, not viral genes. More significantly, recent studies have shown that particle structural proteins are coded by the wasp genome, not by particle DNA, but are of viral origin. Together, these findings provide strong evidence that these particles originated from viruses, but through symbiogenesis followed by gene deletion and acquisition evolved into transducing organelles that shuttle wasp immunosuppressive genes into their hosts, thereby enhancing wasp progeny survival and species radiation.

B.A. Federici Department of Entomology, University of California, Riverside 900 University Avenue, Riverside, California 92521, USA Laboratoire d’Etude des Parasites Ge´ne´tiquesParc Grandmont, Universite´ de Tours, U.F.R. des Sciences et Techniques, 37200, Tours, France e-mail: [email protected] Y. Bigot Laboratoire d’Etude des Parasites Ge´ne´tiquesParc Grandmont, Universite´ de Tours, U.F.R. des Sciences et Techniques, 37200 Tours, France

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_14, # Springer-Verlag Berlin Heidelberg 2010

229

230

14.1 14.1.1

B.A. Federici and Y. Bigot

Introduction Background

George Salt at the University of Cambridge published a series of pioneering studies during the 1960s aimed at understanding how endoparasitic wasps circumvented the innate immune response of their caterpillar hosts. Based on studies of the ichneumonid parasitoid, Venturia (then Nemeritis) canescens and its lepidopteran host, larvae of the Mediterranean flower moth, Ephestia kuehniella, he determined that parasitoid eggs gained protection as they passed through the calyx (egg storage region) of the female wasp’s reproductive tract (Salt 1965, 1966, 1968). This protection was due to a coating added to the eggs in the calyx. Subsequently, Susan Rotheram, one of Salt’s graduate students, determined that this coating contained masses of enveloped virus-like particles about 130 nm in diameter. After assembly in calyx cell nuclei, these were secreted into the calyx lumen where they adhered to fibrillar matrix on the egg surface (Rotheram 1967). In later studies, Rotheram 1973a, b showed that the particles contained protein and complex sugars, but no DNA. Then another of Salt’s graduate students found that a major particle glycoprotein was responsible for the immunoprotection (Bedwin 1979a, b). Following on these studies, Otto Schmidt and his collaborators in Germany showed that this protein was encoded in the wasp genome, but likely originated from basal lamina proteins found in the caterpillar host (Schmidt and Schuchmann-Feddersen 1989; Schmidt and Theopold 1991; Schmidt et al. 2001). After Salt and Rotheram’s studies, Vinson and colleagues as well as others found that particles in the calyx fluid of the endoparasitic ichneumonids Campoletis sonorensis and Cardiochiles nigriceps also suppressed the immune response of their caterpillar hosts (Vinson 1972; Vinson and Scott 1975; Vinson 1990). These particles were also produced in the nuclei of calyx cells, but though morphologically similar to V. canescens particles, they contained DNA. These findings stimulated numerous investigations of the calyx gland and secretions of many endoparasitic wasps of the families Ichneumonidae and Braconidae, revealing two major particle types, one in ichneumonids and another in braconids (see Stoltz and Vinson 1979, and Vinson 1990; Webb et al. 2005). When first discovered, the ichneumonid particles were not typical of virions of any known type of insect virus (Fig. 14.1). They were bound by two unit membranes, were oblong to globular in shape, and ranged from 130 to 150 nm in diameter by 300–400 nm in length, with a fusiform nucleocapsid (Webb et al. 2005). Later, viruses of a new family, the ascoviruses (family Ascoviridae) were discovered that attacked caterpillars, replicating and produces progeny virions in various host tissues. The virions produced by ascoviruses are structurally similar to the ichneumonid particles and are transmitted by parasitic wasps (Federici 1983; Federici et al. 2005). In contrast to the ichneumonid particles, those produced by braconid wasps resembled nudivirus virions and similar virions of the occluded form of baculoviruses (Burand 1998; Wang and Jehle 2009). They consisted primarily of one or more cylindrical

14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects

231

Fig. 14.1 Transmission electron micrographs of immunosuppressive particles produced by endoparasitic braconid and ichneumonid wasps. (a) Bracovirus particles. (b) Ichneumonid particles. The bracovirus particles resemble nudivirus and baculovirus virions, and molecular evidence now indicates that these particles have their origin in an ancestral nudivirus. The ichneumonid particles resemble ascovirus virions, but their origin remains uncertain at present. Bars ¼ 200 nm. Original micrographs by D.B. Stoltz

particles surrounded by a single envelope (Fig. 14.1b). The cylindrical inner particle varied in length from 30 to 100 nm, even within the same wasp species. Similar particles have been identified in more than 50 wasp species. In these, unlike the genomes of most viruses of insects, the DNA does not occur as a single circular molecule, but as numerous circular molecules. These vary in size from few to many kbp and are referred to as segmented, polydispersed, or multipartite DNA (Stoltz 1993; Webb et al. 2005). Most evidence indicates these particles do not have a genome per se, but rather their DNA is part of the wasp genome (Espagne et al. 2004; Webb et al. 2006; Desjardins et al. 2008). Moreover, as far as is known, though genes contained in the particles are expressed in nuclei of the parasitoid’s caterpillar host cells, no particle DNA replication occurs in these, nor do the particles produce any progeny. From the standpoint of a viral life cycle, they are a dead end.

14.1.2

Establishment of the Family Polydnaviridae

Based on the unusual physical and biological properties of these particles and their obligate symbiotic relationship with wasps (Edson et al. 1981), a new virus family, Polydnaviridae (“Poly” referring to the polydispersed DNA), was established to accommodate these newly discovered viruses (Stoltz et al. 1984). Establishment of this family formalized the recognition of two genera, the genus Ichnovirus (ichnoviruses) for particles produced by ichneumonid wasps, and genus Bracovirus (bracoviruses) for particles produced by braconids (Webb et al. 2005). At the time these genera where erected, the particles were considered to be infective viruses capable of replication (at least for these viruses in calyx cells), much like that which occurs in other types of viruses. Although molecular data were not

232

B.A. Federici and Y. Bigot

sufficient at that time to undertake meaningful comparisons of these viruses, available information as well as the significant structural differences between the particles of these two virus types suggested that the association of each with its corresponding wasp family arose independently. Thus, their similar functional roles in parasite biology and success were and are considered a result of convergent evolution.

14.1.3

Particle Function: General Mechanisms of Viral Immunosuppression

Detailed studies of several polydnavirus/parasitoid systems have shown that the virus-like particles produced by these wasps in major braconid and ichneumonid lineages (Whitfield 2002a, b) are required for suppression of the wasps’ hosts’ immune system in all species studied to date (Stoltz 1993, Vinson 1990; Webb et al. 2005, 2006). Suppression, depending on the specific system, occurs either by molecular mimicry, where the surface of the egg and early instars are coated with particles not recognized as foreign, by hemocyte inactivation through expression of particle genes after oviposition, or by both mechanisms. Many of the genes encoded by these wasp particles also inhibit components of innate immune pathways, including the Toll and Imd pathways. Detailed knowledge of how the particle genes of individual wasp species elude or incapacitate innate immune responses varies considerably from one wasp species to another, and thus our understanding of these processes is still in the early stages of development. Our purpose in this chapter, therefore, is not to discuss specific particle functions, but rather to summarize the key data that support the concept that these particles, though they originated as virions, are a novel type of organelle that originated by lateral gene transfer/ symbiogenesis. Those interested in detailed discussions of particle functions as well as their similarities and differences are referred to the excellent articles by Webb et al. (2006) and Tanaka et al. (2007).

14.2

Polydnavirus Particles as Organelles Rather Than Virions – the Concept

The structural similarity of braconid particles to baculovirus virions, and ichneumonid particles to ascovirus virions, made these viruses obvious choices as the evolutionary sources of these two types of immunosuppressive particles (Federici 1991; Federici and Bigot 2003). At the time braconid particles were discovered, the baculoviruses consisted of two main types, referred to as “occluded,” meaning that the virions were occluded in a protein matrix, and “nonoccluded,” meaning that they were not. Subsequently, the nonoccluded baculoviruses were reclassified into

14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects

233

a new type known as the nudiviruses. The nudivirus group consists of a small and very diverse group of nonoccluded viruses from insects and crustaceans that share 33 core genes with baculoviruses (out of more than 100), but differ in host range and pathology (Wang and Jehle 2009). Of significant evolutionary importance is that one of these nudiviruses, HzNV-2, replicates in the reproductive tract of the lepidopteran Heliothis zea, a host used commonly by many braconid and ichnomonid wasps. Of particular significance is the recent finding that an ancestral nudivirus is the likely source of the structural proteins encoded by braconid wasps that compose their immunosuppressive particles (Be´zier et al. 2009). While current evidence for the origin of the ichneumonid immunosuppressive particles is not nearly as strong as that for the braconids, recent molecular analyses suggest these originated from ascovirus virions or a related ancestor virus (Bigot et al. 2008). Data supporting these origins are discussed in more detail later below. Although the braconid and ichneumonid particles clearly resemble nudivirus and ascovirus virions, even early studies of these indicated they lacked important properties characteristic of all viruses. For example, once within a lepidopteran host cell, there was no replication of DNA. Moreover, in no case was there any production of progeny virions to disseminate the virus and infect the next host or cell. Other evidence indicating that the particles were not virions of a virus were that the so-called infection of host cells and particle production in the wasp tissues was strictly under control of the wasp. In all viruses, while they interact in various ways with host cells, it is the virus that controls the synthesis of virus proteins and replication of DNA, not the host cell, strictly speaking. Yet in the case of the braconid and ichneumonid particles, they were only produced in female wasps, and only in a narrow region of the reproductive tract, and only in pupal and adult tissues as eggs were being produced (Webb et al. 2006). Adding to these problems in classifying the particles as those of a virus was the occurrence of similar immunosuppressive particles that contained no DNA, such as those produced by the ichneumonid, V. cansecens, discussed above (Rotheram 1967) and more recently in other parasitic wasps (Barratt et al. 1999). Given that even before the DNA in particles was sequenced there was substantial evidence that they were not virions, the question became what are they? The most obvious correlates were something like mitochondria and plastids, organelles that originated from bacteria through the fusion of genomes, i.e., symbiogenesis followed by gene loss and acquisition (Margulis and Fester 1991; Margulis 1992; Khakhina 1992). The evidence is now indisputable that mitochondria and chloroplasts, for example, originated from bacteria that became endosymbionts and subsequently evolved into organelles. By analogy, the same evolutionary processes occurred, although much more recently, with endoparasitic braconid and ichneumonid wasps and at least two different types of viruses, an ancestral nudivirus in the case of the braconids, and for the ichneumonids, probably an ancestral ascovirus or iridovirus (the latter being the ancestor of the ascoviruses). Whereas the molecular evidence is still weak for the origin of the ichneumonid particles from ascoviruses, the evidence that bracoviruses originated from an ancestral nudivirus is now very strong (Be´zier et al. 2009).

234

B.A. Federici and Y. Bigot

At present, polydnavirus researchers continue to refer to the braconid and ichneumonid particles as, respectively, bracovirus or ichnovirus virions, despite overwhelming evidence from their own studies to the contrary (Webb et al. 2006; Tanaka et al. 2007; Be´zier et al. 2009). Alternatively, based on the molecular data regarding their evolution, current genetic complements, and functions, we argue that these interesting immunosuppressive particles should be recognized for what they are – organelles that evolved from viruses. Continuing to view these organelles as viruses masks a much more interesting biological and evolutionary phenomenon than viewing them as “symbiotic viruses.” It also contravenes the definition of such fundamental concepts as a virus, a genome, and symbiosis. If these particles are viruses, we have a tripartite – a virus, a wasp, and its lepidopteran host (Webb et al. 2006). Viewing the particles as organelles makes it a bipartite system, a wasp with a novel organelle encoded in the genome and a lepidopteran host (Federici and Bigot 2003). We think that this new paradigm better explains their biological properties and diversity and leads to better hypotheses for testing how they evolved and facilitated the evolution of wasps and their insect hosts. Below we elaborate on some of the key evidence for the likely evolutionary pathways that led to these novel organelles. We move from the braconid system, for which the most molecular data are available, to the ichneumonid system. We finish with a description of several other types of endoparasitic wasp/insect host systems which putatively represent various phases of the symbiotic evolutionary process that range from (1) tripartite systems consisting of a wasp, true virus, and insect host, to (2) bipartite systems consisting of a wasp with an organelle that has a DNA complement, and an insect host, to (3) bipartite systems with wasp with organelle lacking a DNA complement, and an insect host.

14.3 14.3.1

The Evolution of Braconid Particles from Nudiviruses Early Studies of Nudiviruses in Braconid Wasps and Their Hosts

Several viruses that have the structural features of nudiviruses have been known for many years. For example, the nudivirus of the braconid, Microplitis croceipes, is transmitted vertically, replicates in hemocytes and other tissues, and causes significant pathology and mortality in adult wasps (Hamm et al. 1988). A more interesting nudivirus is the so-called filamentous virus (FV) of the braconid, Cotesia marginiventris. CmFV is apparently a benign virus that is transmitted vertically by C. marginiventris and replicates in cells of both the wasp’s lateral and common oviduct, the latter near the calyx, and in cells of its lepidopteran hosts including Helicoverpa zea and Spodoptera frugiperda (Hamm et al. 1990). Structurally, the virions of these wasp-transmitted viruses resemble the nudiviruses, Hz-I, and the Gonad-Specific Virus, that occur, respectively, in cells lines derived from H. zea

14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects

235

and in the gonadal tissues of this species (Burand 1998). The Microplitis and CmFV nudiviruses viruses are apparently maintained in host populations by vertical transmission. An even more interesting nudivirus is Hz-NV1, a large virus with a genome of 228 kbp (Wang and Jehle 2009). This virus has been shown to integrate into the chromosomes of Trichoplusia ni (TN 368) and S. frugiperda (SF21AE and SF9) cells, in which it can establish a latent infection (Lin et al. 1999). This is particularly relevant to symbiogenesis because it demonstrates that a large ds DNA circular genome can integrate into the chromosomes of their insect hosts. This provides a possible mechanism for the evolutionary entry of full or partial nudivirus genomes into wasp genomic DNA. The above examples are very limited but they do at least provide examples of the types of viral/host systems that could lead over evolutionary time to the integration of nudivirus or baculovirus genomes into those of their wasp hosts. Fortunately, owing to the studies by Espagne et al. (2004), and more recently Be´zier et al. (2009), we now have very strong evidence that such an integration actually occurred, and given the estimates of Whitfield (2002a), a little less than 100 mya.

14.3.2

Molecular Evidence for the Evolution of Braconid Particles from a Nudivirus

One of the predictions of a viral paradigm is that the DNA in the virions would encode virion structural proteins and enzymes needed for the various replication and assembly processes. An organelle paradigm, on the other hand, would predict a significant reduction in genome size and that many, if not most of the original genes, would be transferred to the nuclear genome or lost during evolution. Thus, before any braconid or ichneumonid particles genomes, the so-called “viral genomes” were sequenced, we predicted that most of the DNA in the particle would consist of wasp genes, that is, DNA originating from wasp chromosomes (Federici 1991; Federici and Bigot 2003). The first significant confirmation of the organelle paradigm came from the sequencing DNA in the particles produced by the braconid wasp, Cotesia congregata (Espagne et al. 2004). In this important study, it was shown that fewer than 2% of the genes were related to those of any known virus. Most of the genes encoded proteins with physiological functions, such as protein tyrosine phosphatases, ankyrins, cysteine-rich proteins, and cystatins. Some of the genes were related to the genes found in the particles produced by other braconid species, but nevertheless, none of these was related to any known virion structural protein. Similar findings have now been reported for the “genomes” of particles produced by other braconids, including those of Glyptapanteles indiensis and G. flavicoxis (Desjardins et al. 2008). The DNA in all the particles sequenced to date consists mostly of noncoding DNA of wasp origin, and DNA that codes for wasp proteins. Some of these genes may well have originated

236

B.A. Federici and Y. Bigot

from viruses or bacteria, but they likely have been part of wasp genomes for millions of years, and therefore are now in essence wasp genes. Even though the structural characteristics of the particles made it probable they originated from a baculovirus or nudivirus, these results made it clear that the “genomic” DNA, unlike in the case of any other known virus, could not be used to find the viral origin from which the particles evolved. Nor would these “genomes” be very useful for polydnavirus systematics, because if the particle DNA is wasp DNA, the sequences would likely reflect the relationships of the wasps. In fact, evidence for this was already apparent years ago for braconid particle DNAs for several Cotesia species (Whitfield 2002b). As it had been known for many years that the braconid particles were produced in calyx cells, a way to get at more meaningful data regarding the origin of the braconid particles was to clone and sequence the transcripts from reproductive tissues at the time of particle production. Thus, in another important and insightful paper, Be´zier et al. (2009) sequenced 5,000 expressed sequence tags from the ovaries of two braconid wasps, Chelonus inanitus and C. congregata, and one ichneumonid, Hyposoter didymator. The sequences from the ichneumonid wasp did not show any relationship to known viral proteins, but analysis of the braconid sequences proved very profitable. They identified 22 sequences related to nudiviruses, and 13 of these were core genes shared with baculoviruses. The genes identified correlated with nudivirus and baculovirus virion structural proteins, proteins involved in virion assembly, and subunits of viral RNA polymerases. No polymerases involved in DNA replication were detected, indicating wasp polymerases were likely responsible for synthesis of braconid particle “genomes.” Aside from providing excellent data regarding the original of crucial particle components and proteins needed for particle assembly, these data show clearly that these proteins are all encoded in the wasp genome and are under strict regulation by the wasp genome, again a property not characteristic of any known virus.

14.4

Origin and Evolution of Ichneumonid Particles

As noted above for braconid particles, the DNA in ichnemonid particles consists primarily of noncoding ichneumonid wasp DNA and genes coding for ichneumonid proteins involved in immunosupression. Therefore, this DNA, while of some value for suggesting the possible viral origins of these particles, as discussed below, we do not currently have the type of information from these wasps corresponding to the data described above for the braconid particles. The structure of the ichneumonid particles suggests they originated from ascoviruses, and fortunately we do have reasonably good molecular data for the evolution of ascoviruses from iridoviruses (Stasiak et al. 2003). So we first review here pertinent key features of iridioviruses and ascoviruses, and then review the limited molecular evidence suggesting the ichnoviruses evolved from an ascovirus or iridovirus ancestor of these.

14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects

14.4.1

237

Family Iridoviridae

The family Iridoviridae is comprised of a diverse group of enveloped, doublestranded (ds) DNA viruses which produce large icosahedral virions that typically range 125–160 nm in diameter (Fig. 14.2). These viruses are commonly found in invertebrates, particularly insects, but also occur among vertebrates (Chinchar et al. 2005). Iridoviruses have a broad tissue tropism in insects, and infect and replicate in most tissues, with the unusual exception of the midgut epithelium, a tissue that most insect viruses attack readily. Corresponding with their tissue tropism, iridoviruses are poorly infectious per os (Federici 1993). Once within a cell, iridovirus DNA replication, formation of the virogenic stroma, and virion assembly all take place in the cytoplasm. Iridoviruses have been reported from diverse lepidopteran hosts, including the rice stem borer, Chilo suppressalis (Pyralidae), the American armyworm, Heliothis armigera (Noctuidae), and the fall armyworm, S. frugiperda (Noctuidae). Relevant to the possibility that an ancestral iridovirus or ascovirus is the source of the ichneumonid particles, the ichneumonid, Eiphosoma vitticolle, which parasitizes larvae of the fall armyworm, S. frugiperda, is also infected by an iridovirus, and transmits this virus to fall armyworm populations in the field (Lopez et al. 2002).

Fig. 14.2 Electron micrographs of iridovirus and ascovirus virions. Iridovirus virions observed in negatively stained preparations (a) and by transmission electron microscopy (b), respectively. Ascovirus virions as observed in negatively stained preparations (c) and by transmission electron microscopy (d), respectively. Despite the marked difference in virion structure, molecular evidence indicates these two types of viruses are closely related, and that the ascoviruses evolved from iridoviruses. Bar ¼ 100 nm

238

14.4.2

B.A. Federici and Y. Bigot

Family Ascoviridae

The ascoviruses (family Ascoviridae) are ds DNA viruses that attack lepidopterans and are characterized by large, enveloped virions, 130 400 nm, which vary, depending on the species, from allantoid to bacilliform in shape (Federici et al. 2005). Structural studies of ascovirus virions suggest that these contain two unit membranes, one that is part of the inner particle that surrounds the DNA core, and a second that makes up part of the outer virion envelope (Fig. 14.2). There are significant differences between ascovirus and ichneumonid particles, but nevertheless they correspond in size and general morphology (Figs. 14.1 and 14.2). Each ascovirus virion contains a single ds DNA genome, which, depending on the species, ranges from 138 to 180 kb. Four species of ascoviruses are recognized, S. frugiperda ascovirus (SfAV-1a), Trichoplusia ni ascovirus (TnAV-2a), Heliothis virescens ascovirus (HvAV-3a), and Diadromus pulchellus ascovirus (DpAV-4a). The first three occur in noctuid species such as the cabbage looper, T. ni, cotton budworms and bollworms of Heliothis and Heliocoverpa species, and armyworms, Spodoptera species, in the United States. These viruses are pathogens that kill the wasp’s host and as a result, wasp larvae as well. The fourth, noted earlier, occurs in France, where it attacks the pupa of the leak moth, Acrolepiosis assectella (family Yponomeutidae). This ascovirus is a true symbiotic virus that enhances the parasitic success of its wasp vector. All ascoviruses replicate genomic DNA, producing large numbers of progeny virions in their caterpillar or pupal hosts. Ascoviruses differ from all other viruses in that after they invade a cell, they destroy the nucleus and direct the cell to cleave into numerous vesicles in which virion assembly proceeds. These vesicles are liberated from tissues into the hemolymph, where female wasps acquire them mechanically during oviposition and transmit them to new caterpillar hosts. Aside from structural similarities, ascovirus virions and ichneumonid particles depend on parasitic wasps for transmission. Much like insect iridoviruses, ascoviruses are very difficult to transmit per os, but are highly infectious when transmitted by parasitoids or by injection (Hamm et al. 1985). Even more importantly with respect to the organelle paradigm and symbiogenesis, the genome of the D. pulchellus ascovirus (DpAV-4a) is carried in a nonintegrated form in the nuclei of males and females of its ichneumonid wasp vector, D. pulchellus (Bigot et al. 1997a, b). If one were looking for evolutionary intermediates between ascoviruses and ichnoviruses, this would be a type that would be expected.

14.4.3

Molecular Evidence for the Evolution of Ascoviruses from Iridoviruses

As noted above, the molecular evidence that ichnovirus particles evolved from ascoviruses is very limited. We therefore first discuss the data that exist for the evolution of the ascoviruses from iridoviruses. These data provide an important

14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects

239

foundation for the ascovirus > ichneumonid particle hypothesis because ascoviruses differ so much from iridoviruses in their cytopathology and morphology of their virions. Thus, if ascoviruses, which recall are transmitted by parasitoids, evolved from iridoviruses, the possibility that ichnoviruses evolved from ascoviruses, where at least the changes in virion structure are less substantial, becomes more plausible. The molecular evidence that ascoviruses evolved from iridoviruses is based on analyses of four proteins that occur among a diversity vertebrate and invertebrate ds DNA viruses. These proteins are the major capsid protein, DNA polymerase, thymidine kinase, and ATPase III. Our analyses, performed using Parsimony and Neighbor-Joining programs, indicate all these evolved from the same virus ancestor (Stasiak et al. 2000, 2003). Although there are variations in the topologies of the trees that emerged from our analyses of these proteins, two significant patterns are apparent. First, ascoviruses and iridoviruses are more closely related to each other than to the algal or vertebrate viruses in this viral lineage. Second and more significantly, the TK and ATPase trees show the lepidopteran Chilo iridovirus (CIV) clustering more closely with ascoviruses than with any of the vertebrate iridioviruses (Stasiak et al. 2000, 2003). That the CIV and ascovirus MCP do not cluster on the same branch is not surprising given the marked differences in virion shape (Fig. 14.2). Another important feature that emerged from these analyses is that the ascoviruses that are mechanically vectored by wasps, i.e., SfAV-1a, TnAV-2a, and HvAV-3a, cluster together on one branch of the ascovirus tree, whereas DpAV-4a, which is vertically transmitted by its wasp host, is found on a separate branch. This difference correlates with the important difference in biology, specifically, the more intimate association that DpAV-4a has with its wasp vector. In summary, while the data indicating ascoviruses evolved from iridoviruses must be considered preliminary, as the genes analyzed represent a small portion of those encoded by these viruses, the results are nevertheless important because they reflect patterns consistent with the biology of virus transmission by parasitic wasps. More recent molecular studies, specifically the sequencing of the DpAV-4a genome, suggest that in fact the ichneumonid particles may well have originated from an ancestral iridovirus. We noted above that the ichneumonid, E. vitticolle, a parasite of noctuid caterpillars, is both capable of transmitting and being infected by an iridovirus (Lopez et al. 2002). Annotation of the DpAV-4a genome shared more core genes with lepidopteran iridoviruses than the more common, highly pathogenic ascoviruses, e.g., SfAV-1, TnAV-2, and HzAV-3 (Bigot et al. 2009). These findings again illustrate the need for more genomic sequence data on iridoviruses and ascoviruses that infect lepidopteran insects.

14.4.4

Molecular Data Supporting an Iridovirus/Ascovirus Origin for Ichneumonid Particles

Though the molecular evidence at this stage is minimal, and despite the findings regarding the DpAV-4 genome noted above, BLAST results obtained with several

240

B.A. Federici and Y. Bigot

Fig. 14.3 Map of the 13-kbp region of the DpAV4 genome (EMBL Acc. No. CU469068 and CU467486) that contains the gene cluster with direct homologs in the genome of the Glypta fumiferanae ichnovirus. DpAV-4 ORF with well-characterized direct homologs among other ascovirus and iridovirus genomes are represented by white arrows. Homologous ORF of the GfIV genes are represented by black arrows (from Bigot et al. 2008). Below, the graph is scaled in kbp

ORFs in this genome provide evidence that certain ichnovirus ORFs have their closest relatives in ascovirus genomes. Specifically, we identified a 13 kbp region that contains a cluster of three genes (Fig. 14.3; ORF90, 91, and 93; Bigot et al. 2008) that have close homologs in a GfIV gene family composed of seven members (Lapointe et al. 2007). All contain a domain similar to a conserved domain found in the pox-D5 family of NTPases. To date, this pox-D5 domain has been identified as a NTP binding domain of about 250 amino acid residues found only in viral proteins encoded by poxvirus, iridovirus, ascovirus, and mimivirus genomes. These genes seem to be specific to GfIV, as they are absent in the three sequenced genomes of other ichnoviruses, namely CsIV, Tranosema rostrales ichnovirus (TrIV), and Hyposoter fugitivus ichnovirus (HfIV). More specifically, in DpAV-4, ORF90 encodes a protein of 925 amino acid residues that is 40 similar from position 140 to 925 to a protein of 972 amino acid residues encoded by the ORF1 contained in the segment C20 in the GfIV genome. These two proteins can therefore be considered putative orthologs. The 480 C-terminal residues of this DpAV-4 protein are also 42 similar to the C-terminal domain of the protein homologs encoded by the ORF1 of the D1 and D4 GfIV segments, 36 similar to the N-terminal and the C-terminal domains of the protein encoded by the ORFs 184R and 128L of the iridovirus CIV and LCDV, and 30 similar with those encoded by ORFs 119, 99, and 78 in the ascovirus genomes of HvAV-3e, SfAV-1a, and TnAV-2c, respectively. Overall, this indicates that this DpAV-4 protein is more closely related to that of GfIV than to those found in other ascovirus and iridovirus genomes currently available in databases. ORF091 encodes a protein of 161 amino acid residues similar only with the C-terminal domain of three proteins encoded by the ORFs 1, 1, and 3, contained, respectively, in GfIV segments D1, D4, and D3. In contrast, ORF93 is closer to iridovirus and ascovirus genes than to GfIV genes. This protein of 849 amino acid residues is 43 similar over all its length to CIV ORF184R orthologs in all iridoviral and ascoviral genomes and is only 36 similar over 350 amino acid residues to the C-terminal domain of the GfIV protein homologs encoded by the ORF1, 2, 1, 1, 1, and 1 in, respectively, the C20, C21, D1, D2, D3, and D4 segments of this virus. Since the three DpAV-4 genes have relatives in all ascovirus and iridovirus genomes sequenced so far, their presence in the DpAV-4 genome cannot result

14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects

241

from a lateral transfer that occurred from an ichnovirus genome related GfIV to DpAV-4. Thus, as these DpAV-4 genes are the closest relatives of the pox-D5 gene family present in GfIV identified so far, they could be considered a landmark of the symbiogenic ascovirus origin of the ichnovirus lineage to which this polydnavirus belongs. An alternative explanation is that the presence of DpAV-4-like genes in the genome of GfIV resulted from a lateral transfer from viral genomes closely related to those of GfIV and DpAV-4. Indeed, this might have happened when a Glypta wasp was infected by an ancestral virus related to DpAV-4. Nevertheless, the symbiogenic origin of GfIV from ascoviruses is also supported by morphological features of its virions (Lapointe et al. 2007), which, aside from similarities in shape, also show reticulations on their surface in negatively stained preparations, a characteristic of the virions of all ascovirus species examined to date (Federici et al. 2005).

14.4.5

Relationships Between Ascovirus Virion and Ichneumnid Particle Proteins

Because ascovirus virions and ichnovirus particles display structural similarities, we developed an approach to search for homologs of virion structural proteins in ichnoviruses. To date, only two virion proteins from the Campoletis sonorensis ichnovirus (CsIV) have been characterized (Webb et al 2006). The first is the P44, a structural protein that appears to be located as a layer between the out envelope and nucleocapsid, and the second, P12, a capsid protein. Presently, there are more than one hundred ascoviral or iridoviral MCP sequences in databases. BLAST searches using these sequences failed to detect any similarities between CsIV virion proteins and ascoviral or iridoviral MCPs, or any other proteins. To evaluate the possibility that homology between ichnovirus and ascovirus virion proteins may simply not be detectable by conventional Blastp searches, we used a different method, WAPAM (weighted automata pattern matching). The models were designed on the basis of a previous study (Stasiak et al. 2003) demonstrating that MCP encoded by ascovirus, iridovirus, phycodnavirus, and asfarvirus genomes are related, and all contain seven conserved domains separated by hinges of very variable size. We investigated these conserved domains further using hydrophobic cluster analysis. This analysis revealed that most conservation occurred at the level of hydrophobic residues, as expected for structural proteins. The size variability of the hinges between conserved domains and the conservation of hydrophobic residues might explain why BLAST searches using iridoviral and ascoviral MCP sequences have limited ability to detect MCP orthologs in phycodnavirus and asfarvirus genomes. We designed two syntactic models which together were able to specifically align all MCP sequences of the four virus families. Importantly, WAPAM aligned the CsIV ichnovirus P44 structural protein with both models. Complementary structural and HCA confirmed the presence of the seven conserved domains in this CsIV structural protein (Fig. 14.4a).

242

B.A. Federici and Y. Bigot

Fig. 14.4 Sequence (lanes 1–3) and secondary structure (lanes 4–6) comparisons among (a) MCP and (b) SfAV1a ORF061 orthologs from CsIV (lanes 1 and 4, typed in black), DpAV4 (lanes 2 and 5, typed in blue), and SfAV1a (lanes 3 and 6, typed in purple). Conserved positions among the amino acid sequence of CsIV and those of DpAV4 and SfAV1a are highlighted in gray. Secondary structures in the three SfAV1a ORF061 orthologs were calculated with the Network Protein Sequence Analysis at http://npsa-pbil.ibcp.fr/ website and the statistical relevance of the secondary structures were evaluated with Psipred at http://bioinf.cs.ucl.ac.uk/psipred/ website. C, E, and H in lanes 4–6 respectively indicated for each amino acid that it is involved in a coiled, b sheet, or a helix structure. Using default parameters of Psipred, upper case letters indicate that the predicted secondary structure is statically significant in Psipred results. Significant secondary structures are highlighted in yellow. In (a), the comparisons were limited to three of the seven conserved domains, 2, 5, and 7. Indeed, classical in silico methods appeared to be inappropriate to predict statistically significant secondary structures in conserved structural protein rich in b strand such as iridovirus and ascovirus major capsid proteins. In contrast, a complete and coherent domain comparison was obtained by HCA profiles (see Bigot et al. 2008)

In addition to the above analysis, ten syntactic models were developed using proteins conserved in the three sequenced ascovirus species (SfAV-1a, TnAV-2c, and HvAV-3a) and twelve iridoviruses. None of these models detected homologs among ichnovirus proteins available in databases, except for one, developed from small proteins encoded by the DpAV-4 ORF041, SfAV1a ORF061, HvAV-3a ORF74, and TnAV-2c ORF118 in the ascovirus genomes, and iridovirus CIV ORF347L and mimivirus MIV ORF096R genomes, respectively. Importantly, these proteins have orthologs in vertebrate iridoviruses, phycodnaviruses, and asfarvirus. In SfAV1a, the peptide encoded by ORF061 is one of the virion components. In ascoviruses, iridoviruses, phycodnaviruses, and the asfarvirus,

14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects

243

they have been annotated as thioredoxines, proteins that play a role in initiating viral infection. Database mining with our model revealed four hits with CsIV sequences (Acc N . M80623, S47226, AF236017, AF362508) each a homolog

1. Chromosomal integration of an Ascovirus genome in ancestors wasp genome of the Banchinae and Campopleginae lineages.

2a. Conservation, translocations and losts of the Ascovirus genes

2b. Translocation, duplication and diversification of host genes in the proviral genome of Ascoviral origine.

3a. Resulting proviral Ichnovirus genomes (monolocus solution)

3b. Resulting proviral Ichnovirus genomes (multilocus solution obtained after fragmention of the proviral genome by recombination)

Fig. 14.5 Hypothetical mechanism for the integration and evolution of ascovirus genomes in endoparasitic wasps. Schematic representation of the three-step process of symbiogenesis, and DNA rearrangements that putatively occurred in the germ line of the wasp ancestors in the Banchinae and Campopleginae lineages, from the integration of an ascoviral genome to the proviral ichnoviral genome. Sequences that originate from the ascovirus are in blue, those of the wasp host and its chromosomes are in pink. Genes of ascoviral origin are surrounded by a thin black or white line, depending on their final chromosomal location. Two solutions can account for the final chromosomal organization of the proviral ichnovirus genome, monolocus or multilocus, since this question is not fully understood in either wasp lineage. More complex alternatives to this three-step process might also be proposed and would involve, for example, the complete de novo creation of a mono or multi locus proviral genome from the recruitment by recombination or transposition of ascoviral and host genes located elsewhere in the wasp chromosomes. This model for the chromosomal organization of proviral DNA in polydnaviruses is consistent with published data (Desjardins et al. 2007)

244

B.A. Federici and Y. Bigot

ORF of SfAV-1a ORF061. In fact, these sequences correspond to several variants of a single region contained in the B segment of the CsIV genome. To date, these have not been annotated in the final CsIV genome, probably because they overlap a recombination site. HCA analyses confirmed that the hydrophobic cores were conserved (Fig. 14.4b). Confirmation of the apparent relationship of iridoviruses, ascoviruses, and the ichneumonid particles awaits the sequencing of more of the viral genomes and sequencing of the wasp genes that code for at least the structural proteins that make up the ichneumonid particles. Nevertheless, the significant biological relationships of endoparasitc ichneumond wasps with iridoviruses, ascoviruses, and their caterpillar hosts, and especially the unique relationship of DpAV-4 with its vector, provide all the reagents for the development of symbiotic relationships that lead to symbiogenesis. The evolutionary progression of these relationships, and the benefits certain lineages of symbiotic viruses provided the wasps, and the likely account for the origin of ichneumonid (and braconid) particles. In Fig. 14.5, we illustrate a possible evolutionary scenario and mechanism that may have yielded the interesting immunosuppressive organelles.

Table 14.1 Examples of viruses vertically transmitted by parasitoids and their possible viral origins Virus Evolutionary Parasitoid Parasitoid Reference origin family host Produce virions in parasitoid’s host Diadromus pulchellus Iridovirusc Ichneumonidae Lepidoptera Bigot et al. 1997a ascovirusa Poxvirus Braconidae Diptera Lawrence 2002 Diachasmimorpha longicaudata poxvirusa Ascovirusc Braconidae Coleoptera Barratt et al. Microctonus aethiopoides a 1999 virus Cotesia melonoscela virus Ascovirusc Braconidae Lepidoptera Stoltz et al. 1988 Cotesia marginiventris Nudivirus Braconidae Lepidoptera Hamm et al. 1990 nudivirus Microplitis croceipes nudivirus Nudivirus Braconidae Lepidoptera Hamm et al. 1988 Diadromus pulchellus Reovirus Ichneumonidae Lepidoptera Rabouille et al. cypovirusb 1994 Rhabdovirus Braconidae Diptera Lawrence and Diachasmimorpha Akin 1990 longicaudata rhabdovirusb No virions produced in parasitoid’s host Campoletis sonorensis Ascovirusc Ichneumonidae Lepidoptera Webb et al. 2000 ichnovirus Braconidae Lepidoptera Webb et al. 2000 Cotesia marginiventris Nudivirusc bracovirus Ichneumonidae Coleoptera Hess et al. 1980 Bathyplectes anurus virus Poxvirusc a Involved in immunosuppression b RNA virus c Ancestral viruses from which the respective parasitic particles originated

14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects

14.5

245

Examples of the Diversity of Immunosuppressive Wasp Viruses and Organelles

While the focus here has been on the origin and evolution of braconid and ichneumonid particles, there are several other known endoparasitic wasp/virus associations that range from symbiotic (i.e., involving true viruses) to organelles that likely originated from viruses. These associations, along with several others that have been discussed above, are listed in Table 14.1 to show the diversity of these relationships, most of which have received very little study. Of particular interest are the ascoviruses and poxviruses that replicate in both the parasitoid and its insect host, produce progeny virions, and play a role in immunosuppression. These include the D. pulchellus ascovirus, D. longicaudata entomopoxvirus, the pox-like particles of Bathyplectes anurus, an ichneumonid parasite of a coleopteran, and the asco-like “virus” of M. aethiopoides, a braconid parasite of a coleopteran.

14.6

Summary

During the last 100 million years, the genomes of at least two different types of DNA viruses were integrated into the genomes of, respectively, endoparasitic braconid and ichneumonid wasps. These viral genes thus became part of the wasp genome. Over time, many of the original viral genes were deleted from the DNA packaged into the virions and replaced by wasp genes involved in suppressing the immune response of their caterpillar hosts, thereby transforming the original virions into a novel type of transducing immunosuppressive organelle that enhanced the survival of wasp progeny. The principal original viral genes that were selectively maintained in a functional state in the wasp genomes were those involved in producing critical structural proteins and enzymes essential for organelle assembly and trafficking wasp immunosuppressive genes into caterpillar host cells and nuclei for transcription. There are marked structural differences between the braconid and ichneumonid organelles and their transducing wasp DNAs, yet their common role in immunosuppression demonstrates a high degree of convergent evolution. This relatively recent example of symbiogenesis through which two DNA viruses evolved into immunosuppressive organelles likely accounts for much of the species radiation characteristic of endoparasitic braconids and ichneumonids, two of the largest groups of higher eukaryotic organisms. Acknowledgments This research was supported by grants from the CNRS and the N.A.T.O. to Y. Bigot, and U.S. National Science Foundation Grant INT-9726818 to B. A. Federici. The photographs used in Fig. 14.1 are by D.B. Stoltz, of Dalhouise University, Halifax, Canada.

246

B.A. Federici and Y. Bigot

References Barratt BIP, Evans AA, Stoltz DB, Vinson SB, Easingwood R (1999) Virus-like particles in the ovaries of Microctonus aethiopoides Loan (Hymenoptera: Braconidae), a parasitoid of adult weevils (Coleoptera: Curculionidae). J Invertebr Pathol 73:182–188 Bedwin O (1979a) The particulate basis of the resistance of a parasitoid to the defense reaction of its insect host. Proc Biol Sci 205:267–270 Bedwin O (1979b) An insect glycoprotein; a study of the particles responsible for the resistance of a parasitoids egg to the defense reactions of its insect hosts. Proc Biol Sci 205:271–286 Be´zier A, Annaheim M, Herbiniere J, Wetterwald C, Gyapay G, Bernard-Samain S, Wincker P, Roditi I, Heller M, Belghazi M, Pfister-Wilhem R, Periquet G, Dupuy C, Juguet E, Volkoff A-N, Lanzrein B, Drezen J-M (2009) Polydnaviruses of braconid wasps derive from an ancestral nudivirus. Science 323:926–930 Bigot Y, Rabouille A, Sizaret P-Y, Hamelim M-H, Periquet G (1997a) Particle and genomic characterisation of a new member of the Ascoviridae, Diadromus pulchellus ascovirus. J Gen Virol 78:1139–1147 Bigot Y, Rabouille A, Doury G, Sizaret P-Y, Delbost F, Hamelim M-H, Periquet G (1997b) Biological and molecular features of the relationships between Diadromus pulchellus ascovirus, a parasitoid hymenopteran wasp (Diadromus pulchullus) and its lepidopteran host, Acrolepiosis assectella. J Gen Virol 78:1149–1163 Bigot Y, Samain S, Auge´-Gouillou C, Federici BA (2008) Molecular evidence for the evolution of ichnoviruses from ascovirsues by symbiogenesis. BMC Evol Biol. doi:10.1186/1471-2148-8-253 Bigot Y, Renault S, Nicolas J, Moundras, C, Demattei MV, Semain S, Bideshi DK, Federici BA (2009) Symbiotic virus at the evolutionary intersection of three types of large DNA viruses: Iridoviruses, Ascoviruses, and Ichnoviruses. PloS One doi:10.1371/journal.pone.000639 Burand JP (1998) Nudiviruses. In: Miller LK, Bell LA (eds) The insect viruses. Plenum Press, New York, pp 69–90 Chinchar VG, Essbauer S, He JG, Hyatt A, Miyazaki T, Seligy V, Williams T (2005) Family Iridoviridae. In: Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA (eds) Virus taxonomy: eight report of the international committee on virus taxonomy. Elsevier/Academic Press, London, pp 145–162 Deng L, Stoltz DB, Webb BA (2000) A gene encoding a polydnavirus structural polypeptide is not encapsidated. Virology 269:440–450 Desjardins CA, Gundersen-Rindal DE, Hostetler JB, Tallon LJ, Fuester RW, Schatz MC, Pedroni MJ, Fadrosh DW, Haas BJ, Toms BS, Chen D, Nene V (2007) Structure and evolution of a proviral locus of Glyptapanteles indiensis bracovirus. BMC Microbiol. doi:10.1186/1471-2180-7-61 Desjardins CA, Gundersen-Rindal DE, Hostetler JB, Tallon LJ, Fadrosh DW, Fuester RW, Pedroni MJ, Haas BJ, Schatz MC, Jones LM, Crabtree J, Forberger H, Nene V (2008) Comparative genomics of mutualistic viruses of Glyptapanteles parasitic wasps. Genome Biol. doi:10.1186/gb-2008-9-12-r183 Edson KM, Vinson SB, Stoltz DB, Summers MD (1981) Virus in a parasitoid wasp: supression of the cellular immune response in the parasitoid’s host. Science 211:582–583 Espagne E, Dupuy C, Huguet E, Cattolico L, Provost B, Martins N, Poire M, Periquet G, Drezen JM (2004) Genome sequence of a polydnavirus: insights into symbiotic virus evolution. Science 306:286–289 Federici BA (1983) Enveloped double stranded DNA insect virus with novel structure and cytopathology. Proc Natl Acad Sci USA 80:7664–7668 Federici BA (1991) Viewing polydnaviruses as gene vectors of endoparasitic hymenoptera. Redia 74:387–392 Federici BA (1993) Viral pathology in relation to insect control. In: Beckage NE, Thompson SN, Federici BA, (eds) Parasites and Pathogens of Insects, Vol 2, Academic Press, New York, pp 81–101

14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects

247

Federici BA, Bigot Y (2003) Origin and evolution of polydnaviruses by symbiogenesis of insect DNA viruses in endoparasitic wasps. J Insect Physiol 49:419–432 Federici BA, Bigot Y, Granados RR, Hamm JJ, Miller LK, Newton I, Stasiak K, Vlak JM (2005) Family Ascoviridae. In: Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA (eds) Taxonomy of virus taxonomy: eight report of the international committee on virus taxonomy. Elsevier/Academic Press, London, pp 269–274 Hamm JJ, Nordlung DA, Marti OG (1985) Effects of a nonoccluded virus of Spodoptera frugiperda (Lepidoptera: Noctuidae) on the development of a parasitoid, Costesia marginiventris (Hymenoptera: Braconidae). Environ Entomol 14:258–261 Hamm JJ, Styer EL, Lewis WJ (1988) A baculovirus pathogenic to the parasitoid Microplitus croceipes (Hymenoptera: Braconidae). J Invertebr Pathol 52:189–191 Hamm JJ, Styer EL, Lewis WJ (1990) Comparative virogenesis of filamentous virus and polydnavirus in the female reproductive track of Cotesia marginiventris (Hymenoptera: Braconidae). J Invertebr Pathol 55:357–360 Hess RT, Poinar GO Jr, Etzel L, Merritt CC (1980) Calyx particle morphology of Bathyplectes anurus and B. curculionis (Hymenoptera: Ichneumonidae). Acta Zoo (Stockholm) 61:111–114 Khakhina LN (1992) Concepts of symbiogenesis. In: Margulis L, McMenamin M (eds) Historical and critical study of the research of Russian botanists. Yale University Press, New Haven Lapointe R, Tanaka K, Barney WE, Whitfield JB, Banks JC, Beliveau C, Stoltz D, Webb BA, Cusson M (2007) Genomic and morphological features of a banchine oplydnavirus: comparison with bracoviruses and ichnoviruses. J Virol 81:6491–6501 Lawrence P (2002) Purification and partial characterization of an entomoposvirus (DLEPV) from a parasitic wasp of tephritid fruit flies. J Insect Sci 2:10 Lin C-L, Lee JC, Chen SS, Wood HA, Li M-L, Li C-F, Chao Y-C (1999) Persistent Hz-1 virus infection in insect cells: evidence for insertion of viral DNA into host chromosomes and viral infection in a latent status. J Virol 73:128–139 Lopez M, Rojas JC, Vandame R, Williams T (2002) Parasitoid mediated transmission of an iridescent virus. J Invertebr Pathol 80:160–170 Margulis L (1992) Biodiversity: molecular biological domains, symbiosis and kingdom origins. Biosystems 27:39–51 Margulis L, Fester R (1991) Symbiosis as a source of evolutionary innovation. MIT Press, Cambridge Massachusetts Rabouille A, Bigot Y, Drezen JM, Sizaret P-Y, Hamelin M-H, Periquet G (1994) A member of the reoviridae (DpRV) has a ploidy-specific genomic segment in the wasp Diadromus pulchellus (Hymenoptera). Virology 205:228–237 Rotheram S (1967) Immune surface of eggs of a parasitic insect. Nature 214:700 Rotheram S (1973a) The surface of the egg of a parasitic insect. I. The surface of the egg and first instar larvae of Nemeritis. Proc Biol Sci 183:179–194 Rotheram S (1973b) The surface of the egg of a parasitic insect. IL. The ultrastructure of the particulate coat on the egg of Nemeritis. Proc Biol Sci 183:195–204 Salt G (1965) Experimental studies in insect parasitism XIII. The haemocytic reaction of a caterpillar to the eggs of its habitual parasite. Proc Biol Sci 162:303–318 Salt G (1966) Experimental studies in insect parasitism XIII. The haemocytic reaction of a caterpillar to the eggs of its habitual parasite. Proc Biol Sci 165:155–178 Salt G (1968) The resistance of insect parasitoids to the defense reactions of their hosts. Biol Rev 43:200–232 Schmidt O, Schuchmann-Feddersen I (1989) Role of virus-like particles in parasitoid-host interaction of insects. Subcell Biochem 15:91–119 Schmidt O, Theopold U (1991) Immune defense and suppression in insects. BioEssays 13:343–346 Schmidt O, Theopold U, Strand M (2001) Innate immunity and its evasion and suppression by hymenopteran endoparasitoids. BioEssays 23:344–351

248

B.A. Federici and Y. Bigot

Stasiak K, Demattei M-V, Federici BA, Bigot Y (2000) Phylogenetic position of the DpAV-4a ascovirus DNA polymerase among viruses with a large double-stranded DNA genome. J Gen Virol 81:3059–3072 Stasiak K, Renault S, Demattei MV, Bigot Y, Federici B (2003) Evidence for the evolution of ascoviruses from iridoviruses. J Gen Virol 84:2999–3009 Stoltz DB (1993) The polydnavirus life cycle. In: Beckage NE, Thompson SN, Federici BA (eds) Parasites and pathogens of insects, vol 1. Academic Press, New York, pp 167–187 Stoltz DB, Faulkner G (1978) Apparent replication of an unusual virus-like particle in both a parasitoid wasp and its host. Can J Microbiol 24:1509–1514 Stoltz DB, Vinson SB (1979) Viruses and parasitism in insects. Adv Virus Res 24:125–171 Stoltz DB, Krell P, Summers MD, Vinson SB (1984) Polydnaviridae – a proposed family of insect viruses with segmented, double-stranded, circular DNA genomes. Intervirology 21:1–4 Stoltz DB, Krell PJ, Cook D, MacKinnon EA, Lucarotti CJ (1988) An unusual virus from the parasitic wasp Cotesia melanoscela. Virology 162:311–320 Tanaka K, Lapointe R, Narney WE, Makkay AM, Stoltz D, Cusson M, Webb BA (2007) Shared and species-specific features among ichnovirus genomes. Virology 263:26–35 Vinson SB (1972) Factors involved in successful attack on Heliothis virescens by the parasitoid Cardiochiles nigriceps. J Invertebr Pathol 20:118–123 Vinson SB (1990) How parasitoids deal with the immune system of their host: an overview. Arch Insect Biochem Physiol 13:2–27 Vinson SB, Scott JR (1975) Particles containing DNA associated with the oocyte of an insect parasitoid. J Invertebr Pathol 25:375–378 Wang Y, Jehle JA (2009) Nudiviruses and other large, double-stranded circular DNA viruses of invertebrates: new insights into an old topic. J Invertebr Pathol 101:187–193 Webb BA, Beckage NE, Hayakawa Y, Lanzrein B, Stoltz DB, Strand MR, Summers MD (2005) Family Polydnaviridae. In: Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA (eds) Virus taxonomy: eight report of the international committee on virus taxonomy. Elsevier/ Academic Press, London, pp 255–265 Webb BA, Strand MR, Dickey SE, Beck MH, Hilgarth RS, Barney WE, Kadash K, Kromer JA, Lindstrom KG, Rattanadechakul E, Shelby KS, Thoetkiattikul H, Turnbull MS, Witherell RA (2006) Polydnavirus genomes reflect their dual roles as mutualists and pathogens. Virology 347:160–174 Whitfield JB (2002a) Estimating the age of the polydnavirus/braconid wasp symbiosis. Proc Natl Acad Sci USA 99:7508–7513 Whitfield JB (2002b) Phylogeny of microgastroid braconid wasps, and what it tells us about polydnavirus evolution. In: Austin AD, Dowton M (eds) Hymenoptera, evolution, biodiversity, and biological control. CSIRO Publishing, Collingswood, Australia, pp 97–105

Chapter 15

The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails with Remarkable Pharmacological Potential Maria Vittoria Modica and Mande¨ Holford

Abstract The Neogastropoda include many familiar molluscs, such as cone snails (Conidae), purple dye snails (Muricidae), mud snails (Nassariidae), olive snails (Olividae), oyster drills (Muricidae), tulip shells (Fasciolariidae), and whelks (Buccinidae). Due to their amazing predatory specializations, neogastropods are often dominant members of the benthic community at the top of the food chain. In a dazzling display that ranges from boring holes to darting harpoons, neogastropods have developed several prey hunting innovations with specialized compounds pharmaceutical companies could only dream about. It has been hypothesized that evolutionary innovations related to feeding were the main drivers of the rapid neogastropod radiation in the late Cretaceous. The anatomical, behavioral, and biochemical specializations of neogastropod families that are promising targets in drug discovery and development are addressed within an evolutionary framework in this chapter.

15.1 15.1.1

Introduction The Neogastropoda

Neogastropoda is an order of gastropod molluscs that are well characterized morphologically and are traditionally viewed as monophyletic (Ponder 1973; Taylor and Morris 1988; Ponder and Lindberg 1996, 1997; Kantor 1996; Strong 2003). M.V. Modica Dipartimento di Biologia Animale e dell’Uomo, “La Sapienza”, University of Rome, Viale dell’Universita` 32, 00185 Rome, Italy e-mail: [email protected] M. Holford The City University of New York – York College & Graduate Center, and The American Museum of Natural History, 94–20 Guy R. Brewer Blvd, Jamaica, NY 11451, USA e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_15, # Springer-Verlag Berlin Heidelberg 2010

249

250

M.V. Modica and M. Holford

This characterization of the Neogastropoda persists even after contrasting interpretations have been proposed (see e.g., Colgan et al. 2007; Kantor and Fedosov 2009). Strong (2003) has recently provided the most updated report of potential neogastropod synapomorphies. Anatomical characteristics of neogastropods include a very peculiar anterior foregut with a proboscis (pleurembolic or intraembolic), a valve of Leiblein, a gland of Leiblein (or a venom gland in Toxoglossa), paired primary and accessory salivary glands, an anal gland, and several radular peculiarities (Ponder 1973; Kantor 2002; Strong 2003). Figure 15.1 illustrates a generalized scheme of neogastropod anatomy. The order Neogastropoda includes up to 25 families (Bouchet and Rocroi 2005) traditionally split into three superfamilies, Cancellarioidea, Conoidea, and Muricoidea, on the basis of anatomical features of the anterior foregut, including the radula. Cancellarioidea, also called Nematoglossa, comprised of the single family Cancellariidae, is perceived to be the basal offshoot of neogastropods (Kantor 1996; Strong 2003; Oliverio and Modica 2009; Modica et al. 2009). They are characterized by a nematoglossan radula with a complex mechanism of interlocking of the distal cusps (viewed as an adaptation to suctorial feeding: Petit and Harasewych 1986) and a mid-oesophageal gland that is generally not separated from the oesophagous (Fig. 15.2a). Conoidea, also referred to as Toxoglossa, include Conidae, Terebridae, and the “turrid” which are estimated to have more than 10,000 extant species, and whose taxonomy is under revision (Puillandre et al. 2008). In Conoidea, the radula is modified in various degrees until forming a harpoon (toxoglossan radula), and the dorsal mid-oesophageal gland is separated from the oesophagous and develops into a venom apparatus, with a muscular bulb and a secretory tubule producing neurotoxins (Fig. 15.2b). Muricoidea (also termed Rachiglossa) include the vast majority of neogastropod families, whose monophyly is currently debated (Kantor 1996, 2002; Oliverio and Modica 2009). The muricoidean radula is rachiglossate (Fig. 15.2c) and their anatomy is similar to the generalized model proposed in Fig. 15.1, but there are many modifications at different taxonomic levels. Variations include the presence/absence of radula, accessory salivary glands, valve and gland of Leiblein, anal gland and a number of other foregut, renal, and reproductive features. According to the fossil record, the adaptive radiation of neogastropods has been particularly rapid (Taylor et al. 1980) and may be attributed to the evolution of a predatory lifestyle and diversification in a number of different trophic strategies. Such attributes allowed neogastropods to fully diversify their niches and to efficiently exploit their alimentary resources. In this scenario, the evolutionary role played by chemical innovations in feeding is unquestionable. The Cancellarioidea, Conoidea, and Muricoidea possess a bountiful reservoir of bioactive compounds routinely used to sedate or capture prey. These compounds are the building blocks for future drug discovery targets. Outlined in this chapter are the anatomical features, specialty feeding strategies, and potential bioactive compounds found in the families of the Neogastropoda. Specific attention is given to the discovery and characterization of bioactive compounds from the Conoidea. Based on the successful characterization and implementation of cone snail toxins in 250

15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails

251

Fig. 15.1 Generalized scheme of neogastropod anatomy (male). Mantle longitudinally dissected, body wall not shown. Abbreviations are as follows: a anus; ag anal gland; asg accessory salivary gland; ct ctenidium; dg digestive gland; ft foot; hg hypobranchial gland; lg gland of Leiblein; lv valve of Leiblein; mo mouth; op operculum; os osphradium; pe penis; pg prostate gland; pr proboscis; sd salivary duct; sg salivary gland; st stomach; t testis. Modified after Ponder (1998a)

pharmacological approaches (Favreau and Sto¨cklin 2009; Twede 2009; Olivera and Teichert 2007; Fox and Serrano 2007), several groups within the Neogastropoda are highlighted as potential biodiversity targets for drug discovery.

15.1.2

Discovery and Characterization of Cone Snail Toxins

The gold standard for investigating toxins from marine snails is the discovery and characterization of neurotoxins from cone snails (Conus) (Fig. 15.2b). This extremely diversified group of marine snails comprises active predators that use biochemical substances to subdue their prey. Characterization of cone snail toxins begun almost a half century ago (Kohn 1956; Kohn et al. 1960; Endean et al. 1974), starting from empirical observations of envenomation episodes, and has blossomed into a successful research field (review; Norton and Olivera 2006). The characterization of conotoxins provides scientists with new, powerful tools to manipulate the function of ion channels and receptors governing the physiology of the nervous 251

252

M.V. Modica and M. Holford

Fig. 15.2 The Neogastropoda radiation. Three major families of the Neogastropoda are shown: (a) Cancellarioidea, (b) Conoidea, and (c) Muricoidea. The grey triangles shown are proportional to the number of species included in each lineage. Shown for each superfamily are radula, scheme of the foregut, and some shell representatives. Shells shown, from left to right, by genus: (a) Scalptia. (b) Conus, Terebra, Thatcheria, Gemmula. (c) Murex, Oliva, Vexillum, Melongena, Cymbiola, Fusinus, Volutopsius. (d) Schematic arrangement of the foregut (modified after Kantor 1996). Shell images courtesy of Guido and Philippe Poppe. Radula pictures courtesy of Yuri Kantor (b) and Alisa Kosyan (c).

system. The pharmacological usage of ion channels and receptors as drug development targets for the treatment of neurological and cardiovascular diseases is rapidly gaining momentum. The discovery of Prialt (Ziconotide) (Miljanich 2004), the synthetic form of the Conus magus peptide o-conotoxin MVIIA, an N-type calcium channel blocker, significantly highlight the potential of toxins from marine snails. Prialt was approved by the Food and Drug Administration of the United States in December 2004 for analgesic use in HIV and cancer patients. Although Prialt is a significant breakthrough, Conus represents only a very small fraction of the diversity of Neogastropoda. Conus is one of the 20–30 recognized neogastropod families and includes ca 4–500 species out of 10–15,000 estimated in the Conoidea (Bouchet and Rocroi 2005). The pharmacological potential of neogastropods as a source for bioactive compounds is largely unrealized. Similar to cone snails, several other neogastropods have evolved specialized compounds as a result of their feeding ecology that may have potential in pharmacological applications. 252

15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails

15.2

253

Feeding Strategies in the Neogastropoda

From what is known about the diets of neogastropod families, the vast majority of neogastropods are carnivorous, with a degree of predatory activity that varies from actively seeking prey to grazing on sessile invertebrates, to scavenging. Some neogastropod families, such as Buccinidae and Muricidae, include many generalist species, which can feed on a variety of living and dead organisms. Most Muricidae feed on living bivalves, gastropods, polychaetes, bryozoans, sipunculids, barnacles, and other small crustaceans, but there are a few that also feed on carrions. A species of Drupa has been observed feeding also on holothurians (Wu 1965), while Drupella (Ergalataxinae) and all Coralliophilinae feed on corals (Taylor 1976; Ward 1965; Haynes 1990) (Fig. 15.4a). Some neogastropod families appear to be highly specialized, such as the Mitridae, which feed exclusively on sipunculids (Taylor et al 1980) and possess peculiar anatomical adaptations to this kind of prey (Harasewych 2009). An interesting feeding strategy is also displayed by the Volutidae, which has been reported for feeding on bivalves, gastropods, and in some deep-water species, on echinoderms (Darragh and Ponder 1998). Members of the Volutidae use their large foot to engulf the prey in a semiclosed environment, in which anesthetic substances are apparently released (Bigatti et al. 2009). Described in the following paragraphs are neogastropod feeding strategies that involve bioactive substances that may have pharmacological utility.

15.2.1

Harpooning

Cone snails, terebrids, and turrids make up the superfamily Conoidea (or Toxoglossa, “poisoned tongued”). Toxoglossans are a megadiverse group of hunting snails where the rapid evolution of venom peptide genes has led to an amazing molecular diversity. They feed on molluscs, polychaetes, acorn worms, and fish (Kohn 1959, 1968; Kohn and Nybakken 1975; Leviten 1980). The key evolutionary innovations enabling conoideans to hunt preys are a conspicuous venom apparatus made up of highly modified radular teeth (harpoon), a venom duct (a glandular duct connected to the oesophagous), and a muscular venom bulb (Fig. 15.2b). The radular tooth, held at the proboscis tip, is inserted into the prey and dispensed similar to a hypodermic needle (Olivera 2002). The mechanism of envenomation involves the contraction of the muscular venom bulb, which forces the secretion of the venom duct through the proboscis, until reaching the tooth. A single cone snail specimen may produce between 50 and 200 different peptides, which are known to target different ion channels (Terlau and Olivera 2004).

253

254

15.2.2

M.V. Modica and M. Holford

Shell Drilling

Shell drilling is the most common feeding technique in muricids, and it is achieved by the concerted action of the radula and a specialized glandular pad (the accessory boring organ) placed on the foot sole (Carriker 1961) (Fig. 15.3a). The drilling process may last up to 1 week (Palmer 1990; Dietl and Herbert 2005). Drilling is not restricted to muricids and has been observed in other rachiglossans, such as the marginellid genus Austroginella (Ponder and Taylor 1992), the buccinid Cominella (Peterson and Black 1995), and the nassariid Nassarius festivus (Morton and Chan 1997). Other feeding strategies developed by the muricids include the opening of the prey shell with the foot (Wells 1958), the cracking of the shells close to the apertural margin followed by proboscis insertion (Radwin and D’Attilio 1976) and the use of shell projections on outer lip (labial spines) to force the opening of the valves (Marko and Vermeij 1999).

15.2.3

Shell Wedging and Proboscis Insertion

As noted above, drilling has been reported for a few species of Buccinidae, but the majority of buccinids use the strengthened margin of their shells to wedge open bivalve shells (Nielsen 1975), in order to insert their proboscis (Fig. 15.3b). Buccinidae eat polychaetes, small crustaceans, and some species have been observed feeding on peculiar preys, e.g., Neptunea antiqua on priapulids, Taylor 1978). Buccinds can also insert their proboscis into the aperture of gastropod shells. Similar strategies of proboscis insertion with mild radular rasping or use of shell margins have been reported in families related to buccinids, such as: the Nassariidae, which feed on polychaetes, barnacles and carrion; the Fasciolariidae, which feed on bivalves, gastropods, sedentary polychaetes, and carrions; the Melongenidae, which feed on gastropods and bivalves; and the Columbellidae, which feed on ascidians, hydroids, small crustaceans, polychaetes, and algae (Taylor et al. 1980).

15.2.4

Suctorial Feeding

Suctorial feeding, or sucking the innards of prey organisms, is an evolutionary advanced feeding technique demonstrated by several neogastropod families. This form of feeding does not always result in the death of the prey, and several neogastropod species coexist with the prey. Two kinds of suctorial feedings are described: haematophagy and corallivory.

254

15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails

255

Fig. 15.3 Examples of neogastropod feeding strategies. (a) An ocinebrine Muricidae drilling the shell of a venerid bivalve (photo G. Herbert). (b) A Muricanthus sp. (Muricidae) using the shell margin to wedge open a bivalve shell (photo G. Herbert). (c) Colubraria muricata (Colubrariidae) feeding on a clownfish in aquarium; the proboscis is inserted under the pectoral fins (photo M. Oliverio). (d) Coralliophila meyendorffi (Coralliophilinae) feeding on Actinia equina (photo P. Mariottini)

15.2.4.1

Haematophagy

Three different neogastropod families, Cancellariidae, Marginellidae, and Colubrariidae, have independently evolved haematophagous feeding on fish (Fig. 15.3c). The buccinoidean family Colubrariidae includes at least six species involved in a parasitic association with different species of fish, mainly belonging to the family Scaridae (Johnson et al. 1995; Bouchet and Perrine 1996). Colubraria specimens can extend their proboscis to a length exceeding three times the shell length. When the extended Colubraria proboscis is in contact with the skin of the prey, a scraping action with its minute radula allows access to the blood vessels of the fish. The snail then apparently takes advantage of the blood pressure of the fish to ingest its meal (Oliverio and Modica 2009). Experimental observations on different Colubraria species (Modica and Oliverio, unpublished) suggest that adaptation to haematophagy involves the use of anesthetic and anticoagulant compounds. In fact, the fish appears to be anesthetized when the snail is feeding. Anesthetization is reversible, and the fish usually recovers its full mobility in a few minutes after the interruption of the contact with the snail. The anesthetic compounds used are not lethal as the prey recovers, in agreement with field observations

255

256

M.V. Modica and M. Holford

that Colubraria usually feed on fish sleeping in crevices of the reef (M. Oliverio pers. comm.; Bouchet and Perrine 1996; Johnson et al. 1995). A similar strategy has been reported for the cancellariid Cancellaria cooperi (Cancellarioidea), which has been observed using its proboscis to ingest blood from open injuries on the body of the electric ray Torpedo californica (O’Sullivan et al. 1987). Cancellariidae are likely to include exclusively suctorial feeders, as inferred from foregut and radular characteristics. Dissection of Cancellaria cooperi evidenced a peculiar oesophageal structure (M.V. Modica, J. Biggs, and M. Holford, unpublished observations). In fact, the mid oesophagous is extremely long (up to 5 times the shell length) and glandular, similar to what is found in Colubraria, suggesting a convergent adaptation to haematophagy. Other examples of haematophagous feeding are the very minute species of Marginellidae, Kogomea ovata, Hydroginella caledonica, and Tateshia yadai, that live attached to the pectoral fins of their host (Kosuge 1986; Bouchet 1989).

15.2.4.2

Corallivory

Feeding on the living tissues of corals and other Anthozoans is reported in Muricidae for Drupella (Ergalataxinae) and for the subfamily Coralliophilinae (Taylor 1976; Ward 1965; Haynes 1990). Coralliophilinae includes over 200 marine tropical to temperate species, from shallow to deep waters. The few species for which alimentary preferences are known (about 10% of the shallow water species, Oliverio et al. 2008) feed exclusively on anthozoans (Fig. 15.3d). A variety of feeding strategies and preferences are displayed for this group. Some species are stenophagous, with very strict host specificity; they are mostly sessile on corals, and many groups have developed interesting eco–morphological adaptations. In fact, while Quoyula has a limpet-like shell suitable for external life on stony corals, Rhizochilus lives and feeds on anthipatharians with the shell deformed to adhere to the black coral branch. A second group lives embedded in the host skeleton: Rapa lives inside alcyonarian octocorals, Magilopsis and Leptoconchus have ovoid shells and bore holes into corals, while Magilus is sessile inside corals and possesses an uncoiled adult shell (Robertson 1970). Some others are mobile as Latiaxis, which is probably associated with deep-water gorgonians, or Babelomurex that mostly feeds on shallow water hexacorals. In a few cases mobile euryphagous species can feed on anthozoans belonging to different orders, such as some species of Coralliophila associated with sea anemones, scleractinians, and zoanthids (M. Oliverio, unpublished observations). Among coralliophilines some anatomical modifications related to parasitism on corals are widespread, such as the loss of the radula and jaws, viewed as an adaptation to suctorial feeding, and brooding of embryos in capsules kept in the pallial cavity (Richter and Luque 2002). The amazing display of feeding strategies developed by neogastropods is possible due to the diversity of innovative anatomical features and chemical compounds that can be readily employed to overcome their prey. 256

15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails

15.3

257

Neogastropod Specialized Anatomy and Predatory Chemical Substances

Most neogastropod snails have developed specialized glands or other anatomical features that enable them to produce and use chemical substances to subdue their prey. It can be argued that the development of specialized foregut glands, such as the venom gland in Conoidea, or salivary and accessory salivary glands in other neogastropod groups, has lead to the successful radiation of neogastropods. The biochemical weaponry developed in the foregut and other glands is an evolutionary advantage that has enabled neogastropods to thrive.

15.3.1

Foregut Glands

The foregut glands described here include the venom gland, primary, and accessory salivary glands (Figs. 15.1 and 15.2). Toxins may be produced in a specific venom gland, as is the case with most Conoideans, or in primary and/or accessory salivary glands (Andrews 1991) for species that do not have a venom gland. In some cases, the production of toxins might involve other foregut organs/tissues, such as the glandular mid-oesophagous of the haematophagous Colubraria and Cancellaria.

15.3.1.1

Venom Gland

The presence of a venom apparatus is characteristic of the Conoidea (Fig. 15.2b). Generally it is a conspicuous organ, constituted by a proximal muscular bulb and a very long, convolute duct (the gland itself). The tubular gland always passes through the nerve ring and opens into the buccal cavity, posterior to the radular sac opening. The active exocrine secretion of the venom is due to a single cell type: cuboidal ciliated cells, accumulating venom granules at their apex, until they are discharged into the lumen (Smith 1967). The venom gland may be lined with such secretory cells for its whole length or, as happens in some species, the secretory tissue may be confined to the region posterior to the nerve ring, while the anteriormost region is a simple ciliated duct (Taylor et al. 1993). The terminal muscular bulb is usually constituted by two muscular layers, internal and external, separated by connective tissue; the relative thickness and development of these layers is variable between species. According to Ponder (1973) the tubular venom gland originated from the dorsal glandular folds of the oesophagous while the gland of Leiblein gave rise to the muscular bulb. Some conoideans, mostly radula-less species, do not possess a venom apparatus. All cone snails (Conus) have a venom apparatus and the toxins found in their venom glands have led the field in characterizing peptide toxins from marine snails. When venom is injected into a prey, the conotoxins work in a concerted manner to 257

258

M.V. Modica and M. Holford

shut down the prey’s nervous system. Conotoxins are potent neurotoxins that target ion channels and receptors. The complement of peptides found in any one Conus venom is strikingly different from that found in the venom of any other Conus specimens (Romeo et al. 2008). Thus, in the whole genus, many tens of thousands of distinct active peptides have evolved. A question that immediately arises is why individual cone snails should need so many different peptides. It has been speculated that the complement of peptides in a venom may be used for at least three general purposes: An individual peptide may play a role in (1) prey capture, directly or indirectly; (2) defense and escape from predators; or (3) other biological processes, such as interaction with potential competitors. Not all terebrids and turrids have a venom apparatus, but those that do also produce toxins to subdue their prey. Unlike conotoxins, less is known about terebrid and turrid toxins, teretoxins and turritoxins, respectively. Preliminary characterization of terebrid and turrid toxins (Imperial et al. 2003, 2007; Watkins et al. 2006; Heralde et al. 2008) indicate a similar threedomain conotoxin structure consisting of a highly conserved signal sequence, a more variable pro-region, and a hypervariable mature toxin sequence. While conotoxins have been identified as potent neuropeptides, no known molecular target has been identified for teretoxins or turritoxins. However, given their similarities to conotoxins it is expected they will also be effective modifiers for ion channels and receptors in the nervous system.

15.3.1.2

Primary Salivary Glands

Primary salivary glands are usually acinous, with a very small lumen and a system of narrow branched ducts (Fig. 15.1). In some species, the paired glands may be fused together in a single glandular mass, but two salivary ducts are always present and run along the oesophagous (or, in some groups, embedded in the oesophageal walls) until opening into the roof of the buccal cavity. Two cell types have been identified in the secretory epithelium, mixed with one another: (1) basal cells with apocrine secretion and (2) superficial ciliated cells secreting mucus (Andrews 1991). Ciliary movement is responsible for delivering the secretion, as the outer layer of muscle fibers is poorly developed (Andrews 1991). Acinous salivary glands are present in all neogastropod, although their role in toxin production may be variable, depending on whether other secreting structures, such as venom gland or accessory salivary glands, are present. Only acinous salivary glands are present in Buccinidae and related families, such as Nassariidae, Melongenidae, Fasciolariidae, and Columbellidae (accessory salivary glands are missing). Species of the buccinid genus Neptunea (as e.g., N. antiqua) have very large salivary glands containing high quantity of tetramine (F€ange 1960; Asano and Itoh 1959, 1960; Saitoh et al. 1983; Fujii et al. 1992; Shiomi et al. 1994; Watson-Wright et al. 1992; Power et al. 2002), which blocks nicotinic acetylcholine receptors (Emmelin and F€ange 1958). A number of human intoxication has been reported so far, caused by consumption of snails of these species (Fleming 1971; Millar and Dey 1987; Reid et al. 1988). Further studies have 258

15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails

259

shown the presence of three additional unidentified toxins in the salivary glands of N. antiqua that appear to inhibit neuronal Ca2+ channels (Power et al. 2002). Other whelks are known to produce histamine, choline, and choline esters (Endean 1972). Nassariidae possess three types of secreting cells in their salivary glands, one of which secretes a glycoprotein rich in disulphide groups like the accessory salivary glands of the muricid Nucella lapillus (Fretter and Graham 1994; Minniti 1986; Martoja 1964). The finding that conopeptides are expressed in the salivary gland of Conus pulicarius (Biggs et al. 2008) suggests that salivary glands may play a role in the envenomation process. Crude extracts of salivary glands of the haematophagous Colubraria reticulata have been observed to increase coagulation time of human blood (S. Rufini, M.V. Modica, and M. Oliverio, unpublished). Current research by Modica and colleagues is underway to identify the anticoagulant transcript using cDNA analysis.

15.3.1.3

Accessory Salivary Glands

Accessory salivary glands are considered to be an informative synapomorphy of Neogastropoda, although they are missing in several families. Accessory salivary glands are present in the basal family Cancellariidae (Fig. 15.2a) and in several Toxoglossa, where in some vermivorous cones they coexist with the venom gland (Marsh 1971). Two pairs of accessory salivary glands are also found in Muricidae, Mitridae, Costellariidae, Volutidae, and Olividae, while in Volutomitridae only one gland is found. In Marginellidae, Harpidae, and in the buccinoideans, accessory salivary glands are generally missing, but are present in Busycon (Andrews 1991). A common anatomical organization of the glands is shared by all neogastropods. The paired glands are tubular in shape, with a lumen lined by a columnar secretory epithelium surrounded by a subepithelial muscular coat richly innervated. External to the muscle layer there is an outer layer of gland cells, with long necks opening in the central lumen of the gland (Ponder 1973; Andrews 1991) producing a peculiar granular secretion (Andrews 1991). Exceptions to this model include olives, volutids, and some mitriform species (Marcus and Marcus 1959; Ponder 1970, 1972). The structure is very similar to the venom gland of Conoidea (West et al. 1996). The glandular accessory salivary glands open at the tip of the buccal cavity with nonciliated ducts. In Muricidae, accessory salivary glands are usually large and well developed. In Nucella lapillus and Stramonita haemastoma, the only muricids studied so far at the biochemical level, accessory salivary glands produce a glycoprotein rich in cysteines (Martoja 1971; McGraw and Gunter 1972), similar to conotoxins. Extracts of the glands are able to elicit flaccid paralysis in Mytilus edulis which can be drilled or not, and, in the case of S. haemastoma, in barnacles, which are never drilled (Carriker 1981; Huang and Mir 1972; Andrews 1991; West et al. 1996; Andrews et al. 1991). S. haemastoma also produces a toxic secretion in the primary salivary glands that decreases cardiac activity in mammals and induces vasodilatation, 259

260

M.V. Modica and M. Holford

hypotension, and smooth muscle contraction (Huang and Mir 1972). A similar response was demonstrated in a combined primary/accessory salivary glands extract of another muricid, Acanthina spirata (Hemingway 1978). N. lapillus extracts also disrupt neuromuscular transmission in rat phrenic nerve–hemidiaphragm preparations (West et al. 1996). In some Volutidae, the accessory salivary glands have been reported to produce a narcotizing compound, with a very low pH, inducing muscular relaxation in the preys (Bigatti et al 2009).

15.3.2

Hypobranchial Gland

The hypobranchial gland is constituted by a thickening of the epithelium in the roof of the pallial cavity and produces large amounts of mucus. Its primary function is currently viewed to be the cleaning of the mantle cavity; the mucous secretion binds together the particulate matter, which is then eliminated from the mantle cavity. However, the hypobranchial gland comprises at least three different cell types that may correspond to distinct chemical activities, which have only been partially identified (Naegel and Aguilar-Cruz 2006). In many muricid species, the hypobranchial gland produces chromogens, which, exposed to light and oxygen, develop into a purple pigment that has been used for centuries as a dye (Tyrian purple). Similarly, in the Mitridae, the hypobranchial secretion once exposed to air becomes yellowish, then purple, and finally dark brown (Harasewych 2009), while in Costellariidae it remains predominantly yellow-green (Ponder 1998b). The production of small compounds, mainly choline esters, but also biogenic amines, has been detected in the hypobranchial gland of several species of muricids and buccinids. These substances elicit neuromuscular blocking, with paralyzing effects both in invertebrates and vertebrates (Roseghini et al. 1996). Due to the low concentrations in which these toxic compounds are found in the snails, it is not sure how effective they are in prey hunting (West et al. 1996). The functions of the hypobranchial gland and the role it played in the evolution and diversification of the Neogastropoda are still to be clarified; nevertheless, hypobranchial secretions may have useful pharmacological properties.

15.4

Neurotoxins, Anesthetics, and Anticoagulants: Prominent Bioactive Compounds from Neogastropod Snails

As stated in the introduction of this chapter, conotoxins, with the approval of the analgesic drug Prialt, have demonstrated the utility of translating basic research of marine snail compounds into drug development targets. The identification of novel neurotoxins, anesthetics, and anticoagulants are three areas in which harvesting the bioactive compounds of the Neogastropoda could prove very fruitful. The following

260

15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails

261

section highlights the success of conotoxins as neurotoxins and outlines the potential of identifying anesthetic and anticoagulant compounds from neogastropod snails.

15.4.1

Neurotoxins

In the Conoidea, the best-characterized venom components are small, highly structured disulfide peptides, individually encoded by a separate gene. Every Conus species has its own distinct repertoire of 50–200 venom peptides, with each peptide presumably having a physiologically relevant target in prey or potential predators/competitors (Olivera 2002). Most conotoxins are small peptides (6–40 amino acids in length), with the majority being in the size range of 12–30 amino acids (Olivera et al. 1990; Terlau and Olivera 2004). Conotoxins are comprised of a highly conserved precursor structure including a signal sequence, followed by a propeptide region and then a mature toxin that is cleaved from the prepro-structure. The mature toxins are highly disulfide rich and are classified according to their cysteine framework. Cone snails practice combinatorial drug therapy in that it is not one conotoxin that attacks the prey, but instead a cocktail of the 50–200 venom peptides working together to shut down the prey’s nervous system. The conotoxin cocktail contains ion channel and receptor modifiers that can affect neuronal signaling. For example, conotoxins that inhibit Na+ channel function prevent the formation of action potential, while conotoxins that target Ca2+ prevent vesicle fusion, which impedes the release of neurotransmitters. There are presently more than 3,000 different Conus venom proteins reported in the literature (Conoserver: http://research1t.imb.uq.edu.au/conoserver/). Less than 10% of the described conotoxins have been functionally characterized. Of those characterized, at least 25 different functions have been described (Olivera 2006; Conoserver). Several conotoxins are at various stages of drug development with the more promising examples being: MrIA (active on norepinephrine transporters), Vc1.1 (active on nicotinic receptors), and Conantokin-G (active on NMDA receptors) (Olivera 2006). While the majority of conotoxins in therapeutic development are analgesic compounds, conotoxins are also being considered as viable targets for epilepsy or myocardial infarction, as well as disorders concerning neuroprotective/ cardioprotective properties (Twede et al. 2009). Another promising group to investigate in order to discover new neurotoxins and/or substances capable of inactivating toxins is the corallivorous subfamily Coralliophilinae (Muricidae). The Anthozoa, such as sea anemones, and stony and soft corals, which are included in the Cnidaria along with the jellyfishes (Scyphozoa), sea-wasps (Cubozoa), hydrocorals, and hydromedusae (Hydrozoa), are known to produce a neurotoxin-rich venom as well as other toxic defensive compounds, from which the Coralliophilinae appear to be immune. Envenomation by cnidarians represents a remarkable sanitary problem for humans. An estimated 40,000–50,000 marine envenomations occur annually due to several species of Cnidaria. Cubozoan alone have been responsible for over 5,000 human deaths in 261

262

M.V. Modica and M. Holford

the last 130 years (Brinkman and Burnell 2009). Antivenom is available only for a very limited number of species. If, as is suggested by reported observations, coralliophilines have antivenom-type compounds, they may potentially be useful in cases of cnidarian envenomations. The immunity of Coralliophilinae raises a number of interesting evolutionary questions, such as: What are the physiological adaptations related to corallivory? Do corallivorous species secrete bioactive compounds interacting with and inactivating anthozoans’ toxin? Are there specialized organs involved in the production of the antivenom (e.g., salivary glands)? Is host switching in euryphagous and host specificity in stenophagous correlated with biochemical variations in the secretion? The answers to these questions may translate into a modern physiological and biochemical understanding of gastropod innovations related to feeding.

15.4.2

Anesthetic and Anticoagulant Compounds

As pointed out in Sect. 15.3, three different neogastropod families have haematophagous species, which produce anesthetic and anticoagulant compounds that may be useful in elucidating cellular communication in the nervous system and as antithrombotic agents. In Colubrariidae, anticoagulants are produced in the salivary glands, but the anatomical structures responsible for anesthetic secretion are not yet known. In addition to the salivary glands, it might be worthy to investigate the glandular mid-posterior oesophagous, a peculiar derived structure that may be related to the haematophagous lifestyle (Oliverio and Modica 2009). Furthermore, the peculiar mid-oesophagous of Cancellaria cooperi is a very advantageous tissue to test for bioactive compounds production, as cancellariid mid-oesophagous may be homologous to toxoglossan venom glands (Ponder 1973; Kantor 1996, 2002). Another issue of interest is the presence in Cancellariidae of both primary and accessory salivary glands. The roles these anatomical structures play in prey subduction and in the production of bioactive substances, as well as their interactions, are still to be investigated. Are the bioactive substances the same in the different haematophagous lineages? Intriguing evolutionary questions may be addressed studying and comparing anticoagulant and anesthetic molecules in Colubrariidae and Cancellariidae.

15.5

Investigating Genetic Evolution and Expression of Neogastropod Toxins

The early evolution, and the first diversification of venom toxins, has been interpreted as the result of a process of neofunctionalization in which strong positive selection acts on redundant genes produced in duplication events, originating new 262

15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails

263

functions (Ohno 1970). This evolutionary mechanism was reported also for conotoxins (Duda and Palumbi 1999). The evolutionary pressure promoting the variability of these “specialty genes” (also called exogenes, as their products act outside the organism; Olivera 2006) is related with a predator–prey arms race process in which the availability of a particular kind of prey may produce an evolutionary force acting on ecologically important genetic loci. Conotoxins are particularly prone to rapid genetic variations, due to their extremely reduced size. It is still unclear at which level the results reported for Conus might be generalized in the neogastropods, but it is plausible at least to hypothesize that the same organs produce the same type of bioactive substances across the entire order Neogastropoda. According to the amount of variation that will be detected at the different taxonomic levels in neogastropods, it will be possible to clarify the evolutionary patterns acting at each level. In snakes, where the same neofunctionalization mechanism is responsible for the evolution of the toxin gene families, the genes that have been recruited to constitute the venom proteome have been partly identified (Fry 2005). In neogastropods, including cone snails, the origin of the toxin sequences has yet to be investigated. The role of differential gene expression and posttranscriptional modifications in modulating toxin diversity is also an intriguing area requiring further investigation. This line of research could be addressed at different taxonomic levels: (1) Between different species – a particular focus should be dedicated to host specificity, to verify if the inverse correlation between the degree of specialization and the diversity of the venom in Conus leopardus (Remigio and Duda 2008) can be generalized to other neogastropod groups. (2) In individuals of the same species – the high levels of intraspecific variability observed in Conus ventricosus (Romeo et al. 2008) raise the possibility that fine-scale modulatory mechanisms may act in response to environmental and ecological variations. And (3) at different ontogenetic stages – juvenile neogastropods have often a largely different diet from the adults, implying a different suite of toxins. How and under which mechanisms does venom composition change during ontogenesis? To address these and other toxin evolution and expression topics, a robust phylogenetic hypothesis and an integrated strategy for the characterization of bioactive compounds are required.

15.6

Conclusion: Integrated Strategies for Building a More Robust Evolutionary Framework and Effective Drug Development Methods

The major challenges in characterizing bioactive compounds in snails are the complexity of sampling, the scarcity of the biological material, and the absence of databases for determination of peptide and protein sequences. Venom profiling may thus prove an elusive target, unless molecular biology techniques are coupled 263

264

Research fields

Integrative approach

Output

M.V. Modica and M. Holford

Ecology

Anatomy & Physiology

Phylogeny

Chemical ecology

Integrated evolutionary framework

Pharmacology

Comparative phylogeny

Genomics & Proteomics

Enhanced drug development

Fig. 15.4 Integrated research strategies for investigating biodiversity. The integration of different approaches to diversity may lead to a more complete evolutionary framework and enhance the rate of drug discovery and development

with biochemical analysis of polypeptide composition. A multidisciplinary platform, combining modern genomic and proteomic techniques, as well as phylogeny and descriptive approaches to ecology and anatomy, is necessary to increase the rate of pharmacological characterization of new bioactive compounds. Genomic libraries can be obtained from tissues of interest and their analysis can be integrated with proteomic techniques, such as venom fractionation, peptide purification, mass spectrometry, and sequence analysis using automated Edman degradation. Spider venoms have recently been analyzed by a three-dimensional approach, combining calculated, predicted, and measured data obtained with different techniques such as cDNA sequences and LC-MALDI analysis (Escoubas et al. 2006). The use of such “venom landscapes” may constitute a significant improvement in venom profiling and can also be effective as molecular markers in taxonomic and phylogenetic studies. A similar strategy has been applied to snake venoms (Nascimento et al. 2006). Molecular phylogeny, combined with anatomical and ecological data, can guide us through the maze of snail biodiversity, toward the species or group of species which are likely to possess bioactive compounds worthy to investigation to find new therapeutics (Fig. 15.4). This strategy was successfully applied to the Terebridae, outlining particular genera/species important for teretoxin discovery (Holford et al. 2009a, b). 264

15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails

265

Interestingly, the relationship between drug discovery and phylogeny is a twoway street. In fact, exogenes mostly belong to gene superfamilies with highly conserved sequence elements, enabling the use of standard molecular techniques. In what has been called a “concerted discovery strategy” venom toxins are revealed to be useful characters for the taxonomy and phylogenetic relationships of their producers (Olivera 2006; Olivera and Teichert 2007; Bulaj 2008). This integrated approach has been used in non-molluscan toxin-producing groups such as snakes to garner insight into the molecular evolution of snake venoms and to correlate the appearance of other morphological evolutionary novelties (Fry and W€uster 2004). For the Neogastropoda, whose phylogeny cannot be readily elucidated using standard taxonomic approaches, an integrated approach has several possibilities. Proteomics of the venom as well as the characterization of its biochemical and functional properties successfully separated two closely related, morphological indistinguishable pit-viper species (Angulo et al. 2007). The use of genomic analysis and venom profiling techniques, along with more traditional approaches such as anatomical and physiological studies, will allow a better understanding of the correlation between venom composition, trophic preferences, and adaptive radiation of the Neogastropoda, creating the basis for a modern integrated evolutionary framework and an effective drug discovery strategy (Fig. 15.4). Acknowledgments The authors thank Marco Oliverio for invaluable advice and helpful comments on the manuscript. Yuri Kantor, Alisa Kosyan, Gregory Herbert, Paolo Mariottini, Marco Oliverio, and Guido and Philippe Poppe are acknowledged for images used in the figures. MH acknowledges support from NIH grant GM088096-01.

References Andrews EB (1991) The fine structure and function of the salivary glands of Nucella lapillus (Gastropoda: Muricidae). J Moll Stud 57:111–126 Andrews EB, Elphick MR, Thorndyke MC (1991) Pharmacologically active constituents of the accessory salivary and hypobranchial glands of Nucella lapillus. J Moll Stud 57:136–138 Angulo Y, Escolano J, Lomonte B, Gutie´rrez JM, Sanz L, Calvete JJ (2007) Snake venomics of Central American pitvipers: clues for rationalizing the distinct envenomation profiles of Atropoides nummifer and Atropoides picadoi. J Proteome Res 7(2):706–719 Asano M, Itoh M (1959) Occurrence of tetramine and choline compounds in the salivary gland of a marine gastropod Neptunea arthritica (Bernardi). J Agric Res 10:209 Asano M, Itoh M (1960) Salivary poison of a marine gastropod, Neptunea arthritica Bernardi, and the seasonal variation of its toxicity. Ann N Y Acad Sci 90:675–688 Bigatti G, Sanchez Antelo CJM, Miloslavich P, Penchaszadeh PE (2009) Feeding behavior of Adelomelon ancilla (Lighfoot, 1786): a predatory neogastropod (Gastropoda: Volutidae) in Patagonian benthic communities. The Nautilus 123(3):159–165 Biggs JS, Olivera BM, Kantor YI (2008) a-Conopeptides specifically expressed in the salivary gland of Conus pulicarius. Toxicon 52:101–105 Bouchet P (1989) A marginellid gastropod parasitize sleeping fishes. Bull Mar Sci 45:76–84 Bouchet P, Perrine D (1996) More gastropods feeding at night on parrotfishes. Bull Mar Sci 59 (1):224–228

265

266

M.V. Modica and M. Holford

Bouchet P, Rocroi JP (2005) Classification and nomenclator of gastropod families. Malacologia 47 (1–2):1–397 Brinkman DL, Burnell JN (2009) Biochemical and molecular characterisation of cubozoan protein toxins. Toxicon 54:1162–1173 Bulaj G (2008) Integrating the discovery pipeline for novel compounds targeting ion channels. Curr Opin Chem Biol 12:441–447 Carriker MR (1961) Comparative functional morphology of boring mechanisms in gastropods. Am Zool 1(2):263–266 Carriker MR (1981) Shell penetration and feeding by naticacean and muricacean predatory neogastropods: a synthesis. Malacologia 20:403–422 Colgan DJ, Ponder WF, Beacham E, Macaranas JM (2007) Molecular phylogenetics of Caenogastropoda (Gastropoda: Mollusca). Mol Phylogenet Evol 42(3):717–737 Conoserver: http://research1t.imb.uq.edu.au/conoserver/ Darragh TA, Ponder WF (1998) Family Volutidae. In: Beesley PL, Ross JGB, Wells A (eds) Mollusca: the Southern synthesis. Fauna of Australia, vol 5. CSIRO Publishing, Melbourne, pp 833–835, part B Dietl GP, Herbert GS (2005) Influence of alternative shell-drilling behaviours on attack duration of the predatory snail Chicoreus dilectus. J Zool 265:201–206 Duda TFJ, Palumbi SR (1999) Molecular genetics of ecological diversification: duplication and rapid evolution of toxin genes of the venomous gastropod Conus. Proc Natl Acad Sci USA 96:6820–6823 Emmelin N, F€ange R (1958) Comparison between biological effects of neurine and a salivary glands extract of Neptunea antiqua. Acta Zool 39:47–52 Endean R (1972) Aspects of molluscan pharmacology. In: Florkin M, Scheer BT (eds) Chemical zoology, vol 7, Mollusca. Academic Press, New York, pp 421–466 Endean R, Parrish G, Gyr P (1974) Pharmacology of the venom of Conus geographus. Toxicon 12:131 Escoubas P, Sollod B, King GF (2006) Venom landscapes: mining the complexity of spider venoms via a combined cDNA and mass spectrometric approach. Toxicon 47:650–663 F€ange R (1960) The salivary gland of Neptunea antiqua. Ann N Y Acad Sci 90:689–694 Favreau P, Sto¨cklin R (2009) Marine snail venoms: use and trends in receptor and channel neuropharmacology. Curr Opin Pharmacol 9:594–601 Fleming C (1971) Case of poisoning from red whelks. Br Med J 3:250–251 Fox JW, Serrano SM (2007) Approaching the golden age of natural product pharmaceuticals from venom libraries: an overview of toxins and toxin-derivatives currently involved in therapeutic or diagnostic applications. Curr Pharm Res 13:2927–2934 Fretter V, Graham A (1994) British prosobranch molluscs. Revised and updated edition, Ray Society, London Fry BG (2005) From genome to “venome”: molecular origin and evolution of the snake venom proteome inferred from phylogenetic analysis of toxin sequences and related body proteins. Genome Res 15:403–420 Fry BG, W€uster W (2004) Assembling an arsenal: origin and evolution of the snake venom proteome inferred from phylogenetic analysis of toxin sequences. Mol Biol Evol 21 (5):870–883 Fujii R, Moriwaki N, Tanaka K, Ogawa T, Mori E, Saitou M (1992) Spectrophotometric determination of tetramine in carnivorous gastropods with tetrabromophenolphthalein ethyl ester. J Food Hyg Soc Japan 33(3):237–240 Harasewych MG (2009) Anatomy and biology of Mitra cornea Lamarck, 1811 (Mollusca, Caenogastropoda, Mitridae) from the Azores. Ac¸oreana 6:121–135 Haynes JA (1990) Distribution movement and impact of the corallivorous gastropod Coralliophila abbreviata (Lamarck) in a Panamanian patch. J Exp Mar Biol Ecol 142:25–42 Hemingway GT (1978) Evidence for a paralytic venom in the intertidal snail Acanthina spirata (Neogastropoda: Thaisidae). Comp Biochem Physiol 60C:79–81

266

15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails

267

Heralde FM, Imperial J, Bandyopadhyay P, Olivera BM, Concepcion GP, Santos AD (2008) A rapidly diverging superfamily of peptide toxins in venomous Gemmula species. Toxicon 51:890–897 Holford M, Puillandre N, Modica MV, Watkins M, Collin R, Bermingham E, Olivera BM (2009a) Correlating molecular phylogeny with venom apparatus occurrence in panamic auger snails (Terebridae). PLoS ONE 4(11):e7667. doi:10.1371/journal.pone.0007667 Holford M, Puillandre N, Terryn Y, Cruaud C, Olivera BM, Bouchet P (2009b) Evolution of the Toxoglossa venom apparatus as inferred by molecular phylogeny of the Terebridae. Mol Biol Evol 26(1):15–25 Huang CL, Mir GN (1972) Pharmacological investigation of salivary gland of Thais haemastoma (Clench). Toxicon 10:111–117 Imperial JS, Watkins M, Chen P, Hillyard DR, Cruz LJ, Olivera BM (2003) The augertoxins: biochemical characterization of venom components from the toxoglossate gastropod Terebra subulata. Toxicon 42:391–398 Imperial JS, Kantor YI, Watkins M, Heralde FM, Stevenson B, Chen P, Hansson K, Stenflo J, Ownby J-P, Bouchet P, Olivera BM (2007) Venomous auger snail Hastula (Impages) hectica (Linnaeus, 1758): molecular phylogeny, foregut anatomy and comparative toxinology. J Exp Zool 308B:744–756 Johnson S, Johnson J, Jazwinski S (1995) Parasitism of sleeping fish by gastropod mollusks in the Colubrariidae and Marginellidae at Kwajalein, Marshall Islands. The Festivus 27(11):121–126 Kantor YI (1996) Phylogeny and relationships of Neogastropoda. In: Taylor J (ed) Origin and evolutionary radiation of the Mollusca. Oxford University Press, Oxford, pp 221–230 Kantor YI (2002) Morphological prerequisite for understanding neogastropod phylogeny. Boll Malacol Suppl 4:161–174 Kantor YI, Fedosov A (2009) Morphology and development of the valve of Leiblein: possible evidence for paraphyly of the Neogastropoda. The Nautilus 123(3):73–82 Kohn AJ (1956) Piscivorous gastropods of the genus Conus. Proc Natl Acad Sci USA 42:168–171 Kohn AJ (1959) The ecology of Conus Hawaii. Ecol Monogr 29:47–90 Kohn AJ (1968) Microhabitats, abundance and food of Conus (Gastropoda) on atoll reefs in the Maldive and Chagos islands. Ecology 49:1046–1062 Kohn AJ (1978) Ecological shift and release in an isolated reefs: the significance of prey size. Ecology 59:614–631 Kohn AJ, Nybakken JW (1975) Ecology of Conus on eastern Indian ocean fringing reefs: diversity of species and resource utilization. Mar Biol 29:211–234 Kohn AJ, Saunders PR, Wiener S (1960) Preliminary studies on the venom of the marine snail Conus. Ann N Y Acad Sci 90:706–725 Kosuge S (1986) Description of a new species of ecto-parasitic snail on fish. Bull Inst Malacol 2 (5):77 Leviten PJ (1980) The foraging strategy of vermivorous conid gastropods. Ecol Monogr 46:157–178 Marcus E, Marcus E (1959) Studies on Olividae. Bol Fac Fil Cieˆnc Let Univ S Paulo Zool 22:99–188 Marko PB, Vermeij GJ (1999) Molecular phylogenetics and the evolution of labral spines among eastern pacific ocenebrine gastropods. Mol Phylogenet Evol 13(2):275–288 Marsh M (1971) The foregut glands of some vermivorous cone shells. Aust J Zool 19:313–326 Martoja M (1964) Contribution a l’e´tude de l’appareil digestif et la digestion chez les gaste´ropodes carnivores de la famille Nassaride´s. Cell 64:237–334 Martoja M (1971) Donne´es histologiques sur les glandes salivaires et oesophagiennes de Thais lapillus (L.) (¼ Nucella lapillus. Prosobranche Ne´ogastropode) Arch Zool Exp Gen 112:249–291 McGraw KA, Gunter G (1972) Observations on killing of the Virginia oyster by the Gulf oyster borer, Thais haemastoma, with evidence for a paralytic secretion. Proc Natl Shellfish Assoc 62:95–97

267

268

M.V. Modica and M. Holford

Miljanich GP (2004) Ziconotide: neuronal calcium channel blocker for treating severe chronic pain. Curr Med Chem 11:3029–3040 Millar JG, Dey A (1987) Food poisoning due to the consumption of red whelks Neptunea antiqua. Comm Dis Scotl Wkly Rep 21(38):5–6 Minniti F (1986) Morphological and histochemical study of pharynx of Leiblein, salivary glands and gland of Leiblein in the carnivorous Gastropoda Amyclina tinei Maravigna and Cyclope neritea Lamarck (Nassariidae: Prosobranchia Stenoglossa). Zool Anz 217:14–22 Modica MV, Kosyan A, Oliverio M (2009) The relationships of the enigmatic gastropod Tritonoharpa: new data on early neogastropod evolution? The Nautilus 123(3):177–188 Morton B, Chan K (1997) The first report of shell-boring predation by a representative of the Nassariidae (Gastropoda). J Moll Stud 63:480–482 Naegel LCA, Aguilar-Cruz CA (2006) The hypobranchial gland from the purple snail Plicopurpura pansa (Gould, 1853) (Prosobranchia, Muricidae). J Shellfish Res 25(2):391–394 Nascimento DG, Rates B, Santos DM, Verano-Braga T, Barbosa-Silva A, Dutra AAA, Biondi I, Martin-Euclaire MF, De Lima ME, Pimenta AMC (2006) Moving pieces in a taxonomic puzzle: venom 2D-LC/MS and data clustering analyses to infer phylogenetic relationships in some scorpions from the Buthidae family (Scorpiones). Toxicon 47:628–639 Nielsen C (1975) Observations on Buccinum undatum L. attacking bivalves and on prey responses, with a short review on attacking methods of other prosobranchs. Ophelia 13:87–108 Norton RS, Olivera BM (2006) Conotoxins down under. Toxicon 48:780–798 O’Sullivan JB, McConnaughey RR, Huber ME (1987) A blood-sucking snail: the Cooper’s nutmeg Cancellaria cooperi Gabb, parasitizes the California electric ray, Torpedo californica Ayres. Biol Bull 172:362–366 Ohno S (1970) Evolution by gene duplication. Springer, Berlin Olivera BM (2002) Conus venom peptides: Reflections from the biology of clades and species. Annu Rev Ecol Syst 33:25–47 Olivera BM (2006) Conus peptides: biodiversity-based discovery and exogenomics. J Biol Chem 281:31173–31177 Olivera BM, Teichert RW (2007) Diversity of the neurotoxic Conus peptides: a model for concerted pharmacological discovery. Mol Interv 7(5):253–262 Olivera BM, Rivier J, Clark C, Ramilo CA, Corpuz GP, Abogadie FC, Mena EE, Woodward SR, Hillyard DR, Cruz LJ (1990) Diversity of Conus neuropeptides. Science 249:257–263 Oliverio M, Modica MV (2009) Relationships of the haematophagous marine snail Colubraria (Rachiglossa, Colubrariidae), within the neogastropod phylogenetic framework. Zool J Linn Soc. 158:779–800 Oliverio M, Barco A, Modica MV, Richter A, Mariottini P (2008) Ecological barcoding of corallivory by ITS2 sequences: hosts of coralliophiline gastropods detected by the cnidarian DNA in their stomach. Mol Ecol Resour 9(1):94–103 Palmer AR (1990) Effect of crab effluent and scent of damaged conspecifics on feeding, growth, and shell morphology of the Atlantic dogwhelk, Nucella lapillus (L.). Hydrobiologia 193:155–182 Peterson CH, Black R (1995) Drilling by buccinid gastropods of the genus Cominella in Australia. The Veliger 38:37–42 Petit RE, Harasewych MG (1986) New Philippine Cancellariidae (Gastropoda: Cancellariacea), with notes on the fine structure and function of the nematoglossan radula. The Veliger 28(4):436–443 Ponder WF (1970) The morphology of Alcithoe arabica (Mollusca: Volutidae). Malacol Rev 3:127–165 Ponder WF (1972) The morphology of some mitriform gastropods with special reference to their alimentary and reproductive system (Neogastropoda). Malacologia 11(2):295–342 Ponder WF (1973) The origin and evolution of the Neogastropoda. Malacologia 12:295–338 Ponder WF (1998a) Infraorder Neogastropoda. In: Beesley PL, Ross JGB, Wells A (eds) Mollusca: the Southern synthesis. Fauna of Australia, vol 5. CSIRO Publishing, Melbourne, p 819 part B

15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails

269

Ponder WF (1998b) Family Costellariidae. In: Beesley PL, Ross JGB, Wells A (eds) Mollusca: the Southern synthesis. Fauna of Australia, vol 5. CSIRO Publishing, Melbourne, pp 843–845, part B Ponder WF, Lindberg DR (1996) Gastropod phylogeny – challenges for the 90s. In: Taylor J (ed) Origin and evolutionary radiation of the Mollusca. Oxford University Press, London, pp 135–154 Ponder WF, Lindberg DR (1997) Towards a phylogeny of gastropod molluscs: an analysis using morphological characters. Zool J Linn Soc 119:83–265 Ponder WF, Taylor JD (1992) Predatory shell drilling by two species of Austroginella (Gastropoda: Marginellidae). J Zool 228:317–328 Power AJ, Keegan BF, Nolan K (2002) The seasonality and role of the neurotoxin tetramine in the salivary glands of the red whelk Neptunea antiqua L. Toxicon 40:419–425 Puillandre N, Samadi S, Boisselier M-C, Sysoev AV, Kantor YI, Cruaud C, Couloux A, Bouchet P (2008) Starting to unravel the toxoglossan knot: molecular phylogeny of the “turrids” (Neogastropoda: Conoidea). Mol Phylogenet Evol 47:1122–1134 Radwin GE, D’Attilio A (1976) Murex shells of the world. Stanford University Press, Stanford Reid TMS, Gould IM, Mackie IM, Ritchie AH, Hobbs G (1988) Food poisoning due to the consumption of red whelks Neptunea antiqua. Epidemiol Infect 101:419 Remigio EA, Duda TFJ (2008) Evolution of ecological specialization and venom of a predatory marine gastropod. Mol Ecol 17:1156–1162 Richter A, Luque AA (2002) Current knowledge on Coralliophilidae (Gastropoda) and phylogenetic implication of anatomical and reproductive characters. Boll Malacol 38:5–19 Robertson R (1970) Review of the predators and parasites of stony corals, with special reference to symbiotic prosobranch gastropods. Pac Sci 24:43–54 Romeo C, Di Francesco L, Oliverio M, Palazzo P, Raybaudi Massilia G, Ascenzi P, Polticelli F, Schinina` ME (2008) Conus ventricosus venom peptides profiling by HPLC-MS: a new insight in the intraspecific variation. J Sep Sci 31:488–498 Roseghini M, Severini C, Falconieri Erspamer G, Erspamer V (1996) Choline esters and biogenic amines in the hypobranchial gland of 55 molluscan species of the neogastropod Muricoidea superfamily. Toxicon 34(1):33–55 Saitoh H, Oikawa K, Takano T, Kamimura K (1983) Determination of tetramethylammonium ion in shellfish by ion chromatography. J Chromatogr 281:397 Shiomi K, Mizukami M, Shimakura K, Nagashima Y (1994) Toxins in the salivary gland of some marine carnivorous gastropods. Comp Biochem Physiol 107B:427–432 Smith EH (1967) The neogastropod midgut, with notes on the digestive diverticula and intestine. Trans R Soc Edinburgh 67:23–42 Strong EE (2003) Refining molluscan characters: morphology, character coding and a phylogeny of the Caenogastropoda. Zool J Linn Soc 137:447–554 Taylor JD (1976) Habitats, abundance and diets of muricacean gastropods at Aldabra Atoll. Zool J Linn Soc 59:155–193 Taylor JD (1978) Habitats and diet of predatory gastropods at Addu Atoll, Maldives. J Exp Mar Biol Ecol 31:83–103 Taylor JD, Morris NJ (1988) Relationships of neogastropoda. Malacol Rev 4:167–179 Taylor JD, Morris NJ, Taylor CN (1980) Food specialization and the evolution of predatory prosobranch gastropods. Palaentology 23(2):375–409 Taylor JD, Kantor YI, Sysoev AV (1993) Foregut anatomy, feeding mechanisms, relationships and classification of the Conoidea (¼Toxoglossa) (Gastropoda). Bull Br Mus Nat Hist 59:125–170 Terlau H, Olivera BM (2004) Conus venoms: a rich source of novel ion channel-targeted peptides. Pysiol Rev 84:41–68 Twede VD, Miljanich GP, Olivera BM, Bulaj G (2009) Neuroprotective and cardioprotective conopeptides: an emerging class of drug leads. Curr Opin Drug Discov Dev 12:231–239

270

M.V. Modica and M. Holford

Ward J (1965) The digestive tract and its relation to feeding habits in the stenoglossan prosobranch Coralliophila abbreviata (Lamarck). Can J Zool 43:447–464 Watkins M, Hillyard DR, Olivera BM (2006) Genes expressed in a turrid venom duct: divergence and similarity to conotoxins. J Mol Evol 62:247–256 Watson-Wright WM, Sims GG, Smyth C, Gillis M, Maher M, Trottier T, Van Sinclair DE, Gilgan M (1992) Identification of tetramine as toxin causing food poisoning in Atlantic Canada following consumption of whelks Neptunea decemcostata. In: Gopalakrishnakone P, Tan CK (eds) Recent advances in toxinology research, vol 2. University of Singapore, Singapore, pp 551–561 Wells HW (1958) Feeding habits of Murex fulvescens. Ecology 39:556–558 West DJ, Andrews EB, Bowman D, McVean AR, Thorndyke MC (1996) Toxins from some poisonous and venomous marine snails. Comp Biochem Physiol 113C:l–10 Wu SK (1965) Comparative functional studies of the digestive system of the muricid gastropods Drupa ricina and Morula granulata. Malacologia 3:211–233

Chapter 16

Antennal Hammers: Echos of Sensillae Past Nina Laurenne and Donald L.J. Quicke

Abstract Many hosts of parasitoids live in concealed environments such as within plants tissue and wood, and therefore they are difficult to find. This is likely to be especially true when concealed hosts are in the pupal stage and thereby silent and immobile. Cryptine ichneumonids collectively have a wide host range including members of several insect orders with different degrees of concealment. Many cryptine genera show a morphological adaptation to finding concealed hosts; their antennal tips are modified into a hammer-like structures that are used to tap the substrate. This vibrational sounding (¼echolocation though solid media) is typical to the tribe Cryptini and it has multiple origins within the subfamily. We show that vibrational sounding is associated with antennal modification and the usage of wood-boring buprestid and cerambycid beetles, and suggest, based on an apparent transition series, that the hammers are derived from mechano-sensilla within the Cryptinae.

16.1

Introduction

The Ichneumonidae is one of the largest insect families with more than 20,000 described species (Yu et al. 2005), though, according to Gaston and Gauld (1993), the real number of species may reach more than half of a million. Ichneumonid wasps are cosmopolitan and whereas most species are parasitoids of other insects N. Laurenne Museum of Natural History, Entomology Division, University of Helsinki, P.O. Box 17, (P. Arkadiankatu 13), 00014 Helsinki, Finland e-mail: [email protected] D.L.J. Quicke Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, Berkshire SL5 7PY, UK Department of Entomology, Natural History Museum, London SW7 5BD, UK

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_16, # Springer-Verlag Berlin Heidelberg 2010

271

272

N. Laurenne and D.L.J. Quicke

and in some cases spiders, their way of life varies remarkably. Unlike simple parasites, parasitoids always kill their hosts that are typically larvae or pupae of various Lepidoptera, Coleoptera and Diptera. Parasitoid life history strategies are commonly divided into two classes, the koinobionts and idiobionts (Askew and Shaw 1986; Godfray 1994; Quicke 1997). These two life strategies differ from each other considerably, but the defining difference between them is that idiobionts do not permit their host to carry on developing after parasitisation. In those cases in which the host is a larval stage, it is typically paralysed by the female wasp’s venom. In contrast to idiobionts, the hosts of koinobionts are allowed to continue their development after parasitisation until they reach a suitable stage to be consumed by parasitoid larvae. Several other features are associated with these life strategies, for example, koinobionts are most usually endoparasitoids with relatively narrow host ranges as they have to be able to adapt to the host’s immunological defenses. Idiobionts are typically ectoparasitoids and generalists with a wide host range, though those that attack pupal hosts are usually endoparasitoids. For idiobionts, host range is often largely determined by the potential hosts that are encountered. The hosts of koinobionts are often exposed or very little concealed. Paralysed hosts would be very prone to predation if they were exposed, and therefore hosts of idiobionts tend to live in concealed conditions (i.e. leaf-rolls or leaf mines, plant stems, under bark or inside wood). The trait of exploiting concealed hosts is regarded as the ancestral state in the Ichneumonidae and transitions from idiobiosis to koinobionsis appear to have happened multiple times within the family (Belshaw et al. 1998; Whitfield 1998).

16.1.1

Host Location

According to Vinson (1988), host location consists of several stages beginning with finding a suitable habitat. Then, a parasitoid must locate a potential host therein, followed by examining it for suitability (species and the developmental stage) and finally, oviposition. Parasitoids use many modalities in host location; scent, vision, sound and vibration are involved (Wertheim et al. 2003; Fischer et al. 2004; Fatouros et al. 2005). Wasps lead to a host by several cues, for example, parasitoids can recognise shape, colour and a movement of a host (Fischer et al. 2001, 2004). Volatile chemicals from host frass and damaged plant material are shown to be attractive to parasitoids (Gohole et al. 2003; Bukovinszky et al. 2005). Some species have even evolved to detect host sex pheromones and other kairomones and use them in host searching (Wertheim et al. 2003; Jumean et al. 2005). In general, multiple cues are involved in host-searching process and their efficiency is affected by environmental factors, such as temperature (Fischer et al. 2001; Kro¨der et al. 2007a, b). The female ovipositor and antennae are both important for host examination and acceptance as they have various sensillae for detecting the suitability of a potential

16

Antennal Hammers: Echos of Sensillae Past

273

host (Mackauer et al. 1996; Ignacimuthu and Dorn 2000; Isidoro et al. 2001; Romani et al. 2002). Many mobile hosts of ichneumonid wasps live in concealed places such as within wood. Such host larvae cause vibration when they chew wood and move, and some parasitoid groups have evolved an ability to detecting these hostgenerated vibrations. However, not all potential concealed hosts create their own vibrations, e.g. pupal and prepupal stages or larva shortly about to moult. To locate these, some parasitic wasps have evolved an active, vibration-based, method called vibrational sounding. This form of echolocation occurs in one non-apocritan group, the Orussidae which have highly modified antennae and massively enlarged subgenual (hearing) organs in the forelegs (Vilhelmsen et al. 2001). Females tap with their antenna the substrate and detect the echoes with their subgenual organs. This idea was originally suggested by Cooper (1953) and later Powell and Turner (1975) made similar observations of female behaviour supporting Cooper’s conjecture. Use of vibrational sounding as a means of host location has also evolved on a number of separate occasions within the Ichneumonidae. Amongst the parasitic apocritan wasps vibrational sounding has been most thoroughly investigated in the pimpline ichneumonid genus Pimpla and relatives (Henaut and Guerdoux 1982; Henaut 1990; Meyho¨fer and Casas 1999; Fischer et al. 2001, 2003). The success of echolocation is dependent of several factors, and Kro¨der et al. (2006, 2007b) have shown it to be more efficient in warmer conditions and the role of vision to be more important in cooler conditions. Parasitoids can adjust the intensity of echolocation according to the temperature which shows adaptation to environmental conditions in temperate regions. The ability to adjust to the microhabitat and its varying environmental factors involves a complicated interaction. According to Otten et al. (2001), females with larger size are better in finding concealed hosts in comparison with smaller ones: a larger body mass is capable of transmitting vibration better than smaller one. Apart from in the pimplines, females of a number of other ichneumonid genera are hypothesised to use vibrational sounding based on their morphology: with antennal tips modified into a hammer-like structures suitable for “hammering” the substrate and enlarged subgenual organs in their fore tibiae for detecting substrate-borne vibrations (Broad and Quicke 2000). Additionally, the antennal pegs of female Xorides (Xoridinae) are solid (Quicke unpublished observations) and therefore likely to act as antennal hammers. The largest subfamily of Ichneumonidae is the Cryptinae with 4,659 species belonging to 394 genera (Yu et al. 2005). The cryptines are appropriate model group as the vibrational sounding has multiple origins and losses and there is a detailed molecular phylogenetic analysis (Laurenne et al. 2006). We tested the association between the occurrence of hammer-like terminal antennal segments within the Cryptinae and the explotation of wood-boring buprestids and cerambycids within a comparative phylogenetic framework. Traditionally, the Cryptini has been divided into three tribes: Cryptini, Phygadeuontini and Hemigasterini, and molecular studies largely support this classification

274

N. Laurenne and D.L.J. Quicke

(Laurenne et al. 2006; Quicke et al. 2009). Most cryptines are idiobiont ectoparasitoids and their hosts usually belong the largest insect orders (Coleoptera, Lepidoptera, Hymenoptera and Diptera), but spider egg predation occurs in some cryptine genera, and a few other insect orders are occasionally attacked. Despite their host groups covering several orders as a whole, individual cryptine species can be quite host specific or have a narrow host range (Askew and Shaw 1986; Gauld 1988; Schwarz and Shaw 1998, 2000).

16.2

Material and Methods

We examined the terminal antennal flagellomeres of species representing 122 genera of the subfamily Cryptinae, six of Ichneumoninae and one species each of the Alomyinae, Eucerotinae and Pedunculinae. Scanning electron microscopy (SEM) was used for the vast majority, though light microscope was occasionally relied upon for larger sized specimens of some groups. For males we included 32 genera (26 cryptines, 2 hemigasterines and 4 phygadeuontines). Female antennal tips were classified into five categories according to the degree of modification from unmodified antennae with a tapered tip to ones forming a large flat surface. The intermediate stages show structures of individual setae becoming thicker and forming a cluster (Laurenne et al. 2009).

16.2.1

Comparative Analysis

Comparative analysis (CAIC) was carried out to test the statistical significance of association between antennal modification and the use of wood-boring beetles (buprestids and cerambycids) (Purvis and Rambaut 1995). The degree of antennal modification was treated as a continuous variable and the coleopteran hosts were treated as a categorical variable. Evolutionary rate was assumed to be the same for each taxon. The trees used in the comparative analysis were based on Laurenne et al.’s (2006) molecular study of cryptine phylogeny based on the length-variable D2 (þD3) variable region of the nuclear 28S rDNA gene, but taxa without the host record information were pruned from the tree as missing values are not allowed in CAIC. Two cryptine genera (Mallochia and Schreineria) with host records were added into the tree and, in the absence of molecular data, their placements were based on Townes’s (1969) classification. To avoid biased results, the comparative analyses were carried out using five different gap cost ratios and with two different alignment methods (POY and Clustal W þ PAUP*). Details of the methods are described in Laurenne et al. (2009).

16

Antennal Hammers: Echos of Sensillae Past

16.3

275

Results

The percentages of the degree of antennal modification are shown in Fig. 16.1. Figure 16.2 presents the occurrence of antennal development on a phylogenetic tree. Figures 16.3 and 16.4 show the transformation series from a simple antennal tip with no especially modified sensilla to a large united structure with a virtually uniform surface. Surculus (Fig. 16.3a) displays a simple antennal tip without obvious modification. Figure 16.3b,c shows thickening of some apical setae in the genera Latibulus (Fig. 16.3b) and Hidryta (Fig.16.3c). Setae are modified into truncate structures forming a cluster in genera Camera (Fig.16.3d) and in Cryptanura (Fig.16.3f) modified structures have started to fuse in the middle. In Fig. 16.4, fused structures form a more or less flat surface in females of Acrorichnus (Fig. 16.4a), and Buathra (Fig. 16.4b) shows a smooth face of modified and fused structures. The antennal tip of Osprynchotus (Fig. 16.4c) forms a large uniform flat surface, a truly hammer-like antenna. Some genera have different types of specialisation of the antennal tip, for example, Meringopus (Fig. 16.4d) has thickened “setae” originating from sockets inside the antennal surface. Terminal antennal structures of cryptines are often sexually dimorphic characters as males typically do not display any particular antennal modification. However, some specialisations do occur in males of a few genera. For example, males of Gabunia (Fig. 16.4e) have two peg-like structures on their antennal tip and those of Eurycryptus have one smaller structure (Fig. 16.4f).

Fig. 16.1 The precentage of occurrence of each degree of antennal hammer development in each tribe of Cryptinae

276

N. Laurenne and D.L.J. Quicke

Fig. 16.2 The phylogeny of cryptine waps (Laurenne et al. 2009). The black circles indicate attacking buprestid/cerambycid beetles and having strongly modified antennae (category 4–5). Grey circles indicate the occurrence of slightly modified antenna (categories 1–3)

16

Antennal Hammers: Echos of Sensillae Past

277

a

b

c

d

e

f

Fig. 16.3 Female antennal tips showing antennal modification. (a) Surculus, not modified. (b and c) Some thickened setae on a tip – (b) Latibulus and – (c) Hidryta. (d) Diapetimorpha, thickened structures form a cluster. (e) Camera, dense cluster of truncate structures form a patch. (f) Cryptanura, a cluster of short apically flattened structures with a fusion in the middle

The CAIC analysis showed a significant association between the degree of antennal development and the usage of wood-boring buprestid and cerambycid beetles in the Cryptini. Thirteen genera of the tribe Cryptini exploit wood-boring beetle larvae and have modified antennae. Within the Phygadeuontini, only five genera have this association. p-Values showed a significant association (0.0080–0.0397) in all analysis except with the alignment obtained with the highest gap:substitution cost (4:1, p-value ¼ 0.0707). Detailed results are presented in the Laurenne et al. (2009).

16.4

Discussion

Possession of an antennal hammer is a clearly homoplastic character at an higher level as it is found also in other ichneumonid subfamilies (Labeninae, Xoridinae, Claseinae and Pimplinae) (Broad and Quicke 2000) as well as in the Orussidae (Cooper 1953; Broad and Quicke 2000; Vilhelmsen et al. 2001). This structure is

278

N. Laurenne and D.L.J. Quicke

a

b

c

d

e

f

Fig. 16.4 Antennal tips of female and males. (a) Acrorichnus female, apical structures form a clear patch. (b) Buathra female, structures form a smooth patch. (c) Osprychotus female, a hammer-like antennal tip. (d) Meringopus female, thickened antennal setae originating from deep sockets. (e) Gabunia male, two pegs on antennal tips, (f) Eurycryptus, one antennal peg

associated with deeply concealed cerambycid and buprestid beetle hosts and we have shown by comparative analysis that it is also highly homoplastic within the single but large subfamily Cryptinae. Behavioural observations of Echthrus and of a Gabunia sp. (Quicke et al. 2003) support the hypothesis that antennal hammers in the Cryptini are associated with host searching. In 2004, we video recorded the host-searching behaviour of a female Echthrus reluctator on a pile of pine logs in Hungary (Quicke 2001). The wasp walked along the log tapping the substrate with the antennae repeatedly sweeping symmetrically in inwardly directed arcs. Similar behaviour was also observed in an unidentified Afrotropical species of Gabunia (tribe Cryptini) in Kibale Forest National Park in Uganda.

16.4.1

Hosts of Cryptine Wasps

Most cryptine wasps are ectoparasitoids and they do not need to adapt to host’s immunological defense. This may explain why some genera attack hosts from

16

Antennal Hammers: Echos of Sensillae Past

279

several insect orders. The essential ability in host usage might be to find concealed hosts of suitable sized.

16.4.1.1

Hosts of the Phygadeuontini

Species of the tribe Phygadeuontini typically parasitise exposed or weakly concealed hosts and this is considered to be a ground-plan biology for the Cryptinae (Gokhman 1996). The comparative analysis using the phylogeny (Laurenne et al. 2006) shows that modified antennal tips have multiple origins within the Phygadeuontini and host range covers several insect orders. Antennal modification was found in three genera, all of which attack wood-boring beetles (Fig. 16.4).

16.4.1.2

Hosts of the Cryptini

In the tribe Cryptini, all the taxa that exploit wood-boring beetles have antennal hammers. This is probably the ground-plan for the tribe. Strongly modified antennal structures are also found in genera that attack other insect groups such as aculeate Hymenoptera larvae in their nests. Parasitoids probably locate cells with suitable host using vibrational sounding. Aculeate larvae are probably largely silent and do not chew wood, though they move inside a cell when they need a feed by adults. Members of the genera Acroricnus, Eurycryptus, Messatoporus, Osprynchotus and Photocryptus exploit aculeate larvae (Genaro 1996) and they all have modified antennal tips. According to Gauld (1988) there may be a host shift from Coleoptera hosts to the young of nest-building aculeate Hymenoptera, but this is only a hypothesis and cannot be tested at present due to the lack of sufficient detailed host information for the vast majority of Cryptinae genera. Unlike most other subtribes, the Gabuniini form a well-supported monophyletic group (Laurenne et al. 2006) comprising 12 genera. Ten of these have strong antennal modifications and the four available host records indicate that these species exploit cerambycid or buprestid beetle hosts. The cylindrical body shape of gabuniines and their long ovipositors probably enable them to reach their hosts and are perhaps constrained by host boring shape (Townes and Townes 1962); the enlarged subgenual organs found in the forelegs of females are assumed to be for detecting echos during host location (Broad and Quicke 2000). Most of available host records concerning the cryptine wasps concern phygadeuontines, many of which attack rather weakly concealed hosts, especially ones in cocoons, or spider egg masses. The spider egg “parasitoids” attack exposed egg masses, and therefore, vibrational sounding probably has no role in locating them, and the antennal tips of the spider egg “parasitoids” examined are typically simple. Hyperparasitism of cocooned parasitoid hosts occurs more commonly in the Cryptini than in the Phygadeuontini, though there are numerous examples within the latter. Some genera have modified antennal tips, but that could possibly be explained by the adaptation to exploit other insect groups as well. Within the Cryptini, males of six out of the ten genera examined had either one or two terminal flagellomere pegs. The females of the same genera also had

280

N. Laurenne and D.L.J. Quicke

antennal modifications except for the case of Chrysocryptus. Structures of male terminal flagellomeres are probably not related to the echolocation role of female antennal hammers. Their co-occurrence suggests that there might be homologous genetic control in the tribe Cryptini. Whether, and in what way, they may be functional has yet to be determined. Field observations of mate-location and mating are sadly largely lacking. Considering the size of the subfamily, very few host records are available for cryptine genera, and when records exist, they are often vague. Records typically especially lack information about the host’s precise developmental stage. Field records are largely lacking, and the host-location behaviour is usually referred to as “antennation” without describing what part of the antennae is used. We hope that this paper will encourage more detailed observation and reporting in the future.

16.4.2

Postulated Derivation of Hammers from Sensilla

If the states shown in Fig. 16.3a–f represent various stages in the evolution of antennal hammers as seems likely, then the individual components of the hammer surface would appear to be derived from sensilla. The unmodified terminal flagellomere of Surculus has many thin curved sensilla chaetica, with a lower number of more erect obliquely ended chaetica (on right), and one visible blunt sensillum. In Latibulus (Fig. 16.3b), there are numerous blunt trichoid sensilla in relatively small sockets plus several longer more pointed chaetica in rather large sockets. In Fig. 16.3c, there is a similar grouping of socketed and less conspicuously socketed blunt sensilla but with their apices curving towards the antennal tip and interspersed with small trichoid sensilla. In Fig. 16.3d, the apical cluster comprises a dense central area of T-shaped pegs that lack sockets at least on the basal side though on the side of the antennal apex there appears to be a well-developed basal socket; these are surrounded by curved, socketed robust trichoid sensilla. Socketed trichoid sensilla are typically involved in mechanoreception. If, as the above suggests, the antennal hammers of cryptines, and possibly other ichneumonid wasps, are evolved from mechanoreceptory sensilla, it begs the question as to what the intermediate evolutionary stages did, and what substrates, the hosts during those intermediate phases occupied. Certainly more detailed behaviour, microscopic and ultrastructural observations of living representatives of apparent intermediate stages are needed.

References Askew RR, Shaw MR (1986) Parasitoid communities: their size, structure and development. In: Waage J, Greathead D (eds) Insect parasitoids. Academic, London, pp 225–264 Belshaw R, Fitton M, Herniou E, Gimeno C, Quicke DLJ (1998) A phylogenetic reconstruction of the Ichneumonoidea (Hymenoptera) based on the D2 variable region of 28S ribosomal RNA. Syst Entomol 23:109–123

16

Antennal Hammers: Echos of Sensillae Past

281

Broad GR, Quicke DLJ (2000) The adaptive significance of host location by vibrational sounding in parasitoid wasps. Proc R Soc Lond B Biol 267:2403–2409 Bukovinszky T, Gols R, Posthumus MA, Vet LEM, van Lenteren JC (2005) Variation in plant volatiles and attraction of the parasitoid Diadegma semiclausum (Hellen). J Chem Ecol 31:461–480 Cooper KW (1953) Egg gigantism, oviposition, and genital anatomy: their bearing on the biology and phylogenetic position of Orussus (Hymenoptera: Siricoidea). Proc R Acad Sci 10:38–68 Fatouros NE, Huigens ME, van Loon JJA, Dicke M, Hilker M (2005) Butterfly antiaphrodisiac lures parasitic wasps. Nature 433:704 Fischer S, Samietz J, W€ackers FL, Dorn S (2001) Interaction of vibrational and visual cues in parasitoid host location. J Comp Physiol A 187:785–791 Fischer S, Samietz J, Dorn S (2003) Efficiency of vibrational sounding in parasitoid host location depends on substrate density. J Comp Physiol A 189:723–730 Fischer S, Samietz J, W€ackers FL, Dorn S (2004) Perception of chromatic cues during host location by the pupal parasitoid Pimpla turionellae (L.) (Hymenoptera: Ichneumonidae). Environ Entomol 33:81–87 Gaston KJ, Gauld ID (1993) How many species of pimplines (Hymenoptera: Ichneumonidae) are there in Costa Rica? J Trop Ecol 9:491–499 Gauld ID (1988) Evolutionary patterns of host utilization by ichneumonoid parasitoids hymenoptera Ichneumonidae and Braconidae. Biol J Linn Soc 35:351–378 Genaro JA (1996) Nest parasites (Coleoptera, Diptera, Hymenoptera) of some wasps and bees (Vespidae, Sphecidae, Colletidae, Megachilidae, Anthophoridae) in Cuba. Caribb J Sci 32:239–240 Gohole LS, Overholt WA, Khan ZR, Vet LEM (2003) Role of volatiles emitted by host and nonhost plants in the foraging behaviour of Dentichasmias busseolae, a pupal parasitoid of the spotted stemborer Chilo partellus. Entomol Exp Appl 107:1–9 Godfray HCJ (1994) Parasitoids: behavioral and evolutionary ecology. Princeton University Press, Princeton, NJ Gokhman VE (1996) Trends of biological evolution in the subfamily Ichneumoninae and related groups (Hymenoptera Ichneumonidae): an attempt of phylogenetic reconstruction. Russ Entomol J 4:91–103 Henaut A, Guerdoux J (1982) Location of a lure by the drumming insect Pimpla instigator (Hymenoptera, Ichneumonidae). Experientia 38:346–347 Henaut A (1990) Study of the sound produced by Pimpla instigator (Hymenoptera, Ichneumonidae) during host selection. Entomophaga 35:127–139 Ignacimuthu S, Dorn S (2000) Mechano- and chemoreceptors and their possible role in host location behaviour of parasitoid Anisopteromalus calandrae Howard (Hymenoptera: Pteromalidae). Entomon 25:179–184 Isidoro N, Romani R, Bin F (2001) Antennal multiporous sensilla: their gustatory features for host recognition in female parasitic wasps (Insecta, Hymenoptera: Platygastroidea). Microsc Res Tech 55:350–358 Jumean Z, Unruh T, Gries R, Gries G (2005) Mastrus ridibundus parasitoids eavesdrop on cocoonspinning codling moth, Cydia pomonella, larvae. Naturwissenschaften 92:20–25 Kro¨der S, Samietz J, Dorn S (2006) Effect of ambient temperature on mechanosensory host location in two parasitic wasps of different climatic origin. Physiol Entomol 31:299–305 Kro¨der S, Samietz J, Dorn S (2007a) Temperature affects interaction of visual and vibrational cues in parasitoid host location. J Comp Physiol 193:223–231 Kro¨der S, Samietz J, Schneider D, Dorn S (2007b) Adjustment of vibratory signals to ambient temperature in a host-searching parasitoid. Physiol Entomol 32:105–112 Laurenne NM, Broad GR, Quicke DLJ (2006) Direct optimization and multiple alignment of 28S D2–D3 rDNA sequences: problems with indels on the way to a molecular phylogeny of the cryptine ichneumon wasps (Insecta: Hymenoptera). Cladistics 22:442–473

282

N. Laurenne and D.L.J. Quicke

Laurenne NM, Karatolos N, Quicke DLJ (2009) Hammering homoplasy: multiple gains and losses of vibrational sounding in cryptine wasps (Insecta: Hymenoptera: Ichneumonidae). Biol J Linn Soc 96:82–102 Meyho¨fer R, Casas J (1999) Vibratory stimuli in host location by parasitic wasps. J Insect Physiol 45:967–971 Mackauer M, Michaud JP, Volkl W (1996) Host choice by aphidiid parasitoids (Hymenoptera: Aphidiidae): host recognition, host quality, and host value. Can Entomol 128:959–980 Otten H, W€ackers F, Battini M, Dorn S (2001) Efficiency of vibrational sounding in the parasitoid Pimpla turionellae is affected by female size. Anim Behav 61:671–677 Powell JA, Turner WJ (1975) Observations on oviposition behaviour and host selection in Orussus occidentalis (Hymenoptera: Siricoidea). J Kans Entomol Soc 48:299–307 Purvis A, Rambaut A (1995) Comparative analysis by independent contrasts (CAIC): an Apple Macintosh application for analysing comparative data. Comput Appl Biosci 11:247–251 Quicke DLJ (1997) Parasitic wasps. Chapman & Hall, London, New York Quicke DLJ (2001) Movie of host searching Echthrus. http://www.imperial.ac.uk/imedia/vid/fons/ biology/quicke//Echthrus.mp4. Accessed 7 Dec 2009 Quicke DLJ, Laurenne NM, Broad GR, Barclay MVL (2003) Host location behaviour and a new host record for Gabunia aff. togoensis Krieger (Hymenoptera: Ichneumonidae: Cryptinae) in Kibale Forest National Park, West Uganda. Afr Entomol 11:308–310 Quicke DLJ, Laurenne NM, Fitton MG, Broad GR (2009) A thousand and one wasps: a 28S rDNA and morphological phylogeny of the Ichneumonidae (Insecta: Hymenoptera) with an investigation into alignment parameter space and elision. J Nat Hist 43:1305–1421 Romani R, Isidoro N, Bin F, Vinson SB (2002) Host recognition in the pupal parasitoid Trichopria drosophilae: a morpho-functional approach. Entomol Exp Appl 105:119–128 Schwarz M, Shaw MR (1998) Western Palaearctic Cryptinae (Hymenoptera: Ichneumonidae) in the National Museums of Scotland, with nomenclatural changes, taxonomic notes, rearing records and special reference to the British check list. Part 1. Tribe Cryptini. Entomologist’s Gaz 49:101–127 Schwarz M, Shaw MR (2000) Western Palaearctic Cryptinae (Hymenoptera: Ichneumonidae) in the National Museums of Scotland, with nomenclatural changes, taxonomic notes, rearing records and special reference to the British check list. Part 3. Tribe Phygadeuontini, subtribes Chiroticina, Acrolytina, Hemitelina and Gelina (excluding Gelis), with descriptions of new species. Entomologist’s Gaz 51:147–186 Townes H (1969) The genera of Ichneumonidae, part 1. Mem Am Entomol Inst 11:1–300 Townes H, Townes M (1962) Ichneumon-flies of America north of Mexico: 3. Subfamily Gelinae, tribe Mesostenini. United States National Museum Bulletin 216:1–602 Vinson SB (1988) Comparison of host characteristics that elicit host recognition behavior of parasitoid Hymenoptera. In: Gupta VK (ed) Advances in parasitic Hymenoptera research: proceedings of the II conference on the taxonomy and biology of parasitic Hymenoptera. E. J. Brill, Leiden, pp 285–291 Vilhelmsen L, Isidoro N, Romani R, Basibuyuk HH, Quicke DLJ (2001) Host location and oviposition in a basal group of parasitic wasps: the subgenual organ, ovipositor apparatus and associated structures in the Orussidae (Hymenoptera, Insecta). Zoomorphology 121:63–84 Wertheim B, Vet LEM, Dicke M (2003) Increased risk of parasitism as ecological costs of using aggregation pheromones: laboratory and field study of Drosophila–Leptopilina interaction. Oikos 100:269–282 Whitfield JB (1998) Phylogeny and evolution of host–parasitoid interactions in Hymenoptera. Ann Rev Entomol 43:129–151 Yu D, van Achtenberg K, Horstmann K (2005) World Ichneumonoidea 2004. Taxonomy, biology, morphology and distribution. CD/DVD, Taxapad, Vancouver, Canada

Chapter 17

Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics and Evolutionary Patterns in Behavior and Morphology Burton K. Lim

Abstract A phylogenetic analysis of loci from the four genetic transmission pathways in mammals (mitochondrial, autosomal, X, and Y sex chromosomes) was used to investigate the evolution of bats in the pantropically distributed family Emballonuridae. The nuclear data sets support a monophyletic clade of species found in the New World. Character optimization of distributional areas suggests that the most recent common ancestor colonized South America from Africa. Molecular dating with fossil calibrations estimated that a basal split occurred approximately 27 million years ago followed by primary intergeneric diversification 19.4–18.0 million years ago. An analysis of historical biogeography identified the northern Amazon as the ancestral area where there was speciation by taxon pulses from a stable core area in the Guiana Shield. Range contractions followed by expansions during the Early Miocene suggest an adaptive radiation in cluttered forest and open savannah habitats. A correlation of ear morphology, echolocation, and foraging behavior indicates a phylogenetic basis for these complex character systems.

17.1

Introduction

South America was an insular continent from the Late Cretaceous to the Early Pliocene but nevertheless, it has high levels of biodiversity for many groups of organisms compared with other parts of the world. For example, bats account for 20% of the mammalian faunal diversity (Wilson and Reeder 2005) and are unique in being the only order of mammals that can fly. This gives bats an advantage for over-water dispersal but there have been no studies investigating the evolutionary B.K. Lim Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario M5S 2C6, Canada e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_17, # Springer-Verlag Berlin Heidelberg 2010

283

284

B.K. Lim

mechanisms for the successful radiation of bats, especially in the rainforests of the Amazon. As with most taxa, this has been hindered by a lack of comprehensive species-level phylogenies, a dearth of fossils in the paleontological record, and a paucity of ecological data. Herein, I synthesize data on New World emballonurid bats in the tribe Diclidurini as one of the first detailed studies of an adaptive radiation of mammals in the Neotropics. I begin by giving general background information on the biology of the family Emballonuridae. The primary objective of this study is to hypothesize the processes involved in the biotic diversification in New World emballonurid bats by inferring a robust phylogeny of New World emballonurid bats using a molecular phylogenetic approach, estimating times of divergence based on molecular dating with fossil calibration points, examining the historical biogeography with the incorporation of both temporal and spatial information, and investigating patterns of evolution in morphology and behavior as inferred from the phylogeny.

17.1.1

Emballonurid Bats

The family Emballonuridae is characterized by a tail that emerges mid-dorsally from the interfemoral membrane, which is the origin of its common name of sheathtailed bats. They are found pantropical in distribution, and the New World emballonurids occur from Mexico through Central America into South America to southeastern Brazil, including the off-shore islands of Trinidad, Tobago, and Grenada (Koopman 1994). Most species are uncommonly encountered in Neotropical rainforests using traditional methods of capture such as mesh mist nets set in the understory because they typically fly in or over the canopy. Consequently, New World emballonurid bats are typically poorly studied and incompletely sampled in terms of taxonomic and geographic coverage. However, this apparent rarity is associated with a sampling bias that may be partially corrected by supplemental surveying by novel methods such as flap trapping (Borissenko 1999; Lim 2009), acoustic monitoring (Jung et al. 2007), and systematically searching for roosts (Simmons and Voss 1998).

17.1.2

Taxonomy

There are 16 genera of emballonurid bats with 13 extant (eight in the New World and five in the Old World) and three extinct (all Old World) that are represented by 63 species with 52 extant (22 New World and 30 Old World) and 11 extinct (all Old World; McKenna and Bell 1997; Simmons 2005; Lim et al. 2010). Four previous phylogenies have been proposed for Emballonuridae including studies on cranial morphology (Barghoorn 1977), protein electrophoresis and immunology (Robbins and Sarich 1988), hyoid morphology (Griffiths and Smith 1991), and morphology

17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics

285

and behavior (Dunlop 1998). All of these studies were at the taxonomic rank of genus except for the species-level analysis of Dunlop (1998). However, the only taxonomic congruence among the topologies is the higher-level recognition of subfamilies (Emballonurinae and Taphozoinae). The lack of consensus in other parts of these trees was confounded by a combination of incomplete taxonomic sampling and poor resolution. A recent molecular phylogenetic analysis of DNA sequence variation supported this taxonomic classification (Lim et al. 2008). Although the New World emballonurid species were comprehensively surveyed, there were only exemplar samples of the two Old World tribes, which are still poorly represented by tissue collections.

17.2

Molecular Phylogenetic Analyses

The data set for New World emballonurid bats included 99 specimens representing all of the eight recognized genera and 21 of the 22 species (Simmons 2005; Lim et al. 2010). The only missing species is Saccopteryx antioquensis, which is endemic to the northern Andes of Colombia and known by only two specimens without tissue samples (Mun˜oz and Cuartas 2001). Outgroup taxa included nine specimens representing two genera of Old World emballonurids and four genera of other bat species (Lim et al. 2008). Loci from the four genomic components of mammalian transmission genetics were used to hypothesize the evolutionary history of New World emballonurid bats. Each of these genetic transmission pathways has different properties associated with effective population size, mutation rate, and recombination that should be conducive for recovering a robust estimate of phylogeny. The mitochondrial marker was the complete protein-coding gene cytochrome b (Cytb); the autosomal marker was intron 26 of the protein-coding gene Chd1 (found on chromosome 5 in humans); the Y sex chromosome marker was intron 7 of the protein-coding gene Dby; and the X sex chromosome marker was intron 18 of the protein-coding gene Usp9x (Lim et al. 2008). There were a total of 3,176 aligned basepairs (bp) including 1,140 bp of Cytb, 624 bp of Chd1, 750 bp of Dby, and 662 bp of Usp9x. The phylogenetic analyses of individual and combined nucleotide data sets incorporated both an explicit model of DNA evolution using a statistical Bayesian approach and a model-free methodology using a maximum parsimony approach as corroboration of topological robustness. Bayesian inference was implemented in the program MrBayes (Ronquist and Huelsenbeck 2003) and parsimony reconstruction was implemented in the program PAUP* (Swofford 2001) as outlined by Lim et al. (2008). Branch supports of the resultant trees were calculated by the posterior probability distribution in the Bayesian analysis and by 1,000 bootstrap replications in the parsimony analysis. The trees were compared for topological congruence using the Approximately Unbiased (AU) test (Shimodaira and Hasegawa 2001). Each data set was reciprocally constrained to the individual gene trees to determine if one was better than another.

286

17.2.1

B.K. Lim

Tree Topology

Parsimony and Bayesian analyses of each of the individual data sets gave congruent topologies with high bootstrap proportions and posterior probabilities for monophyletic clades representing the currently recognized genera and species of New World emballonurid bats (Fig. 17.1; Lim et al. 2008). However, the mitochondrial

Fig. 17.1 Phylogenetic tree from a Bayesian analysis of combined DNA sequences of three nuclear genes for New World emballonurid bats, tribe Diclidurini (Lim et al. 2008). The first number along the branch is the Bayesian posterior probability percentage, and the second number is the bootstrap percentage from a parsimony analysis. Numbers in parentheses are the corresponding branch-support values from a phylogenetic analysis after the removal of the outgroup taxon Nycteris javanicus, which was missing data for two of the genes. Intrageneric support values are the same for both analyses and branches with an asterisk (*) have 100% support. Peropteryx macrotis has two divergent populations from Central America (CA) and South America (SA)

17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics

287

gene had significantly faster rates of nucleotide substitution, higher levels of homoplasy, and a greater degree of saturation of transitions than any of the three nuclear genes. These factors contributed to the loss of phylogenetic signal at deeper branches of the cytochrome b tree including the monophyly of the New World emballonurids. In contrast, there was better resolution and branch support for the more slowly evolving nuclear introns. However, the intergeneric relationships within the two subtribes were poorly resolved and supported by only a few nucleotide changes. This suggests a hard polytomy resulting from a lack of phylogenetic signal in each of the different genetic transmission pathways because of rapid speciation as opposed to a soft polytomy due to conflicting phylogenetic signal. Based on topological congruence, linear accumulation of substitutions, and high consistency index, the three nuclear genes were combined to lessen the effects of random sequence errors among nucleotide sites and ensure the recovery of phylogenetic signal from a robust species tree. A monophyletic New World clade was recovered in the individual and combined nuclear data sets indicating a single origin of emballonurid bats in the Neotropics (Fig. 17.1). Similarly, there was a basal split in the New World tribe Diclidurini that was congruent and well supported in the nuclear trees.

17.3

Divergence Times

The combined nuclear data set for the tribe Diclidurini was used in a Bayesian relaxed clock approach to approximate the times of divergence (Thorne and Kishino 2002). Two fossil constraints were used as calibration points including a minimum age of 13 million years ago (mya) for the split of Cyttarops and Diclidurus based on the only pre-Pleistocene record of an extant New World emballonurid genus (Czaplewski 1997). The second constraint was a maximum age of 30 million years ago for the split of the Old and New World emballonurids based on a molecular dating analysis with fossil calibrations for all families of bats (Teeling et al. 2005). The basal split in the New World emballonurids occurred in the Late Oligocene approximately 27 million years ago and six of the eight currently recognized genera diversified relatively rapidly in the Early Miocene 19.4–18.0 million years ago, and most intrageneric differentiation (16 of 21 species) occurred before the Pliocene 5 million years ago (Fig. 17.2; Lim 2007).

17.4

Historical Biogeography

Character optimization (Farris 1970) of distributional areas onto the phylogeny for the superfamily Emballonuroidea indicates that the ancestor of New World emballonurid bats has its origins in Africa (Fig. 17.3; Lim 2007). This biogeographic scenario was previously suggested from phylogenetic studies of interfamilial relationships of

288

B.K. Lim

Fig. 17.2 Molecular dating based on a relaxed clock Bayesian analysis with fossil calibrations of New World emballonurid bats (Lim 2007). Nodes are labeled with divergence time estimates (millions of years ago) and standard deviations. Intergeneric and most intrageneric diversification occurred in the Miocene (shaded). Peropteryx macrotis has two divergent populations from Central America (CA) and South America (SA)

bats (Eick et al. 2005; Teeling et al. 2005). The paleoenvironment during the Early Oligocene was drier than today with more open habitats such as woodlands and savannahs as suggested by the prevalence of large hypsodont mammals in the fossil record (Flynn and Wyss 1998). Colonization of South America by trans-Atlantic dispersal and subsequent speciation in allopatry has been reported for three other groups of placental mammals based on fossil records from the Oligocene including molossid bats (Legendre 1984), caviomorph rodents (Wyss et al. 1993), and platyrrhine primates (Takai et al. 2000). These range expansions probably occurred earlier in the Eocene (Poux et al. 2006). The phylogenies of each of the eight genera of New World emballonurid bats were incorporated in an historical biogeographic analysis using the algorithm Phylogenetic Analysis for Comparing Trees (PACT; Wojcicki and Brooks 2005). In constructing the area cladogram, temporal information from the molecular dating

17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics

289

Fig. 17.3 Phylogenetic tree for the superfamily Emballonuroidea with the ancestral areas mapped onto each node (AF Africa, EU Europe, NA North America, SA South America) following Lim (2007). Lineage splits, other than the extant New World emballonurids (tribe Diclidurini), are based on the minimum age of the fossil record (black bars). The basal divergence at 52 million years ago (mya) of the families Nycteridae and Emballonuridae is the molecular approximation by Teeling et al. (2005). Extinct taxa are indicated by an asterisk (*)

analysis (Lim 2007) was also used in conjunction with spatial information based on the current distribution of each species (Table 17.1). There were nine biogeographic areas identified in Central and South America for New World emballonurids (Fig. 17.4). The final area cladogram identified the Northern Amazon as the

290

B.K. Lim

Table 17.1 Biogeographic areas identified for species of New World emballonurid bats based on current species distributions (Lim 2008) Species Biogeographic area A B C D E F G H I Balantiopteryx infusca C Balantiopteryx io B Balantiopteryx plicata A Centronycteris centralis B C D H Centronycteris maximiliani F G I Cormura brevirostris B D E F G H Cyttarops alecto B F G Diclidurus albus A B D E F G I Diclidurus ingens E F G Diclidurus isabellus F Diclidurus scutatus F G I Peropteryx kappleri B C D E F G H I Peropteryx leucoptera F G H I Peropteryx macrotis (Central America) A B Peropteryx macrotis (South America) D E F G H I Peropteryx pallidoptera F G Peropteryx trinitatis E F Rhynchonycteris naso A B C D E F G H I Saccopteryx antioquensis D Saccopteryx bilineata A B C D E F G H I Saccopteryx canescens D E F G H Saccopteryx gymnura F G Saccopteryx leptura A B C D E F G H I A ¼ Pacific versant of Central America; B ¼ Atlantic versant of Central America; C ¼ Choco region of northwestern South America; D ¼ northern Andes and valleys of Colombia; E ¼ north coast of Venezuela and offshore islands; F ¼ north of the Amazon River; G ¼ south of the Amazon River; H ¼ eastern slope of the Andes in the western Amazon basin; and I ¼ southeastern South America (Fig. 17.4)

ancestral area for the basal node and for most internal nodes based on character optimization (Fig. 17.5). This indicates that most lineage splits were within-area speciation events. However, there were three range expansions from the Northern Amazon followed by vicariant contractions including (1) a peripheral isolation in the Pacific slope of northwestern South America and subsequent colonization of Proto-Central America during the Middle Miocene; (2) colonization of northern Colombia and vicariant isolation after the uplift of the Andes during the Late Miocene; and (3) overland dispersal into Central America during the Pleistocene after the establishment of the Panamanian land bridge connection, which was followed by extinction in the intervening area of the northern Andes in Colombia, which resulted in allopatric speciation (Lim 2008). As is the case for most species of New World emballonurid bats, widely distributed species typically are not conducive for recovering biogeographic patterns. However, the optimization of the Northern Amazon at most nodes of the area cladogram indicates repeated within-area speciation events. Tectonic uplifting of

17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics

291

Fig. 17.4 Map of the nine biogeographical areas in Central America and South America that were identified based on current species distributions in Table 17.1 (Lim 2008): (A) Pacific versant; (B) Atlantic versant; (C) Choco; (D) Northern Andes; (E) North Coast; (F) Northern Amazon; (G) Southern Amazon; (H) Western Amazon; (I) Southeastern South America

the northern Andes (Hoorn et al. 1995) combined with fluctuations in temperature and sea levels (Haq et al. 1987; Miller et al. 2005), and changes in vegetation (Janis 1993) contributed to a heterogeneous paleoenvironment in South America during the Miocene (Lundberg et al. 1998). This scenario is similar to the taxon-pulse hypothesis of biotic diversification with recurring adaptive shifts over time to different habitats centered on a stable core area (Erwin 1979, 1981). For New World emballonurid bats, there were repeated episodes of range expansions and contractions from a stable core area such as the ancient Guiana Shield of the Northern Amazon. Mapping the area cladogram (Fig. 17.5) onto the chronogram (Fig. 17.3) suggests that other than an earlier colonization in the Miocene that was associated with the genus Balantiopteryx (Lim 2008; Lim et al. 2004), range expansion from South America into Central America probably did not occur until later in the Pliocene. Although Centronycteris split vicariantly in the Late Miocene with Centronycteris maximiliani speciating in the Northern Amazon and Centronycteris centralis in the Northern Andes, C. centralis did not colonize Central America until a later date. Similarly, Saccopteryx bilineata and Saccopteryx leptura split during the Late Miocene in the North Amazon before both species became widely distributed throughout the continental mainland. Even more recently, Diclidurus albus and Diclidurus ingens split during the Early Pleistocene in the North Amazon before D. albus dispersed into Central America. Although the topology forms a trichotomy with Peropteryx kappleri, the allopatrically distributed Central and South American populations of Peropteryx macrotis split in the Late Pleistocene. Three other

292

B.K. Lim

Fig. 17.5 Final area cladogram from an historical biogeographic analysis of New World emballonurid bats (Lim 2008). Ancestral areas at nodes are derived from character optimization. Three nodes marked with roman numerals in parentheses identify biotic expansions followed by vicariant isolation. All other nodes are within-area taxon pulses of biotic diversification in the Northern Amazon (F)

species (Cormura brevirostris, Cyttarops alecto, and Rhynchonycteris naso) are also widely distributed but their range expansions cannot be discerned from the area cladogram. Likewise, patterns of range expansion from the Northern Amazon southwards are not explicitly discernible because no speciation events involve the Southern Amazon. However, C. maximiliani, S. bilineata, S. leptura, Saccopteryx canescens, Saccopteryx gymnura, Diclidurus scutatus, and Peropteryx pallidoptera dispersed from the Northern to the Southern Amazon sometime after they speciated in the late Miocene. This timing coincides with the uplifting of the eastern

17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics

293

cordillera of the Andes, which created the Amazon River and primary drainage of South America east toward the Atlantic Ocean as we know it today (Hoorn et al. 1995).

17.5 17.5.1

Evolutionary Patterns Morphological Data

The most comprehensive morphological study of the family Emballonuridae incorporated 141 external, cranial, and skeletal characters from 43 of 52 extant species including 18 of 22 New World species (Dunlop 1998; Lim and Dunlop 2008). However, the phylogeny was poorly supported with the exception of the genera within the tribe Diclidurini. Topological congruence using the KH (Kishino and Hasegawa 1989), Wilcoxon signed ranks (Templeton 1983), and winning sites (Prager and Wilson 1988) tests indicated that the morphological data set constrained to each of the molecular trees was significantly worse than its own tree ( p < 0.02), except for Usp9x ( p < 0.07). Similarly, all three of the molecular data sets were significantly worse ( p < 0.01) when constrained to the morphological tree as opposed to their own tree. In terms of character congruence, the incongruence length difference test (Farris et al. 1995) identified the morphological data set as significantly different from the molecular data sets. Taxonomic congruence summarizes these topological and character differences because the three nuclear gene trees corroborate the split of the New World taxa into the subtribes Diclidurina and Saccopterina, which are clades not recovered by the morphological tree. Except for a collapse to a polytomy at the basal node of the subtribe Saccopterina in the parsimony tree, combining the morphological and molecular data sets resulted in the same topology as the nuclear tree for both Bayesian and parsimony analyses. This indicates that the morphological dataset has a lot of homoplasy with very little phylogenetic signal.

17.5.2

Ecological Data

The most comprehensive ecological study incorporated 28 characters primarily associated with roosting and foraging behavior; however, data for most of the species were unknown (Dunlop 1998; Lim and Dunlop 2008). A phylogenetic analysis of this incomplete dataset resulted in a largely unresolved topology. A combined analysis of morphological and behavioral characters resulted in a slightly better but still poorly resolved consensus tree of 509 equally parsimonious trees. The only higher level relationships recovered were the subfamilies Taphozoinae and Emballonurinae.

294

B.K. Lim

Although there is a lack of resolving power because of high levels of homoplasy and large amounts of missing data, characters can be optimized onto the robust molecular phylogeny to hypothesize evolutionary patterns in morphology and behavior. Three examples are detailed herein that are associated with the diversification of genera of New World emballonurid bats.

17.5.3

Wing Sacs

Species of Balantiopteryx, Cormura, Peropteryx, and Saccopteryx have a sac-like structure in the propatagium between the shoulder and forearm that is uniquely structured in each of the genera in terms of location in the wing membrane, direction of the opening, and size. However, only the wing sac in S. bilineata has been thoroughly studied. It is well developed in males and acts as a storage container without glandular cells (Scully et al. 2000) for bodily secretions used in a salting behavior to mark females in the harem (Voigt and von Helversen 1999). Based on both a parsimony and likelihood method of ancestral state reconstruction as implemented in Mesquite (Maddison and Maddison 2006), wing sac character states mapped independently onto the molecular phylogeny (Fig. 17.6; Lim and Dunlop 2008). An alternative hypothesis of a single origin of wing sacs for New World emballonurid bats is less parsimonious with two additional losses and it is also not supported by the likelihood method of ancestral state reconstruction, which predicts no wing sac at the base of this clade. However, because of multiple occurrences of sac-like structures in different genera, there is a possibility of a phylogenetic predisposition (Soltis et al. 1995) whereby the genetic components underlying the structure originated once on the tree (Lim and Dunlop 2008).

17.5.4

Roosts and Pelage

Most species of emballonurids and many bats in general have brown fur but some genera have atypical appearances including paler pelage that is white, as in the ghost bat Diclidurus, gray as in the smoky bat Cyttarops, or a pelage pattern with two dorsal pale lines as in Rhynchonycteris and Saccopteryx. In terms of primary roosting sites, most emballonurid bats occupy relatively sheltered areas such as caves and crevices in rocky outcrops, or in man-made structures such as tombs and buildings. Some species are primarily found in other forms of concealed roosts including tree hollows and rotted-out logs. A few genera, however, predominately roost in more exposed situations including in leaves at the tops of palm trees (Cyttarops and Diclidurus), or on sloping tree trunks overhanging rivers (R. naso), vertical tree trunks within forest (S. leptura), and within exposed cavities on the outside of buttressed roots of trees (S. bilineata). Although Saccopteryx is also known to roost in other places such as tree hollows, caves, and man-made

17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics

295

Fig. 17.6 Chronogram of New World emballonurid bats with the primary characters defining the basal diversification during the Late Oligocene and Early Miocene. Echolocation call design: C1 – frequency high (41.3–98.2 kHz), call duration low (4.8–7.6 ms), and pulse interval low (58–119 ms); C2 – frequency low (23.5–42.6 kHz), call duration high (8.1–9.7 ms), and pulse interval high (100–317 ms). Ear morphology: E1 – medial edge of ears arise from between the eyes; E2 – medial edge of ears are connected between the eyes; E3 – medial edge of ears arise above the inner portion of the eyes; E4 – medial edge of ears arise above the middle portion of the eyes; E5 – medial edge of ears arise above the outer portion of the eyes. Pelage pattern: P1 – fur typically a uniformly medium or dark brown color; P2 – fur has 2 wavy pale lines on the dorsum; P3 – fur is pale gray; and P4 – fur is brownish white or white. Roost site: R1 – lives in shelter area; R2 – lives in exposed areas on tree trunks; and R3 – lives in exposed areas under palm leaves. Wing sacs: W1 – no wing sacs; W2 – large-sized wing sacs located along the forearm of the propatagia; W3 – medium-sized wing sacs located in the middle of the propatagia; W4 – smallsized and conspicuous wing sacs located near the leading edge of the propatagia; and W5 – smallsized and inconspicuous wing sacs located near the leading edge of the propatagia

structures, they regularly use the exposed surfaces of trees, unlike other genera that occupy sheltered areas (Bradbury and Emmons 1974; Bradbury and Vehrencamp 1976). Pelage and roosting behavior map consistently and are correlated on the phylogeny suggesting a phylogenetic basis to these character systems and an association of camouflage for genera that roost on exposed substrate such as tree trunks and leaves at the tops of palm trees (Fig. 17.6; Lim and Dunlop 2008).

296

17.5.5

B.K. Lim

Ear Morphology and Echolocation

Although bats are not the only mammals that echolocate, they have the most sophisticated system of high frequency emission, sound reception, and neural processing for navigating and foraging in the dark. Ear shape and position are important factors for receiving returning echoes. The position of the medial edge of the ear in relation to the eye dictates the degree of forward or lateral orientation of the ear on the head of the bat. The direction of the ear may in turn influence the ecological adaptation of flying behavior. The more basal nodes for extant bats are equivocal for ear position because of polymorphic states in most families and the lack of comprehensive intrafamilial phylogenies (Lim and Dunlop 2008). Nonetheless, a possible accelerated character transformation is an ancestral state reconstruction of the ear directed more forward with the medial edge located between the eyes at the base of the New World emballonurid tree (Fig. 17.6). More laterally directed ears as seen in the subtribe Saccopterygina would be considered derived states. New World emballonurid bats are all aerial insectivores with an echolocation search call consisting of a central quasi-constant frequency band with short frequency modulated components and multiharmonics with most of the energy in the second harmonic. There is a negative correlation of a decrease in flying distance to forest clutter with an increase in peak echolocation frequency and a positive correlation of a decrease in pulse interval and call duration with a decrease in distance to clutter (Jung et al. 2007). These acoustic parameters map consistently on the phylogeny suggesting that foraging habitat and echolocation call design reflect phylogenetic relationships. Species within the subtribe Saccopterygina (Centronycteris, Rhynchonycteris, and Saccopteryx) fly in more cluttered environments within the forest or near the edge of forest and have higher frequencies, shorter pulse intervals, and shorter call durations (Fig. 17.6). In contrast, the subtribe Diclidurina (Balantiopteryx, Cormura, Cyttarops, Diclidurus, and Peropteryx) fly in less cluttered environment in open spaces near the forest or above the canopy and have lower frequencies, longer pulse intervals, and longer call durations. If ear positioning is linked to echolocation parameters and flying behavior, foraging near to forest clutter would be considered a derived ecological adaptation for Saccopterygina because forward directed ears are considered ancestral for New World emballonurids and are also found in Diclidurina.

17.6

Conclusions

The most recent common ancestor of New World emballonurid bats colonized an insular South America from Africa during the Early Oligocene 30 million years ago when savannah was more prevalent than today. A basal split occurred approximately 27 million years ago in the Northern Amazon with the speciation of the subtribes Saccopterygina in forested habitats and Diclidurina in savannah. There

17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics

297

was relative stasis until a rapid differentiation of genera 19.4–18.0 million years ago during the Early Miocene when marine incursions from the Caribbean into the northwestern Amazon region resulted in heterogeneous environments in a forestsavannah mosaic. The uplands of the Guiana Shield acted as a stable core area during range contractions. Subsequent range expansions back into favorable lowland habitats completed episodes of taxon pulses of biotic diversification. These changing paleoenvironments in the Early Miocene resulted in an adaptive radiation occurring in forested habitats that gave rise to the differentiation of the genera in Saccopterygina. The association of ear morphology and echolocation call design suitable for foraging within cluttered environments supports a phylogenetic basis to the evolution of these complex character systems. A similar radiation occurred in savannah habitats giving rise to the diversification of genera in Diclidurina that were adapted to foraging in more open environments. More detailed study of morphology, ecology, and echolocation of emballonurids at the species-level in a phylogenetic context will give further insights into the remarkable evolutionary history and adaptive radiation of bats. Acknowledgments I thank Mark Engstrom for critical comments throughout the formulation of the ideas presented herein. Primary funding for fieldwork and research was secured through the generous support of the Royal Ontario Museum Governors and Department of Natural History.

References Barghoorn SF (1977) New material of Vespertiliavus Schlosser (Mammalia, Chiroptera) and suggested relationships of emballonurid bats based on cranial morphology. Am Mus Novit 2618:1–29 Borissenko AV (1999) A mobile trap for capturing bats in flight. Plecotus et al 2:10–19 Bradbury JW, Emmons LH (1974) Social organization of some Trinidad bats: 1. Emballonuridae. Z Tierpsychol 36:137–183 Bradbury JW, Vehrencamp SL (1976) Social organization and foraging in emballonurid bats. Behav Ecol Sociobiol 1:337–381 Czaplewski NJ (1997) Chiroptera. In: Kay RF, Madden RH, Cifelli RL, Flynn JJ (eds) Vertebrate paleontology in the neotropics: the Miocene fauna of La Venta, Colombia. Smithsonian Institution Press, Washington, DC, pp 410–431 Dunlop JM (1998) The evolution of behavior and ecology in Emballonuridae (Chiroptera). PhD dissertation, York University, North York, Ontario Eick GN, Jacobs DS, Matthee CA (2005) A nuclear DNA phylogenetic perspective on the evolution of echolocation and historical biogeography of extant bats (Chiroptera). Mol Biol Evol 22:1869–1886 Erwin TL (1979) Thoughts on the evolutionary history of ground beetles: hypotheses generated from comparative faunal analyses of lowland forest sites in temperate and tropical regions. In: Erwin TL, Ball GE, Whitehead DR (eds) Carabid beetles: their evolution, natural history, and classification. Dr W. Junk, The Hague, pp 539–592 Erwin TL (1981) Taxon pulses, vicariance, and dispersal: an evolutionary synthesis illustrated by carabid beetles. In: Nelson G, Rosen DE (eds) Vicariance biogeography: a critique. Columbia University Press, New York, pp 159–196 Farris JS (1970) Methods for computing Wagner trees. Syst Zool 19:83–92

298

B.K. Lim

Farris JS, Kallersjo M, Kluge AG, Bult C (1995) Testing significance of incongruence. Cladistics 10:315–319 Flynn JJ, Wyss AR (1998) Recent advances in South American mammalian paleontology. Trends Ecol Evol 13:449–454 Griffiths TA, Smith AL (1991) Systematics of emballonuroid bats (Chiroptera: Emballonuridae and Rhinopomatidae) based on hyoid morphology. Bull Am Mus Nat Hist 206:62–83 Haq BU, Hardenbol J, Vail PR (1987) Chronology of fluctuating sea levels since the Triassic. Science 235:1156–1167 Hoorn C, Guerrero J, Sarmiento GA, Lorente MA (1995) Andean tectonics as a cause for changing drainage patterns in Miocene northern South America. Geology 23:237–240 Janis CM (1993) Tertiary mammal evolution in the context of changing climates, vegetation, and tectonic events. Ann Rev Ecol Syst 24:467–500 Jung K, Kalko EKV, von Helversen O (2007) Echolocation calls in Central American emballonurid bats: signal design and call frequency alternation. J Zool 212:125–137 Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179 Koopman KF (1994) Chiroptera: systematics Part 60 of Mammalia, vol 8, Handbook of Zoology. Walter de Gruyter, New York Legendre S (1984) E˙tude odontologique des repre´sentants actuels du groupe Tadarida (Chiroptera, Molossidae): implications phyloge´niques, syste´matiques et zooge´ographiques. Rev Suisse Zool 91:399–442 Lim BK (2007) Divergence times and origin of neotropical sheath-tailed bats (tribe Diclidurini) in South America. Mol Phylogenet Evol 45:777–791 Lim BK (2008) Historical biogeography of New World emballonurid bats (tribe Diclidurini): taxon pulse diversification. J Biogeogr 35:1385–1401 Lim BK (2009) Environmental assessment at the Bakhuis Bauxite Concession: small-sized mammal diversity and abundance in the lowland humid forests of Suriname. Open Biol J 2:42–57 Lim BK, Dunlop JM (2008) Evolutionary patterns of morphology and behavior as inferred from a molecular phylogeny of New World emballonurid bats (tribe Diclidurini). J Mammal Evol 15:79–121 Lim BK, Engstrom MD, Simmons NB, Dunlop JM (2004) Phylogenetics and biogeography of least sac-winged bats (Balantiopteryx) based on morphological and molecular data. Mamm Biol 69:225–237 Lim BK, Engstrom MD, Bickham JW, Patton JC (2008) Molecular phylogeny of New World emballonurid bats (Tribe Diclidurini) based on loci from the four genetic transmission systems in mammals. Biol J Linn Soc 93:189–209 Lim BK, Engstrom MD, Reid FA, Simmons NB, Voss RS, Fleck DW (2010) A new species of Peropteryx (Chiroptera: Emballonuridae) from western Amazonia with comments on phylogenetic relationships within the genus. Am Mus Novit 3686:1–20 Lundberg JG, Marshall LG, Guerrero J, Horton B, Malabarba MCSL, Wesselingh F (1998) The stage for Neotropical fish diversification: a history of tropical South American rivers. In: Malabarba LR, Reis RE, Vari RP, Lucena ZMS, Lucena CAS (eds) Phylogeny and classification of Neotropical fishes. Edipucrs, Porto Alegre, Brazil, pp 13–48 Maddison WP, Maddison DR (2006) Mesquite: a modular system for evolutionary analysis, version 1.12. http://mesquiteproject.org. Accessed 23 Sept 2006 McKenna MC, Bell SK (1997) Classification of mammals above the species level. Columbia University Press, New York Miller KG, Kominz MA, Browning JV, Wright JD, Mountain GS, Katz ME, Sugarman PJ, Cramer BS, Christie-Blick N, Pekar SF (2005) The Phanerozoic record of global sea-level change. Science 310:1293–1298

17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics

299

Mun˜oz J, Cuartas CA (2001) Saccopteryx antioquensis n. sp. (Chiroptera: Emballonuridae) del noroeste de Colombia. Actual Biol 23:53–61 Poux C, Chevret P, Huchon D, de Jong WW, Douzery EJP (2006) Arrival and diversification of caviomorph rodents and platyrrhine primates in South America. Syst Biol 55:228–244 Prager EM, Wilson AC (1988) Ancient origin of lactalbumin from lysozyme: analysis of DNA and amino acid sequences. J Mol Evol 27:326–335 Robbins LW, Sarich VM (1988) Evolutionary relationships in the family Emballonuridae (Chiroptera). J Mammal 69:1–13 Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574 Scully WMR, Fenton MB, Saleuddin ASM (2000) A histological examination of the holding sacs and glandular scent organs of some bat species (Emballonuridae, Hipposideridae, Phyllostomidae, Vespertilionidae, and Molossidae). Can J Zool 78:613–623 Shimodaira H, Hasegawa M (2001) CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17:1246–1247 Simmons NB (2005) Order Chiroptera. In: Wilson DE, Reeder DM (eds) Mammal species of the world: a taxonomic and geographic reference, 3rd edn. Johns Hopkins University Press, Baltimore, pp 312–529 Simmons NB, Voss RS (1998) The mammals of Paracou, French Guiana: a neotropical lowland rainforest fauna. Part 1, bats. Bull Am Mus Nat Hist 237:1–219 Soltis DE, Soltis PS, Morgan DR, Swensen SM, Mullin BC, Dowd JM, Martin PG (1995) Chloroplast gene sequence data suggest a single origin of the predisposition for symbiotic nitrogen fixation in angiosperms. Proc Natl Acad Sci USA 92:2647–2651 Swofford DL (2001) PAUP*: phylogenetic analysis using parsimony (*and other methods), version 4.0b10. Sinauer Associates, Sunderland, MA Takai M, Anaya F, Shigehara N, Setoguchi T (2000) New fossil materials of the earliest New World onkey, Branisella boliviana, and the problem of platyrrhine origins. Am J Phys Anthropol 111:263–281 Teeling EC, Springer MS, Madsen O, Bates P, O’Brien SJ, Murphy WJ (2005) A molecular phylogeny for bats illuminates biogeography and the fossil record. Science 307:580–584 Templeton AR (1983) Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37:221–244 Thorne JL, Kishino H (2002) Divergence time and evolutionary rate estimation with multilocus data. Syst Biol 51:689–702 Voigt CC, von Helversen O (1999) Storage and display of odour by male Saccopteryx bilineata (Chiroptera, Emballonuridae). Behav Ecol Sociobiol 50:29–40 Wilson DE, Reeder DM (eds) (2005) Mammal species of the world: a taxonomic and geographic reference, 3rd edn. Baltimore, Johns Hopkins University Press Wojcicki M, Brooks DR (2005) PACT: an efficient and powerful algorithm for generating area cladograms. J Biogeogr 32:755–774 Wyss AR, Flynn JJ, Norell MA, Swisher CC, Charrier R, Novacek MJ, McKenna MC (1993) South America’s earliest rodent and recognition of a new interval of mammalian evolution. Nature 365:434–437

Chapter 18

Trends in Rhizobial Evolution and Some Taxonomic Remarks Julio C. Martı´nez-Romero, Ernesto Ormen˜o-Orrillo, Marco A. Rogel, Aline Lo´pez-Lo´pez, and Esperanza Martı´nez-Romero

Abstract Bacteria that establish nitrogen-fixing symbiosis in specialized plant structures belong to only three of over 100 bacterial phyla. Among these, rhizobial symbioses are the best known and nodulation genes (nod) have been described in many species. nodA phylogenies revealed a larger diversity in Bradyrhizobium than in other genera and suggest that bradyrhizobial nod genes are the oldest in agreement to the proposal that nod genes evolved in Bradyrhizobium (Plant Soil 161:11–20, 1994). In many cases, rhizobial symbiotic and housekeeping genes have different evolutionary histories in relation to the lateral transfer of symbiotic genes among bacteria. Misclassified Rhizobium strains were identified, to properly identify rhizobial species we propose the use of fragments of the rpoB and dnaK genes, which according to probability analyses reflect the behavior of whole genes. With these analyses several rhizobial species related to Agrobacterium tumefaciens may be reclassified to a genus other than Rhizobium.

18.1

Introduction

Legume plants are widespread and diverse with a large number of species; they profit from symbiosis with nitrogen-fixing bacteria (collectively designated as rhizobia and comprising different, not closely related genera, such as Bradyrhizobium, Mesorhizobium, Azorhizobium, Sinorhizobium, Rhizobium, and others) that induce the formation of nodules on roots and rarely on stems and provide nitrogen that allows the plants to grow in nitrogen poor soils. Rhizobia are used as inoculants in agriculture, a practice that has been in use for over a hundred years, substituting fertilizers and saving millions of dollars in some cases (Hungria et al. 2000, 2005). J.C. Martı´nez-Romero, E. Ormen˜o-Orrillo, M.A. Rogel, A. Lo´pez-Lo´pez, and E. Martı´nezRomero Centro de Ciencias Geno´micas, UNAM, Av. Universidad, Cuernavaca, Morelos 62210, Me´xico e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_18, # Springer-Verlag Berlin Heidelberg 2010

301

J.C. Martı´nez-Romero et al.

302

Rhizobial evolution and diversity (reviewed in Terefework et al. 2000; Wang and Martı´nez-Romero 2000; Sprent 2001; Sessitsch et al. 2002; Provorov and Vorobyov 2008; Martinez-Romero 2009) and molecular mechanisms mediating their interaction with legume hosts (Barnett and Fisher 2006; Jones et al. 2007) have been studied for a small proportion of legume-rhizobial symbioses (Lo´pez-Lo´pez et al. 2010). The coevolution of Rhizobium and legumes in symbiosis has been critically analyzed (Sprent 1997; Martinez-Romero 2009).

18.2

Nitrogen-Fixing Symbioses with Plants

* eo ot Pr

Ch

lor

ob

i

ba

cte

ria

ria * ac te ria *

Ac tin

ob

te

ac te

ob

ae Cy

an

ch

iro

Sp

Fir

mi cu

te s

s

In plants with nitrogen-fixing symbiosis, special structures are involved (Fig. 18.1) indicating a sort of “convergent evolution” and suggesting a need to contain (in specialized structures) large numbers of selected bacteria to provide enough nitrogen for plants and/or to confine, control, or protect bacteria. Few bacterial genera belonging to only three phyla (out of over 100 current bacterial phyla) are capable of forming these nitrogen-fixing symbioses with plants (Fig. 18.1). There are more phyla with nitrogen-fixing bacteria than with nodulating bacteria, suggesting that nodulating bacteria evolved from nitrogen-fixing bacteria. Other bacteria out of the complex community found associated with plants, such as Azoarcus (Hurek and Reinhold-Hurek 2003) and Herbaspirillum (Roncato-Maccari et al. 2003), fix low levels of nitrogen not in nodules but as endophytes (inside plants); maybe rhizobial nitrogen fixation started similarly, as low-level nitrogen fixation. It is the aim of applied research with some of the plant-associated bacteria to achieve similar levels of nitrogen fixation with rice, corn, sugar-cane, and potatoes, as those obtained with the well recognized nitrogen-fixing symbioses of plants. Bacteria induce the formation of nodules on actinorrhizal plants and in legumes (including the nonlegume Parasponia) while cyanobacteria do not induce coralloid roots in cycads (an older symbiosis than those of legumes and actinorrhizal plants),

Fig. 18.1 Bacterial phyla, names correspond to phyla containing nitrogen-fixing species. Asterisks (*) indicate phyla containing bacteria that establish symbiosis in specialized structures

18

Trends in Rhizobial Evolution and Some Taxonomic Remarks

303

and seemingly neither the specialized cavities in Azolla and Gunnera, such structures formed normally by plants are subsequently colonized by cyanobacteria. Rhizobia and actinobacteria become intracellular in nodules as do cyanobacteria in Gunnera. Interestingly, in Casuarina glauca, an actinorhizal plant, a legume symbiotic gene (symRK) has been found that is required for nodulation suggesting a common genetic basis for nodule formation in legumes and actinorrhizal plants (Gherbi et al. 2008). A landmark in symbiotic research in Rhizobium was the discovery of the inducing molecules (Lerouge et al. 1990), Nod factors (produced by enzymes encoded by nod genes), which have a unique structure in biology, are active at nanomolar concentrations and are capable of inducing nodules in the absence of bacteria (De´narie´ et al. 1996; Relic et al. 1994). Great interest and much effort have been devoted toward identifying nodulation factors in actinobacteria but results have not been reported yet. Genetic approaches in the 1980s led to the discovery of nodulation mutants in Rhizobium and nod genes were described then (Long et al. 1983; Kondorosi et al. 1984). With the exception of photosynthetic Bradyrhizobium nodulating some Aeschynomene species on stems (Giraud et al. 2007), all other rhizobial species use Nod factors to induce nodules on legume roots. Furthermore, the acquisition of nod genes in some nonsymbiotic bacteria makes them form nodules (see later). The nodABC genes constitute an operon in most rhizobia. Exceptions are Rhizobium etli biovar phaseoli with nodA separated from nodBC (Vazquez et al. 1991) and Mesorhizobium loti where nodB does not form an operon with nodA and C (Sullivan et al. 2002). nodABC genes encode the enzymes that synthesize the core of the Nod factor: nodC encodes an N-acetylglucosaminyltransferase, nodB a chitooligosaccharide deacetylase, and nodA specifies the N-acylation of the aminosugar backbone by different fatty acids (Atkinson et al. 1994; Debelle´ et al. 1996a; Roche et al. 1996). Other nod gene products act to add chemical modifications to the Nod factor (Relic et al. 1994; Ferro et al. 2000), mediate its secretion (Evans and Downie 1986), provide precursors (Baev et al. 1991), or regulate nod gene expression (Mulligan and Long 1985; Kondorosi et al. 1991).

18.3

nod Gene Evolution

Where do nod genes originally come from? A hyaluronate synthase (hyaluronic acid is an polymer of alternative N-acetylglucosamine and glucuronic acid) from Streptococcus has sequence similarities to NodC, DG42 from Xenopus, and chitin synthases from yeast. Some bacterial xylanases (that catalyze the hydrolysis of linked xylose oligomeric and polymeric substrates) contain domains homologous to NodB proteins (Laurie et al. 1997). A Bacillus strain produces a molecule seemingly structurally related to Nod factors that stimulates plant proliferation (Lian et al. 2001). Interestingly, some plant mutants affecting rhizobial nodulation are defective in the mycorrhization process (Oldroyd et al. 2005) and it is suggested that a common

304

J.C. Martı´nez-Romero et al.

signaling pathway exists for Nod factor perception and mycorrhizal symbiosis (Catoira et al. 2000; Gianinazzi-Pearson and De´narie´ 1997). Mycorrhizal symbiosis occurs in around 80% of all plants and is considered as old as the first plants that evolved on Earth. The Nod factor may be considered as a very small chitin molecule that subsequently acquired other chemical modifications, some of them involved in protecting the molecule from plant chitinases (Staehelin et al. 1994). Mycorrhiza, being fungi, have chitin. Maybe rhizobia mimicked micorrhizal symbiosis (Debelle´ et al. 1996b). nod gene phylogenies have been reported in Bradyrhizobium, Rhizobium, Mesorhizobium, and Sinorhizobium (Moulin et al. 2004; Steenkamp et al. 2008; Stepkowski et al. 2007; Han et al. 2008; Rincon-Rosales et al. 2009). A host correlation to nod genes has been recognized (Suominen et al. 2001) and Nod factor fucosylation and acetylation have been correlated to bacterial phylogenies and specificities (Moulin et al. 2004); bacteria with sulfate modifications are scattered in rhizobial phylogenies (Martı´nez et al. 1995). We constructed a phylogenetic tree with available reported nodA sequences (Fig. 18.2). There seems to be a larger diversity of nodA sequences in Bradyrhizobium compared with the diversity in b-Proteobacteria or Sinorhizobium. In 1994, we proposed the hypothesis that nod genes evolved in Bradyrhizobium and that they were later transferred to other genera such as Rhizobium (Martinez-Romero 1994). In Bradyrhizobium, an ancestral nod group has been identified from bacteria nodulating several diverse legumes (indicated in Fig. 18.2), supposedly this group of legumes extended over many parts of the world during the Eocene after the origin of legumes north of the Tethys Sea (Steenkamp et al. 2008). Bradyrhizobium are the main nodule bacteria of tropical tree legumes (Qian et al. 2003; Moreira et al. 1998; Parker 2004; Ormen˜o-Orrillo et al. 2006) with a low degree of specificity and tropical legumes are considered older than temperate legumes. We found 23 novel lineages of Bradyrhizobium in the rain forest of Los Tuxtlas in Veracruz, Mexico, and they exhibited low specificity (Ormen˜o-Orrillo submitted). Specificity is a characteristic of many temperate legumes and few tropical legumes and may have been acquired later in bacteria (Perret et al. 2000; Young et al. 2003). Most nodule forming bacteria belong to the a-Proteobacteria and few to b-Proteobacteria (Moulin et al. 2001; Chen et al. 2003). Lateral transfer of nod genes to b-Proteobacteria was considered to account for the existence of nodulation in Burkholderia and Cupriavidus nodulating species (Moulin et al. 2001; Amadou et al. 2008), in Devosia (Rivas et al. 2002), and in Phyllobacterium (Valverde et al. 2005).

18.4

Different Evolutionary Histories of Chromosomal and Symbiotic Genes

In Rhizobium, Sinorhizobium, and in b-Proteobacteria, symbiotic genes including nod and nif (nitrogen fixation) genes are located on plasmids (Amadou et al. 2008) that may be transferred among species both in the laboratory and in nature.

Rhizobium/ Sinorhizobium

Burkholderia/ Cupriavidus

Azorhizobium Mesorhizobium

Rhizobium/ Sinorhizobium

Mesorhizobium

Rhizobium/ Sinorhizobium

Mesorhizobium

Sinorhizobium

Trends in Rhizobial Evolution and Some Taxonomic Remarks

Fig. 18.2 NodA gene phylogeny in different rhizobial genera

B. tuberum M. nodulans

Bradyrhizobium

18 305

306

J.C. Martı´nez-Romero et al.

In Mesorhizobium except Mesorhizobium amorphae (Wang et al. 1999b), in Azorhizobium, in Methylobacterium, and in Bradyrhizobium, symbiotic genes are on the chromosome. Symbiotic islands have been found to be transferable among mesorhizobia in the environment (Sullivan et al. 1995; Sullivan and Ronson 1998; Nandasena et al. 2007). Evidence that transfer and recombination occurs in nature is obtained by comparing housekeeping and nod gene phylogenies revealing different evolutionary histories in symbiotic and housekeeping genes (Haukka et al. 1998; Steenkamp et al. 2008). In the laboratory plant pathogens such as Agrobacterium tumefaciens and opportunistic human pathogens as Ochrobactrum may become fully symbiotic by acquiring symbiotic plasmids from Rhizobium tropici, albeit with reduced levels of nitrogen fixation (Martinez et al. 1987; Rogel et al. 2006). Two highly diverging lineages of R. tropici (type A and B) harbor very similar symbiotic plasmids that we suppose are exchanged among these lineages (Martı´nez-Romero 1996). Biovars were defined in Rhizobium as the different symbiotic specificities (mainly plasmid encoded) that could be exhibited in a single chromosomal background (species). As such three biovars were recognized in Rhizobium leguminosarum (viciae, trifolii, and phaseoli) (Jordan 1984); however, recently a more complicated situation has been revealed and some R. leguminosarum strains have been assigned to different species: Rhizobium pisi (Ramı´rez-Bahena et al. 2008) and Rhizobium fabae (Tian et al. 2008). The symbiotic plasmid from biovar phaseoli in R. etli is highly conserved (Gonza´lez et al. 2010) may be in relation to a recent evolutionary origin (Martinez-Romero 2009) maybe as recent as Phaseolus vulgaris, dating of around 2–3 million years ago (Delgado-Salinas et al. 2006). We identified a new biovar in R. etli, biovar mimosae, and supposed that it was a more ancient plasmid than the phaseoli plasmid (Wang et al. 1999a); nod gene phylogenies seem to support this hypothesis. Nonrandom association between plasmid and chromosome markers (Young et al. 2003) and limited plasmid transfer have been observed in nodule bacteria (Wernegreen and Riley 1999); however, different evolutionary histories of symbiotic and metabolic genes or chromosomal markers have been recognized in some cases in rhizobia (Silva et al. 2005; Tian et al. 2007; Han et al. 2008; RinconRosales et al. 2009). Two sympatric species of Sinorhizobium nodulating wild Acaciellas in Mexico seem to contain the same symbiotic plasmid, and incongruencies in symbiotic and housekeeping phylogenies have been repeatedly observed in sinorhizobia (Haukka et al. 1998; Toledo et al. 2003; Lloret et al. 2007). African Sinorhizobium terangae is a close relative to these American sinorhizobia but not on the basis of symbiotic genes (Rincon-Rosales et al. 2009) (Fig 18.3). In symbionts of Galega orientalis and Galega officinalis (two native legumes from the Caucasus), there is evidence of transfer of symbiotic information (Andronov et al. 2003). In Bradyrhizobium japonicum, a biovar with symbiotic genes specific for genistoid wild legumes is also found in another species B. canariense (Vinuesa et al. 2005). Lateral transfer of symbiotic genes is recognized to have occurred in Bradyrhizobium nodulating a diversity of wild legumes (Steenkamp et al. 2008).

18

Trends in Rhizobial Evolution and Some Taxonomic Remarks S. americanum

rpoB

S. fredii

307

S. americanum

nodA S. fredii bv. mediterranense

S. saheli S. mexicanum S. mexicanum S. terangae

S. chiapanecum

S. chiapanecum

Mesorhizobium de acacias

S. kostiense

S. kostiense

S. arboris

S. saheli

S. meliloti S. arboris S. medicae S. adhaerens

S. terangae

S. morelense

Fig. 18.3 Schematic comparison of chromosomal and symbiotic gene phylogenies in Sinorhizobium

Symbiotic plasmids in rhizobia are repABC plasmids. repABC plasmids are characteristic of a-Proteobacteria and differences in repA, repB, and repC gene evolution have been reported (Castillo-Ramirez et al. 2009), supporting the occurrence of large recombination rates in plasmids. Genomic analyses have revealed mosaicism in symbiotic plasmids (Gonzalez et al. 2006). Genetic information in plasmids has been described as accessory or the mobile genome (Young et al. 2006). Plasmid (and maybe also genomic island) plasticity may have been instrumental for the adaptation of rhizobia to legume evolution and specificity (MartinezRomero 2009).

18.5

Chromosomal Evolution and Molecular Markers

Rhizobial lineages have been estimated to be nearly as old as plants, for example, Rhizobium and Bradyrhizobium last common ancestor was dated as being over 400 million years old but legumes evolved around 100–65 million years ago (Sprent 2001). Nodulation seemingly evolved (Young and Johnston 1989), in only one group of bacteria that were associated with plants (maybe as endophytes, MartinezRomero 2009). Further spread of nod genes by lateral gene transfer may have conferred to diverse genera their nodulating capacity.

308

J.C. Martı´nez-Romero et al.

In 1989, it was suggested that “We will eventually need many genera to accommodate all the root-nodule bacteria” (Young and Johnston 1989), up to now 13 genera and over 50 species have been described establishing symbioses with a small sample of legumes analyzed. Small subunit ribosomal (16S rRNA) gene sequences have been commonly used to identify and propose species in rhizobia (Wang and Martı´nez-Romero 2000). It is remarkable that in spite of the large divergence of nod gene sequences found in Bradyrhizobium, this genus exhibits only a very limited diversity of 16S rRNA genes (Barrera et al. 1997; Vinuesa et al. 2005) and species delineation is not clear with this marker. Several molecular markers have been used to establish phylogenies and identify new species not only in Bradyrhizobium but in rhizobia in general. Genomic information provides large numbers of genes for these analyses (Young et al. 2006; Gonzalez et al. 2006; Crossman et al. 2008) and congruent bacterial relationships have been reported using indel analyses (Gupta 2005). Alternative phylogenetic relationships are encountered in multiple gene analyses from reported complete genomes of Agrobacterium, Rhizobium, and Sinorhizobium (Young et al. 2006); this suggests that the divergence of these lineages occurred within a very short time as has been concluded for other a-Proteobacteria (Castillo-Ramı´rez and Gonza´lez 2008).

18.6

Probability Estimates to Distinguish Rhizobial Species

Representative molecular markers are being searched to better reflect species phylogenies and not single gene phylogenies, in this regard dnaJ was found to reproduce accepted phylogenetic relationships (Alexandre et al. 2008). rpoB gene sequences have been considered for diversity studies in very different habitats or communities (Planet et al. 1995; Dahlloef et al. 2000; Case et al. 2007; SachmanRuiz et al. 2009). We have used partial sequences of rpoB as part of the phylogenetic studies to characterize new Sinorhizobium species (Lloret et al. 2007; Rincon-Rosales et al. 2009) and a new species of Klebseilla (Rosenblueth et al. 2004). rpoB is a large gene (more than 4,140 bp in Rhizobium) and usually, only fragments of the gene sequence are available. Different studies report sequences of different fragments, hampering direct comparisons. Sequencing a common fragment will facilitate comparisons and diminish misclassifications. Up to now several genomes of species within the Rhizobium genus have been completely sequenced. A practical utility for defining gene divergence ranges is to facilitate proper identification of novel species and of species belonging to a single species. When describing Sinorhizobium (Ensifer) mexicanum (Lloret et al. 2007) and Sinorhizobium chiapanecum (Rincon-Rosales et al. 2009), we proposed a probability range of inter- and intraspecies gene differences that allowed the distinction of different species and bacteria belonging to the same species. Comparing full rpoB gene sequences from seven Rhizobium genomes, we calculated that the 95% confidence interval for identities ranges from 0.898 to 1.000 for the sequences within this genus. The 0.898 threshold provides a useful criterion to determine if a new isolate

18

Trends in Rhizobial Evolution and Some Taxonomic Remarks

309

belongs to this genus: an identity of less than 0.898 excludes it from being a Rhizobium. Nevertheless, this is not a practical approach to classify new isolates due to the large size of rpoB gene, which can hardly be expected to be totally sequenced in diversity studies considering a large number of strains. Thus, we examined 700 bp fragments that covered the entire 4,140 bp sequence and found that the identities of a 700 bp fragment, ranging from positions 2,800 to 3,500, closely match the distribution of the entire gene sequence (Kolmogorov Smirnoff, p ¼ 0.05), in contrast to all other fragments analyzed. This fragment would provide not only a dependable molecular marker to study the phylogenies of rhizobia, but also a performable one. In both the full gene and the 700 bp (position 2,800–3,500) fragment, with a 95% confidence it can be stated that while Agrobacterium radiobacter is within the ranges of Rhizobium, A. tumefaciens, and Agrobacterium vitis identities to the members of the group do not fall within the limits of the genus in the distribution that described the dispersion of their differences. The same analysis was performed for dnaK. For this gene, the 95% confidence interval for identities ranges from 0.896 to 1.000 for the sequences within Rhizobium. Considering this interval, A. radiobacter and Agrobacterium rhizogenes are within the ranges of Rhizobium (therefore should be considered Rhizobium radiobacter and Rhizobium rhizogenes as has been proposed by Young et al. 2001), whereas A. tumefaciens and A. vitis identities to the members of the group do not fall within the limits of the genus (Fig. 18.4). Thus, by rpoB and by dnaK analyses, Agrobacterium could stand as an independent genus from Rhizobium as has been claimed before (Farrand et al. 2003), in consequence Rhizobium galegae, Rhizobium huautlense, Rhizobium cellulosilyticum, Rhizobium selenireducens, and Rhizobium daejeonense, all related to A. tumefaciens should be reclassified. It is clear from many published phylogenetic trees that Rhizobium is not monophyletic. We encountered several examples of misclassified Rhizobium strains in a 16S rRNA gene phylogenetic tree (Fig. 18.5), probably because many new isolates are only recognized by 16S rRNA genes and designation is done based on the closest relative frequently identified only as the best Blast hit, without further characterization. Rhizobium mongolense and Rhizobium lusitanum are polyphyletic (Fig. 18.5). Emendments to such misclassifications should be done. Agrobacterium tumefaciens

rpoB

Fig. 18.4 95% Confidence intervals for identities of species within Rhizobium genus for rpoB and dnaK genes. The arrows indicate the average identity of Agrobacterium tumefaciens or A. rhizogenes to the members of Rhizobium genus

Rhizobium Agrobacterium Agrobacterium rhizogenes tumefaciens

dnaK Rhizobium

J.C. Martı´nez-Romero et al.

310

EU399697 Rhizobium mongolense CCBAU 05122 AF008130 Rhizobium gallicum R602sp U89819 Rhizobium mongolense USDA 1844T

98

U89817 Rhizobium mongolense USDA 1877

100

U89822 Rhizobium mongolense USDA 2377 AY509212 Rhizobium mongolense S110*

70

100

EU256432 Rhizobium sullae CCBAU 85011

DQ196418 Rhizobium leguminosarum bv. viciae PEPSM13

100

90

EF141340 Rhizobium leguminosarum bv. phaseoli ATCC 14482

AY998046 Rhizobium etli bv. phaseoli IE4804 DQ648575 Rhizobium etli bv. mimosae Mim 7-4

81 61

U28916 Rhizobium etli CFN 42 AY509209 Rhizobium mongolense S152*

96

EU074200 Rhizobium lusitanum CCBAU 03301*

97 62

X67234 Rhizobium tropici IIA LMG9517 EF035070 Rhizobium multihospitium CCBAU 83435 U89832 Rhizobium tropici CIAT899

99

AY738130 Rhizobium lusitanum P1-7 CP000628 Agrobacterium radiobacter K84 96 77 AY945955 Agrobacterium rhizogenes ATCC 11325 63 EF522124 Agrobacterium rhizogenes CU10

0.002

Fig. 18.5 Rhizobium 16S rRNA gene phylogenies. Misclassified strains are indicated by asterisks (*) Acknowledgments To PAPIIT IN200709 and Michael Dunn for reading the manuscript. Partial financial support for this project was from GEF PNUMA, TSBF-CIAT. E.M. is grateful to DGAPA UNAM for a postdoctoral fellowship during her sabattical year at UC Davis in California.

References Alexandre A, Laranjo M, Young JPW, Oliveira S (2008) dnaJ is a useful phylogenetic marker for alphaproteobacteria. Int J Syst Evol Microbiol 58:2839–2849 Amadou C, Pascal G, Mangenot S, Glew M, Bontemps C, Capela D, Carrere S, Cruveiller S, Dossat C, Lajus A, Marchetti M, Poinsot V, Rouy Z, Servin B, Saad M, Schenowitz C, Barbe V, Batut J, Medigue C, Masson-Boivin C (2008) Genome sequence of the beta-Rhizobium Cupriavidus taiwanensis and comparative genomics of rhizobia. Genome Res 18:1472–1483 Andronov EE, Terefework Z, Roumiantseva ML, Dzyubenko NI, Onichtchouk OP, Kurchak ON, Dresler-Nurmi A, Young JPW, Simarov BV, Lindstroem K (2003) Symbiotic and genetic diversity of Rhizobium galegae isolates collected from the Galega orientalis gene center in the Caucasus. Appl Environ Microbiol 69:1067–1074 Atkinson EM, Palcic MM, Hindsgaul O, Long SR (1994) Biosynthesis of Rhizobium meliloti lipooligosaccharide Nod factors: NodA is required for an N-acyltransferase activity. Proc Natl Acad Sci USA 91:8418–8422 Baev N, Endre G, Petrovics G, Banfalvi Z, Kondorosi A (1991) Six nodulation genes of nod box locus 4 in Rhizobium meliloti are involved in nodulation signal production: nodM codes for D-glucosamine synthetase. Mol Gen Genet 228:113–124

18

Trends in Rhizobial Evolution and Some Taxonomic Remarks

311

Barnett MJ, Fisher RF (2006) Global gene expression in the rhizobial-legume symbiosis. Symbiosis 42:1–24 Barrera LL, Trujillo ME, Goodfellow M, Garcia FJ, Hernandez-Lucas I, Davila G, van Berkum P, Martinez-Romero E (1997) Biodiversity of bradyrhizobia nodulating Lupinus spp. Int J Syst Bacteriol 47:1086–1091 Case RJ, Boucher Y, Dahlloef I, Holmstroem C, Doolittle WF, Kjelleberg S (2007) Use of 16S rRNA and rpoB genes as molecular markers for microbial ecology studies. Appl Environ Microbiol 73:278–288 Castillo-Ramı´rez S, Gonza´lez V (2008) Factors affecting the concordance between orthologous gene trees and species tree in bacteria. BMC Evol Biol 8:300 Castillo-Ramirez S, Vazquez-Castellanos JF, Gonzalez V, Cevallos MA (2009) Horizontal gene transfer and diverse functional constrains within a common replication-partitioning system in Alphaproteobacteria: the repABC operon. BMC Genomics 10:536 Catoira R, Galera C, De Billy F, Penmetsa RV, Journet E-P, Maillet F, Rosenberg C, Cook D, Gough C, Denarie J (2000) Four genes of Medicago truncatula controlling components of a Nod factor transduction pathway. Plant Cell 12:1647–1666 Chen W-M, Moulin L, Bontemps C, Vandamme P, Bena G, Boivin-Masson C (2003) Legume symbiotic nitrogen fixation by b-Proteobacteria is widespread in nature. J Bacteriol 185:7266–7272 Crossman LC, Castillo-Ramı´rez S, McAnnula C, Lozano L, Vernikos GS, Acosta JL, Ghazoui ZF, Herna´ndez-Gonza´lez I, Meakin G, Walker AW, Hynes MF, Young JPW, Downie JA, Romero D, Johnston AWB, Da´vila G, Parkhill J, Gonza´lez V (2008) A common genomic framework for a diverse assembly of plasmids in the symbiotic nitrogen fixing bacteria. PLoS ONE 3(7):e2567 Dahlloef I, Baillie H, Kjelleberg S (2000) rpoB-based microbial community analysis avoids limitations inherent in 16s rRNA gene intraspecies heterogeneity. Appl Environ Microbiol 66:3376–3380 Debelle´ F, Plazanet C, Roche P, Pujol C, Savagnac A, Rosenberg C, Prome J-C, Denarie J (1996a) The NodA proteins of Rhizobium meliloti and Rhizobium tropici specify the N-acylation of Nod factors by different fatty acids. Mol Microbiol 22:303–314 Debelle´ F, Yang GP, Ferro M, Truchet G, Prome´ JC, De´narie´ J (1996b) Rhizobium nodulation factors in perspective. In: Legocki A, Bothe H, P€ uhler A (eds) Biological fixation of nitrogen for ecology and sustainable agriculture. Springer, Heidelberg, Germany, pp 15–24 Delgado-Salinas A, Bibler R, Lavin M (2006) Phylogeny of the genus Phaseolus (Leguminosae): a recent diversification in an ancient landscape. Syst Bot 31:779–791 De´narie´ J, Debelle´ F, Prome´ JC (1996) Rhizobium lipo-chitooligosaccharide nodulation factors: signaling molecules mediating recognition and morphogenesis. Annu Rev Biochem 65:503–535 Evans IJ, Downie JA (1986) The nodI gene product of Rhizobium leguminosarum is closely related to ATP-binding bacterial transport proteins; nucleotide sequence analysis of the nodI and nodJ genes. Gene 43:95–101 Farrand SK, van Berkum PB, Oger P (2003) Agrobacterium is a definable genus of the family Rhizobiaceae. Int J Syst Evol Microbiol 53:1681–1687 Ferro M, Lorquin J, Ba S, Sanon K, Prome´ JC, Boivin C (2000) Bradyrhizobium sp. strains that nodulate the leguminous tree Acacia albida produce fucosylated and partially sulfated Nod factors. Appl Environ Microbiol 66:5078–5082 Gherbi H, Markmann K, Svistoonoff S, Estevan J, Autran D, Giczey G, Auguy F, Peret B, Laplaze L, Franche C, Parniske M, Bogusz D (2008) SymRK defines a common genetic basis for plant root endosymbioses with arbuscular mycorrhiza fungi, rhizobia, and Frankiabacteria. Proc Natl Acad Sci USA 105:4928–4932 Gianinazzi-Pearson V, De´narie´ J (1997) Red carpet genetic programmes for root endosymbioses. Trends Plant Sci 2:371–372 Giraud E, Moulin L, Vallenet D, Barbe V, Cytryn E, Avarre J-C, Jaubert M, Simon D, Cartieaux F, Prin Y, Bena G, Hannibal L, Fardoux J, Kojadinovic M, Vuillet L, Lajus A, Cruveiller S, Rouy Z, Mangenot S, Segurens B, Dossat C, Franck WL, Chang W-S, Saunders E, Bruce D,

312

J.C. Martı´nez-Romero et al.

Richardson P, Normand P, Dreyfus B, Pignol D, Stacey G, Emerich D, Vermeglio A, Medigue C, Sadowsky M (2007) Legumes symbioses: absence of nod genes in photosynthetic bradyrhizobia. Science 316:1307–1312 Gonzalez V, Santamaria RI, Bustos P, Hernandez-Gonzalez I, Medrano-Soto A, MorenoHagelsieb G, Janga SC, Ramirez MA, Jimenez-Jacinto V, Collado-Vides J, Davila G (2006) The partitioned Rhizobium etli genome: genetic and metabolic redundancy in seven interacting replicons. Proc Natl Acad Sci USA 103:3834–3839 Gonza´lez V, Acosta JL, Santamarı´a RI, Bustos P, Ferna´ndez JL, Herna´ndez Gonza´lez IL, Dı´az R, Flores M, Palacios R, Mora J, Da´vila G (2010) Conserved symbiotic plasmid DNA sequences in the multireplicon pangenomic structure of Rhizobium etli. Appl Environ Microbiol 76:1604–1614 Gupta RS (2005) Protein signatures distinctive of a-Proteobacteria and its subgroups and a model for a-proteobacterial evolution. Crit Rev Microbiol 31:101–135 Han TX, Wang ET, Han LL, Chen WF, Sui XH, Chen WX (2008) Molecular diversity and phylogeny of rhizobia associated with wild legumes native to Xinjiang, China. Syst Appl Microbiol 31:287–301 Haukka K, Lindstrom K, Young JPW (1998) Three phylogenetic groups of nodA and nifH genes in Sinorhizobium and Mesorhizobium isolates from leguminous trees growing in Africa and Latin America. Appl Environ Microbiol 64:419–426 Hungria M, Vargas MAT, Campo RJ, Chueire LMO, Andrade DS (2000) The Brazilian experience with the soybean (Glycine max) and common bean (Phaseolus vulgaris) symbioses. In: Pedrosa FO, Hungria M, Yates G, Newton WE (eds) Nitrogen fixation: from molecules to crop production. Kluwer Academic Publishers, Netherlands, p 515 Hungria M, Franchini JC, Campo RJ, Graham PH (2005) The importance of nitrogen fixation to soybean cropping in South America. In: Werner D, Newton WE (eds) Nitrogen fixation in agriculture, forestry, ecology, and the environment. Springer, Dordrecht, pp 25–42 Hurek T, Reinhold-Hurek B (2003) Azoarcus sp. strain BH72 as a model for nitrogen-fixing grass endophytes. J Biotechnol 106:169–178 Jones KM, Kobayashi H, Davies BW, Taga ME, Walker GC (2007) How rhizobial symbionts invade plants: the Sinorhizobium-Medicago model. Nat Rev Microbiol 5:619–633 Jordan DC (1984) Family III. Rhizobiaceae Conn 1938, 321AL. In: Krieg NR, Holt JG (eds) Bergeys’s manual of systematic bacteriology, vol 1. The Williams and Wilkins Co., Baltimore, pp 234–254 Kondorosi E, Banfalvi Z, Kondorosi A (1984) Physical and genetic analysis of a symbiotic region of Rhizobium meliloti: identification of nodulation genes. Mol Gen Genet 193:445–452 Kondorosi E, Pierre M, Cren M, Haumann U, Buire M, Hoffmann B, Schell J, Kondorosi A (1991) Identification of NolR, a negative transacting factor controlling the nod regulon in Rhizobium meliloti. J Mol Biol 222:885–896 Laurie JI, Clarke JH, Ciruela A, Faulds CB, Williamson G, Gilbert HJ, Rixon JE, Millward-Sadler J, Hazlewood GP (1997) The NodB domain of a multidomain xylanase from Cellulomonas fimi deacetylates acetylxylan. FEMS Microbiol Lett 148:261–264 Lerouge P, Roche P, Faucher C, Maillet F, Truchet G, Prome´ JC, De´narie´ J (1990) Symbiotic host-specificity of Rhizobium meliloti is determined by a sulphated and acylated glucosamine oligosaccharide signal. Nature 344:781–784 Lian B, Prithiviraj B, Souleimanov A, Smith DL (2001) Evidence for the production of chemical compounds analogous to nod factor by the silicate bacterium Bacillus circulans GY92. Microbiol Res 156:289–292 Lloret L, Ormen˜o-Orrillo E, Rinco´n R, Martı´nez-Romero J, Rogel-Herna´ndez MA, Martı´nezRomero E (2007) Ensifer mexicanus sp. nov. a new species nodulating Acacia angustissima (Mill.) Kuntze in Mexico. Syst Appl Microbiol 30:280–290 Long SR, Buikema WJ, Ausubel FM (1983) Cloning of Rhizobium meliloti nodulation genes by direct complementation of Nod-mutants. Nature 298:485–487 Lo´pez-Lo´pez A, Rosenblueth M, Martı´nez J, Martı´nez-Romero E (2010) Rhizobial symbioses in tropical legumes and non-legumes. In: Dion P (ed) Soil biology and agriculture in the tropics. Springer Heidelberg, pp. 163–184

18

Trends in Rhizobial Evolution and Some Taxonomic Remarks

313

Martinez E, Palacios R, Sanchez F (1987) Nitrogen-fixing nodules induced by Agrobacterium tumefaciens harboring Rhizobium phaseoli plasmids. J Bacteriol 169:2828–2834 Martı´nez E, Laeremans T, Poupot R, Rogel MA, Lopez L, Garcı´a F, Vanderleyden J, Prome´ JC, Lara F (1995) Nod metabolites and other compounds excreted by Rhizobium spp. In: Tikhonovich IA, Provorov NA, Romanov VI, Newton WE (eds) Nitrogen fixation: fundamentals and applications. Kluwer Academic Publishers, Dordrecht, pp 281–286 Martinez-Romero E (1994) Recent developments in Rhizobium taxonomy. Plant Soil 161:11–20 Martinez-Romero E (2009) Coevolution in Rhizobium-legume symbiosis? DNA Cell Biol 28:361–370 Martı´nez-Romero E (1996) Comments on Rhizobium systematics. Lessons from R. tropici and R. etli. In: Stacey G, Mullin B, Gresshoff PM (eds) Biology of plant–microbe interactions. International Society for Molecular Plant–Microbe Interactions, St. Paul, Minnesota, pp 503–508 Moreira FMS, Haukka K, Young JPW (1998) Biodiversity of rhizobia isolated from a wide range of forest legumes in Brazil. Mol Ecol 7:889–895 Moulin L, Munive A, Dreyfus B, Boivin-Masson C (2001) Nodulation of legumes by members of the bsubclass of Proteobacteria. Nature 411:948–950 Moulin L, Bena G, Boivin-Masson C, Stepkowski T (2004) Phylogenetic analyses of symbiotic nodulation genes support vertical and lateral gene co-transfer within the Bradyrhizobium genus. Mol Phylogenet Evol 30:720–732 Mulligan JT, Long SR (1985) Induction of Rhizobium meliloti nodC expression by plant exudate requires nodD. Proc Natl Acad Sci USA 82:6609–6613 Nandasena KG, O’Hara GW, Tiwari RP, Sezmis¸ E, Howieson JG (2007) In situ lateral transfer of symbiosis islands results in rapid evolution of diverse competitive strains of mesorhizobia suboptimal in symbiotic nitrogen fixation on the pasture legume Biserrula pelecinus L. Environ Microbiol 9:2496–2511 Oldroyd GED, Harrison MJ, Udvardi M (2005) Peace talks and trade deals. Keys to long-term harmony in legume-microbe symbioses. Plant Physiol 137:1205–1210 Ormen˜o-Orrillo E, Vinuesa P, Zuniga-Davila D, Martinez-Romero E (2006) Molecular diversity of native bradyrhizobia isolated from Lima bean (Phaseolus lunatus L.) in Peru. Syst Appl Microbiol 29:253–262 Parker MA (2004) rRNA and dnaK relationships of Bradyrhizobium sp. nodule bacteria from four Papilionoid legume trees in Costa Rica. Syst Appl Microbiol 27:334–342 Perret X, Staehelin Ch, Broughton WJ (2000) Molecular basis of symbiotic promiscuity. Microbiol Mol Biol Rev 64:180–201 Planet P, Jagoueix S, Bove JM, Garnier M (1995) Detection and characterization of the African citrus greening Liberobacter by amplification, cloning, and sequencing of the rplKAJL-rpoBC operon. Curr Microbiol 30:137–141 Provorov NA, Vorobyov NI (2008) Equilibrium between the “genuine mutualists” and “symbiotic cheaters” in the bacterial population co-evolving with plants in a facultative symbiosis. Theor Popul Biol 74:345–355 Qian J, Kwon S, Parker MA (2003) rRNA and nifD phylogeny of Bradyrhizobium from sites across the Pacific Basin. FEMS Microbiol Lett 219:159–165 Ramı´rez-Bahena MH, Garcı´a-Fraile P, Peix A, Valverde A, Rivas R, Igual JM, Mateos PF, Martı´nez-Molina E, Vela´zquez E (2008) Revision of the taxonomic status of the species Rhizobium leguminosarum (Frank 1879) Frank 1889AL, Rhizobium phaseoli Dangeard 1926AL and Rhizobium trifolii Dangeard 1926AL. R. trifolii is a later synonym of R. leguminosarum. Reclassification of the strain R. leguminosarum DSM 30132 (¼NCIMB 11478) as Rhizobium pisi sp. nov. Int J Syst Evol Microbiol 58:2484–2490 Relic B, Perret X, Estrada-Garcia MT, Kopcinska J, Golinowski W, Krishnan HB, Pueppke SG, Broughton WJ (1994) Nod factors of Rhizobium are a key to the legume door. Mol Microbiol 13:171–178

314

J.C. Martı´nez-Romero et al.

Rincon-Rosales R, Lloret L, Ponce E, Martinez-Romero E (2009) Rhizobia with different symbiotic efficiencies nodulate Acaciella angustissima in Mexico, including Sinorhizobium chiapanecum sp. nov. which has common symbiotic genes with Sinorhizobium mexicanum. FEMS Microbiol Ecol 68:255–255 Rivas R, Velazquez E, Willems A, Vizcaino N, Subba-Rao NS, Mateos PF, Gillis M, Dazzo FB, Martinez-Molina E (2002) A new species of Devosia that forms a unique nitrogen-fixing rootnodule symbiosis with the aquatic legume Neptunia natans (L.f.) Druce. Appl Environ Microbiol 68:5217–5222 Roche P, Maillet F, Plazanet C, Debelle F, Ferro M, Truchet G, Prome J-C, Denarie J (1996) The common nodABC genes of Rhizobium meliloti are host-range determinants. Proc Natl Acad Sci USA 93:15305–15310 Rogel MA, Torres C, Lloret L, Rosenblueth M, Herna´ndez-Lucas I, Martı´nez L, Martı´nez J, Martı´nez-Romero E (2006) Lateral transfer of Rhizobium symbiotic plasmids leading to genomic innovation. In: Sa´nchez F, Quinto C, Lo´pez-Lara IM, Geiger O (eds) Biology of plant–microbe interactions, vol 5. International Society for Molecular Plant–Microbe Interactions, St. Paul, USA, pp 310–318 Roncato-Maccari LDB, Ramos HJO, Pedrosa FO, Alquini Y, Chubatsu LS, Yates MG, Rigo LU, Steffens MBR, Souza EM (2003) Endophytic Herbaspirillum seropedicae expresses nif genes in gramineous plants. FEMS Microbiol Ecol 45:39–47 Rosenblueth M, Martinez L, Silva J, Martinez-Romero E (2004) Klebsiella variicola, a novel species with clinical and plant-associated isolates. Syst Appl Microbiol 27:27–35 Sachman-Ruiz B, Castillo-Rodal AI, Lo´pez-Vidal Y, Martı´nez-Romero E, Vinuesa P (2009) Diversity of environmental mycobacteria in Mexican rivers assessed by cultivation and metagenomics approaches. In: 109th General Meeting, American Society for Microbiology, May 17–21, 2009, Philadelphia, Pennsylvania Sessitsch A, Howieson JG, Perret X, Antoun H, Martinez-Romero E (2002) Advances in Rhizobium research. Crit Rev Plant Sci 21:323–378 Silva C, Vinuesa P, Eguiarte LE, Souza V, Martinez-Romero E (2005) Evolutionary genetics and biogeographic structure of Rhizobium gallicum sensu lato, a widely distributed bacterial symbiont of diverse legumes. Mol Ecol 14:4033–4050 Sprent JI (1997) Co-evolution of legume-rhizobial symbioses:is it essential for either partner? In: Legocki A, Bothe H, P€ uhler A (eds) Biological fixation of nitrogen for ecology and sustainable agriculture. Springer, Heidelberg, Germany, pp 313–316 Sprent JI (2001) Nodulation in legumes. Royal Botanic Gardens, Kew, UK Staehelin C, Schultze M, Kondorosi E, Mellor RB, Boller T, Kondorosi A (1994) Structural modifications in Rhizobium meliloti Nod factors influence their stability against hydrolysis by root chitinases. Plant J 5:319–330 Steenkamp ET, Stepkowski T, Przymusiak A, Botha WJ, Law IJ (2008) Cowpea and peanut in southern Africa are nodulated by diverse Bradyrhizobium strains harboring nodulation genes that belong to the large pantropical clade common in Africa. Mol Phylogenet Evol 48:1131–1144 Stepkowski T, Hughes CE, Law IJ, Markiewicz L, Gurda D, Chlebicka A, Moulin L (2007) Diversification of lupine Bradyrhizobium strains: evidence from nodulation gene trees. Appl Environ Microbiol 73:3254–3264 Sullivan JT, Ronson CW (1998) Evolution of rhizobia by acquisition of a 500-kb symbiosis island that integrates into a phe-tRNA gene. Proc Natl Acad Sci USA 95:5145–5149 Sullivan JT, Patrick HN, Lowther WL, Scott DB, Ronson CW (1995) Nodulating strains of Rhizobium loti arise through chromosomal symbiotic gene transfer in the environment. Proc Natl Acad Sci USA 92:8985–8989 Sullivan JT, Trzebiatowski JR, Cruickshank RW, Gouzy J, Brown SD, Elliot RM, Fleetwood DJ, McCallum NG, Rossbach U, Stuart GS, Weaver JE, Webby RJ, de Bruijn FJ, Ronson CW (2002) Comparative sequence analysis of the symbiosis island of Mesorhizobium loti strain R7A. J Bacteriol 184:3086–3095

18

Trends in Rhizobial Evolution and Some Taxonomic Remarks

315

Suominen L, Roos C, Lortet G, Paulin L, Lindstroem K (2001) Identification and structure of the Rhizobium galegae common nodulation genes: evidence for horizontal gene transfer. Mol Biol Evol 18:907–916 Terefework Z, Lortet G, Suominenl LK (2000) Molecular evolution of interactions between rhizobia and their legume hosts. In: Triplett E (ed) Prokaryotic nitrogen fixation: a model for analysis of a biological process. Horizon Scientific Press, Norfolk, England, pp 187–206 Tian CF, Wang ET, Han TX, Sui XH, Chen WX (2007) Genetic diversity of rhizobia associated with Vicia faba in three ecological regions of China. Arch Microbiol 188:273–282 Tian CF, Wang ET, Wu LJ, Han TX, Chen WF, Gu CT, Gu JG, Chen WX (2008) Rhizobium fabae sp. nov., a bacterium that nodulates Vicia faba. Int J Syst Evol Microbiol 58:2871–2875 Toledo I, Lloret L, Martı´nez-Romero E (2003) Sinorhizobium americanum sp. nov., a new Sinorhizobium species modulating native Acacia spp. in Mexico. Syst Appl Microbiol 26:54–64 Valverde A, Velazquez E, Fernandez-Santos F, Vizcaino N, Rivas R, Mateos PF, Martinez-Molina E, Igual JM, Willems A (2005) Phyllobacterium trifolii sp. nov., nodulating Trifolium and Lupinus in Spanish soils. Int J Syst Evol Microbiol 55:1985–1989 Vazquez M, Davalos A, de las Pen˜as A, Sanchez F, Quinto C (1991) Novel organization of the common nodulaiton genes in Rhizobium leguminosarum bv. phaseoli strains. J Bacteriol 173:1250–1258 Vinuesa P, Leo´n-Barrios M, Silva C, Willems A, Jarabo-Lorenzo A, Pe´rez-Galdona R, Werner D, Martı´nez-Romero E (2005) Bradyrhizobium canariense sp. nov., an acid-tolerant endosymbiont that nodulates endemic genistoid legumes (Papilionoideae: Genisteae) from the Canary Islands, along with Bradyrhizobium japonicum bv. genistearum, Bradyrhizobium genospecies alpha and Bradyrhizobium genospecies beta. Int J Syst Evol Microbiol 55:569–575 Wang ET, Martı´nez-Romero E (2000) Phylogeny of root- and stem-nodule bacteria associated with legumes. In: Triplett E (ed) Prokaryotic nitrogen fixation: a model for analysis of a biological process. Horizon Scientific Press, Norfolk, England, pp 177–186 Wang ET, Rogel MA, Garcı´a-De los Santos A, Martı´nez-Romero J, Cevallos MA, Martı´nezRomero E (1999a) Rhizobium etli bv. mimosae, a novel biovar isolated from Mimosa affinis. Int J Syst Bacteriol 49:1479–1491 Wang ET, van Berkum P, Sui XH, Beyene D, Chen WX, Martinez-Romero E (1999b) Diversity of rhizobia associated with Amorpha fruticosa isolated from Chinese soils and description of Mesorhizobium amorphae sp. nov. Int J Syst Bacteriol 49:51–65 Wernegreen JJ, Riley MA (1999) Comparison of the evolutionary dynamics of symbiotic and housekeeping loci: a case for the genetic coherence of rhizobial lineages. Mol Biol Evol 16:98–113 Young JPW, Johnston AWB (1989) The evolution of specificity in the legume-Rhizobium symbiosis. Trends Ecol Evol 4:341–349 Young JM, Kuykendall LD, Martinez-Romero E, Kerr A, Sawada H (2001) A revision of Rhizobium Frank 1889, with an emended description of the genus, and the inclusion of all species of Agrobacterium Conn 1942 and Allorhizobium undicolade Lajudie et al. 1998 as new combinations: Rhizobium radiobacter, R. rhizogenes, R. rubi, R. undicola and R. vitis. Int J Syst Evol Microbiol 51:89–103 Young JPW, Mutch LA, Ashford DA, Ze´ze´ A, Mutch KE (2003) The molecular evolution of host specificity in the Rhizobium-legume symbiosis. In: Hails R, Godfray HJC, Beringer JE (eds) Genes in the environment. Blackwell Science, Oxford, pp 245–257 Young JPW, Crossman LC, Johnston AWB, Thomson NR, Ghazoui ZF, Hull KH, Wexler M, Curson ARJ, Todd JD, Poole PS, Mauchline TH, East AK, Quail MA, Churcher C, Arrowsmith C, Cherevach I, Chillingworth T, Clarke K, Cronin A, Davis P, Fraser A, Za H, Hauser H, Jagels K, Moule S, Mungall K, Norbertczak H, Rabbinowitsch E, Sanders M, Simmonds M, Whitehead S, Parkhill J (2006) The genome of Rhizobium leguminosarum has recognizable core and accessory components. Genome Biol 7:R34

Chapter 19

Convergent Evolution of Morphogenetic Processes in Fungi Sylvain Brun and Philippe Silar

Abstract Eumycetes fungi are a diverse group of organisms whose evolution is characterized by frequent changes in nutritional strategy and the corresponding developmental programs. The reasons for this versatility are unknown. We previously discovered that the NADPH oxidase Nox2 and the tetraspanin Pls1 are used in two radically different cell types to achieve the same purpose: exiting from a reinforced cell, suggesting that convergent evolution of morphogenetic processes could account for the repetitive switches in trophic modes during fungal evolution. However, we recently observed that saprobic fungi are also able to differentiate appressorium-like structure closely resembling those of phytopathogenic species, arguing that the ability to differentiate such cells is an ancient property of filamentous fungi. Adaptation of parasitic and mutualistic fungi to plant may thus not solely reside in their ability to penetrate their host.

19.1

Introduction

Fungi belonging to the Eumycetes (Opisthokonta) are a great success of evolution. Their ancestors switched from phagotrophy, the original eukaryotic trophic mode, to osmotrophy likely a billion years ago (McLaughlin et al. 2009). Since then they have diversified into hundreds of thousands species and possibly much more (Hawksworth 1991). They have invaded nearly all biotopes, from the deepest depths of the oceans to the top of the highest mountains all around the globe. They are even found in the arctic soils that remain frozen most of the years (Schadt et al. 2003). Their total biomass is huge and they greatly impact on their environment. They live either in parasitic or in mutualistic symbiosis with other organisms, S. Brun and P. Silar UFR des Sciences du Vivant, Universite´ de Paris 7 – Denis Diderot, 75205 Paris Cedex 13, France Institut de Ge´ne´tique et Microbiologie, UMR CNRS – Universite´ de Paris 11, UPS Baˆt. 400, 91405 Orsay cedex, France e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_19, # Springer-Verlag Berlin Heidelberg 2010

317

318

S. Brun and P. Silar

or as free living saprobes. The saprobes participate in the global carbon cycle, especially they degrade highly recalcitrant materials that no other organism may and regulate soil health by producing humic acids. As mutualistic symbionts, the mycorhizal and endophytic fungi increase plant fitness and those present inside the digestive tract enable many insects and mammalian herbivores to use the hard-todigest plant materials as food. Similarly, the mutualistic lichens are an important component of many extreme biotopes. Parasitic fungi are known for nearly all organisms (even fungi!), but they are especially important for plants and insects. These have a tremendous impact on the dynamics of natural populations but also on domesticated plants and animals. The feeding, dispersal, and “behavioral” diversity of fungi is such that complete books are required to describe it (Webster 2007). Because of their importance, scientific programs aimed at better understanding the evolution and biology of fungi have been launched. The aftol (Assembling the Fungal Tree of Life) used multigene trees to resolve their phylogeny (James et al. 2006) and proposed a new classification (Hibbett et al. 2007). Numerous genomic programs have established sequences from a great diversity of fungi (see, for example, http://genome.jgi-psf.org/, http://www.broadinstitute.org/science/projects/fungalgenome-initiative/current-fgi-sequence-projects, http://www.genoscope.cns.fr/ spip/Fungi-sequenced-at-Genoscope.html). The data show that fungi are highly diverse (McLaughlin et al. 2009). For example, the genetic diversity of fungi belonging to related families or even to the same family may exceed that of animals from different classes (Dujon 2005; Espagne et al. 2008).

19.2

The Versatility of Fungal Development

An important point that emerges from phylogenetic studies is the versatility with which fungi may switch their trophic modes and “invent” repeatedly the same structures (James et al. 2006). For instance, saprobic and symbiotic fungi may exist within the same genus, and within the same class, saprotrophy, plant pathogeny, lichen symbiosis, and other trophic modes may evolve. Similarly, plant pathogens and mutualists invade their host plant by many means, one of which involves the in-force breaking of the plant cuticule and/or cell wall. To do this, fungi differentiate special cells called appressoria (Deising et al. 2000). These come in different sizes and shapes and their origin may be quite different. For example, in Magnaporthe grisea, a hemi-biotrophic parasite of rice and barley, the appressorium develops at the extremity of a dedicated hypha that is produced by a three-celled spore issued from asexual reproduction. In this species, appressoria are heavily melanized round cells with a very well-defined structure, from which the penetration peg emerges (Fig. 19.1). In Botrytis cinerea, appressorium-like structures are also produced at the extremity of an hypha that originates from a spore issued from asexual reproduction, but this spore has only one cell and the appressorium is no more than a specialized hypha slightly reinforced at its tip, which is able to orient its growth toward plant wall and to penetrate it, thanks to a penetration peg (Fig. 19.1).

19 Convergent Evolution of Morphogenetic Processes in Fungi

319

Fig. 19.1 Ontogeny of ascospores, appressorium, and appressorium-like structures. Sexual reproduction results in one-celled hyaline ascospores in B. cinerea, four-celled hyaline ascospores in M. grisea, and two-celled melanised ascospores with a germ pore in P. anserina. In this latter species, a cell death has occurred during ascospore differentiation. Appressorium is a roundish heavily melanized structure in M. grisea, while it is no more than a reinforced hyphae in B. cinerea. The similarity between P. anserina ascospore and M. grisea appressorium ontogenies are highlighted by arrows

320

S. Brun and P. Silar

These structures are thus qualified as “appressoria-like” rather than as true appressoria. M. grisea and B. cinerea belong to two different classes of ascomycetes, the Sordariomycetes and Leotiomycetes, respectively. In these classes, numerous species are known to live as saprobes, which seemingly do not differentiate appressoria as they do not need to penetrate host plants. Thus, the question raised is whether the utilization of appressoria to penetrate plants is the result of convergent evolution by plant pathogens or whether it reflects an ancient ability of fungi to differentiate penetration structures that would have been lost in saprobes. Spore is another fungal structure (along with the fruiting body) that exhibits many convergent evolutions. Spores are issued either from sexual (basidiospores, ascospores. . .) or from asexual (conidia. . .) reproduction and constitute an important part of the life cycle, since they enable fungi to disperse efficiently and to resist to adverse conditions. They come in many shapes, sizes, and colors and have been used in the past to classify fungi. For example, Podospora anserina, a model ascomycete produces heavily melanized ascospores that germinate in a regulated manner through a germ pore (Fig. 19.1). These are in fact constituted of two cells, one of which has undergone a cell death. Neurospora crassa produces one-celled striated ascospores with two germ pores located at the opposite poles, while M. grisea ascospores are composed of four hyaline cells and lack a germ pore (Fig. 19.1). Those of B. cinerea are composed of a single hyaline cell (Fig. 19.1). Yet, spore evolution appears filled with convergence. For example, in some Sordariomycetes, the fruiting body wall is a better descriptor of evolution than ascospore shape (Miller and Huhndorf 2005). Similarly, a germ pore is present in some species for both basidiomycetes and ascomycetes and is absent in others. The molecular basis for the versatility of fungi in switching trophic modes and developments is unknown. The only documented instance is for a change from mycoparasitism to saprotrophy in the genus Trichoderma. Indeed, there is evidence for a horizontal transfer of a cluster of genes involved in nitrate assimilation from a basidiomycete related to Ustilago maydis to the ascomycete Trichoderma reesei, whereas the other members of the Trichoderma genus appear to lack the cluster. This has been correlated with the fact that T. reesei is the only Trichoderma living as a saprobe in woody materials, while the other members of the genus are mycoparasites (Slot and Hibbett 2007). The nitrate assimilation cluster would enable T. reesei to efficiently scavenge nitrogen in wood, while the other Trichodermas must obtain it from their host, accounting for the trophic change. Trichoderma may parasitize basidiomycetes, favoring perhaps the gene transfer in the ancestors of T. reesei.

19.3

Are Appressoria and Appressorium-Like Structures the Result of Convergent Evolution?

We discovered serendipitously a possible convergent evolution of morphogenetic processes impacting on trophic strategy in filamentous fungi by studying the role of the Pls1 tetraspanin and the Nox2 NADPH oxidase (Nox) in the saprobic fungus

19 Convergent Evolution of Morphogenetic Processes in Fungi

321

P. anserina. Tetraspanin are membrane-bound proteins, whose roles are not yet completely clear (Veneault-Fourrey et al. 2006b). In fungi, tetraspanin of the Pls1 family have been at first unraveled as virulence factors in three different plant pathogenic species. In M. grisea, B. cinerea, and Colleototrichum lindemuthianum, the Pls1 mutants are blocked at the penetration step; the appressorium appears normal but penetration pegs are not produced (Clergeot et al. 2001; Gourgues et al. 2004; Veneault-Fourrey et al. 2005). This was taken as the indication for a specific role of Pls1 tetraspanin in phytopathogenic fungi. Yet, orthologues of Pls1 are present in saprobic fungi, including P. anserina (Lambou et al. 2008). Tetraspanins share the same membrane localization as Nox. Nox are membrane-bound enzymes that generate superoxide ions in exchange of consumption of NADPH. Several years ago, we proposed that the ancient role of Nox (and of the ROS they produce) was the sensing of the environment and cell-to-cell communication (Lalucque and Silar 2003). And indeed, these enzymes have now been shown to play key roles in development, pathogeny, symbiosis, and defense in a broad range of Eukaryotes (Lara-Ortiz et al. 2003; Malagnac et al. 2004; Aguirre et al. 2005; Silar 2005; Takemoto et al. 2007). There is presently three Nox isoforms known in fungi (see Table 19.1 for an update on Nox genes in fungal genomes) and all data argue that they do not fulfill redundant roles (Takemoto et al. 2007). In particular, in two saprobic fungi, P. anserina and N. crassa, the Nox2 isoform seems to be more specifically dedicated to regulate melanized ascospore germination (Malagnac et al. 2004; Cano-Dominguez et al. 2008). Indeed, both fungi produce melanized ascospores and, in both species, Nox2 mutant ascospores do not germinate. Furthermore, when P. anserina ascospore melanin is removed, the Nox mutant ascospores germinate efficiently but in a nonregulated manner (Malagnac et al. 2004). Accordingly, Nox2 appears dispensable for the germination of B. cinerea ascospores, which are not melanized (Segmuller et al. 2008). When we deleted the PaPls1 gene of P. anserina, we discovered that the DPaPls1 mutants had the same ascospore germination defects as the PaNox2 mutants (Lambou et al. 2008). Again, removal of melanin in PaPls1 mutant ascospores suppressed the germination default, leading to unregulated germination. Interestingly, the Nox2 isoforms are necessary for plant penetration in M. grisea and B. cinerea (Egan et al. 2007; Segmuller et al. 2008). Additionally, Pls1 is dispensable for the germination of the M. grisea nonmelanized ascospores (Lambou et al. 2008). These data suggest that Pls1 and Nox2 may act together. This finding is supported by the fact that both proteins are either present or absent in fungal genomes (Table 19.1, Fig. 19.2). In lower fungi, the coevolution is not clear. However, Pls1 tetraspanins are small proteins that evolve rapidly, impairing their detection in very divergent genomes by using ordinary tools. In the “higher fungi”, i.e., Ascomycetes and Basidiomycetes, the repartition of Pls1 and Nox2 is best accounted for by at least nine independent losses of both genes during evolution (Fig. 19.2). As the Pls1 and Nox2 genes are not linked in the genomes, these data provide a strong argument for their acting in the same processes (Loganantharaj and Atwi 2007). Both proteins may act together in a complex located at the plasma

322

S. Brun and P. Silar

Table 19.1 Occurrence of Nox1, Nox2, Nox3, and Pls1 in Eumycota Fungal species Ascomycota Pezizomycotina Sordariomycetes

Leotiomycetes

Eurotiomycetes

Dothideomycetes

Podospora anserina Sporotrichum thermophile Thielavia terrestris Chaetomium globosum Neurospora tetrasperma Neurospora discreta Neurospora crassa Magnaporthe grisea Cryphonectria parasitica Grosmannia clavigera Fusarium graminearum Fusarium verticillioides Fusarium oxysporum Haematonectria (Nectria) haematococca Epichloe¨ festucae Trichoderma atroviride Trichoderma reesei Trichoderma virens Verticillium dahliae Verticillium albo-atrum Colletotrichum graminicola Sclerotinia sclerotiorum Botrytis cinerea Blumeria graminis Aspergillus oryzae Aspergillus flavus Aspergillus terreus Aspergillus carbonarius Aspergillus niger Aspergillus fumigatus Neosartorya fischeri Aspergillus clavatus Aspergillus nidulans Penicillium chrysogenum Talaromyces stipitatus Penicillium marneffei Histoplasma capsulatum Paracoccidioides brasiliensis Blastomyces dermatitidis Uncinocarpus reesii Coccidioides posadasii Coccidioides immitis Arthroderma gypseum Microsporum canis Trichophyton tonsurans Trichophyton rubrum Trichophyton equinum Ascosphaera apis Mycosphaerella graminicolla Mycosphaerella fijiensis Cochliobolus heterostrophus Alternaria brassicola

Nox1/ NoxA

Nox2/ NoxB

Nox3/ NoxC

Pls1

1 1 1 1 1 1 1 1 1 1 1 1 1 2

1 1 1 1 1 1 1 1 1 1 1 1 1b 1

1 0 0 0 0 0 0 1 0 0 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 2 1 1 1 1b 1 1 + 1b 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1

1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1

0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0

1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1

(continued)

19 Convergent Evolution of Morphogenetic Processes in Fungi

323

Table 19.1 (continued) Fungal species

Saccharomycotina

Taphrinomycotina

Basidiomycota Ustilaginomycotina Agaricomycotina Agaricomycetes

Tremellomycetes Pucciniomycotina

“Lower fungi” Mucoromycotina

Pyrenophora tritici Stagonospora nodorum Saccharomyces cerevisiae Candida glabrata Zygosaccharomyces rouxii Saccharomyces kluyveri Kluyveromyces thermotolerans Kluyveromyces lactis Ashbya gossypii Candida albicans Debaryomyces hansenii Yarrowia lipolytica Schizosaccharomyces japonicus Schizosaccharomyces pombe Schizosaccharomyces octosporus Pneumocystis carinii

Nox1/ NoxA 1 1b 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Nox2/ NoxB 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Nox3/ NoxC 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Ustilago maydis Malassezia globosa

0 0

0 0

0 0

0 0

Heterobasidion annosum Schizophyllum commune Coprinopsis cinerea Laccaria bicolor Postia placentaa Pleurotus ostreatus Phanerochaete chrysosporium Cryptococcus neoformans Tremella mesenterica Sporobolomyces roseus Melampsora larici-populina Puccinia graminis

1 1 1 1 1 1 1 0 1 1 3 1

1 1 1 1 1 1 1 0 0 0 2 1

0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 0 0 0 1 1

0 0 0 0 0 0 1(1) 1 1

0 0 0 0 0 0 1 (4) 1 1

0 0 0 0 0 0 0 0 0

1? 1? 1? 0 0 0 ? ? ?

Rhizopus oryzae Mucor circinelloides Phycomyces blakesleeanus Microsporidia Encephalitozoon cuniculi Antonospora locustae Nosema ceranea Blastocladiomycetes Allomyces macrogynus Chytridiomycetes Spizellomyces punctatus Batrachochytrium dendrobatidis

Pls1

a BLAST analysis detects two very similar copies for this species. However, the P. placenta project sequenced the genome of a dikaryon (http://genome.jgi-psf.org/Pospl1/Pospl1.home.html). The two copies are likely the different alleles present in each haploid genome b Genome sequence with an incomplete or erroneous gene sequence. Pseudogenes are in parenthesis

membrane and despite varying fungal habitat and/or physiological diversity, the function of this complex might have been conserved in the different lineages. The second striking conclusion is that melanized ascospore germination requires the same proteins as the formation of the penetration peg from appressoria. When compared (Fig. 19.1), these two processes appear noticeably similar in P. anserina

324

S. Brun and P. Silar Sordariales

*

Magnaporthales Diaporthales Sordariomycetes Leotiomycetes

*

Ophiostomatales Hypocreales Ascosphaera

Eurotiomycetes

Pezizomycotina

Ascomycota

Onygenales Eurotiales

P.c

Capnodiales Saccharomycotina

Dothideomycetes Pleosporales

Taphrinomycotina Agaricomycetes Agaricomycotina

appressorium-like structures

Tremellomycetes Basidiomycota

Ustilaginomycotina Microbotryomycetes Pucciniomycotina

Mucoromycotina

?

Microsporidia

Pucciniomycetes

R.o Lower Fungi

Blastocladiomycota Chrytridiomycota

Fig. 19.2 Phylogenetic tree of Eumycetes. The tree shows the fungal groups for which complete genome sequences are available. The nine vertical arrows locate the loss of Pls1 and Nox2. Asterisks (*) indicate the two groups for which the Pls1 and Nox2 proteins have been recruited for the same goal (exiting a reinforced structure) in two cell types: the ascospores in Sordariales (P. anserina and N. crassa) and the appressorium in Magnaporthales (M. grisea). Possible appearance of appressorium-like structures occurred very early during fungal evolution, however, at a yet undefined moment. Fungi unable to differentiate appressorium-like structure are indicated by P.c (Penicillium chrysogenum) and R.o (Rhizopus oryzae)

and M. grisea. Indeed, during the ontogeny of appressoria and ascospores, there is a programmed cell death event (Beckett et al. 1968; Veneault-Fourrey et al. 2006a). When the structures are formed they are both heavily melanized and both contain a pore from which a peg is produced (Beckett et al. 1968; Deising et al. 2000). We thus speculated that the same program was used by the two species to achieve the same mean (exiting from a melanized structure). This provides a nice example of the reutilisation of the same proteins to achieve a similar morphogenetic goal in two different cell types. We also speculated that this process could be recruited repeatedly during evolution to achieve the same mean, i.e., penetrate plants. If so, appressoria from different fungi would be due to convergent evolution. However, we recently obtained data that call off this statement. Indeed, we recently discovered that Nox2 and Pls1 are involved in a novel developmental stage in P. anserina: the development of appressorium-like cells involved in plant material penetration (Brun et al. 2009).

19 Convergent Evolution of Morphogenetic Processes in Fungi

19.4

325

Differentiating Appressorium-Like Structures Could Be an Ancient Property of Fungi

During our studies on Nox2 and Pls1, we noticed that in addition to their ascospore germination default, the null mutants of both genes presented a defect in the production of fruiting bodies, specifically when grown on cellulose as sole carbon source (Malagnac et al. 2008). This prompted us to investigate in more details the cellulose degradation process in P. anserina (Brun et al. 2009). When cellophane is provided as food source, P. anserina is able to orient its growth toward the cellophane layer. Upon contacting cellophane, it differentiates a structure that greatly resembles B. cinerea pseudo-appressorium. Even more striking is the similarity between the appressorium-like phenotypes of B. cinerea and P. anserina Pls1 and Nox2 mutants (Segmuller et al. 2008; Brun et al. 2009). In both species, these mutants are impaired at the reorientation step toward the substrate (onion skin and cellophane, respectively), which is a prerequisite for penetration. In both species, mutant hyphae tend to “hesitate” in the direction to grow. Then, they establish loose contacts with the substrate and finally are completely defective in penetrating it. Nonetheless, the setting up of fully functional penetration structures is not only under the control of Nox2 and Pls1 but also require the Nox1 isoform (Egan et al. 2007; Giesbert et al. 2008; Brun et al. 2009). In the view of this new finding, we speculate that the ability to differentiate cellular structure dedicated to penetrate plant materials might be an ancient property of filamentous fungi (at least ascomycetes and basidiomycetes), which is used in saprobes to efficiently degrade dead plants, and more aggressively in phytopathogens to penetrate their hosts. To test this possibility, we have evaluated the ability of several additional fungi to differentiate penetration structures on cellophane (see Fig. 19.3 for an example). A variety of structures permitting to breach the cellophane were indeed produced by a wide spectrum of fungi (several Sordariomycetes and Agaricomycetes; S. Brun and P. Silar, unpublished data). Presently, we did not detect such structures in two species, Penicillium chrysogenum and Rhizopus oryzae (Fig. 19.3). Significantly, both fungi lack Nox2 and Pls1 (Table 19.1, Fig. 19.2), confirming the crucial role of the two proteins in the differentiation of appressorium-like cells. Therefore, a wide range of fungi seem to possess the toolkit necessary to breach the plant cell wall. The patchy phylogenetic repartition of species known to produce appressoria and related structure could thus be due to biased sampling toward parasitic and mutualist plant symbionts in studies dealing with appressorium formation. However, some species may truly be unable to differentiate these structures: those that have lost Pls1 and Nox2. In other words, there is no need to invoke complex convergent evolution of fungal structures to explain the recurrent change in trophic lifestyle. Evidence is arising which confirms a role of ROS and Nox in polarized hyphal growth (Semighini and Harris 2008) and we believe that the ability of fungi to attack and penetrate plant materials may simply rely on sensing the glucose gradient created by the enzymatic degradation of the polysaccharides composing the plant cell wall, i.e., cellulose and hemicellulose (Brun et al. 2009). More generally, we believe that if

326

S. Brun and P. Silar

Fig. 19.3 Cellophane breach. Four days old mycelia of P. anserina (P. a), Trichoderma species (T. sp), Penicillium chrysogenum (P. c), and Rhizopus oryzae (R. o) were observed as described (Brun et al. 2009). Numbers indicate the distance from the first picture in mm as depicted by the arrows on the schemes on the right. In the first column, mycelia of all the strains are growing horizontally on the cellophane layer. In the second column, mycelia of P. anserina and T. species reorient their growth toward the cellophane and establish bulging contacts (some examples are indicated by arrows). In P. chrysogenum and R. oryzae, there is no reorientation toward cellophane, though rare contact may occur. In the third column, needle-like hyphae (some examples are indicated by asterisk) are emitted in P. anserina and T. species, which allow both fungi to penetrate into the cellophane layer. In contrast, P. chrysogenum and R. oryzae cannot penetrate cellophane. In the fourth column, schematic representation of the structures; the arrows points toward the approximate focal plan of the first three columns and the eye indicates the direction of the observation

this simple model is true, penetration structures under the control of Nox2/Pls1 should be found not only for phytopathogens and saprobes, but also for entomopathogens (for cuticle breaching) as well as for fungal parasites such as Trichoderma sp. (for chitin-based cell walls penetration) and possibly for human pathogens. We thus now need to confirm on a larger sample if the correlation between the ability to build these structures and the conservation of Nox2/Pls1 holds true. Acknowledgments This work was supported by ANR grant n ANR-05-Blan-0385-02.

19 Convergent Evolution of Morphogenetic Processes in Fungi

327

References Aguirre J, Rios-Momberg M, Hewitt D, Hansberg W (2005) Reactive oxygen species and development in microbial eukaryotes. Trends Microbiol 13:111–118 Beckett A, Barton R, Wilson IM (1968) Fine structure of the wall and appendage formation in ascospores of Podospora anserina. J Gen Microbiol 53:89–94 Brun S, Malagnac F, Bidard F, Lalucque H, Silar P (2009) Functions and regulation of the Nox family in the filamentous fungus Podospora anserina: a new role in cellulose degradation. Mol Microbiol 74:480–496 Cano-Dominguez N, Alvarez-Delfin K, Hansberg W, Aguirre J (2008) NADPH oxidases NOX-1 and NOX-2 require the regulatory subunit NOR-1 to control cell differentiation and growth in Neurospora crassa. Eukaryot Cell 7:1352–1361 Clergeot PH, Gourgues M, Cots J, Laurans F, Latorse MP, Pepin R, Tharreau D, Notteghem JL, Lebrun MH (2001) PLS1, a gene encoding a tetraspanin-like protein, is required for penetration of rice leaf by the fungal pathogen Magnaporthe grisea. Proc Natl Acad Sci USA 98:6963–6968 Deising HB, Werner S, Wernitz M (2000) The role of fungal appressoria in plant infection. Microbes Infect 2:1631–1641 Dujon B (2005) Hemiascomycetous yeasts at the forefront of comparative genomics. Curr Opin Genet Dev 15:614–620 Egan MJ, Wang ZY, Jones MA, Smirnoff N, Talbot NJ (2007) Generation of reactive oxygen species by fungal NADPH oxidases is required for rice blast disease. Proc Natl Acad Sci USA 104:11772–11777 Espagne E, Lespinet O, Malagnac F, Da Silva C, Jaillon O, Porcel BM, Couloux A, Aury JM, Segurens B, Poulain J, Anthouard V, Grossetete S, Khalili H, Coppin E, Dequard-Chablat M, Picard M, Contamine V, Arnaise S, Bourdais A, Berteaux-Lecellier V, Gautheret D, de Vries RP, Battaglia E, Coutinho PM, Danchin EG, Henrissat B, Khoury RE, Sainsard-Chanet A, Boivin A, Pinan-Lucarre B, Sellem CH, Debuchy R, Wincker P, Weissenbach J, Silar P (2008) The genome sequence of the model ascomycete fungus Podospora anserina. Genome Biol 9:R77 Giesbert S, Schurg T, Scheele S, Tudzynski P (2008) The NADPH oxidase Cpnox1 is required for full pathogenicity of the ergot fungus Claviceps purpurea. Mol Plant Pathol 9:317–327 Gourgues M, Brunet-Simon A, Lebrun MH, Levis C (2004) The tetraspanin BcPls1 is required for appressorium-mediated penetration of Botrytis cinerea into host plant leaves. Mol Microbiol 51:619–629 Hawksworth DL (1991) The fungal dimension of biodiversity: magnitude, significance, and conservation. Mycol Res 95:641–655 Hibbett DS, Binder M, Bischoff JF, Blackwell M, Cannon PF, Eriksson OE, Huhndorf S, James T, Kirk PM, Lucking R, Thorsten Lumbsch H, Lutzoni F, Matheny PB, McLaughlin DJ, Powell MJ, Redhead S, Schoch CL, Spatafora JW, Stalpers JA, Vilgalys R, Aime MC, Aptroot A, Bauer R, Begerow D, Benny GL, Castlebury LA, Crous PW, Dai YC, Gams W, Geiser DM, Griffith GW, Gueidan C, Hawksworth DL, Hestmark G, Hosaka K, Humber RA, Hyde KD, Ironside JE, Koljalg U, Kurtzman CP, Larsson KH, Lichtwardt R, Longcore J, Miadlikowska J, Miller A, Moncalvo JM, Mozley-Standridge S, Oberwinkler F, Parmasto E, Reeb V, Rogers JD, Roux C, Ryvarden L, Sampaio JP, Schussler A, Sugiyama J, Thorn RG, Tibell L, Untereiner WA, Walker C, Wang Z, Weir A, Weiss M, White MM, Winka K, Yao YJ, Zhang N (2007) A higher-level phylogenetic classification of the fungi. Mycol Res 111:509–547 James TY, Kauff F, Schoch CL, Matheny PB, Hofstetter V, Cox CJ, Celio G, Gueidan C, Fraker E, Miadlikowska J, Lumbsch HT, Rauhut A, Reeb V, Arnold AE, Amtoft A, Stajich JE, Hosaka K, Sung GH, Johnson D, O’Rourke B, Crockett M, Binder M, Curtis JM, Slot JC, Wang Z, Wilson AW, Schussler A, Longcore JE, O’Donnell K, Mozley-Standridge S, Porter D, Letcher PM, Powell MJ, Taylor JW, White MM, Griffith GW, Davies DR,

328

S. Brun and P. Silar

Humber RA, Morton JB, Sugiyama J, Rossman AY, Rogers JD, Pfister DH, Hewitt D, Hansen K, Hambleton S, Shoemaker RA, Kohlmeyer J, Volkmann-Kohlmeyer B, Spotts RA, Serdani M, Crous PW, Hughes KW, Matsuura K, Langer E, Langer G, Untereiner WA, Lucking R, Budel B, Geiser DM, Aptroot A, Diederich P, Schmitt I, Schultz M, Yahr R, Hibbett DS, Lutzoni F, McLaughlin DJ, Spatafora JW, Vilgalys R (2006) Reconstructing the early evolution of fungi using a six-gene phylogeny. Nature 443:818–822 Lalucque H, Silar P (2003) NADPH oxidase: an enzyme for multicellularity? Trends Microbiol 11:9–12 Lambou K, Malagnac F, Barbisan C, Tharreau D, Lebrun MH, Silar P (2008) A crucial role for the Pls1 tetraspanin during ascospore germination of the saprophytic fungus Podospora anserina. Eukaryot Cell 7:1809–1818 Lara-Ortiz T, Riveros-Rosas H, Aguirre J (2003) Reactive oxygen species generated by microbial NADPH oxidase NoxA regulate sexual development in Aspergillus nidulans. Mol Microbiol 50:1241–1255 Loganantharaj R, Atwi M (2007) Towards validating the hypothesis of phylogenetic profiling. BMC Bioinformatics 8(Suppl 7):S25 Malagnac F, Bidard F, Lalucque H, Brun S, Lambou K, Lebrun MH, Silar P (2008) Convergent evolution of morphogenetic processes in fungi: role of tetraspanins and NADPH oxidases 2 in plant pathogens and saprobes. Commun Integr Biol 1:180–181 Malagnac F, Lalucque H, Lepere G, Silar P (2004) Two NADPH oxidase isoforms are required for sexual reproduction and ascospore germination in the filamentous fungus Podospora anserina. Fungal Genet Biol 41:982–997 McLaughlin DJ, Hibbett DS, Lutzoni F, Spatafora JW, Vilgalys R (2009) The search for the fungal tree of life. Trends Microbiol 17:488–497 Miller AN, Huhndorf SM (2005) Multi-gene phylogenies indicate ascomal wall morphology is a better predictor of phylogenetic relationships than ascospore morphology in the Sordariales (Ascomycota, Fungi). Mol Phylogenet Evol 35:60–75 Schadt CW, Martin AP, Lipson DA, Schmidt SK (2003) Seasonal dynamics of previously unknown fungal lineages in tundra soils. Science 301:1359–1361 Segmuller N, Kokkelink L, Giesbert S, Odinius D, van Kan J, Tudzynski P (2008) NADPH oxidases are involved in differentiation and pathogenicity in Botrytis cinerea. Mol Plant Microbe Interact 21:808–819 Semighini CP, Harris SD (2008) Regulation of apical dominance in Aspergillus nidulans hyphae by reactive oxygen species. Genetics 179:1919–1932 Silar P (2005) Peroxide accumulation and cell death in filamentous fungi induced by contact with a contestant. Mycol Res 109:137–149 Slot JC, Hibbett DS (2007) Horizontal transfer of a nitrate assimilation gene cluster and ecological transitions in fungi: a phylogenetic study. PLoS ONE 2:e1097 Takemoto D, Tanaka A, Scott B (2007) NADPH oxidases in fungi: diverse roles of reactive oxygen species in fungal cellular differentiation. Fungal Genet Biol 44:1065–1076 Veneault-Fourrey C, Barooah M, Egan M, Wakley G, Talbot NJ (2006a) Autophagic fungal cell death is necessary for infection by the rice blast fungus. Science 312:580–583 Veneault-Fourrey C, Lambou K, Lebrun MH (2006b) Fungal Pls1 tetraspanins as key factors of penetration into host plants: a role in re-establishing polarized growth in the appressorium? FEMS Microbiol Lett 256:179–184 Veneault-Fourrey C, Parisot D, Gourgues M, Lauge R, Lebrun MH, Langin T (2005) The tetraspanin gene ClPLS1 is essential for appressorium-mediated penetration of the fungal pathogen Colletotrichum lindemuthianum. Fungal Genet Biol 42:306–318 Webster J (2007) Introduction to fungi, 3rd edn. Cambridge University Press, U.K

Chapter 20

Evolution and Historical Biogeography of a Song Sparrow Ring in Western North America Michael A. Patten

Abstract The Song Sparrow, Melospiza melodia (Aves: Emberizidae), exhibits a greater degree of geographic variation than does any other North American bird species. Detailed morphological work has demonstrated that a subset of the 25 diagnosable subspecies forms a classic ring species in the western United States. The ring’s center is the Sierra Nevada and Mojave Desert in California and adjacent Nevada, and its connecting point is in southeastern California, where an olive and black subspecies of the coastal slope interbreeds sporadically with a gray and rufous subspecies of the arid interior. However, song differences associated with habitat segregation lead to assortative mating between the two subspecies that meet in the Coachella Valley at the southern base of San Gorgonio Pass. Moving clockwise around the ring from the connecting point one finds a gradation of subspecies that become paler, rustier, and grayer. Standard models of ring species evolution imply the connecting point is the region occupied most recently, in this case after sparrows would have spread southward down either side of the mountains and desert. This scenario is plausible given molecular evidence of a glacial refugium on the Queen Charlotte Islands, British Columbia, suggesting that ancestral birds could have moved south in this pattern. By contrast, another postulated refugium is what is now the arid desert of southeastern California or northeastern Baja California, Mexico. This refugium’s location – coupled with a recent meta-analysis of North American hybrid zones that identifies the San Gorgonio Pass region as an ancestral contact zone of coastal and desert fauna – implies that the connecting point is the region occupied earliest, an alternative that would mean the Song Sparrow ring differs fundamentally from one that would have evolved via the standard model. Biogeographical and morphological data support the latter, more radical interpretation,

M.A. Patten Oklahoma Biological Survey and Department of Zoology, University of Oklahoma, 111 E. Chesapeake Street, Norman, OK 73019, USA e‐mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_20, # Springer-Verlag Berlin Heidelberg 2010

329

330

M.A. Patten

but genetic, vocal, ecological, and behavioral data are needed around the ring to determine conclusively which model is best supported.

20.1

Ring Species as a Biogeographic Pattern

A concrete bridge between microevolution and macroevolution, including speciation, continues to elude evolutionary biologists (Mayr 1982; Jablonski 2000; Reznick and Ricklefs 2009). Some researchers have concluded that macroevolution is no more than the accumulated effects of microevolution (Hansen and Martins 1996; Simons 2002), whereas others have concluded that macroevolution requires a fundamentally different mechanism (Stanley 1998; Erwin 2000). Ring species may prove to be that crucial bridge (Irwin et al. 2001b). A ring species consists of multiple subspecies whose contiguous geographic ranges encircle a geographic barrier and whose terminal subspecies behave as good biological species where their ranges meet (Cain 1954; Irwin and Irwin 2002; Coyne and Orr 2004). Subspecies around the ring that connect the terminal subspecies grade into each other to form a continuous set of intermediate forms. Because reproductive isolation evolves in the face of gene flow, Mayr (1942:180) referred to ring species as “the perfect demonstration of speciation”, and Cain (1954:141) referred to them as “the clearest evidence of geographical speciation”. But as Coyne and Orr (2004:102) noted, ring species do not demonstrate geographical (¼allopatric) speciation but rather speciation that occurs “through the attenuation of gene flow with distance”. Thus, ring species remain a key to understanding the evolution of reproductive isolation and, therefore, of speciation, and they demonstrate how “small changes can lead to species-level differences” (Irwin et al. 2001b). Lost or conflated in this argument about whether ring species are examples of geographic speciation is a clear distinction between pattern and process. To fit the pattern of a ring species, three conditions must be met (Irwin and Irwin 2002; Joseph et al. 2008; Patten and Pruett 2009): (1) geographic ranges of neighboring subspecies must meet, (2) phenotype and genotype of neighbors must exhibit the effects of intergradation, except for (3) the two subspecies that form the terminal points, which must exhibit a sharp break in phenotype, genotype, ecology, and behavior, enough so that these subspecies behave as good biological species where their ranges meet. Few proposed ring species meet these criteria (Irwin et al. 2001b; Coyne and Orr 2004), and even a weaker criterion, replacing (1) and (2) above, of “a series of progressively intermediate forms must be arranged in a ring” (Patten and Pruett 2009) still excludes many of the proposed ring species. Regardless, if a geographically variable species was found to fit the above criteria, it would be fair to dub it a ring species, immaterial of how the pattern came to be. It also seems fair to conclude that the pattern of phenotypic variation exhibited by a ring species demonstrates that the microevolutionary processes that lead to population differentiation are akin to the processes that lead to speciation, whatever differences there are being only a matter of degree (Irwin et al. 2001b).

20

Evolution and Historical Biogeography of a Song Sparrow Ring

20.2

331

The Evidence for Ring Species

Whether any claimed ring species fits all three criteria outlined above is debatable or unlikely (Coyne and Orr 2004; Martens and P€ackert 2007; Joseph et al. 2008). For example, Irwin et al. (2001b) and Irwin and Irwin (2002) reviewed 23 ring species reported in the scientific literature. Almost all were found wanting in some way, often because reproductive isolation of the terminal points had not been studied but sometimes because gene flow around the ring was unlikely or was known not to occur. In the case of the tsetse fly, Glossina morsitans, the terminal points did not meet in sympatry. Even the two most widely studied examples of putative ring species, the salamander Ensatina eschscholtzii (Stebbins 1957; Wake and Yanev 1986; Wake 2006; Kuchta et al. 2009) and the warbler Phylloscopus trochiloides (Mayr 1942; Irwin et al. 2001a, 2005), do not meet criteria fully (Coyne and Orr 2004; Martens and P€ackert 2007), although they nonetheless display enough characteristics to be considered ring species by most evolutionary biologists. Just over half of the examples of ring species Irwin et al. (2001b) considered pertained to bird species, although they did not consider Mayr’s (1942) examples of the Zosterops white-eyes in the Lesser Sunda Islands nor the Pernis honeyeaters in the Philippines, to say nothing of Stejneger’s (in Jordan 1905) speculation regarding Lanius shrikes around the Baltic Sea. Perhaps, there are no additional pertinent data on these systems. To these examples can be added two avian ring species described recently: the Willow Warbler (Phylloscopus trochilus) complex encircling the Baltic Sea (Bensch et al. 2009) and subspecies of the Song Sparrow (Melospiza melodia) encircling the Sierra Nevada and Mojave Desert of the southwestern United States (Patten and Pruett 2009). The Willow Warbler varies in plumage color, body size, AFLPs (amplified fragment length polymorphism), microsatellite markers, and migratory behavior to the extent that it “shares many features with the classic examples of ring species”, albeit one that evolved recently relative to nearly all other examples (Bensch et al. 2009). The Song Sparrow varies considerably in plumage color and pattern around the ring (Table 20.1), with phenotypically intermediate populations present in all contact zones, implying gene flow and intergradation where ranges meet (Fig. 20.1; Patten and Pruett 2009). The terminal points are two subspecies – the pale, rufescent M. m. fallax of the desert Southwest and the dark, olivaceous M. m. heermanni of southern and central California – that meet in the Coachella Valley, which lies between San Gorgonio Pass and the Salton Sea. The terminal taxa hybridize only rarely; instead, there is evidence that females choose mates assortatively, males respond more strongly to their own subspecies’ songs, and song structure is shaped by habitat structure, which differs between the subspecies (Patten et al. 2004b). Although genetic variation has not yet been studied around the ring, the terminal taxa differ in frequency of microsatellite markers and these differences are associated with plumage differences (Patten et al. 2004b). Moreover, a recent study of Song Sparrows along the whole of the Pacific Coast, from the western Aleutian Islands of Alaska to southernmost California, found, in many

332

M.A. Patten

Table 20.1 Patterns of phenotypic variation around the Song Sparrow Melospiza melodia ring in western North America Mantle color Mantle fringe Underparts Streak color Streak fringe Malar Supercilia

heermanni Grayish olive-brown Gray, thin White Fuscous Ruddy Reddish fuscous Ashy

gouldii Reddish olive-brown Absent White Black Olive Blackish Ashy

cleonensis Dark reddish brown Gray, thin Grayish Dark brown Brown Fuscous Grayish

montana Grayish brown Gray, broad White Brown Chestnut Chestnut brown Whitish

fallax Brownish gray Reddish gray, broad White Warm brown Chestnut Chestnut Whitish

Fig. 20.1 The Song Sparrow (Melospiza melodia) ring in western North America (from Patten and Pruett 2009). The northwestern portion of center of the ring is the Sierra Nevada, the tallest mountain range in the conterminous United States. The remainder of the gap is the Mojave Desert (southern California) and southern Great Basin desert (southern Nevada). The large lake in southeastern California is the Salton Sea, which sits at the southern edge of where the terminal taxa meet, and San Gorgonio Pass lies at the northwestern edge of Coachella Valley

20

Evolution and Historical Biogeography of a Song Sparrow Ring

333

cases, that microsatellite variation and plumage variation (subspecies) were correlated significantly (Pruett et al. 2008; cf. Zink 2010). This finding suggests that a detailed genetic survey around the ring holds the promise of yielding a pattern that corroborates the pattern evident in the analysis of plumage variation.

20.3

Models for the Evolution of Ring Species

The two recently proposed ring species need more study, but at the least the criteria for establishing the pattern appear to have been met as convincingly as in the two more well-studied examples of Ensatina eschscholtzii and Phylloscopus trochiloides. But determining that a species or subspecies complex fits a ring species is only half of the battle. How a ring pattern came to be is about the process of a ring species, and the stringent criteria Coyne and Orr (2004:103) set forth for determining if a ring species is valid focused equally on process and pattern. Although these authors agreed that criterion (1) above must hold, they modified (2) to state that geographic continuity must have been present always; i.e., no geographic barriers to gene flow could have existed in the past, during ring formation. They further imposed two criteria related to the process by which the ring formed: (A) there must be historical information that the ring was formed by a single population (i.e., not from two or more genetically distinct lines), with all subspecies around the ring descended from that single line, and (B) one of the terminal points must be represented by a population that expanded its range most recently. Criterion (A) may be justified if we wish to hold up a ring species as a solid example of speciation either in the face of gene flow or with geographic distance. Criterion (B), by contrast, implies that the ring must have formed in a certain way, which ignores other plausible ways in which a ring could evolve. The model inherent in criterion (B) is consistent with the first model put forth for the evolution of a ring species (Stejneger, in Jordan 1905), a half-century before the term “ring species” was coined. In one of several published response to Jordan’s review of geographic speciation, Stejneger postulated that two subspecies might breed in sympatry, but only under specific circumstances. Using Lanius shrikes in northern Europe as an example, he asked readers to imagine that two trajectories of range expansion split from a common stock in Asia, with one heading west through central Europe to reach the Scandinavian Peninsula by way of Denmark and the other heading northwest through Finland to colonize the Scandinavian Peninsula from the north. The ranges of these subspecies would meet in southern part of the peninsula. Stejneger (p. 552) proposed that “it is then not unnatural to conclude that in the specimens meeting there the characters might have become so fixed that the two forms would react on each other as two distinct species, though at their original dividing line they might still remain in the imperfectly differentiated stage”. This scenario corresponds with the classic conceptual model of how a ring forms (Fig. 20.2, “classical I”; Martens and P€ackert 2007). An alternative model (Fig. 20.2, “classical II”) yields the same pattern and still invokes forming a ring that would meet Criterion (B).

334

M.A. Patten

Fig. 20.2 Competing models for the evolution of a species ring. The “classical I” model corresponds to Leonard Stejneger’s (in Jordan 1905) conception of how a ring formed (see also Martens and P€ackert 2007). A ring may also form in the classical sense by encircling the geographic barrier back to the starting point (see Kuchta et al. 2009 for similar examples). The “in situ” model relies on repeated, simultaneous ecological speciation, whereas the “ecological divergence” model combines aspects of a classical ring model (e.g., differentiation during range expansion) with ecological speciation

Using current snapshots to distinguish between various iterations of these “classical” models can be challenging (Kuchta et al. 2009), but alternative models that would yield the pattern of a ring species and conform to conceptual specifications of the “ring species hypothesis” (sensu Joseph et al. 2008) have not been explored. Yet there are alternative models in which a ring pattern evolves by means of a process that retains the concept’s emphasis on divergence with gene flow, a possibility increasingly recognized as plausible (Nosil 2008; Thorpe et al. 2008; Mila´ et al. 2009). One such model is a simple scenario invoking in situ divergence across various ecotones around a ring (Fig. 20.2), with divergence being especially pronounced across one moderately steep, but not too steep, environmental gradient (Doebeli and Dieckmann 2003; Leimar et al. 2008). Taxa on either side of this gradient diverge by the process of ecological speciation, “the evolution of reproductive isolation between populations by divergent natural selection arising from differences between ecological environments” (Schluter 2009). These taxa become the terminal points of the ring. Because geographic ranges were always and are still continuous, and intergradation persists at other contact points where gradients are shallower, a ring species pattern forms in the face of gene flow. Another model for the evolution of a ring species also invokes ecological speciation across an environmental gradient (Fig. 20.2, “ecological divergence”). In this case, ranges expand around a geographic barrier, just as in the classical models; however, ranges split initially from the parent population across an ecotone

20

Evolution and Historical Biogeography of a Song Sparrow Ring

335

with a moderately steep gradient, an area conducive to divergence (Endler 1977). As ranges expand around either side of the barrier, time elapsed at the initial branch point is sufficient for divergence to occur there, but the expanding front does not diverge at this same rate. Indeed, the two fronts remain undifferentiated enough that when the fronts meet, the populations interbreed readily, forming a broad hybrid zone of secondary contact. The end result would again be a ring species pattern in the face of gene flow. The chief differences from the classical models are that terminal points occur at an ecotone and are at the opposite end of the ring from where the expanding fronts met. It is important to note that a variety of other scenarios may lead to a ring species pattern. For example, a species may have spread from multiple glacial refugia and in doing so form multiple zones of secondary contact (Bensch et al. 2009). Or a set of subspecies may have arisen by a process of vicariant (allopatric) divergence, but all barriers between resultant forms have since eroded, leaving a ring of connected forms with intergradation where ranges meet (Joseph et al. 2008). We therefore ought to predict the existence of a ring species pattern in situations that cannot teach us about speciation in the face of gene flow, an oft-cited hallmark of the ring species hypothesis. Such examples only add to the abundant evidence for allopatric speciation, albeit they will prove suitable for studies of the maintenance of geographic variation in the face of gene flow (e.g., hybrid zone dynamics; Barton and Hewitt 1989).

20.4

Evolution of the Song Sparrow Ring

The Song Sparrow currently ranges across North America, with populations occurring north to southwestern Alaska and to southern Canada east to Newfoundland and contiguous populations south to northwestern Mexico. There are also geographically isolated populations on the Channel Islands and Islas Coronados off of California and Baja California, respectively, and at various locales in mainland Mexico, south to the Trans-Mexican volcanic belt (Patten and Pruett 2009). So wide a geographic range may hinder interpretation of the evolution of geographic variation. We thus need to consider whether the species was always so widespread or, more likely, if the species expanded its range considerably in the wake of the most recent glaciation 12,000 ybp. In the case of the Song Sparrow, two genetic analyses (Zink and Dittmann 1993; Fry and Zink 1998) identified two or three Pleistocene refugia, respectively. That is, extant populations of the Song Sparrow carry a genetic signature that implies range expansion away from either two or three regions that harbored the species’ ancestors during the last glacial maximum (Fig. 20.3; see Sommer and Zachos 2009). Two refugia identified by mtDNA restriction sites (Zink and Dittmann 1993) were Newfoundland and the Queen Charlotte Islands, British Columbia (Fig. 20.3). Because Newfoundland was covered by a sheet of ice, it seems an implausible site for a refugium. This concern was alleviated by a follow-up study of mtDNA sequence (Fry and Zink 1998), who found evidence for a “model of Song Sparrow

336

M.A. Patten

Fig. 20.3 Approximate extent of the North American ice sheets during the last glacial maximum (Ehlers and Gibbard 2004). On the basis of mitochondrial DNA restriction sites and sequences (Zink and Dittmann 1993; Fry and Zink 1998), three glacial refugia (dashed circles) for the Song Sparrow (Melopsiza melodia) have been proposed. A fourth (solid circle) was proposed initially but later discarded

population history involving multiple Pleistocene refugia and colonization of some formerly glaciated regions from multiple sources”. Their study identified three refugia: the Queen Charlotte Islands, the Atlantic Coast of the northeastern United States, and, likely, southern California (Fig. 20.3). Southern California was considered a likely location for a refugium, but it could not be identified conclusively because sample size was small. Nevertheless, a genetic survey across a suite of terrestrial vertebrate taxa – but not including the Song Sparrow – identified southeastern California as a Pleistocene refugium (Waltari et al. 2007), lending support to Fry and Zink’s (1998) finding. Waltari et al. (2007) also presented evidence for a refugium in the central or southern Baja

20

Evolution and Historical Biogeography of a Song Sparrow Ring

337

California peninsula, a location Fry and Zink (1998) could not have detected because they lacked samples of the Song Sparrow from the peninsula. The Baja California peninsula nonetheless corresponds to a common Pleistocene refugium incorporated into a meta-analysis of North American hybrid zones (Fig. 20.4; Swenson and Howard 2005). That the sparrow occurs currently in all three (or four, if we include Baja California as separate from southern California) putative refugia (Fig. 20.3) raises the possibility of future screening for ancestral haplotypes, preferably in the nuclear genome. The issue of hybrid or contact zones is an additional crucial consideration when piecing together the evolutionary and biogeographic history of the Song Sparrow. The contact zone of the terminal points of the sparrow ring occurs in the Coachella Valley, at the southeastern base of San Gorgonio Pass (Fig. 20.1). The San

Fig. 20.4 Proposed routes of range expansion away from glacial refugia (squares) in North America (after Swenson and Howard 2005)

338

M.A. Patten

Gorgonio Pass divides the north end of the north–south Peninsular Ranges from the east–west Transverse Ranges and is an area of faunal transition (Patten et al. 2004a; Leavitt et al. 2007). It has been identified as a “hot spot” for phylogeographic breaks (Swenson and Howard 2005), locations where there are deep splits in phylogenetic history. The Transverse Ranges themselves figure prominently in phylogenetic breaks: animal taxa (invertebrate and vertebrate) either north or south of that line of mountains tend to be in separate phylogenetic clusters (Calsbeek et al. 2003; Burns et al. 2007), further emphasizing the prominence of the San Gorgonio Pass region as a contact zone hot spot. That the terminal points of the Song Sparrow ring occur in this region of faunal transition is likely not a coincidence. If we accept that the Song Sparrow’s ancestors persisted in a glacial refugium in southern California or in Baja California and spread north from there (Figs. 20.3 and 20.4), a cleave in the expanding fronts of the geographic range would be at the San Gorgonio Pass. The moderately steep environmental gradient in the pass – from a Mediterranean climate at the northwest end to an extreme desert climate at the southeast end – is conceivably ideal for ecological speciation. If speciation occurred while the expanding fronts differentiated, via isolation by distance, enough to be recognized as subspecies but not enough to yield reproductive isolation, then the result would be a true ring species that evolved by a process that best fit the “ecological divergence” model (Fig. 20.2). Conversely, Lapointe and Rissler (2005) examined congruent phylogeographies across California of seven verebrates, an invertebrate, and a plant and found general patterns that corresponded broadly to the ranges of the subspecies of the Song Sparrow that constitute the ring (Fig. 20.1). If these regions, each of which has a distinct environment (i.e., general climate and vegetation), tend to promote divergence via an ecological speciation model, then the San Gorgonio Pass still might be the site of speciation when other contact zones represent areas where locally adapted populations meet. Such a scenario would yield a true ring species, but one that evolved by means of the “in situ” model (Fig. 20.2). Morphologically, the California subspecies of the Song Sparrow form a distinct group, as do the subspecies in the desert Southwest and the mesic Pacific Northwest (Patten and Pruett 2009). It therefore seems unlikely that postglacial range expansion was solely from the Queen Charlotte refugium, a requisite for the ring to conform to a “classical I” model (Fig. 20.2). Evolution by means of a “classical II” model may be more likely, if the ancestral taxon expanded north to encircle the Sierra Nevada and Mojave Desert counterclockwise, yet such a pattern would not jibe with general tracks of postglacial expansion in other species (Fig. 20.4; Swenson and Howard 2005). Moreover, the subspecies M. m. rivularis of Baja California Sur is morphologically most like M. m. fallax of the Sonoran Desert, one of the terminal points of the ring; indeed, they are nearly identical in plumage – the principal difference is the diagnostically longer bill of M. m. rivularis (Patten and Pruett 2009). If phenotype corresponds to evolutionary relatedness and the Pleistocene refugium was in the Baja California peninsula, then the ancestral form expanded northward only on the east side of the peninsula, an unlikely scenario

20

Evolution and Historical Biogeography of a Song Sparrow Ring

339

given presumably spotty suitable habitat in the far more xeric portion of Baja California east of the Peninsular Ranges.

20.5

Conclusions

Morphological variation in the Song Sparrow in the southwestern United States creates a ring species pattern around the Sierra Nevada and Mojave Desert (Patten and Pruett 2009). A detailed study of two subspecies that differ most strikingly in plumage implies that they are terminal points of the ring (Patten et al. 2004b). These subspecies meet at the base of the San Gorgonio Pass, a well-known area of faunal transition (Leavitt et al. 2007). Yet prima facie evidence suggests that neither of the classical models for the evolution of a ring species (Fig. 20.2) holds in this case. A glacial refugium for the Song Sparrow likely existed in the desert Southwest (Fry and Zink 1998), and postglacial range expansion from this region tended to be of a northward trajectory (Swenson and Howard 2005). It thus would appear that an “ecological divergence” model is the most plausible. This model requires ecological speciation of M. m. heermanni and M. m. fallax, the terminal points, across the San Gorgonio Pass while the species expanded its range northward on either side of the Sierra Nevada and Mojave Desert (Fig. 20.5). At this stage an “in situ” model cannot be eliminated, and distinguishing between these models requires detailed genetic, ecological, and behavioral research around the ring. Even so, Occam’s razor would argue in favor of the “ecological divergence” model, if only because it invokes ecological speciation (or subspeciation) at only one location instead of a minimum of four (the number of contact zones between Song Sparrow subspecies that form the ring). There are additional wrinkles in the formation of the Song Sparrow ring. For example, M. m. cleonensis is morphologically intermediate between subspecies in the “California group” and those in the “Alaska and Pacific Northwest group” (sensu Patten and Pruett 2009). I suggest that this intermediacy reflects a historical merging of a northward expanding front from the refugium in southern California and the southward expanding front from the Queen Charlotte Islands. That M. m. montana, the northern “cap” to the species ring, shares characters of both California and “Eastern” subspecies also implies extensive gene flow, but it remains to be determined whether eastward and southward fronts merged to leave a ring species pattern without divergence in the face of gene flow or by distance. Only in-depth studies that combine morphology, genetics (especially nuclear DNA), ecology, and geological history will be able to distinguish among various models for the evolution of a ring species or confirmation of the “ring species hypothesis” (Joseph et al. 2008; Bensch et al. 2009). Regardless, an important starting point for any investigation of a putative ring species is full consideration of all plausible models that could have led to a ring species’ evolution, not just an

340

M.A. Patten

San Gorgonio Pass

Fig. 20.5 Hypothesized postglacial expansion of the Song Sparrow (Melospiza melodia) from an identified (but nonetheless postulated) glacial refugium in the Sonoran Desert (dashed circle). Such range expansion would yield a ring species pattern, but in this species’ case the terminal points are in the vicinity of the San Gorgonio Pass, meaning the ring evolved by a combination of “divergence by distance” and ecological speciation (the “ecological divergence” model of Fig. 20.2), a process heretofore not considered in studies of ring species

expectation of conformity to classical models. Consideration of alternative models not only promises to provide deeper insight in how ring species evolve but also promises to build a stronger bridge between micro- and macroevolution.

20

Evolution and Historical Biogeography of a Song Sparrow Ring

341

Acknowledgments I thank Pierre Pontarotti for the opportunity to speak at the 13th Evolutionary Biology Meeting and Axelle Pontarotti for her excellent guidance both pre and post meeting. John T. Rotenberry, Leonard Nunney, and Marlene Zuk advised during early stages of this study, and Christin L. Pruett has been a sounding board during later stages. I am grateful to Lukas F. Keller and his research group and colleagues at Universit€at Z€ urich for their feedback following my September 2008 seminar there. Brenda D. Smith-Patten has been a limitless source of support throughout this research; she also helped prepare Fig. 20.2 and commented on a draft of this chapter.

References Barton NH, Hewitt GM (1989) Adaptation, speciation, and hybrid zones. Nature 341:497–503 ˚ kesson S (2009) Genetic, morphological, and feather Bensch S, Grahn M, M€ uller N, Gay L, A isotope variation of migratory Willow Warblers show gradual divergence in a ring. Mol Ecol 18:3087–3096 Burns KJ, Alexander MP, Barhoum DN, Sgariglia EA (2007) Statistical assessment of congruence among phylogeographic histories of three avian species in the California Floristic Province. Ornithol Monogr 63:96–109 Cain AJ (1954) Animal species and their evolution. Princeton University Press, Princeton, NJ Calsbeek R, Thompson JN, Richardson JE (2003) Patterns of molecular evolution and diversification in a biodiversity hotspot: the California Floristic Province. Mol Ecol 12:1021–1029 Coyne JA, Orr HA (2004) Speciation. Sinauer Assoc, Sunderland, MA Doebeli M, Dieckmann U (2003) Speciation along environmental gradients. Nature 421:259–264 Ehlers J, Gibbard PL (2004) Quaternary glaciations – extent and chronology, part 2: North America. Elsevier, Amsterdam Endler JA (1977) Geographic variation, speciation, and clines. Princeton Monogr Pop Biol 10:1–246 Erwin DH (2000) Macroevolution is more than repeated rounds of microevolution. Evol Dev 2:78–84 Fry AJ, Zink RM (1998) Geographic analysis of nucleotide diversity and Song Sparrow (Aves: Emberizidae) population history. Mol Ecol 7:1303–1313 Hansen TF, Martins EP (1996) Translating between microevolutionary process and macroevolutionary patterns: the correlation structure of interspecific data. Evolution 50:1404–1417 Irwin DE, Irwin JH (2002) Circular overlaps: rare demonstrations of speciation. Auk 119:596–602 Irwin DE, Bensch S, Price TD (2001a) Speciation in a ring. Nature 409:333–337 Irwin DE, Irwin JH, Price TD (2001b) Ring species as bridges between microevolution and speciation. Genetica 112–113:223–243 Irwin DE, Bensch S, Irwin JH, Price TD (2005) Speciation by distance in a ring species. Science 307:414–416 Jablonski D (2000) Micro- and macroevolution: scale and hierarchy in evolutionary biology and paleobiology. Paleobiology 26(suppl):15–52 Jordan DS (1905) The origin of species through isolation. Science 22:545–562 Joseph L, Dolman G, Donnellan S, Saint KM, Berg ML, Bennett ATD (2008) Where and when does a ring start and end? Testing the ring-species hypothesis in a species complex of Australian parrots. Proc Biol Sci 275:2431–2440 Kuchta SR, Parks DS, Mueller RL, Wake DB (2009) Closing the ring: historical biogeography of the salamander ring species Ensatina eschscholtzii. J Biogeogr 36:982–995 Lapointe F-J, Rissler LJ (2005) Congruence, consensus, and the comparative phylogeography of codistributed species in California. Am Nat 166:290–299

342

M.A. Patten

Leavitt DH, Bezy RL, Crandall KA, Sites JW Jr (2007) Multi-locus DNA sequence data reveal a history of deep cryptic vicariance and habitat-driven convergence in the desert night lizard Xantusia vigilis species complex (Squamata: Xantusiidae). Mol Ecol 16:4455–4481 Leimar O, Doebeli M, Dieckmann U (2008) Evolution of phenotypic clusters through competition and local adaptation along an environmental gradient. Evolution 62:807–822 Martens J, P€ackert M (2007) Ring species – do they exist in birds? Zool Anz 246:315–324 Mayr E (1942) Systematics and the origin of species. Columbia University Press, New York Mayr E (1982) Speciation and macroevolution. Evolution 36:1119–1132 Mila´ B, Wayne RK, Fitze P, Smith TB (2009) Divergence with gene flow and fine-scale phylogeographical structure in the wedge-billed Woodcreeper, Glyphorynchus spirurus, a neotropical rainforest bird. Mol Ecol 18:2979–2995 Nosil P (2008) Speciation with gene flow could be common. Mol Ecol 17:2103–2106 Patten MA, Pruett CL (2009) The Song Sparrow as a ring species: patterns of geographic variation, a revision of subspecies, and implications for speciation. System Biodivers 7:33–62 Patten MA, Erickson RA, Unitt P (2004a) Population changes and biogeographic affinities of the birds of the Salton Sink, California/Baja California. Studies Avian Biol 27:24–32 Patten MA, Rotenberry JT, Zuk M (2004b) Habitat selection, acoustic adaptation, and the evolution of reproductive isolation. Evolution 58:2144–2155 Pruett CL, Arcese P, Chan YL, Wilson AG, Patten MA, Keller LF, Winker K (2008) Concordant and discordant signals between genetic data and described subspecies of Pacific coast Song Sparrows. Condor 110:359–364 Reznick DN, Ricklefs RE (2009) Darwin’s bridge between microevolution and macroevolution. Nature 457:837–842 Schluter D (2009) Evidence for ecological speciation and its alternative. Science 323:737–741 Simons AM (2002) The continuity of microevolution and macroevolution. J Evol Biol 15:688–701 Sommer RS, Zachos FE (2009) Fossil evidence and phylogeography of temperate species: ‘glacial refugia’ and post-glacial recolonization. J Biogeogr 36:2013–2020 Stanley SM (1998) Macroevolution: pattern and process. Johns Hopkins University Press, Baltimore Stebbins RC (1957) Intraspecific sympatry in the lungless salamander Ensatina eschscholtzii. Evolution 11:265–270 Swenson NG, Howard DJ (2005) Clustering of contact zones, hybrid zones, and phylogeographic breaks in North America. Am Nat 166:581–591 Thorpe RS, Surget-Groba Y, Johansson H (2008) The relative importance of ecology and geographic isolation for speciation in anoles. Phil Trans R Soc Lond B Biol Sci 363:3071–3081 Wake DB (2006) Problems with species: patterns and processes of species formation in salamanders. Ann Mo Bot Gard 93:8–23 Wake DB, Yanev KP (1986) Geographic variation in allozymes in a “ring species”, the plethodontid salamander Ensatina eschscholtzii of western North America. Evolution 40:702–715 Waltari E, Hijmans RJ, Peterson AT, Nya´ri AS, Perkins SL, Guralnick RP (2007) Locating Pleistocene refugia: comparing phylogeographic and ecological niche model predictions. PLoS ONE 2(7):e563 Zink RM (2010) Drawbacks with the use of microsatellites in phylogeography: the Song Sparrow Melospiza melodia as a case study. J Avian Biol 41:1–7 Zink RM, Dittmann DL (1993) Gene flow, refugia, and evolution of geographic variation in the Song Sparrow (Melospiza melodia). Evolution 47:717–729

Chapter 21

Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont d’Arc Ce´line Bon and Jean-Marc Elalouf

Abstract Caves are reservoirs of fossils, some of which belong to species now extinct. Paleogenetics explores ancient DNA that may have survived in these fossils to better understand the phylogeny of Pleistocene species and the paleoenvironment. The Chauvet-Pont d’Arc Cave, which displays the earliest known human drawings, contains thousands of animal remains, setting this cave as a mine for genetic analysis. We focused on the extinct cave bear, Ursus spelaeus, and proved that Chauvet-Pont d’Arc samples still contain enough DNA for genetic studies. One of them yielded well-preserved DNA and allowed sequencing the complete cave bear mitochondrial genome. We used this molecular information to establish bear phylogeny and the tempo of Ursidae speciation. Widening our analysis to cave bears samples from Chauvet-Pont d’Arc and a closely located cave, we showed that the Pleistocene ursine population was highly homogeneous at the regional level.

21.1 21.1.1

The Chauvet-Pont d’Arc Cave, a Well-Preserved Paleolithic Site The Earliest Rock Art Recorded to Date

In 1994, the three cavers Jean-Marie Chauvet, Eliette Brunel, and Christian Hillaire made a major discovery in the field of archeology: they found a cave containing hundreds of Paleolithic rock art pictures. This cave, located near Vallon-Pont d’Arc (Arde`che, Southeastern France) at the entrance of the Arde`che Gorge, is now known as Chauvet-Pont d’Arc from one of its discoverers, Jean-Marie Chauvet.

C. Bon and J-M. Elalouf CEA, IBiTec-S, F-91191 Gif-sur-Yvette cedex, France e-mail: [email protected]

P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_21, # Springer-Verlag Berlin Heidelberg 2010

343

344

C. Bon and J.-M. Elalouf

Since some of the pictures were drawn with charcoal, dating analysis was possible using the radiocarbon method. Several paintings returned a radiocarbon age between 30,000 and 32,000 years Before Present (BP), which sets them about twice older than the age currently proposed for Lascaux Cave paintings. ChauvetPont d’Arc rock art is the oldest Paleolithic drawing known to date (Valladas et al. 2001). The cave displays three kinds of rock art pictures: charcoal- and ochre-made drawings and engravings. As dating is only feasible for charcoal-made pictures, some of the other pictures might be older than 32,000 years BP. The cave also contains other remains of human occupation. The track of a male infant was found in a deep part of the cave, in the Gallery of the Crosshatches. During his trip, the child regularly rubbed his torch against the wall, leaving numerous sooty marks. These marks were radiocarbon dated back to 26,000 years BP (Garcia 2005). Huge hearths were found in other cave sectors and were most probably used by Paleolithic artists for the production of charcoal pencils. The cave also contains about 20 flint tools as well as an ivory assegai point (Geneste 2005). Other anthropogenic processes, such as stone blocks grouped together by humans or a cave bear skull deposited on a large rock, remain enigmatic. Due to the rich overall archeological content and, especially, the great age of the rock art pictures, the Chauvet-Pont d’Arc Cave is protected from the very day of its discovery (Baffier 2005). As soon as they saw the first rock art pictures, the three discoverers took care to protect the ancient soil. Afterwards, footbridges were installed throughout the cave. The access to the cave is restricted to a handful of people that are granted authorization from the prefect. A permanent watch was set to detect microbial pollution as well as local climate change. Even the scientific researches are strictly monitored to ensure preservation of the site. Thus, there are only two short campaigns of studies each year, no more than 12 people are tolerated inside the cave, no direct contact with the archeological remains or the walls are allowed, and retrieving of samples rests on special curator’s authorization (Baffier 2005). Despite these constraints, the cave provides a unique basis for scientific research because its preserved state gives us access to a Paleolithic site untouched since the entrance of the cave collapsed some 20,000 years ago.

21.1.2

The Chauvet-Pont d’Arc Cave, a Bear Cave

Even without such anthropogenic remains, Chauvet-Pont d’Arc would still have been a major paleontological discovery since it displays thousands of animal remains, most of which consist of Ursus spelaeus bones (Fig. 21.1) (Fosse and Philippe 2005). Among the 3,844 bones dispatched all over the ground, 3,703 are ascribed to the cave bear. The brown bear (Ursus arctos) has been identified through a single skull, which contrasts with the 200 cave bear skulls that are present in Chauvet-Pont d’Arc. Other species, such as the wolf, extinct cave hyena, fox, ibex, deer, are evidenced by a few samples. Canidae coprolites and footprints are also present in the cave.

21

Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc

345

Fig. 21.1 Topography of the Chauvet-Pont d’Arc Cave. Blue areas correspond to places with cave bear wallows; purple circles indicate cave bears footprints; green thick lines on walls indicate that cave bear claw marks are present. Radiocarbon ages are given as years BP. Topography: Y. Le Guillou and F. Maksud. Paleontological data: P. Fosse and M. Philippe

346

C. Bon and J.-M. Elalouf

But the cave is not only a bear grave, for it also displays many evidences of live animal’s occupation. The ground is warped by the numerous wallows in which bears used to hibernate; the walls are scratched by claw marks and polished by their roaming; bear footprints can be seen in every chamber. Whereas the brown bear is still an extant species, the cave bear became extinct about 25,000 years ago (Pacher and Stuart 2009). Ursus spelaeus was a robustly built bear that weighed 200 kg more than the sturdiest extant bears, i.e., the Kodiac and polar bears. The sexual dimorphism is strong, as well as the intraspecific variability (Kurte´n 1976). It is currently estimated that the cave bear was confined to Europe, even though cave-bear-looking bears that may belong to some cave bear subspecies were found in Crimea, Caucasus, or Siberia (Knapp et al. 2009). It has been considered that the cave bear was mostly herbivorous, but two recent studies (Richards et al. 2008; Peigne et al. 2009) showed that it was omnivorous at least during the prehibernation period. Since the cave bear is an extinct species, its phylogenetic relationship with other bears has long been only known through paleontological data. The direct ancestor of the cave bear is Ursus deningeri, because Ursus spelaeus succeeds continuously to Ursus deningeri (Mazza and Rustioni 1994). It is estimated that the transition between the two species occurred around the beginning of the last interglacial, but to draw a limit between these two chrono-species may be awkward (Argant 2001). Views diverge about the origins of the Ursus arctos and the Ursus spelaeus lineages. Whereas most paleontologists assume that these two lineages emerged from Ursus etruscus, Mazza and Rustioni proposed that Ursus etruscus is a dead end, and that Ursus deningeri appeared among extremely polymorphic Ursus arctos lineages. This issue was first questioned in 1994 by analyzing mitochondrial DNA fragments from Pleistocene remains (Hanni et al. 1994). This initial studies and subsequent work (Loreille et al. 2001) yielded sequence data for the mitochondrial control region and cytochrome b (CYTB) gene. However, when we initiated our studies the information available consisted of less than 10 % of the mitochondrial genome. As increasing evidences suggest that long sequences are necessary to obtain robust phylogenies and to accurately date the divergence events between lineages (Rohland et al. 2007), a complete cave bear mitochondrial genome sequence was highly desirable (Bon et al. 2008).

21.2 21.2.1

Sequencing the Mitochondrial Genome of the Extinct Cave Bear The Challenge of Retrieving and Sequencing Ancient DNA

The study of ancient DNA is tricky. Although in the living cell enzymatic processes continuously repair DNA, endogenous nucleases and exogenous fungi or bacteria begin degrading DNA from the death of an organism. Under rare circumstances (such as rapid desiccation or adsorption on a mineral matrix), the DNA may escape

21

Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc

347

the onslaught, its only source of deterioration being through chemical processes (Hofreiter et al. 2001b; Paabo et al. 2004). Thus ancient DNA is scarce and displays a number of chemical alterations. This has several consequences. The length of the DNA molecules is reduced by strand breaks. In addition, depurination and crosslinking between strands or between a DNA strand and another molecule result in impeding PCR amplifications. As the initial amount of ancient DNA is extremely low, the amplification stage is sensitive to contaminations, not only from modern DNA but also from previously amplified products. Another problem is the deamination of cytosine and adenine, leading to mutations such as T instead of C, and G instead of A in the retrieved sequence. At last, the samples often contain a variety of organic molecules that may act as PCR inhibitors. This prevents the use of a large amount of extract in the PCR mix. Considering the care taken to protect the Chauvet-Pont d’Arc Cave from contaminations, we turned to it to select an eligible cave bear sample for the sequencing of the mitochondrial genome. After screening several samples, we chose US18 because of its biomolecular preservation. It still contained enough collagen for radiocarbon dating, and the amino-acid racemization extent was quite low. After DNA extraction, a 117 bp mitochondrial sequence was amplified over a wide range of sample extract (from 0.1 to 2%), which shows that we retrieved large amounts of DNA and few PCR inhibitors. Since independent replication is required in ancient DNA studies, another group of investigators from another Institute performed extraction and analysis. The same and another overlapping pair of primers were used and confirmed the sequence initially obtained. Both extracts were employed in the subsequent experiments.

21.2.2

Obtaining the Complete Cave Bear Mitochondrial Sequence

When this analysis began, only few fragments of the cave bear mitochondrial genome were known: a portion of the control region had been sequenced from several samples (Hanni et al. 1994; Hofreiter et al. 2002, 2007; Orlando et al. 2002; Rohland et al. 2004). A single gene, namely CYTB, had been characterized throughout its coding region from one sample found in the Balme-a`-Collomb Cave (Loreille et al. 2001). We designed an iterative experimental strategy to determine the cave bear mitochondrial genome. First, we aligned the mitochondrial genomes of the extant brown bear (Ursus arctos), polar bear (Ursus maritimus), and American black bear (Ursus americanus) (Delisle and Strobeck 2002). From this alignment, conserved regions were identified and used to design a first series of primers for amplifying DNA fragments ranging from 100 to 200 bp. These 147 primer pairs spanned the entire genome. Only 64 primer pairs out of 147 succeeded; the 83 failures may result from mispairing between the template cave bear DNA and the primers. As a consequence, in

348

C. Bon and J.-M. Elalouf

the following rounds, we used the sequence obtained from previous runs to design cave bear specific primers. In the end, nine rounds were required and we successfully used 245 primer pairs. In order to avoid contaminations, prePCR steps were done in a dedicated laboratory facility, in a building free from molecular biology research. Each primer pair was designed to amplify DNA fragments shorter than 200 bp. For each fragment, at least two PCR amplifications were performed. As differences caused by ancient DNA damages were usually detected, a third amplification was often carried out, and the consensus sequence was retained. In the worst case scenario, this strategy is expected leading to a 0.06% error rate (Hofreiter et al. 2001a). PCR products were cloned and a minimum of 12 colonies was sequenced on both strands. In the end, 570 successful PCR amplifications and more than 14,000 sequencing reactions were required to cover the entire mitochondrial genome. In order to check the accuracy of the sequence, we analyzed each fragment individually by BLAST to validate that the best GenBank match was an Ursidae sequence. Specifically, we verified that previously analyzed cave bear mitochondrial sequences (control region and CYTB gene) displayed the best BLAST score with our analogous sequences. The control region sequence of US18 cave bear belongs to the B haplotype as defined in Orlando et al. (2002) and is identical to Scladina cave’s samples SC3500 and SC3800. Our and the published CYTB sequences differ only on four transitions (0.35% of all CYTB nucleotides), two of them being located at the third base position of codons. Furthermore, as the two specimens belong to different mitochondrial haplotypes, these differences may highlight intraspecific polymorphism. We obtained a 16,810 bp long mitochondrial genome, which is in the range of the extant Ursidae mitochondrial genomes. These genomes vary in length between 16,723 bp (Ursus maritimus) (Arnason et al. 2002) and 17,044 bp (Ursus thibetanus formosanus). The variation of the mitochondrial genome length is mainly due to a domain of the control region, which displays a highly variable number of repeat of a 10 bp motif (Yu et al. 2007). This domain is longer than 200 bp and therefore cannot be retrieved through a single PCR from ancient cave bear extracts. Thus, we designed two primer pairs to target the 50 and the 30 ends of the domain. Afterwards, all fragments were assembled into a 350 bp repeat sequence. Another group has sequenced a second cave bear mitochondrial genome from a sample found in Gamssulzen cave, Austria (Krause et al. 2008). This sample is a 44,000-year-old bone and its sequence belongs to the D haplogroup as defined in Orlando et al. (2002). The experimental strategy was slightly different from ours as they used a two-step multiplex approach PCR. As we did, they confirmed their data by at least two independent amplifications, cloning of the PCR product and sequencing of multiple clones. Both cave bears sequences are very similar. Without taking into account the 350 bp repeat region, 16,227 bp among 16,448 are identical. As expected, the 221 mutations are rather transitional mutations (216) than transversional (5), with a transition/transversion ratio equal to 43.2. As these two sequences belong to different haplogroups, it is not surprising that they display 1.3% differences.

21

Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc

349

Our aim was to determine the phylogenetic position of the cave bear, especially with respect to the two main brown bear lineages (Taberlet and Bouvet 1994). As only one brown bear mitochondrial genome was published, we decided to sequence the mitochondrial genome of a brown bear belonging to the western lineage. We analyzed a submodern bone sample from a French Pyrenean site (Guzet, Arie`ge, France). This was conducted in a third building and after the cave bear mitochondrial genome had been obtained to avoid cross-species contaminations. The same experimental strategy was followed, except that the first series of primers (designed on a brown bear sample) was already highly specific, and that, as submodern DNA is still well conserved, less primer pairs were needed (only 52 primer pairs). As for the cave bear sequence, each PCR was performed at least twice, several clones were sequenced, and the consensus sequence was checked using BLAST.

21.2.3

Resolving the Phylogeny of the Extinct Cave Bear

In order to obtain the Ursidae phylogeny, we aligned the cave bear and the Pyrenean brown bear mitochondrial sequences (EU327344 and EU497665, respectively) with sequences retrieved from GenBank for other bears species, using MEGA 4.0.2 alignment tool with the default parameters. The giant panda was set as an outgroup. The domain of the control region containing the 10 bp repeat motif was removed prior to the phylogenetic analyses. First, we tested the mutational saturation of our dataset, in order to check that homoplasy keeps low and does not alter the results. We calculated the patristic distance using Patristic software (Fourment and Gibbs 2006) and plotted the genetic distance against the patristic distance. These distances are almost equal, indicating that mutational saturation is weak and that few reversions affect the dataset. We also calculated the transition/transversion ratio, which is equal to 19:1. As this ratio is rather high, it confirms that saturation is rare. Phylogenetic trees were reconstructed from this dataset using Neighbor Joining (NJ), Maximum Parsimony (MP), and Maximum Likelihood (ML) using PhyML (Guindon and Gascuel 2003) and Mega 4.0.2 (Tamura et al. 2007) softwares, as appropriate. PhyML was implemented with a GTR þ G4 substitution model with some invariable sites, and for the NJ reconstruction method, we used the Tamura 3-parameters and the gamma-distribution shape parameter estimated with PhyML. The robustness of the phylogenetic trees was estimated with the bootstrap method (1,000 replicates for NJ and MP, 100 replicates for ML). Almost the same topology was recovered whatever the algorithm used (Fig. 21.2). The only difference concerns Ursus thibetanus subspecies’ relationships. Our results confirm the spectacled bear’s (i.e. Tremarctos ornatus) basal position (Waits et al. 1999; Yu et al. 2004, 2007; Pages et al. 2008). Ursinae is a monophyletic group in which Melursus ursinus is the most basal bear. Then Ursinae split into two clades, one leading to Ursus spelaeus, Ursus arctos, and Ursus maritimus and the other leading to Ursus thibetanus, Ursus americanus, and

350

C. Bon and J.-M. Elalouf

Fig. 21.2 Molecular phylogeny inferred from complete mitochondrial genomes. Tree reconstruction was performed by NJ analysis using the giant panda (Ailuropoda melanoleuca) as an outgroup. The same tree topology was obtained using two other methods, except for the relationships between Ursus thibetanus subspecies. Bootstrap values are indicated for NJ (regular), MP (bold), and ML (italic) analysis. The two sequences from this study are displayed in bold. GenBank accession numbers for the other sequences are: Ailuropoda melanoleuca, FM177761, EF212882, EF196663, and AM711896; Tremarctos ornatus, FM177764 and EF196665; Melursus ursinus, EF196662; Ursus thibetanus, EF1966362, EF667005, FM177759, EF587265, EF076773, and EF196661; Ursus americanus, AF303109; Helarctos malayanus, FM177765 and EF196664; Ursus maritimus, AF303111 and AJ428577; Ursus arctoseast, AF303110; Ursus spelaeus, FM177760

Helarctos malayanus. Whereas the first group is highly robust (all bootstrap values equal 100%), the second one is less statistically supported. Besides, this clade is not always found when analyzing shorter dataset (Talbot and Shields 1996a; Waits et al. 1999; Yu et al. 2004, 2007; Bon et al. 2008; Pages et al. 2008). As most of the internal branches are very short, we conclude that ursine speciation

21

Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc

351

was very rapid. Because of this radiation, it is difficult to retrieve the branching order, except for the brown-polar-cave bear clade. Relationships within this group are always consistent and are supported by maximal bootstrap values. The cave bear stands as a sister species to the brown and polar bear clade. The brown bear species is a paraphyletic group with respect to Ursus maritimus, as the polar bear species emerges from the western brown bear lineage (Talbot and Shields 1996b). Therefore, mitochondrial genome data disagree with Mazza and Rustoni’s late speciation hypothesis and confirm that the cave bear and brown bear lineages split before the radiation of the brown bear species. The robust phylogeny obtained with a complete mitochondrial genome offers the opportunity of evaluating the divergence times between species. We used the BEAST software (Drummond et al. 2005; Drummond and Rambaut 2007) with the complete mitochondrial genomes dataset. Calibration was performed with the divergence between the giant panda and Ursidae, and between Ursinae and Tremarctinidae, set at 12 1 MY and 6 0.5 MY (million years), respectively, considering a normal distribution. We chose a relaxed uncorrelated lognormal molecular clock, a GTR þ G4 substitution model with some invariable sites and a Yule process of speciation. Two independent chains that each consist of 10,000,000 points were calculated and the burn-in was set to 10,000. To highlight the benefits brought by the analysis of long DNA sequences in molecular dating analysis, we randomly created alignments of various lengths from whole mitochondrial genome sequences. We calculated node ages using the parameters described above. Obviously, short sequences yield different node ages and wider credibility intervals than longer sequences. The alignment has to reach at least 10 kb to stabilize the node ages. A long sequence alignment is therefore required to obtain an accurate molecular dating (Bon et al. 2008). According to the results obtained with complete mitochondrial genomes (Fig. 21.3), Tremarctinae diverged from Ursinae 6.3 MY ago, shortly before the appearance of Ursus boeckhi, the first ursine representative. The bears radiation occurred about 4 millions years later, between 2 and 3 MY ago. The short time while five bears groups appeared explains the difficulties in determining the branching order of bears. These speciations happened during the Pliocene, when Ursus minimus was the most common bear in Europe. As this fossil species is assumed to be the last common ancestor of Ursus spelaeus, Ursus arctos, and Ursus thibetanus, our results agree with paleontological data. We date the divergence event between arctoid and speleoid lineages to 1.6 MY, during the Villafranchian stage, when Ursus etruscus was the main bear in Europe. Most paleontologists consider that Ursus etruscus was the last common ancestor of the brown and cave bears. In conclusion, our approach proved successful for sequencing the complete mitochondrial genome of a species extinct for more than 20,000 years. The cave bear mitochondrial genome shares high similarities with other bear mitochondrial genomes. In addition, the phylogenetic analysis robustly confirms that the cave bear is a sister species to the brown and polar bear clade. The amount of data obtained made

352

C. Bon and J.-M. Elalouf

Fig. 21.3 Phylogeny and divergence times determined using the mitochondrial genome sequence of the cave bear and of eight extant bears. Divergence times were calculated using BEAST software with the splits between the giant panda and Ursidae and between Ursinae and Tremarctinidae set to 12 and 6 MY, respectively. Age for each node and 95% credibility intervals are, as follows: 1, 6.3 MY (5.4–7.2); 2, 3.0 MY (2.2–3.8); 3, 2.8 MY (2.1–3.5); 4, 2.4 MY (1.7–3); 5, 2.1 MY (1.4–2.7); 6, 1.6 MY (1–2.1); 7, 0.6 MY (0.3–0.8); and 8, 0.4 MY (0.2–0.5). The extinct cave bear is displayed by a picture from Chauvet-Pont d’Arc

possible to evaluate the tempo of bears’ history during Pliocene and Pleistocene and compare our conclusions with paleontological ones. The cave bear mitochondrial genome sequence opens up possibilities to push forward extinct bears DNA analysis. First, this sequence will help rescuing poorly preserved samples by targeting different regions of the mitochondrial genome. We studied Chauvet-Pont d’Arc bear samples that failed to yield any DNA when

21

Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc

353

analyzed for the mitochondrial control region. We targeted 112 bp in the 16 S gene and obtained a successful amplification for 48% of the 23 samples, instead of 17% when the control region was queried. Second, sequence data provided by extant bears may not be sufficient to analyze DNA sequences of species that existed before Ursus spelaeus, such as Ursus deningeri. The availability of the cave bear mitochondrial genome is expected to provide a better template for exploring very ancient bear species.

21.3

Genetic Diversity Among Chauvet-Pont d’Arc Cave Bears

We explored the genetic diversity of cave bears from Chauvet-Pont d’Arc Cave by analyzing several samples from the cave. For comparison purposes, we turned to another cave from the same area, the Deux-Ouvertures Cave. This cave is located by the end of the Arde`che Gorge, approximately 15 km away from Chauvet-Pont d’Arc, and displays rock art pictures. It also contains numerous cave bear remains, and except for Chauvet-Pont d’Arc, is the most striking bears cave in the area. We collected 39 and 17 samples from Chauvet-Pont d’Arc and Deux-Ouvertures caves, respectively. DNA was extracted, and we attempted to amplify a 117 bp fragment of the mitochondrial genome control region. Most of the Chauvet-Pont d’Arc cave samples (32/39) and some of the DeuxOuvertures cave ones (3/17) failed to yield the queried fragment. We conclude that this fragment was no longer present or that the samples contain too much PCR inhibitory compounds for being successfully amplified. The samples that gave positive results belong to the same haplogroup (haplogroup B) and to two different haplotypes, which we named HT1 and HT2. HT1 is also found in Scladina (AY149268, AY149267) and Gigny (AY149264) Caves (Orlando et al. 2002). HT2 differs from HT1 only in the position 16,550 and is found in the Cova-Linares Cave (AY149271, AY149272) (Loreille et al. 2001). It is not surprising to find the B haplogroup in these two caves since it is widely spread throughout Western Europe. HT1 and HT2 were both found in Chauvet-Pont d’Arc: two samples in ChauvetPont d’Arc Cave displayed the HT2 haplotype (US08 and US21); the five samples that yielded the HT1 haplotype are US17, US18, US19, US34, and US39. On the other hand, all Deux-Ouvertures Cave samples gave the same haplotype, HT1. In order to verify that this homogeneity is not due to a biased sampling with different bones belonging to the same individual, we sampled five humerus from five different individuals. We obtained the HT1 sequence for each of them, validating that HT1 is widely spread in this cave. Thus, we observed a high genetic homogeneity inside the bear population of each cave, as well as from one cave to another. This evidences the frequent female genetic exchange along Arde`che Gorge and contrasts with the highly subdivided cave bear population hypothesis (Hofreiter et al. 2002, 2007).

354

C. Bon and J.-M. Elalouf

In the same time, several Chauvet-Pont d’Arc samples were dated and returned radiocarbon age between 37,300 340 years BP and 29,560 160 years BP. Most of them range from 30,000 to 32,000 years BP, indicating that cave bears were present at Chauvet-Pont d’Arc for a relatively brief period of time. It is worth noting that Scladina and Cova-Linares samples which belong to the HT1 and HT2 haplotypes display approximately the same age as the Chauvet-Pont d’Arc samples. Scladina’s bones belong to an archeological layer estimated to 40,000–45,000 years, and Cova-Linares’ ones are from a 35,000-year-old layer. In conclusion, the genetic studies carried out in Chauvet-Pont d’Arc provided a complete mitochondrial genome for the extinct cave bear, which enabled us to obtain robust phylogenetic trees for Ursidae. The amount of data also offers the opportunity of evaluating the divergence dates between species and to compare genetic and paleontological results. Widening our studies to several samples from this cave and another cave allowed us to explore the genetic diversity of the area. We established that the mitochondrial genetic landscape in two caves 15 km away from each other in the Arde`che Gorge is almost homogeneous. With other bear caves along the river, extending such analysis to additional sites may allow to describe more precisely the genetic pattern of the area. This study also demonstrates that well-preserved DNA still remains in the Chauvet-Pont d’Arc Cave and establishes this painted cave as a reservoir for ancient DNA researches. Other species from the Chauvet-Pont d’Arc Cave can now be analyzed to better characterize the Pleistocene environment.

Reference Argant A (2001) Los antepasados del oso de las cavernas. Cad Lab Xeol Laxe 26:9 Arnason U, Adegoke JA, Bodin K, Born EW, Esa YB, Gullberg A, Nilsson M, Short RV, Xu X, Janke A (2002) Mammalian mitogenomic relationships and the root of the eutherian tree. Proc Natl Acad Sci USA 99:8151–8156 Baffier D (2005) La Grotte Chauvet: conservation d’un patrimoine. Bulletin de la socie´te´ pre´historique franc¸aise 102:11–16 Bon C, Caudy N, de Dieuleveult M, Fosse P, Philippe M, Maksud F, Beraud-Colomb E, Bouzaid E, Kefi R, Laugier C, Rousseau B, Casane D, van der Plicht J, Elalouf JM (2008) Deciphering the complete mitochondrial genome and phylogeny of the extinct cave bear in the paleolithic painted cave of Chauvet. Proc Natl Acad Sci USA 105:17447–17452 Delisle I, Strobeck C (2002) Conserved primers for rapid sequencing of the complete mitochondrial genome from carnivores, applied to three species of bears. Mol Biol Evol 19:357–361 Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214 Drummond AJ, Rambaut A, Shapiro B, Pybus OG (2005) Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol 22:1185–1192 Fosse P, Philippe M (2005) La faune de la grotte Chauvet: pale´obiologie et anthropozoologie. Bulletin de la socie´te´ pre´historique franc¸aise 102:89–102 Fourment M, Gibbs MJ (2006) PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change. BMC Evol Biol 6:1

21

Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc

355

Garcia MA (2005) Ichnologie ge´ne´rale de la grotte Chauvet. Bulletin de la socie´te´ pre´historique franc¸aise 102:103–108 Geneste JM (2005) L’arche´ologie des vestiges mate´riels dans la grotte Chauvet-Pont-d’Arc. Bulletin de la socie´te´ pre´historique franc¸aise 102:135–144 Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704 Hanni C, Laudet V, Stehelin D, Taberlet P (1994) Tracking the origins of the cave bear (Ursus spelaeus) by mitochondrial DNA sequencing. Proc Natl Acad Sci USA 91:12336–12340 Hofreiter M, Jaenicke V, Serre D, von Haeseler A, Paabo S (2001a) DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Res 29:4793–4799 Hofreiter M, Serre D, Poinar HN, Kuch M, Paabo S (2001b) Ancient DNA. Nat Rev Genet 2:353–359 Hofreiter M, Capelli C, Krings M, Waits L, Conard N, Munzel S, Rabeder G, Nagel D, Paunovic M, Jambresic G, Meyer S, Weiss G, Paabo S (2002) Ancient DNA analyses reveal high mitochondrial DNA sequence diversity and parallel morphological evolution of late pleistocene cave bears. Mol Biol Evol 19:1244–1250 Hofreiter M, Munzel S, Conard NJ, Pollack J, Slatkin M, Weiss G, Paabo S (2007) Sudden replacement of cave bear mitochondrial DNA in the late Pleistocene. Curr Biol 17:R122–R123 Knapp M, Rohland N, Weinstock J, Baryshnikov G, Sher A, Nagel D, Rabeder G, Pinhasi R, Schmidt HA, Hofreiter M (2009) First DNA sequences from Asian cave bear fossils reveal deep divergences and complex phylogeographic patterns. Mol Ecol 18:1225–1238 Krause J, Unger T, Nocon A, Malaspinas AS, Kolokotronis SO, Stiller M, Soibelzon L, Spriggs H, Dear PH, Briggs AW, Bray SC, O’Brien SJ, Rabeder G, Matheus P, Cooper A, Slatkin M, Paabo S, Hofreiter M (2008) Mitochondrial genomes reveal an explosive radiation of extinct and extant bears near the Miocene–Pliocene boundary. BMC Evol Biol 8:220 Kurte´n B (1976) The cave bear story: life and death of a vanished animal. Columbia University Press, New York Loreille O, Orlando L, Patou-Mathis M, Philippe M, Taberlet P, Hanni C (2001) Ancient DNA analysis reveals divergence of the cave bear, Ursus spelaeus, and brown bear, Ursus arctos, lineages. Curr Biol 11:200–203 Mazza P, Rustioni M (1994) On the phylogeny of Eurasian bears. Palaeontographica 230:38 Orlando L, Bonjean D, Bocherens H, Thenot A, Argant A, Otte M, Hanni C (2002) Ancient DNA and the population genetics of cave bears (Ursus spelaeus) through space and time. Mol Biol Evol 19:1920–1933 Paabo S, Poinar H, Serre D, Jaenicke-Despres V, Hebler J, Rohland N, Kuch M, Krause J, Vigilant L, Hofreiter M (2004) Genetic analyses from ancient DNA. Annu Rev Genet 38:645–679 Pacher M, Stuart AJ (2009) Extinction chronology and palaeobiology of the cave bear (Ursus spelaeus). Boreas 38:189–206 Pages M, Calvignac S, Klein C, Paris M, Hughes S, Hanni C (2008) Combined analysis of fourteen nuclear genes refines the Ursidae phylogeny. Mol Phylogenet Evol 47:73–83 Peigne S, Goillot C, Germonpre M, Blondel C, Bignon O, Merceron G (2009) Predormancy omnivory in European cave bears evidenced by a dental microwear analysis of Ursus spelaeus from Goyet, Belgium. Proc Natl Acad Sci USA 106:15390–15393 Richards MP, Pacher M, Stiller M, Quiles J, Hofreiter M, Constantin S, Zilhao J, Trinkaus E (2008) Isotopic evidence for omnivory among European cave bears: late pleistocene Ursus spelaeus from the Pestera cu Oase, Romania. Proc Natl Acad Sci USA 105:600–604 Rohland N, Siedel H, Hofreiter M (2004) Nondestructive DNA extraction method for mitochondrial DNA analyses of museum specimens. Biotechniques 36(814–816):818–821 Rohland N, Malaspinas AS, Pollack JL, Slatkin M, Matheus P, Hofreiter M (2007) Proboscidean mitogenomics: chronology and mode of elephant evolution using mastodon as outgroup. PLoS Biol 5:e207

356

C. Bon and J.-M. Elalouf

Taberlet P, Bouvet J (1994) Mitochondrial DNA polymorphism, phylogeography, and conservation genetics of the brown bear Ursus arctos in Europe. Proc Biol Sci 255:195–200 Talbot SL, Shields GF (1996a) A phylogeny of the bears (Ursidae) inferred from complete sequences of three mitochondrial genes. Mol Phylogenet Evol 5:567–575 Talbot SL, Shields GF (1996b) Phylogeography of brown bears (Ursus arctos) of Alaska and paraphyly within the Ursidae. Mol Phylogenet Evol 5:477–494 Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24:1596–1599 Valladas H, Clottes J, Geneste JM, Garcia MA, Arnold M, Cachier H, Tisnerat-Laborde N (2001) Palaeolithic paintings. Evolution of prehistoric cave art. Nature 413:479 Waits LP, Sullivan J, O’Brien SJ, Ward RH (1999) Rapid radiation events in the family Ursidae indicated by likelihood phylogenetic estimation from multiple fragments of mtDNA. Mol Phylogenet Evol 13:82–92 Yu L, Li QW, Ryder OA, Zhang YP (2004) Phylogeny of the bears (Ursidae) based on nuclear and mitochondrial genes. Mol Phylogenet Evol 32:480–494 Yu L, Li YW, Ryder OA, Zhang YP (2007) Analysis of complete mitochondrial genome sequences increases phylogenetic resolution of bears (Ursidae), a mammalian family that experienced rapid speciation. BMC Evol Biol 7:198

Index

A Accessory, 250, 251, 254, 257–260, 262 Actinobacteria, 303 Actinorhizal plants, 303 Adaptations, 8, 50, 53, 60, 82, 83, 95, 96 Adaption, 82, 84–90, 95 Adaptive radiation, 13, 283–297 Aeschynomene, 303 Ag–NOR staining, 10 Agrobacterium radiobacter, 309 rhizogenes, 309 tumefaciens, 306, 309 vitis, 309 Allopatry, 50 Alpha, 119 Alpha-lactalbumin, 118, 121, 127 Alternative splicing, 31, 38 Amazon, 284, 289–293, 296, 297 Amines, 260 Amniotes, 3, 4, 6, 7, 12, 13 Ancestral area, 289, 290, 292 Ancestral karyotype, 144–146, 153 Ancient DNA, 346–348, 354 Andes, 285, 290, 291, 293 Anesthetic, 253, 255, 260–262 Antarctic fur seal (Arctocephalus gazella), 127 Antennal modification antennal hammer, 271–280 Anticoagulant, 255, 259–262 Aphid Acyrthosiphon pisum, 133–136 Aphis gossypii, 133, 134, 137 Myzus persicae, 133, 134, 137 Apparatus, 250, 253, 257, 258

Appressorium ascospores, 319, 324 Area cladogram, 288, 289, 291, 292 Aromatase, 7 Ascoviruses Diadromus pulchellus, 238, 244, 245 Heliothis virescens, 238 Spodoptera frugiperda, 237, 238 Trichoplusia ni, 238 Azoarcus, 302 Azolla, 303 Azorhizobium, 301, 306 B Background selection, 9 Baculoviruses, 230, 232, 233, 236 Bats, 283–297 Bayesian, 10, 12 Bayesian inference, 285 Bdelloid rotifers, 104 Behavior, 283–297 Beta, 119, 120 Beta-lactoglobin, 121 Biased incrementalism, 91–93, 95 Birth and death model, 31, 35, 40 BLAST, 192 Bootstrap, 107 Bovine (bos Taurus), 116, 127, 128 Bracoviruses Chelonus inanitus, 236 Cotesia congregata, 235, 236 Glyptapanteles flavicoxis, 235 Glyptapanteles indiensis, 235 Bradyrhizobium canariense, 306 japonicum, 306

357

358 Brown bear, 344, 346, 347, 349, 351 Buccinidae, 253, 254, 258 Buccinids, 254, 258, 260 C California sea lion (Zalophus californianus), 127 Cancellariid, 256, 262 Cancellariidae, 250, 255, 256, 259, 262 Cancellarioidea, 250, 252, 256 Cape fur seal (Arctocephalus pusillus), 126, 127 Caseins, 118–122, 127, 128 Cave bear, 343–354 C-banding, 10 Charnov–Bull hypothesis, 7 Chauvet-Pont d’Arc, 343–354 Chdl, 285 Chemical alterations, 347 Choline, 259, 260 Chromogens, 260 Chromosomal inversions, 52, 55 Chromosomal rearrangements, 51, 52, 55, 58, 59, 61 Chromosomal theory of speciation, 51, 61 Chromosome rearrangements, 55 CNGs. See Conserved nongenic sequences Codon reassignments ambiguous intermediate mechanism, 86–90 codon capture mechanism, 86, 87 Coevolution, 302 Colinearity, 55 Colubrariidae, 255, 262 Columbellidae, 254, 258 Comparative analysis CAIC, 277 Comparative genomics, 10, 19–20, 25, 26, 29, 31, 40, 41 Complexity hypothesis, 102 Concerted evolution, 203–204 Conidae, 250 Connectivity analysis, 106, 108, 109, 111 Conoidea, 250, 252, 253, 257, 259, 261 Conopeptides, 259 Conotoxins, 251, 252, 257–261, 263 Conserved nongenic sequences (CNGs), 191 Constraints, 19–41 Convergence, 5 Convergent evolution, 302, 317–326 Coralliophilinae, 253, 255, 256, 261, 262 Corallivory, 254, 256, 262 Costellariidae, 259

Index Cot curve analysis, 188 Cow, 117, 121, 128 Cryptinae, 273–275, 278, 279 Cryptosporidium, 107, 109 Cyanobacteria, 302, 303 Cycads, 302 Cytb, 285 D Dby, 285 Deletions, 55, 56, 58 Deux-Ouvertures Cave, 353 Developmental biology, 161 Diatoms, 107, 108, 110 Diclidurini, 284, 286, 287, 289, 293 Divergence times, 351, 352 Diversity, 252, 253, 256, 263, 264 Dmrt1, 10 Dobzhansky, T., 50–52, 59 Dosage compensation, 12 Dosage sensitivity, 201 Drug targets, 106–110 Duplication Genome duplication, 134 Lineage specific duplications, 138 Paralogs, 133, 136 E Early lactation protein (ELP), 123 Ear morphology, 295–297 Echidnas (Tachyglossus and zaglosus), 116 Echolocation, 295–297 E.C. number, 105, 109 Ecotones, 334, 335 Efficiency of sporulation, 54, 56 ELP. See Early lactation protein EM. See Error minimization Emballonuridae, 284, 289, 293 Embryos, 3, 7, 8 Emergence, 81–96 Endoparasitic wasps Braconidae Chelonus inanitus, 236 Cotesia congregata, 235, 236 Cotesia marginiventris, 234, 244 Glyptapanteles flavicoxis, 235 Glyptapanteles indiensis, 235 Microplitis croceipes, 234 Ichneumonidae Campoletis sonorensis, 230, 241, 244 Cardiochiles nigriceps, 230 Eiphosoma vitticolle, 237, 239 Hyposoter didymator, 236

Index Hyposoter fugitivus, 240 Venturia canescens, 230 Endosymbiont bacteria, 212 eukaryote, 209 facultative, 210 obligate, 210 primary, 210 reproductive, 211 secondary, 210 Endosymbiosis, 103, 104, 108 Enrichment analysis, 106, 111 ENU mutagenesis, 202 Environmental stress, 54 Enzymes, 103–109, 111 Epistasis, 36, 52 Ergalataxinae, 253, 256 Error minimization (EM), 83–91 Esters, 259, 260 Estrogen, 6 Eukaryotes, 102–108, 111 Eumycetes and Fungi Botrytis cinerea, 318, 322 Magnaporthe grisea, 322 Neurospora crassa, 320, 322 Penicillium chrysogenum, 322, 324–326 Podospora anserina, 320, 322 Rhizopus oryzae, 323–326 Trichoderma reesei, 320, 322 Trichoderma species, 326 Eutheria (eutherian or placentalia), 116 Evolution, 249–265 convergent, 182 divergent, 182 Evolutionary breakpoints, 144, 147–150 Evolutionary constraints, 190, 194, 200 Evolutionary rates Divergence time, 144 Mutations, 133 Omega ratio (dN/dS), 134, 137–140 Synonymous non-substitution rate (dN), 134, 135, 137 Synonymous substitution rate (dS), 134, 135, 137 Evolvability, 95 Exogenes, 263, 265 Exons, 26, 38 Extinction, 9 Eye camera, 182–185 compound, 181–183 mirror, 182 pinhole, 182

359 F Fadrozole, 6 Fasciolariidae, 254, 258 Feeding, 250, 252–256, 262 Fitness change, 75–77 Fitness landscape, 33, 34, 36 Fluorescent in situ hybridization (FISH), 10, 11 Forest, 294, 296, 297 Functional constraints, 200–203 G Gene architecture, 26 Gene-conversion, 203–204 Gene duplication, 29, 31 Gene expression, 160, 163, 171 Gene identity intervals interspecies, 308 intraspecies, 308 Gene markers dnaJ, 308 dnaK, 309 rpoB, 308, 309 Genes, 253, 261–263, 265 Genetic code adaptive code hypothesis, 84–90 emergence hypothesis, 90–91 Genetic code evolution, 85, 90, 91 Genetic diversity, 353–354 Gene transfer lateral, horizontal, 232 Genic theory of speciation, 51 Genome architecture, 19, 20, 23, 26–29, 35, 37, 38, 40 Genome 10K, 13 Genome sequence, 19, 23 Genomic, 56, 58 Genomic rearrangements, 51, 52, 55–61 Genomic structure, 188–190 Genotype environment, 8–9 Gland, 250, 251, 257–260, 262 Goats, 121, 123, 128 Grey seal (Halichoerus grypus), 127 Guiana Shield, 291, 297 Gunnera, 303 H Haematophagous, 255, 257, 259, 262 Haematophagy, 254–256 Haplogroup, 348, 353 Haplotypes, 348, 353, 354 Harbour seal (Phoca vitulina), 127 Harpidae, 259

360 Harpooning, 253 Hemiplasy, 144, 150–154 Herbaspirillum, 302 Heterogamety, 4–8, 10–12 Heteromorphic sex chromosomes, 4, 9 Hill–Robertson effect, 9 Histamine, 259 Historical biogeography, 284, 287–293 Hitchhiking, 9 Homoplasy, 144, 151, 153 Horizontal gene transfer (HGT), 101–104, 106–109, 111 Horizontal transfer, 202–204 Host location, 272–274, 279, 280 Hosts, 272–274, 278–280 Human chromosome 2, 195 Human chromosome 21, 191, 194, 195, 201 Hybrid fertility, 55, 57, 58 Hybridization, 4, 10 Hypobranchial gland, 251, 260 Hypolimnas bolina Hypolimnas bolina resistance, 221 I Ichneumonidae, 271–273 Ichnoviruses Campoletis sonorensis, 240–244 Cardiochiles nigriceps, 230 Hyposoter fugitivus, 240 Tranosema rostrales, 240 Immunosuppressive genes Imd, 232 Toll, 232 Inactivation, 12 Incipient, 50, 56, 58, 60 Incubation, 4–8 Insertions, 55, 56 Interaction, 8–9 Introns, 21, 23, 25, 26, 38, 39 Inversions, 52, 55, 56 Iridoviruses Chilo suppressalis, 237 Isolation, 49–61 J Junk DNA, 190 K Kappa, 119, 120 Karyotype, 4, 5 KEGG, 106, 111

Index L Lactotransferin, 121 LALBA, 127 Lateral transfer, 304, 306 Legume plants Phaseolus vulgaris, 306 Leishmania, 107, 108, 111 Lepidopterans Chilo suppressalis, 237 Ephestia kuehniella, 230 Heliothis armigera, 237 Heliothis zea, 233 Spodoptera frugiperda, 234 Trichoplusia ni, 235 Likelihood, 10 LINEs. See Long interspersed elements Lipopolysaccharides, 111 LLP-A, 123 LLP-B, 123 Long conserved noncoding sequences (LNCS), 192 Long interspersed elements (LINEs), 189 M Mammaliaforms, 116 Mammals, 116–122, 124, 126–129 Marginellid, 254 Marginellidae, 255, 256, 259 Markov-chain Monte Carlo, 12 McDonald–Kreitman test, 23, 24 Melongenidae, 254, 258 Melospiza melodia, 331, 332, 340 Mesorhizobium amorphae, 306 loti, 303 Metabolic enzymes, 103, 104, 111 Metatheria (marsupials or Marsupialia), 116 metaTIGER, 104–111 Methylobacterium, 306 Microarray interspecies array, 183, 184 Microevolution, 8 Migration, 9 Milk proteins, 116–119, 122–128 Minimal gene set, 29, 30 Miocene, 287, 288, 290–292, 295, 297 Misfolding, 33, 34, 40 Mismatch repair, 51, 55 Mitochondria, 103, 104 Mitochondrial genome, 346–354 Mitridae, 253, 259, 260 Molecular dating, 284, 287, 288

Index Molecular evolution, 67, 68, 78 Molluscs cephalopod, 182–184 nautilus, 182–185 octopus, 182–184 pectin, 182–185 squid, 182–185 Morphogenetic gradient dorsal gradient, 162, 163, 167, 169–171 dpp gradient, 168, 171 gradient, 160, 164, 166, 167, 169–172 Morphology, 283–297 Mouse chromosome 2, 195 Muller’s Ratchet, 9, 10 Muricidae, 253, 255, 256, 259, 261 Muricids, 254, 259, 260 Muricoidea, 250, 252 Mutation, 188, 192, 194, 200–202 beneficial, 69, 75–78 deleterious, 69, 75–77 neutral, 75 Mutational cold spot, 201–202 Mutational load, 55 Mutation robustness error minimization (EM), 83, 90, 91 extrinsic, 94 intrinsic, 94, 95 Mutation-selection equilibrium, 73, 75, 77 Mycorrhizal symbiosis, 304 N NADPH oxidase, 320, 321, 325 Nassariid, 254 Nassariidae, 254, 258, 259 Natural science, Natural selection, 82–84, 91–96 Neotropics, 283–297 Nervous system neural, 159–167, 172 neuroblast, 164–167, 172 Networks, 20, 26, 31–35, 37, 40 Neurotoxins, 250, 251, 258, 260–262 Neutral networks, 91, 93–95 New World emballonurid bats, 284–288, 290, 292, 294–296 Nitrogen fixation, 302, 304, 306 Nodulation factors nodB, 303 nodC, 303 Noncoding sequences, 20, 23 Nonorthologous gene displacement, 29, 31 Nonsynonymous substitutions, 21 Northern Amazon, 289–292, 296 Nudiviruses, 231, 233–236, 244

361 O Odobenids, 125 Oligocene, 287, 288, 295, 296 Olividae, 259 One-band-one-gene hypothesis, 188 Operons, 23, 27, 28 Organelles immunosuppressive, 229–245 Origin of life, 67, 68 Ortholog, 134, 137 Orthologous, 28–32, 35, 38 Ostreococcus, 107, 111 Otariids (sea lions, fur seals), 125, 127 Oviparity, 12, 13 P Paleolithic, 343–354 Pan-genome, 102 Paralogs, 29, 31, 35, 40 Parsimony, 10, 285, 286, 293, 294 Particles immunosuppressive, 231–234 Patterning, 159–172 Pelage, 294–295 Peptides, 252, 253, 257, 258, 261, 263, 264 Phenomic, 19, 32–36, 39, 40 Phocids (true seals), 125, 127 Photoreceptors, 181, 183 Phylogenetic trees, 101–112 Phylogenies, 304, 306–310 Phylogeny, 284, 285, 287, 293–296, 349–353 Phytophthora, 107, 109, 111 Pinniped, 125–126 Plasmodium, 107, 108 Plasticity, 19–41 Plastids, 103, 104, 107–109, 111 Platypus (Ornithorhynchus anatinus), 116, 118, 119, 121 Pleiotropy, 36 Pleistocene, 287, 290, 291, 346, 352, 354 Pleistocene refugia, 335, 336 Pleistocene refugium, 336, 337 Pliocene, 283, 287, 291 Polydnaviruses, 232–234, 236, 241, 243 Polygenic inheritance, 8 Polymorphisms, 151–153 Positive selection, 21–25 Poxviruses Diachasmimorpha longicaudata poxvirus, 244 Preferential attachment, 91, 92 Prezygotic, 50, 61 Prialt, 252, 260 PRIAM, 105, 106

362 Primary, 250, 257–260, 262 Production, 257, 258, 260, 262 Profiling, 263–265 Prokaryotes, 23, 27–30, 36, 37, 39, 102–103, 107, 108, 111 Promiscuous domains, 26 Proteomics, 264, 265 Prototheria (monotreme or Monotrema), 116 Pseudaptation, 81–96 Pseudogenes, 21, 23 PSI-BLAST, 105, 106 PTMP-1, 123 PTMP-2, 123 Q Quasispecies, 68, 72–74, 78 R RAC 2 (myoblast fusion), Radiation, 351 Radiocarbon age, 345, 354 Radula, 250, 253–257 Rearrangements, 51–53, 55–61 Reciprocical Best Hit, 133, 136 Recombination, 9, 12 Red kangaroo (macropus rufus), 122 Regulators, 30, 31 Reinforcing mechanism, 50 Relative reproductive isolation, 50 Repeat masking, 192 Replication, 68, 71, 72, 75–78 Reproductive, 49–61 Reproductive barrier, 55, 56, 58–60 Reproductive isolation, 49–61, 331, 338 Rhizobia, 301–310 Rhizobium R. cellulosilyticum, 309 R. daejeonense, 309 R. etli, 303, 306 R. fabae, 306 R. galegae, 309 R. huautlense, 309 R. leguminosarum, 306 R. lusitanum, 309 R. mongolense, 309 R. pisi, 306 R. selenireducens, 309 R. tropici, 306 Ringed seal (Pusa hispida), 127 Ring species Ensatina eschscholtzii, 331, 333 Glossina morsitans, 331 Lanius, 331, 333 Melospiza melodia, 331, 332, 340

Index Phylloscopus trochiloides, 331 Phylloscopus trochilus, 331 Zosterops, 331 RNA folding, 69–72 sequence-structure map, 68, 69, 72 world, 67–69, 71, 77 RNA complexity, 188 RNome, 25, 31 Robustness, 20, 32–34, 36–37, 40, 41 Roosts, 284, 293–295 Rot curve analysis, 188 S Saccharomyces, 107 Saccharomyces cerevisiae, 52, 54, 55 Salivary glands, 250, 251, 257–260, 262 Savannahs, 288, 296, 297 Scale free networks, 91–92 Scaling, 30, 31, 40 Scaling, size, 160, 167–172 SDs. See Segmental duplications Secretion, 253, 257–260, 262 Segmental duplications (SDs), 144, 147–150, 153 Selection Adaptation, 133, 140 Fast-evolving genes, 137, 139–140 Positive selection, 134, 140 Relaxed selection, 134, 137, 140 Selective pressure, 68, 72, 78 Selfish operon, 28 Sequence data Coding sequence (CDS), 135 Expressed sequence tag (ESTs), 133, 137 Pea aphid genome, 134, 136, 140 Sequences, 10, 11, 13, 14 Sequencing, 116–125, 129 Sex determination, 4–8, 10–13 SHARKhunt, 105, 106 Shell drilling, 254 Shell wedging, 254 Short interspersed elements (SINEs), 189 Signaling pathways BMP signaling pathway/BMP signaling, 165, 166 SINEs. See Short interspersed elements Single nucleotide polymorphisms (SNPs), 190, 191, 202 Sinorhizobium S. chiapanecum, 308 S. mexicanum, 308 S. terangae, 306 SNPs. See Single nucleotide polymorphisms

Index Song Sparrow, 329–340 SOS, 53 South America, 283, 284, 286, 288–291, 293, 296 Spandrel, 82 Speciation, 50–56, 58, 60, 61 allopatric, 330, 335 ecological, 334, 338–340 Species, 49–58, 60, 61 Sporulation efficiency, 54, 56–61 16S rRNA, 308–310 Sry, 9 Starvation, 49–61 Stochastic approaches, 10 Subspecies, 330, 331, 333, 335, 338, 339 Symbiogenesis genome fusion, 233 Symbiosis, 234, 301, 302, 304 Symbiotic genes nif, 304 nodA, 303–305 nodABC, 303 nodB, 303 nodBC, 303 nodC, 303 Symbiotic islands, 306 Symbiotic plasmids, 306, 307 repA, 307 repABC, 307 repB, 307 repC, 307 Synaptid, 116 Synonymous sites, 21, 22, 23, 25, 26, 32, 33 Synonymous substitutions, 21 Syntenies, 144, 145, 151, 152 T Tandem repeats, 147, 148 Taxon-pulse, 291, 292, 297 Terebridae, 250, 264 Terebrids, 253, 258 Teretoxins, 258, 264 TEs. See Transposable elements Testosterone, 6 Tetramine, 258 Tetraploid, 51, 59, 60 Tetraploidization, 51, 52, 55, 59 Tetraspanin, 320, 321 Theileria, 107, 108 Therapsid, 116 Theria, 116 Toxins, 250–252, 257–259, 261, 262–263, 265 Toxoplasma, 107, 108

363 Transferomics, 101–112 Translocations, 55, 56 Transposable elements (TEs), 25, 147, 149–150 Transpositions, 53, 55, 56 Trichosurin, 123 Trypanosoma, 107, 108 Turrids, 250, 253, 258 Turritoxins, 258 U UCEs. See Ultraconserved elements Ultraconserved elements (UCEs), 191, 194–195, 201–204 Underdominance, 51, 52 Ursidae, 348, 349, 351, 352, 354 Ursinae, 349, 351, 352 Ursus spelaeus, 344, 346, 349–351, 353 Usp9x, 285, 293 V Venom, 250, 253, 257–259, 261–265 Vibrational sounding, 273, 279 Viviparity, 4, 12, 13 Volutidae, 253, 259, 260 Volutomitridae, 259 W Wallaby (macropus eugenii), 122–124, 127 Walrus, 125 WAP. See Whey acidic protein WDC2, 121 Whey acidic protein (WAP), 118, 121, 122, 123, 128 Whole-genome, 13 Wing sac, 294, 295 Within-area specification events, 290 Wolbachia cytoplasmic incompatibility (CI), 211, 215, 217, 220 male-killing (MK), 209–222 supergroup, 210, 211 transmission, 211, 216 wBol1, 212–216, 218–221 wBol2, 214, 215, 217 wPip, 220 Wood boring beetles Wood-boring, 273, 277 X X chromosome, 5, 12 Y Y chromosome, 195, 201