Pathogenomics: Genome Analysis of Pathogenic Microbes

Pathogenomics Edited by Jrg Hacker and Ulrich Dobrindt Related Titles Schumann, W. Streips, U. N., Yasbin, R. E. (e...

Author: Werner Gobel (Foreword) | Jorg Hacker (Editor) | Ulrich Dobrindt (Editor)

59 downloads 1237 Views 14MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Pathogenomics Edited by Jrg Hacker and Ulrich Dobrindt

Related Titles Schumann, W.

Streips, U. N., Yasbin, R. E. (eds.)

Dynamics of the Bacterial Chromosome

Modern Microbial Genetics

Structure and Function 2006 ISBN 3-527-30496-7

Black, J. G.

Microbiology Principles and Explorations

2002 ISBN 0-471-38665-0

Gellissen, G. (ed.)

Hansenula polymorpha Biology and Applications 2002 ISBN 3-527-30341-3

2005 ISBN 0-471-42084-0

Sensen, C. W. (ed.)

Zhou, J., Thompson, D. K., Xu, Y., Tiedje, J. M.

Essentials of Genomics and Bioinformatics

Microbial Functional Genomics 2004 ISBN 0-471-07190-0

Grandi, G. (ed.)

Genomics, Proteomics and Vaccines 2004 ISBN 0-470-85616-5

2002 ISBN 3-527-30541-6

Singleton, P., Sainsbury, D.

Dictionary of Microbiology and Molecular Biology 2001 ISBN 0-471-94150-6

Bahl, H., Drre, P.

Dale, J. W., Park, S.

Clostridia

Molecular Genetics of Bacteria

Biotechnology and Medical Applications

2004 ISBN 0-470-85084-1

2001 ISBN 3-527-30175-5

Pathogenomics Genome Analysis of Pathogenic Microbes Edited by Jrg Hacker and Ulrich Dobrindt

Series Editors Prof. Dr. Dr. h. c. mult. Jrg Hacker Universitt Wrzburg Institut fr Molekulare Infektionsbiologie Rntgenring 11 97070 Wrzburg Germany

Dr. Ulrich Dobrindt Universitt Wrzburg Institut fr Molekulare Infektionsbiologie Rntgenring 11 97070 Wrzburg Germany

&

All books published by Wiley-VCH are carefully produced. Nevertheless, authors, editors, and publisher do not warrant the information contained in these books, including this book, to be free of errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate. Library of Congress Card No.: applied for British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at . 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – nor transmitted or translated into machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law. Printed in the Federal Republic of Germany. Printed on acid-free paper. Typesetting Khn & Weyh, Satz und Medien, Freiburg Printing betz-druck GmbH, Darmstadt Bookbinding J. Schffer GmbH i. G., Grnstadt

ISBN-13: ISBN-10:

978-3-527-31265-8 3-527-31265-X

V

Foreword The determination of the genome sequences of many prokaryotic and eukaryotic organisms, together with the high-throughput techniques (transcriptomics, proteomics, metabolomics, interactomics, etc.) and the powerful tools of bioinformatics, have opened up to us all new perspectives for the deeper understanding of the basic mechanisms of life. Because of the small size of bacterial genomes and the collinearity of their genetic information, prokaryotes are particularly suited to genome-based in-depth analysis of the essential processes which allow these microorganisms to survive and replicate in many different environments. Some of them can even multiply in those parts of the human body which are normally well protected by highly developed antimicrobial defense mechanisms. This latter property is the most outstanding evolutionary achievement of microbial pathogens that are capable of causing infectious diseases in humans. The present monograph summarizes the state of genome-based research on some of the most important bacterial (and to a lesser extent viral and fungal) human pathogens today. The term pathogenomics has been coined for this new branch of microbiology. Not surprisingly, the major focus of pathogenomics was first on those human bacterial pathogens that (a) can cause major epidemics, especially in developing countries (e.g., Shigella spp. or Vibrio cholerae); (b) represent major health problems in almost all human societies in the form of food contaminants and/or agents of nosocomial infections (e.g., pathogenic Enterobacteriaceae, staphylococci, and streptococci); and (c), due to their frequent occurrence in humans (e.g., Helicobacter pylori and Mycobacterium tuberculosis) represent lifethreatening problems not only for many people in developing countries, but also for immune-compromised and elderly persons in all the industrialized countries. These microorganisms therefore also represent the major objects described in this book. A primary goal of pathogenomics is to use the new experimental tools in combination to unravel those genes – and gene arrangements – of pathogens that are essential for causing disease, and thereby to shed new light on the evolution, interbacterial gene transfer, and distribution of these virulence genes among bacterial populations. This area of pathogenomics, which is already highly advanced, makes up the largest part of this monograph.

VI

Foreword

The other, certainly not less important goal is to study the functional significance of the pathogen-specific genes in the infection process. In the years to come, the enormous amount of genetic information that has piled up in the last 10 years from the sequencing of the genomes of most of the important bacterial pathogens (and their closely related nonpathogenic environmental counterparts) will need to be functionally analyzed. It is anticipated that this functional pathogenomics will finally unravel the mechanisms of differential and coordinated regulation of the virulence genes, the structural, molecular, and physiological functions of their gene products, which will lead to a more comprehensive view on the pathogenic microorganisms. Some aspects of this most interesting future line of microbial research that can be expected to become the mainstream in future pathogenomics are already addressed in some of the chapters of this book. Infections are the outcome of the encounter between the microbial pathogen and its host. Describing microbial infections in molecular terms therefore requires among other things a profound understanding of the host cell responses. The availability today of the genome sequences of man and some of the major model hosts in which microbial infections can be experimentally studied (the mouse in particular, but also alternative hosts such as amebae, the nematode Caenorhabditis elegans, Drosophila melanogaster, or the slime mold Dictyostelium) together with the new genetic and bioinformatics tools open up new avenues towards uncovering at least some of the basic host cell responses. Whilst cellular microbiology has already delivered an enormous set of valuable host-response information at the cellular level, the new in vivo imaging techniques, siRNA technology, and the routine genetic manipulations of some of the abovementioned model hosts may now allow such molecular infection studies to be performed in real hosts. This monograph devotes several chapters to this important aspect of pathogenomics. Science and, especially, the public and political worlds expect pathogenomics to provide novel ideas and strategies to combat infectious diseases through the rational design of better diagnostic tools and novel anti-infectives and vaccines. Indeed, some promising new developments deriving from pathogenomics are already visible and are in part outlined in some chapters of this monograph. These new approaches will most undoubtedly help in fighting the most dangerous infectious agents which still claim the highest death toll in mankind. However, scientists should be modest in their promises relating to the possibilities that arise from this exciting new field of science, since one major lesson which the attentive and critical reader of this monograph will quickly learn is the enormous genetic flexibility and adaptation potential of pathogenic microorganisms – and that means they will keep us busy for decades to come despite all the obvious successes of pathogenomics. The book thus offers a representative view of the present state of the art in the new research area of pathogenomics. It is a valuable and reasonably comprehensive source of information for scientists and advanced students who wish to become acquainted with this most exciting field of modern microbiology. Wrzburg, October 2005

Werner Goebel

VII

Contents Foreword Preface

V XIX

List of Contributors Color Plates

XXI

XXVII

I

Methods

1

Bioinformatics: Data Mining Among Genome Sequences Susanne Kneitz and Thomas Dandekar

1.1

Systematic Genome Analysis of Pathogens as a Basis for Pharmacogenomic Strategies 3 Direct Sequence Annotation Tools for Functional Genomics 4 Identification of Protein Function 4 Obtaining Protein Information from a Domain Server 5 Pathway Analysis 6 Network Analysis 7 Adaptation in Time and to Stimuli 8 Experimental Design for Microarray Analysis 8 Data Analysis 9 Pathogen-Specific Challenges 13 Pathogen Adaptation Potential 14 The Fight Against Resistance 14 Drug Design and Antibiotics 15 Annotation Platforms Suitable for Pathogenomics 15 Conclusions 17

1.2 1.3 1.4 1.5 1.6 1.7 1.7.1 1.7.2 1.8 1.9 1.10 1.11 1.12 1.13

1 3

VIII

Contents

2

Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity 21 Ben Sidders and Neil Stoker

2.1 2.2 2.2.1 2.2.2 2.3 2.3.1 2.3.2 2.3.3 2.4 2.4.1 2.4.1.1 2.4.1.2 2.4.2 2.4.3 2.4.4 2.4.5 2.5 2.5.1 2.5.2 2.5.3 2.5.3.1 2.5.3.2 2.5.3.3 2.5.6 2.5.7 2.5.8 2.6

Introduction 21 Development of Transcriptomics 21 From Genomics to Functional Genomics 21 From Gene to Whole Genome 22 Introducing the Microarray 23 What Is a Microarray? 23 The Affymetrix Gene Chip 23 The Spotted Microarray 24 Microarray Methods 25 Experimental Design 25 Type of Experiment 25 Replicates 28 RNA Extraction 28 Labeling/Reverse Transcription 30 Hybridization 31 Scanning 31 Data Normalization and Analysis 32 Image Quantification 32 Data Processing 33 Data Analysis 34 Detection of Differential Expression 34 Pattern Recognition 35 Graphical Representations 35 Microarray Analysis Tools 36 Microarray Follow-Up 36 Data Storage and Reanalysis 37 Transcriptomics: Where We Are Now and What’s to Come

3

Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus: Towards a Comprehensive Understanding of Cell Physiology and Pathogenicity Michael Hecker and Susanne Engelmann

3.1 3.2 3.2.1 3.2.2 3.3 3.3.1 3.3.2

43

Introduction 43 Proteomics of Bacillus subtilis: The Gram-positive Model Organism 46 The Vegetative Proteome 46 Proteomes of Nongrowing Cells: Proteomic Signatures of Stress/Starvation Stimuli 47 Physiological Proteomics of Staphylococcus aureus 53 The Postgenome Era of S. aureus 53 Proteomes of Growing and Nongrowing Cells 56

37

Contents

3.3.3 3.4

Extracellular Proteins and Pathogenicity Networks 62 Outlook: Second Generation Proteomics and New Fields in S. aureus Physiology and Infection Biology 65

4

Impact of Genome Sequences on Mutational Analysis of Fungal and Bacterial Pathogens 69 Vladimir Pelicic and Xavier Nassif 69

4.1 4.2 4.2.1 4.2.2 4.3 4.3.1 4.3.2 4.3.3 4.3.3.1 4.3.3.2 4.3.3.3 4.3.3.4 4.4

The Long Road from Sequence to Function 69 Classical Genetics Still at the Forefront in the Postgenome Era 70 Reverse Genetics 70 Transposon Mutagenesis 71 Genome-Scale Mutational Analyses 72 Saccharomyces cerevisiae 73 Bacterial Workhorses: E. coli and Bacillus subtilis Bacterial Pathogens 75 Mycoplasma Species 76 Pseudomonas aeruginosa 76 Staphylococcus aureus 77 Neisseria meningitidis 77 Conclusion 79

II

Genomics of Pathogenic Bacteria

5

Pathogenomics of Escherichia coli and Shigella Species Ulrich Dobrindt and Jrg Hacker

5.1 5.2 5.3 5.3.1 5.3.2 5.3.3

Introduction 85 Comparative Genomics of Shigella 86 Comparative Genomics of Escherichia coli 92 Comparison of Complete Genome Sequences 92 Comparative Genomics Using DNA Arrays 94 Mobile Genetic Elements and Evolution of Pathogenic E. coli 95 Genomic Islands/Pathogenicity Islands 95 Plasmids and Bacteriophages 99 Genetic Diversity Among Extraintestinal Pathogenic E. coli 100 Conclusions 101

5.3.4 5.3.5 5.3.6 5.4

75

83 85

IX

X

Contents

6

Pathogenomics of Salmonella Species 109 Helene Andrews-Polymenis and Andreas J. Bumler

6.1 6.2 6.3 6.4

Introduction 109 Salmonella Signature Genes 109 Subspecies I Signature Genes 112 Host Restriction 115

7

Pathogenomics of Enterococcus faecalis Janet M. Manson and Michael S. Gilmore

7.1 7.2 7.3 7.3.1 7.3.2 7.3.3 7.3.4 7.3.4.1 7.3.4.2 7.3.5 7.4

Introduction 125 Enterococcal Pathogenesis 125 Genome Sequence of E. faecalis 126 Mobile Elements, Acquired DNA, and Antimicrobial Resistance 127 Environmental Adaptation and Stress Response 131 Survival In Vivo 133 Potential Virulence Factors 134 Hemolysins, Proteases, and other Enzymes 134 Cell-Wall-Associated Virulence Factors 136 Pathogenicity Island of E. faecalis 138 Conclusions and Future Perspectives 140

8

Genomics of Streptococci 149 Joseph J. Ferretti and W. Michael McShan

8.1 8.2 8.2.1 8.2.1.1 8.2.1.1.1 8.2.1.1.2 8.2.2.1 8.2.2.2 8.2.2.3 8.2.2 8.2.2.1 8.2.3 8.2.3.1 8.2.3.2

Introduction 149 Bacterial Genomes 152 Pyogenic Group 152 Streptococcus pyogenes 152 Virulence Factors 153 Horizontal Gene Transfer 154 Streptococcus agalactiae 155 Group C (GCS) and Group G Streptococci 156 Streptococcus uberis 157 Bovis Group 157 Streptococcus bovis and Streptococcus suis 157 Mitis Group 158 Streptococcus pneumoniae 158 Streptococcus mitis, Streptococcus sanguis, and Streptococcus gordonii 159 Anginosus and Salivarius Group 159 Streptococcus salivarius 159 Streptococcus thermophilus 160 Mutans Group 160 Streptococcus mutans and Streptococcus sobrinus 160

8.2.4 8.2.4.1 8.2.4.2 8.2.5 8.2.5.1

125

Contents

8.2.6 8.2.7 8.3 8.3.1 8.3.2 8.3.2.1 8.3.2.2 8.3.2.3 8.3.3

Other Organisms: Enterococcus faecalis 161 Comparative Genomics 161 Streptococcal Genomic Bacteriophages 162 Prophages and Streptococcal Genomes 162 GAS Genome Prophages 163 Prophages and Virulence Factors 163 Prophage Attachment Sites and Host Biology 165 Prophage Diversity 166 Prophages Associated with other Streptococcal Species

9

Pathogenic Staphylococci: Lessons from Comparative Genomics Knut Ohlsen, Martin Eckart, Christian Httinger, and Wilma Ziebuhr

9.1 9.2 9.2.1 9.2.2 9.2.2.1 9.2.2.2 9.2.2.3 9.2.3 9.2.3.1 9.2.3.2 9.2.3.3 9.2.3.4 9.3 9.3.1 9.3.2 9.3.3 9.3.4 9.3.5 9.3.6 9.4

Introduction 175 Comparative Genomics of S. aureus 176 Overall Genome Structure 177 Core Genome 178 Metabolism 178 Information Pathways 180 Virulence Factors 181 Accessory Genome 184 Pathogenicity Islands 185 Staphylococcal Cassette Chromosome 190 Bacteriophages 192 Plasmids 194 Staphylococcus epidermidis 195 Genomic Islands 197 Phage SPb and other Bacillus Genes 197 Virulence Factors 197 Staphylococcal Cassette Chromosome 198 Adherence and Biofilm Formation 199 Insertion Sequences 201 Concluding Remarks 202

10

Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases 211 Alexander S. Pym, Stephen V. Gordon, and Roland Brosch

10.1 10.2 10.3 10.4 10.5 10.5.1 10.5.2 10.6

Introduction 211 Molecular Basis of Pathogenicity 212 Evolution of the M. tuberculosis Complex 216 Some Metabolic Insight from the Genome Sequences Other Major Mycobacterial Human Pathogens 222 Mycobacterium leprae 222 Mycobacterium ulcerans 223 Concluding Remarks 224

166

220

175

XI

XII

Contents

11

Genomes of Pathogenic Neisseria Species 231 Christoph Schoen, Heike Claus, Ulrich Vogel, and Matthias Frosch

11.1 11.2 11.2.1 11.2.2 11.2.2.1

Introduction 231 Genomes of Pathogenic Neisseria Species 232 The Flexible Genome Pool 233 Repetitive DNA Sequence Elements Govern Neisserial Biology 236 DNA Uptake Sequences, Horizontal Gene Transfer, and Antigenic Diversity 236 Simple Sequence Repeats and Phase Variation 237 Insertion Sequences and the Regulation of Gene Expression 238 Genome-Wide Mutational Analyses 239 Comparative Genomics 240 Novel Virulence Factors of Meningococci Identified by Genomic Approaches 244 Future Perspectives 249

11.2.2.2 11.2.2.3 11.2.3 11.2.4 11.2.5 11.3 12

Genomics of Pathogenic Clostridia and Bacilli 257 Armin Ehrenreich, Gerhard Gottschalk, and Holger Brggemann

12.1 12.1.1 12.1.2 12.1.3 12.1.4 12.1.5 12.1.6 12.2 12.2.1 12.2.2

Genomics of Pathogenic Clostridia spp. 257 Introduction 257 C. perfringens 258 C. tetani 260 C. botulinum 262 C. difficile 263 Conclusions and Perspectives 264 Genomics of Pathogenic Bacilli 265 Introduction 265 Pathogenic Properties of Bacilli not Belonging to the B. cereus Group 266 Pathogenicity of B. cereus 266 Pathogenicity of B. anthracis 268 Course of Anthrax 268 Virulence Factors of B. anthracis 269 Genome of B. anthracis 270 Chromosomal Genes 270 Genes Located on Plasmids pXO1 and pXO2 271 Regulation of Virulence Genes 272 Molecular Diversity in B. anthracis Genomes 272 Genome of a Highly Virulent B. cereus Strain Resembling B. anthracis in Pathogenesis 273 Comparison of B. cereus Group Genomes: How Did Pathogenicity Evolve? 273

12.2.3 12.2.4 12.2.4.1 12.2.4.2 12.2.5 12.2.5.1 12.2.5.2 12.2.5.3 12.2.5.4 12.2.5.5 12.2.6

Contents

13

The Genomes of Pathogenic Bartonella Species 281 Carolin Frank, Eva Berglund, and Siv G. E. Andersson 281

13.1 13.1.1 13.1.2 13.2 13.2.1 13.2.2 13.3 13.4 13.5 13.5.1 13.5.2 13.5.3 13.5.4 13.6 13.6.1 13.6.2 13.6.3 13.7 13.8

Introduction 281 Bartonella in a Phylogenetic Context 281 Hosts and Vectors for Bartonella Species 282 Bartonella Species and Pathogenicity 284 Infection of Reservoir and Incidental Host 284 Bartonella Species as Human Pathogens 285 The Bartonella Genomes 286 Genomic Islands and Phages 286 Genomic Islands and Phages in Bartonella Species 287 The B. henselae Prophage 288 B. henselae Genomic Islands and Islets 288 B. quintana Harbors Remnants of the B. henselae Islands 290 Role of Phages and Islands in the Evolution of Bartonella 290 The Chromosome II-Like Segment in Bartonella 291 Type IV Secretion Systems in Bartonella Species 292 The virB-D4 Operon 293 The trw Operon 293 B. quintana’s Evolution into a Human Pathogen 294 Conclusions and Future Perspectives 295

14

Pathogenomics of Gastric and Enterohepatic Helicobacter Species Sebastian Suerbaum, Sandra Schwarz, and Christine Josenhans

14.1 14.2 14.2.1 14.2.1.1 14.2.1.2 14.2.1.3 14.2.1.4 14.2.1.5 14.2.1.6 14.3 14.3.1 14.3.2 14.4

Introduction 301 Helicobacter pylori 302 Key Features of the H. pylori Genome Related to Pathogenesis Colonization Factors: Urease and Motility 302 Phase Variation 302 The H. pylori Outer Membrane Protein Family 303 Intraspecies Variation of H. pylori Genomes 303 The cag Pathogenicity Island 304 Nucleotide Sequence Variation in H. pylori 305 Helicobacter hepaticus 305 The HHGI1 Genomic Island 306 Other Putative H. hepaticus Virulence Factors 307 Genome Comparisons of Gastric and Enterohepatic Helicobacter Species with Related Bacteria 307 Outlook 308

14.5

301

302

XIII

XIV

Contents

15

Genomics of the Opportunistic Pathogen Legionella pneumophila Christel Cazalet and Carmen Buchrieser

The Genus Legionella: Epidemiology, Life Cycle, and Pathogenesis 315 15.2 Genomics of Legionella pneumophila 316 15.3 Specific Features of the Legionella Genomes 318 15.3.1 Eukaryotic-like Proteins in Legionella pneumophila: Modulation of Host Functions? 318 15.3.2 Secretion Machineries of L. pneumophila: Central to Its Life and to Pathogenesis 324 15.3.2.1 Type IV Secretion Systems in Legionella 325 15.3.2.1.1 The dot/icm Type IVB Secretion System 325 15.3.2.1.2 The lvh Type IVA Secretion System 326 15.3.2.2 A Putative Type I Secretion System in Legionella 326 15.3.2.3 A Type II Secretion System in Legionella 327 15.3.2.4 Secretion Across the Cytoplasmic Membrane 327 15.3.2.5 A Putative Type V Secretion System (Autotransporter) Specific to Strain Paris 328 15.3.3 Comparative Genomics: Diversity of the Species L. pneumophila 15.3.3.1 Genomic (Pathogenicity) Islands in the L. pneumophila Genomes 329 15.3.3.2 Plasmids and Genetic Diversity of L. pneumophila 331 15.4 Conclusions 333

315

15.1

329

16

Genomics of Listeria monocytogenes Michael Kuhn and Werner Goebel

16.1 16.2 16.3

Introduction: From Pregenomics to Postgenomics 339 Listeria monocytogenes: A Facultative Intracellular Pathogen 339 Listeria monocytogenes Genetics in the Pregenomic Era: Identification and Characterization of Important Virulence Factors 340 Internalins and the Invasion of Nonprofessional Phagocytic Cells 340 Listeriolysin O and Two Listerial Phospholipases Allow Escape from the Phagocytic Vacuole 342 Intracellular Motility and Cell-to-Cell Spread: The Surface Protein ActA 343 PrfA and the Regulation of Virulence Gene Expression 344 Genome Sequence of L. monocytogenes and Its Comparison with the Closely Related L. innocua 346 Genomic Approaches to Studying the other Members of the Genus Listeria 348 Evolutionary Aspects 349

16.3.1 16.3.2 16.3.3 16.3.4 16.4 16.5 16.6

339

Contents

16.7.5 16.8 16.9 16.10

Identification of Listerial Virulence Factors in the Postgenomic Era 351 Internalins and Other Surface Proteins 351 Growth in the Host Cell Cytoplasm 352 Resistance to Bile 353 Two-component Systems and the Regulation of Virulence Gene Expression 354 Vitamin B12 Biosynthesis and Anaerobic Use of Ethanolamine Proteomics 355 Transcriptomics 356 Conclusions 358

III

Genomics of Pathogens and Their Hosts: Applications

17

Genomics of Viruses 369 Esteban Domingo, Alejandro Brun, Jos Ignacio Nuez, Juan Cristina, Carlos Briones, and Cristina Escarms

17.1 17.2 17.3 17.4

Introduction: Wide Scope of Virogenomics 369 Retrieving Information 371 Applications of Data Banks to Virology 374 Beyond Reference Strains: Towards a Second-Generation Virogenomics? 379 Virogenomics Through Microarrays 382

16.7 16.7.1 16.7.2 16.7.3 16.7.4

17.5

367

18

Genomics of Pathogenic Fungi 389 Gerwald A. Khler, Alan Kuo, George Newport, and Nina Agabian

18.1 18.2 18.2.1 18.2.2 18.2.3 18.3 18.3.1 18.3.2 18.3.3 18.3.4 18.3.5 18.4 18.4.1 18.4.2 18.5 18.6

Introduction 389 Genomics of Primary Fungal Pathogens 390 Histoplasma 390 Coccidioides 396 Blastomyces and Paracoccidioides 396 Genomics of Opportunistic Fungal Pathogens 397 Aspergillus 397 Cryptococcus 398 Pneumocystis 401 Microsporidia 402 Candida 403 The Tool Box for Functional Genomics 406 Expression Analysis 407 Transformation and Mutagenesis 407 Fungal Virulence – From the Genomic Point of View 410 Conclusion 411

354

XV

XVI

Contents

19

Genomics of Pathogenic Parasites 417 Gabriele Pradel and Thomas James Templeton

19.1 19.2 19.3

Exploring the Genomes of Pathogenic Protozoans 418 The Shaping of the Proteomes of the Pathogenic Protists 420 Role of Horizontal Gene Transfer in Protozoan Genome Plasticity 421 The Apicomplexa 422 Plasmodium, the Malaria Parasite 423 Cryptosporidium 427 Toxoplasma 428 The Pathogenic Kinetoplastids 429 Trypanosoma 430 Leishmania 431 The Pathogenic Diplomanad Giardia and the Parabasalid Trichomonas 431 Postgenomic Strategies and the Search for Cure 433 Gene Expression Analysis 434 Proteomics 434 Drug and Vaccine Development 435 Vector Genetics 436

19.4 19.4.1 19.4.2 19.4.3 19.5 19.5.1 19.5.2 19.6 19.7 19.7.1 19.7.2 19.7.3 19.7.4 20

Model Host Systems: Tools for Comprehensive Analysis of Host–Pathogen Interactions 445 Michael Steinert and Gernot Glckner

20.1 20.2 20.3 20.4 20.5 20.6 20.7 20.8 20.9

Introduction 445 Host–Pathogen Interactions 446 Arabidopsis thaliana: A Plant as a Model for Human Disease 447 Dictyostelium discoideum: Perspectives from a Social Amoeba 448 Caenorhabditis elegans: Answers from a Worm 449 Drosophila melanogaster: A Fruitful Model 450 Danio rerio: Fishing for Knowledge 451 Mus musculus: Of Mice and Men 451 Clean Models and Dirty Reality 453

21

Expression Analysis of Human Genes During Infection Erwin Bohn and Ingo B. Autenrieth

21.1 21.2

Introduction 457 Comparison of Gene Expression Profiles of Macrophages and Dendritic Cells In Vitro Upon Infection with Different Pathogens 458 Septicemia 460 Gene Expression in Epithelial Cells Modulated by Bacteria 461 Helicobacter pylori 462

21.3 21.4 21.4.1

457

Contents

21.4.2 21.4.3 21.4.4 21.5 21.6 21.7

Yersinia enterocolitica 464 Pseudomonas aeruginosa 466 Bartonella henselae 468 Common Signatures 469 Genetic Polymorphisms and Mutations Affect Gene Expression: Impact on Infection Susceptibility and Infection Course 471 Concluding Remarks 473

22

Pathogenomics: Application and New Diagnostic Tools Sren Schubert and Jrgen Heesemann

22.1 22.2

Introduction: “In Our Hands” 481 Microbiological Diagnostics of Bacterial Pathogens: Aims, Tasks, and Current Limitations 482 The Pregenomic Era: Conventional and Molecular Methods in Microbiological Diagnostics 483 Conventional Culture-Based Methods in Microbiological Diagnostics 483 Molecular Microbiological Diagnostic Methods 484 Typing of Bacterial Isolates Using 16S-rRNA 484 Fluorescence In Situ Hybridization 484 PCR Methods for Microbial Diagnostics 485 The Postgenomic Era: Use of DNA Microarrays in the Diagnosis of Infectious Diseases in Humans and Animals 487 DNA Arrays: Platforms, Techniques and Targets 487 Detection and Typing of Microbial Pathogens 488 Pathoarrays 489 16S-/23S-rDNA Arrays 494 Detection of Antibiotic Resistance in Microbial Pathogens Using Microarray Technology 494 Microarray Technology in Bacteria: Further Areas of Applications 495 Gene Expression Microarrays and Host–Pathogen Interaction 495 DNA Microarray Technology in Food Technology 496 DNA Microarray Technology in Environmental Microbiology 496 Pathogenomic Tools (Microarrays) in the Diagnosis of Microbiologic Agents as Bioweapons 497 Current Limitations on the Use of DNA Microarrays in Diagnostics in Medical Microbiological Laboratories 498 Final Remarks 498

22.3 22.3.1 22.3.2 22.3.2.1 22.3.2.2 22.3.2.3 22.4 22.4.1 22.4.2 22.4.3 22.4.4 22.4.5 22.5 22.5.1 22.5.2 22.5.3 22.5.4 22.6 22.7

481

XVII

XVIII

Contents

23

The Search for New Antibiotics 505 Harald Labischinski, Christoph Freiberg, and Heike Brtz-Oesterhelt

23.1 23.2 23.2.1 23.2.2 23.2.3 23.3

The Need for Novel Antibiotics 505 Where Will the New Antibiotics Come From? 507 The Past 507 The Present 509 Future Directions 511 Contributions of Genomic Technologies to Antibacterial Research 513 Target Identification and Validation 513 Target Prioritization 517 Genetic Tools for Drug Screening and Mode-of-Action Determination 517 Genome-Wide Expression Profiling for Mode-of-Action Characterization 520 Outlook for Genomic Technologies for Antibiotic Drug Discovery Alternative Approaches in Antibacterial Drug Discovery 521 Targeting the Resistance Mechanism 522 Extremely Narrow-Spectrum Drugs 523 Phage Therapies and other Bacteriolytic Approaches 524 Strategies for Reducing Virulence and/or Influencing Pathogenesis 525

23.3.1 23.3.2 23.3.3 23.3.4 23.3.5 23.4 23.4.1 23.4.2 23.4.3 23.4.4

24

Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design Laura Serino, Mariagrazia Pizza, and Rino Rappuoli 533

24.1 24.2 24.3 24.4 24.5 24.6 24.7 24.8

Impact of Genomics on Vaccine Design 533 MenB Vaccine Approach by Reverse Vaccinology 535 Following the MenB Experience: Other Pathogens 538 Functional Genomics 539 Gene Expression In Vivo: IVET and STM 540 Transcriptome Analysis and Comparative Genomics 541 Proteomics and Vaccine Design 546 Conclusions 546 Index

555

521

533

XIX

Preface In the year 1995, the first full genome sequence of a free-living-organism, the bacterium Haemophilus influenzae strain Rd, was published. This publication, which appeared in the journal Science, represented the starting point for a new field in molecular biology called genomics. Today, 10 years later, complete genome sequences of almost all the major pathogenic microbes have been determined. As a consequence, a new discipline has arisen, which has been named “pathogenomics.” As the name implies, pathogenomics is the analysis at the genomic level of the processes involved in bacterial pathogenesis caused by the interaction of pathogenic microbes and their hosts. The present volume is the first handbook to be entirely devoted to the newly established discipline of pathogenomics. We are very grateful to our colleagues for their input, especially those associated with the German Pathogenomics competence network, established by the German Federal Ministry of Science and Education to analyze pathogenic microbes at the genomic level. Werner Goebel in particular – the network’s speaker, who contributed the Foreword to this book’s preface – has influenced the entire discipline with his spirit and his vision. Our thanks are due to the staff at Wiley-VCH, most notably Andrea Pillmann, who encouraged us to put this book together. We are also very grateful to all our authors for their contributions to this important project. Wrzburg, September 2005

Ulrich Dobrindt, Jrg Hacker

XXI

List of Contributors Nina Agabian University of California San Francisco Department of Cell and Tissue Biology 521 Parnassus, Box 0640 San Francisco, CA 94143-0640 USA Siv Andersson Uppsala University Evolutionary Biology Center Department of Molecular Evolution Norbyvgen 18C 752 36 Uppsala University Sweden Helene Andrews-Polymenis Department of Medical Microbiology and Immunology College of Medicine Texas A&M University HSC 407 Reynolds Medical Building College Station, TX 77843-1114 USA Ingo B. Autenrieth Universittsklinikum Tbingen Institut fr Medizinische Mikrobiologie und Hygiene Elfriede-Aulhorn-Str. 6 72076 Tbingen Germany

Andreas Bumler Department of Medical Microbiology and Immunology School of Medicine University of California at Davis One Shields Ave. Davis, CA 95616-8645 USA Eva Berglund Uppsala University Evolutionary Biology Center Department of Molecular Evolution Norbyvgen 18C 752 36 Uppsala University Sweden Erwin Bohn Universittsklinikum Tbingen Institut fr Medizinische Mikrobiologie und Hygiene Elfriede-Aulhorn-Str. 6 72076 Tbingen Germany Carlos Briones Centro de Astrobiologa (CSIC-INTA) Carretera de Ajalvir, Km. 4 Torrejn de Ardoz 28850 Madrid Spain

XXII

List of Contributors

Roland Brosch Institut Pasteur Unit de Gntique Molculaire Bactrienne 28 Rue du Dr. Roux 75724 Paris Cedex 15 France Heike Brtz-Oesterhelt Bayer Healthcare AG Pharma Research Center Anti-Infective Research Gebude 405 42096 Wuppertal Germany Holger Brggemann Institut Pasteur Unit de Gnomique des Microorganismes Pathognes (GMP) 28, Rue du Dr Roux 75724 Paris, Cedex 15 France Alejandro Brun CISA-INIA Valdeolmos 28130 Madrid Spain Carmen Buchrieser Laboratoire de Gnomique des Microorganismes Pathognes Institut Pasteur 25 Rue du Dr. Roux 75724 Paris, Cedex 15 France Christel Cazalet Laboratoire de Gnomique des Microorganismes Pathognes Institut Pasteur 25 Rue du Dr. Roux 75724 Paris, Cedex 15 France

Heike Claus Institut fr Hygiene und Mikrobiologie Universitt Wrzburg Josef-Schneider-Str. 2 97080 Wrzburg Germany Juan Cristina Centro de Biologa Molecular “Severo Ochoa” (CSIC-UAM) Universidad Autnoma de Madrid Cantoblanco 28049 Madrid Spain Thomas Dandekar University of Wrzburg Chair of Bioinformatics Biocenter Am Hubland 97074 Wrzburg Germany Ulrich Dobrindt Institut fr Molekulare Infektionsbiologie Universitt Wrzburg R ntgenring 11 97070 Wrzburg Germany Esteban Domingo Centro de Biologa Molecular “Severo Ochoa” (CSIC-UAM) Universidad Autnoma de Madrid Cantoblanco 28049 Madrid Spain Martin Eckart Institut fr Molekulare Infektionsbiologie R ntgenring 11 97070 Wrzburg Germany

List of Contributors

Armin Ehrenreich G ttingen Genomics Laboratory Georg-August-Universitt G ttingen Grisebachstraße 8 37077 G ttingen Germany

Matthias Frosch Institut fr Hygiene und Mikrobiologie Universitt Wrzburg Josef-Schneider-Str. 2 97080 Wrzburg Germany

Susanne Engelmann Ernst-Moritz-Arndt-Universitt Institut fr Mikrobiologie Friedrich-Ludwig-Jahn-Straße 15 17487 Greifswald Germany

Michael S. Gilmore The Schepens Eye Research Institute and Harvard Medical School Department of Ophthalmology 20 Staniford St. Boston, MA 02114 USA

Cristina Escarms Centro de Biologa Molecular “Severo Ochoa” (CSIC-UAM) Universidad Autnoma de Madrid Cantoblanco 28049 Madrid Spain Joseph J. Ferretti Department of Microbiology and Immunology University of Oklahoma Health Sciences Center Oklahoma City, OK 73104 USA Carolin Frank Uppsala University Evolutionary Biology Center Department of Molecular Evolution Norbyvgen 18C 752 36 Uppsala University Sweden Christoph Freiberg Bayer Healthcare AG Pharma Research Center Anti-Infective Research Gebude 405 42096 Wuppertal Germany

Gernot Glckner Leibnitz-Institut fr Altersforschung Fritz-Lipmann-Institut Beutenbergstr. 11 07745 Jena Germany Werner Goebel University of Wrzburg Theodor-Boveri-Institute for Biological Sciences Chair of Microbiology Am Hubland 97074 Wrzburg Germany Stephen V. Gordon TB Research Group Veterinary Laboratories Agency Weybridge Woodham Lane, New Haw Addlestone, Surrey KT15 3NB United Kingdom

XXIII

XXIV

List of Contributors

Gerhard Gottschalk G ttingen Genomics Laboratory Georg-August-Universitt G ttingen Grisebachstraße 8 37077 G ttingen Germany Jrg Hacker Institut fr Molekulare Infektionsbiologie Universitt Wrzburg R ntgenring 11 97070 Wrzburg Germany Michael Hecker Ernst-Moritz-Arndt-Universitt Institut fr Mikrobiologie Friedrich-Ludwig-Jahn-Straße 15 17487 Greifswald Germany Jrgen Heesemann Max-von-Pettenkofer-Institut fr Hygiene und Medizinische Mikrobiologie Universitt Mnchen Pettenkoferstraße 9a 80336 Mnchen Germany Christian Httinger Institut fr Molekulare Infektionsbiologie R ntgenring 11 97070 Wrzburg Germany Christine Josenhans Hannover Medical School Institute for Medical Microbiology OE5210 Carl-Neuberg-Straße 1 30625 Hannover Germany

Susanne Kneitz IZKF, Labor fr MicroarrayAnwendungen Institut fr Virologie und Immunbiologie 97078 Wrzburg Germany Gerwald A. Khler University of California San Francisco Department of Cell and Tissue Biology 521 Parnassus, Box 0640 San Francisco, CA 94143-0640 USA Michael Kuhn University of Wrzburg Competence Network PathoGenoMik Biocenter Am Hubland 97074 Wrzburg Germany Alan Kuo University of California San Francisco Department of Cell and Tissue Biology 521 Parnassus, Box 0640 San Francisco, CA 94143-0640 USA Harald Labischinski Combinature Biopharm AG Robert-R ssle-Str. 10, Gebude 80 13125 Berlin Germany Janet M. Manson The Schepens Eye Research Institute and Harvard Medical School Department of Ophthalmology 20 Staniford St. Boston, MA 02114 USA

List of Contributors

W. Michael McShan Department of Pharmaceutical Sciences P.O. Box 26901, CPB 307 University of Oklahoma Health Sciences Center Oklahoma City, OK 73190 USA Xavier Nassif INSERM U570 Facult Ren Descartes Paris 5 Site Necker-Enfants Malades 156 rue de Vaugirard 75730 Paris Cedex 15 France

Mariagrazia Pizza Chiron Vaccines Via Fiorentina, 1 53100 Siena Italy Gabriele Pradel University of Wrzburg Research Center for Infections Diseases R ntgenring 11 97070 Wrzburg Germany

George Newport University of California San Francisco Department of Cell and Tissue Biology 521 Parnassus, Box 0640 San Francisco, CA 94143-0640 USA

Alexander S. Pym Unit for Clinical and Biomedical TB Research South African MRC 491 Ridge Road PO Box 70380 Overport, 4067 Durban South Africa

Jos Ignacio Nuez CISA-INIA Valdeolmos 28130 Madrid Spain

Rino Rappuoli Chiron Vaccines Via Fiorentina, 1 53100 Siena Italy

Knut Ohlsen Institut fr Molekulare Infektionsbiologie R ntgenring 11 97070 Wrzburg Germany

Christoph Schoen Institut fr Hygiene und Mikrobiologie Universitt Wrzburg Josef-Schneider-Str. 2 97080 Wrzburg Germany

Vladimir Pelicic INSERM U570 Facult Ren Descartes Paris 5 Site Necker-Enfants Malades 156 rue de Vaugirard 75730 Paris Cedex 15 France

Sren Schubert Max von Pettenkofer-Institut fr Hygiene und Medizinische Mikrobiologie Universitt Mnchen Pettenkoferstr. 9a 80336 Mnchen Germany

XXV

XXVI

List of Contributors

Sandra Schwarz Institute for Medical Microbiology OE5210 Hannover Medical School Carl-Neuberg-Straße 1 30625 Hannover Germany

Sebastian Suerbaum Hannover Medical School Institute for Medical Microbiology OE5210 Carl-Neuberg-Straße 1 30625 Hannover Germany

Laura Serino Chiron Vaccines Via Fiorentina, 1 53100 Siena Italy

Thomas J. Templeton Weill Medical College of Cornell University Department of Microbiology and Immunology 1300 York Avenue, Box 62 New York, NY 10021 USA

Ben Sidders The Royal Veterinary College Pathology and Infectious Diseases Royal College Street London, NW1 0TU United Kingdom Michael Steinert Universitt Wrzburg Institut fr Molekulare Infektionsbiologie R ntgenring 11 97070 Wrzburg Germany Neil Stoker The Royal Veterinary College Pathology and Infectious Diseases Royal College Street London, NW1 0TU United Kingdom

Ulrich Vogel Institut fr Hygiene und Mikrobiologie Universitt Wrzburg Josef-Schneider-Str. 2 97080 Wrzburg Germany Wilma Ziebuhr Institut fr Molekulare Infektionsbiologie R ntgenring 11 97070 Wrzburg Germany

XXVII

Color Plates

Fig. 3.2 Protein pattern of heat-shocked B. subtilis cells. During exposure to heat shock (37–to 48 C), B. subtilis cells were pulselabeled with 35S-l-methionine. After the separation of crude protein extract by 2-D gels, the proteins were visualized by silver staining and the protein synthesis rate was

determined by phosphoimaging. The two images were overlaid with the aid of the Delta-2D software package (dual-channel imaging technique [11]; Decodon, Greifswald, Germany). The red-labeled proteins form the heat stress stimulon. (This figure also appears on page 48.)

XXVIII

Color Plates

Fig. 3.4 The phosphate starvation stimulon of B. subtilis consists of both general stress proteins also induced by phosphate starvation (rB regulon, rB-dependent proteins marked in blue) and starvation-specific proteins (PhoR regulon, marked in red). The stress/starvation induction profile of both groups and the functions of the proteins

are indicated. C, control; h, heat stress; e, ethanol stress; s, NaCl stress; a, pH 5.5; o, oxidative stress (H2O2); p, puromycin; g, glucose starvation; O2, oxygen starvation; ph, phosphate starvation; aa, amino acids starvation. (This figure also appears on page 50.)

Color Plates

Fig. 3.5 Multicolor imaging of expression patterns under different growth conditions in B. subtilis. Delta-2D software (Decodon) was used to visualize complex protein expression patterns on the 2-D gel image in the standard pH range 4–7. The color code is presented in the top left corner. Proteins only induced by single stresses are colored red (H, heat), light-blue (O, oxidative stress), and orange

(E, ethanol), respectively. Proteins induced by oxidative stress as well as heat stress are colored yellow, proteins induced by ethanol stress and heat stress are colored dark-blue, proteins induced by oxidative and ethanol stress can be recognized as purple spots, and, finally, proteins induced by all three stimuli are displayed in green. (This figure also appears on page 51.)

XXIX

XXX

Color Plates

Color Plates

3 Fig. 3.6 Dynamics of protein synthesis profiles of growing and glucose-starved cells of B. subtilis. A Individual dual-channel 2-D patterns of protein synthesis and accumulation recorded during the different phases of the growth curve are assembled into a “life movie.” B Growth curve (optical density at 500 nm) and 35S-l-methionine incorporation (million cpm per 60 lg protein). C Patterns of selected examples representing different

branches of cellular physiology. Sample points correspond to the following growth phases depicted are shown in the growth curve: 1, 2, exponential growth; 3–7, glucose starvation; 8, 9, recovery of growth after readdition of glucose. The bar graphs on the left display normalized relative synthesis rates of the individual proteins at the different time points. (This figure also appears on page 52.)

XXXI

XXXII

Color Plates

Color Plates

3 Fig. 3.7 Proteomic signatures of B. subtilis of different physiological stress/starved conditions. Comparisons of the protein profile of both exponentially growing and stressed B. subtilis cells reveal signature-like changes that are specific to certain stress stimuli (e.g., induction of the catalase KatA by oxidative

stress). The individual sections of the 2-D gels display typical parts of the proteomic signatures of oxidative or heat stress, the stringent response or limitation of glucose or phosphate. (This figure also appears on page 54.)

XXXIII

XXXIV

Color Plates

Color Plates

XXXV

XXXVI

Color Plates

3 Fig. 3.9 Assignment of proteins identified in S. aureus COL to biochemical pathways and other essential cellular components. Proteins that have not been identified in the 2-D gel images thus far are colored green. A Purine and pyrimidine metabolism, B glycolysis, pentose phosphate shunt, and citric acid

Fig. 3.10 Protein pattern of cells of S. aureus COL grown under aerobic (green) and anaerobic (red) conditions in synthetic medium. Cells were pulse-labeled (5 min) with 35S-lmethionine under aerobic conditions and 30 min after imposition of anaerobic growth conditions. Radioactively labeled proteins were visualized by the phosphoimaging

cycle, C oxidative stress resistance, D ATPase components, E proteolysis, F components of the translational machinery, G amino acid metabolism, H fatty acid synthesis and metabolism of cell wall components, and I biotin metabolism. (This figure also appears on pages 58/59.)

technique. Proteins whose synthesis was increased after shifting to anaerobic growth conditions are shown in red (e.g., enzymes involved in glycolysis and fermentation) and those whose synthesis was decreased are shown in green (e.g., enzymes involved in the TCA cycle). (This figure also appears on page 60.)

Color Plates

Fig. 3.12 The extracellular proteome of S. aureus RN6390 at low (green image) and high cell densities (red image). 250 lg proteins of the supernatant of cells grown in TSB medium at an optical density OD540 = 1 and 5 were separated by 2-D gels and stained with Sypro

Ruby. Extracellular proteins present in increased amounts at high cell densities are labeled red and those proteins only present at low cell densities are labeled green. (This figure also appears on page 63.)

XXXVII

XXXVIII

Color Plates

Fig. 3.13 A Growth of S. aureus RN6390 in TSB-Medium. The sampling is indicated by an arrow and a letter in the respective growth curve. B, C Virulence factors of S. aureus RN6390 whose amount depends on the growth phase. The amount of the respective proteins at OD540 = 1 (green) of cells grown in TSB medium was compared with the amount of these proteins at higher optical densities

(red). B Virulence factors only present at low cell densities. C Virulence factors only present at high cell densities. In addition, the amount of the respective proteins in the wild type strain was compared to the amount of these proteins in various regulatory mutants (agr, sarA, sigB) known to be impaired in virulence. Proteins were stained with Sypro Ruby. (This figure also appears on page 64.)

Fig. 5.2 Comparison of the genetic organization of GEI IINissle 1917 and the pheV-associated PAI of E. coli strain CFT073 (A) demonstrating the loss of the a-hemolysin-encoding determinant (hly) and of large parts of the P fimbrial operon (pap) in strain Nissle 1917. The DNA sequences comparison of the two islands is

visualized using Artemis and ACT [104]. " Identical regions of the two islands are highlighted in red. Functionally related DNA regions are indicated by different colors as shown. (B) Enlarged section of GEI IINissle1917 comprising the partially deleted P fimbrial determinant. (This figure also appears on page 98.)

Color Plates

XXXIX

XL

Color Plates

Fig. 8.1 GAS Genome alignments. The seven available GAS genomes were compared using the software package Mauve, a method that identifies conserved genomic regions, rearrangements and inversions in conserved regions, and the exact sequence breakpoints of such rearrangements across multiple genomes [11]. Corresponding regions share the

same color and are connected by lines. Collinear regions are positioned above the central axis (relative to M1) while inverted regions are drawn below the axis. Light gray regions were too divergent in at least one genome to be meaningfully aligned. (This figure also appears on page 153.)

Fig. 8.2 GAS vs S. uberis alignment. The corresponding regions in the M1 and S. uberis genomes identified by Mauve analysis are shown. Several large regions of collinearity are evident between the two species. Corresponding regions share the same color and are connected by lines. (This figure also appears on page 157.)

Color Plates

Fig. 9.1 Structure of type I, II, III, and IV staphylococcal cassette chromosome mec (SCCmec) elements (adapted from [70]). SCCmec is characterized by two common gene complexes, the ccr (cassette chromosome recombinase) gene complex (gray) and the mec gene complex (blue). Integrated IS431, Tn554, and plasmid pT181 are also indicated. Gene symbols: cad, cadmium

resistance; ccr, cassette chromosome recombinase; ermA, erythromycin resistance; hsdR, host specificity determinant; kdp, two-component system regulating potassium transport; mec, penicillin binding protein 2a, methicillin resistance; mer, mercury resistance; tetK, tetracycline resistance. (This figure also appears on page 191.)

XLI

XLII

Color Plates

Fig. 9.2 Pairwise comparison of the Staphylococcus epidermidis ATCC 12228 and Staphylococcus epidermidis RP62A genomes displayed using the Artemis Comparison Tool (ACT; http:// www.sanger.ac.uk/Software/ACT). The red bars represent homologous matches between the genomes and the blue bars indicate a homologous, but inverted chromosomal region.

Genomic islands, IS elements, phages, SCC cassettes, and genes involved in adherence and biofilm formation are marked as colored boxes. Gene symbols: aae, autolysin/adhesin; aap, accumulation-associated protein; atlE, autolysin; bhp, Bap homologous protein; ica, intercellular adhesin. (This figure also appears on page 196.)

Color Plates

Fig. 11.1 Multiple alignment of the four neisserial genomes so far fully sequenced using the program Mauve [100]. Locally collinear blocks of DNA are depicted in the same colors and connected via correspondingly colored lines. Equally colored blocks on different sides of black lines corresponding to

XLIII

the respective genome sequence indicate chromosomal inversions. The chromosomal locations of potential prophages [30], putative genetic islands [13], and some putative virulence genes are given for strain MC58. (This figure also appears on page 235.)

XLIV

Color Plates

Fig. 11.3 Comparison of the cps locus of pathogenic and apathogenic Neisseria species. Regions with identical functions are depicted in the same colors. The following sequence data were used to compare the cps loci: NmB B1940: assembled from the GenBank sequences L09188, L09189, M57677, M95053, and Z13995; NmB MC58: AE002098; NmA Z2491: AL157959; NmC FAM18: NC_003221, homologous genes according to BLASTX [75] searches against the annotated neisserial genomes; Ng FA1090: AE004969; Nl DSM4691 and

Ns LMG5290: H. Claus and U. Vogel, unpublished data. The gene assignment of the meningococcal sequences was done following the annotation of the MC58 genome. Ng, N. gonorrhoeae; Nl, N. lactamica; Nm, N. meningitidis; Ns, N. sicca; DSM, German type culture collection (Deutsche Sammlung von Mikroorganismen), Braunschweig, Germany; LMG, Belgian collection of microorganisms (Laboratorium voor Microbiologie, University of Gent), Gent, Belgium. (This figure also appears on page 243.)

Color Plates

Fig. 15.2 Alignment of the complete genome sequences of L. pneumophila strains Paris, Lens, and Philadelphia 1, using the ACT (Artemis comparison tool) software (http:// www.sanger.ac.uk/Software/ACT/). Color code: Red blocks represent homologous sequences; blue blocks, homologous

sequences, but inverted. 1, Disruption of the synteny by a 260-kbp inversion in strain Lens; 2, cluster encoding several efflux pumps missing in strain Lens; 3, lvh type IV secretion system; 4, 65-kbp pathogenicity island of strain Philadelphia 1. (This figure also appears on page 329)

XLV

XLVI

Color Plates

Fig. 16.1 Circular genome maps of L. monocytogenes EGD-e and L. innocua CLIP 11262, showing the position and orientation of genes. From the outside: Circles 1 and 2, L. innocua and L. monocytogenes genes on the + and – strands, respectively. Color coding: green, L. innocua genes; red, L. monocytogenes genes; black, genes specific to L. monocytogenes or L. innocua, respectively; orange, rRNA

operons; purple, prophages. Circle 3, G+C content of L. monocytogenes (< 32.5% G+C in light yellow, 32.5–43.5% G+C in yellow, and > 43.5% G+C in dark yellow). The scale in megabasepairs is indicated on the outside of the genome circles, with the origin of replication at position 0. Reprinted with permission from Ref. [2]. (This figure also appears on page 347.)

Color Plates

Fig. 16.2 The virulence gene cluster locus in the six species of the genus Listeria. The cluster is flanked by the housekeeping genes (black boxes) prs, vclB (lmo0209), and ldh in all six species. Genes controlled by PrfA are shown as boxes with black arrowheads. vclA (lmo0208) is present in all species except L. grayi. vclP is present in L. welshimeri, L. seeligeri, and L. ivanovii. vclZ (lmo0207) is present

in L. monocytogenes and L. innocua. vclY and vclX are inverted in L. seeligeri. Species-specific genes (medium gray) not under PrfA control include vclJ, vclF1, vclG1, vclG2, vclF2 of L. grayi, and vclC of L. seeligeri. Homologous genes are represented by boxes of the same color. Reprinted with permission from Refs. [73, 75]. (This figure also appears on page 350.)

XLVII

XLVIII

Color Plates

Fig. 19.2 Domain architectures for select surface proteins from the pathogenic protozoa, demonstrating the almost exclusive nonoverlap in the catalogs of extracellular proteins. Red stars indicate proteins composed of domain(s) invented in a lineage- or cladespecific fashion (e.g., a molecule found in

Plasmodium but not in the apicomplexan Cryptosporidium or in kinetoplastids). The remaining domains originated either via vertical inheritance (followed by gene loss in widespread lineages) or by lateral transfer from a bacterial or metazoan source. (This figure also appears on page 425.)

1

I Methods

3

1 Bioinformatics: Data Mining Among Genome Sequences Susanne Kneitz and Thomas Dandekar

1.1 Systematic Genome Analysis of Pathogens as a Basis for Pharmacogenomic Strategies

Rather than identifying individual resistance events and fighting them using a classical pharmacological strategy, systematic genome analysis of the pathogen presents a more ambitious but very powerful pharmacogenomic strategy by which to identify new treatments against the pathogen. The basis for the systematic analysis of genomes is solid experimental data, in particular relating to pathogenicity factors. Current methods allow large-scale collection of data on a genomic scale involving the complete genome sequence (e.g., by DNA capillary sequencers), an overview of the transcriptome (e.g., by EST, expressed sequence tag, sequencing or SAGE, serial analysis of gene expression data), and insights into the proteome (e.g., large-scale two-dimensional gel analysis coupled with mass spectroscopy). The available genome sequences of various pathogens provide a wide range of novel targets for drug design which can be identified by means of microarray analysis. For example, a recent paper describes the application of functional genomics tools such as microarrays and proteomics for development of new drugs that are not only active against drug-resistant Mycobacterium tuberculosis but also can shorten the course of M. tuberculosis therapy [1]. On a note of caution, the quality of large-scale data is often not as good as for single observations. Examples are uncertainties in contig assembly, repetitive sequences, and gene prediction (DNA data), representational bias, and missing out of low-copy messengers (transcriptome data). Proteome data have particular problems, e.g., membrane proteins and highly charged proteins are not well resolved. Multiple gel spots may indicate modifications of the same protein. In addition, certain protein modifications (glycosylation, phosphorylation etc.) are not easily detected. Bioinformatic tools are nevertheless the key to systematic analysis of these data with the aim of fighting development of resistance in the best way possible and of devising strategies against the pathogen on a rational, pharmacogenomic basis. They exploit a number of approaches such as the analysis of gene expression, pathway modeling, and detailed, database-guided biochemical analysis of the var-

4

1 Bioinformatics: Data Mining Among Genome Sequences

ious steps leading to survival of the pathogen. Sequence information about involved genes, ESTs, or proteins is essential for starting the genome analysis. Iterative sequence alignments [2] offer a much improved way of detecting sequence similarities by aligning iteratively all newly identified related candidate sequences and using position-specific scoring matrices to search for families of related proteins in different genomes. Specific motifs in proteins are recognized by tools such as PROSITE [3]. Detailed analysis of gene expression by software such as Bioconductor and R (software package) [4] allows prediction about activated or repressed genes, gene clusters, groups, and pathways in pathogens. There follows a detailed analysis of all involved pathogen genes (and, if data are available, interacting host genes), enzymes, and pathways by the application of methods such as pathway alignment and elementary mode analysis [5].

1.2 Direct Sequence Annotation Tools for Functional Genomics

With the advent of large-scale sequencing techniques, many sequenced pathogenic genomes are now available. This allows comparative genomics to be carried out on a wide scale, but also requires also the representation of data on individual genomes in a suitable format. Analyzing novel sequences from a large sequencing effort or a complete genome involves a number of different tasks. The first is the identification of transcripts (including splicing in eukaryotic genomes) and the determination of reading frames using programs such as Genescan, Orpheus, Genepredict and Prophet [5]. The next step is the analysis of mRNA, including identification of regulatory elements. A helpful tool we have developed for this is the RNA analyzer [6], which identifies individual regulatory elements in RNA sequences using a decision tree and individual subprograms that execute sequence and secondary structure searches for various elements. It exploits fast folding routines from the Vienna package [7]. Similarly, the program UTRscan [8] identifies a number of further regulatory elements in the mRNA. Pathogenomics should also profit from insights in new regulatory features such as riboswitches, metabolite-mediated translational repression by RNA structures forming aptamers [9]. Detection software is available for these structures as well [10].

1.3 Identification of Protein Function

The next task is to determine the complete repertoire of protein functions. Note that one should first establish all molecular functions of a protein. This means taking stock of all the different domains contained in an individual protein using tools such SMART (annotation of functional domains) [11] and COG (clusters of orthologous genes, i.e., genes with similar function) [12]. As another example of a

1.4 Obtaining Protein Information from a Domain Server

tool, the AnDOM (“annotation of domains”) server allows the analysis of structural domains in given protein sequences in order to identify parts of the sequence that are homologous to a known three-dimensional structure [13]. It utilizes position-specific scoring matrices (PSSMs) made from a large alignment of homologous sequences to individual structural domains (the parts of a protein folding as independent folding units with a specific function such as catalysis or cofactor binding) of known experimental structure (using PDB [protein data bank] as a reference database). Comparing of a query sequence to the stored PSSMs allows rapid identification of any structural domains homologous to a known structure domain according to SCOP, structural classification of proteins [14] (http://scop.mrc-lmb.cam.ac.uk/scop/).

1.4 Obtaining Protein Information from a Domain Server

As an example of information thus readily available from a protein sequence, consider the output of AnDOM [13]. It shows the regions homologous to known three-dimensional structural domains which are hidden in a given sequence. For instance, plasmoredoxin from Plasmodium falciparum [15] can be annotated using its sequence by employing the AnDom server. Genbank entry AAF87222 contains the plasmoredoxin [thioredoxin-like redox-active protein (Plasmodium falciparum)] sequence. The server is located at http://andom.bioapps.biozentrum.uniwuerzburg.de/cgi-bin/start_impala.cgi. A green bar shows that the main central part of the structure is homologous to the domain in crystal structure 1o8x from the SCOP family c.47.1.10., which is the tryparedoxin I from Crithidia fasciculata. Its residues 22–141 structurally match residues 39–157 of the plasmoredoxin sequence, further confirming its predicted and experimentally confirmed role in redox metabolism [15]. Thus, color coding reflects the predicted individual three-dimensional structures of the domains according to homology: green represents those parts of the structure where there is a mixture of a-helix part plus a b-strand part situated in this domain (SCOP class 3), blue represents folding class a/b (SCOP class 4, helix follows strand follows helix and so on, often in catalytic domains), and violet would correspond to a multidomain protein (SCOP class 5). The positions of the main structural domains are shown graphically at the top; next, the output displays the similarities to all protein structural domains found, including the names of the domains, significance, and references to sequence comparisons. Finally, the detailed domain alignments are given by the server according to their homology to known structures. In this way, the first structural clues to potential drug targets of a pathogen become immediately available. Several other structure predictors are also available [16]. Protein interactions are important because they better reveal protein partners within the infectious organism or the host and with each other. Conservation of gene order [17] and gene fusion events as well as the combined presence or

5

6

1 Bioinformatics: Data Mining Among Genome Sequences

absence of clusters of genes in prokaryotes are good predictors in prokaryotes of direct interactions at the protein level of the encoded gene products. Gene context conservation is easily detected with the STRING software [18]. A huge database of more then 100 genomes is used in STRING to determine whether there is conservation of gene neighborhood or a gene fusion event in genomes related to the organism and the protein gene used as the query. For predictions of protein interactions in eukaryotic pathogens such as trypanosomes or plasmodia, gene context is less reliable; however, recent versions of the STRING software include predictions based on text mining, two hybrid assays, coexpression, and homology to allow helpful predictions about physical association, or at least functional association, of proteins in a protein cluster in eukaryotes as well. Functional association allows predictions about substrate specificity (e.g., a tryptophan pathway metabolite) by looking at the cluster with which the gene of interest is associated (in this example this would be tryptophan-metabolizing enzymes). In this way even proteins of unknown function can be put into a functional context and their substrate predicted, if the exact biochemical function of at least one protein of the cluster predicted by STRING to be associated is known. These predictions are complementary to sequence analysis methods where homology allows prediction of protein function, but homology is less clear in relation to the exact substrate specificity. Recognizing all the hidden features in a pathogenic genome is a continuous process, as new software, input from additional experimental data, and the exponential growth of databases allow new insights; for example, five years after the original annotation of a pathogenic genome as exemplified by Mycoplasma pneumoniae, about a third of its annotation can be substantially be improved by these new data and techniques [5, 19]. Automated data and software platforms such as GenDB [20], Pedant [21], and Magpie [22] for annotation together with integrated databases [23] increase annotation speed and allow systematic comparison of genomes. The combination of these and similar tools allows rapid annotation of genomes and rapid identification of pathogen-specific features such as host interaction factors and toxins.

1.5 Pathway Analysis

At the next level, the pathway perspective, individual enzyme activities encoding protein reading frames are assembled and combined into specific pathways. A concise and integrated way of comparing pathway results from different genomes is comparative pathway alignment. Species-specific differences in terms of the presence or absence of individual enzymes are identified by insertions or gaps in this comparative table [5]. In general, potential target structures for medical intervention (e.g., for drug design of an inhibitor against a biosynthetic enzyme in the pathogen of interest) should not occur in the human patient (or at least in the human the target homology protein should have clear structural and affinity differences). The occurrence of the target in several pathogens will make the target

1.6 Network Analysis

structure more interesting from an application point of view (e.g. cell wall synthesis enzymes in gram-positive bacteria – with a potential to lead to alternative drugs to penicillin-like substances).

1.6 Network Analysis

Finally, the different pathways are parts of cellular networks. Here the challenge is to define individual pathways in such a network in a clear and mathematical way. This is necessary both for a concise description of the network capabilities and for prediction of the effects of inhibition of a particular enzyme: in many instances alternative routes through the network still allow most of the metabolites to be produced. An enzyme inhibitor which can be compensated by the metabolic network is in most cases compatible with life in prokaryotic cells. Genes which encode such nonessential enzymes are themselves nonessential and lead to a nonlethal phenotype. The nonessential enzymes for a given metabolic network can easily be identified by calculation of the elementary modes. These are nondecomposable (“elementary”) sets of enzymes. Each set can sustain a steady state for all internal metabolites used by this set of enzymes as substrates and products. The external metabolites used by each enzyme set need not fulfil this equilibrium condition. Using this mathematical requirement, the software METATOOL [24] calculates all elementary modes for a given metabolic network. Testing conditions are included in this program so that the stable flux modes calculated are elementary, i.e., nondecomposable in subsets which fulfil the steady state condition for internal metabolites. This helps to answer the above questions: any observed network flux state is always a linear combination of the elementary modes. Inhibition of a given enzyme will thus inhibit exactly those elementary modes in which this enzyme occurs [25]. Recent research from our laboratory shows that this perspective of enzymes directing metabolite flows can also be turned around: metabolites also shape the way in which pathways evolve. This is strikingly shown by the observation that metabolite networks tend to be driven in structure and enzyme substrate specificities by the most frequently represented metabolites of this network. This helps and partly explains the observed widespread recruitment of enzymes to new pathways, allowing pathogens to rapidly change and adapt to hostile environments including xenobiotics [26] and antibiotics. In this way connections and regulatory networks may be sketched, in particular induction of antibiotic resistance and pathways involved in signal transmission and activation of pathogenicity factors. For instance, subtle genome variations within the structure of a 150-kbp pathogenicity island change Enterococcus strains in terms of virulence and most known auxiliary traits that enhance virulence of the organism [27]. Genome-based approaches also identify new targets associated with disease which are not apparent in the commensal behavior of harmless enterococci. Methicillin-resistant Staphylococcus aureus (MRSA) can be identified by typing the Spa (“Staphylococcus aureus encoded protein A”) gene. Efficient soft-

7

8

1 Bioinformatics: Data Mining Among Genome Sequences

ware for this is available, allowing rapid determination of the dynamics of resistant clones in a hospital setting [28]. After networks have been identified, specific proteins can be singled out as the most promising targets for drug design. A number of bioinformatic strategies such as homology modeling, pharmacophores, and virtual ligand screening are at our disposal today. New work targeted against resistance development also exploits new approaches to differentiating between target structures and related structures in the human host, e.g., rapid identification of structural domains (e.g., using ref. [13]) on a genome scale with subsequent targeting of parasite specific protein domains.

1.7 Adaptation in Time and to Stimuli

Pathogens survive in a harsh environment. There is a constant race between host defense and parasite escape. In addition, their life cycle involves often abrupt changes of habitat (e.g. bloodborne, airborne, waterborne, intracellular, in a macrophage, in the gut). Quick adaptation is critical, and rapid changes can be monitored using gene expression analysis. Microarray analysis has now become the method of choice for rapid capture of genome-wide adaptation (including changes within minutes). The essentials are summarized below. In this technology the probe, consisting of oligonucleotides, either cDNA or DNA sequences are spotted on a glass support, or oligonucleotides are synthesized directly on the array using photolithographic masks. RNA samples are fluorescently labeled and hybridized to the arrayed probes. The signal intensities of the hybridized spots are used to estimate relative RNA concentration. Subsequent longer time scales are then reflected in the proteome. 1.7.1 Experimental Design for Microarray Analysis

The large-scale screening of thousands of genes at the same time by microarrays would be impossible without bioinformatics. Data analysis is becoming more and more challenging, with increasing complexity of experimental design and biological systems. Large-scale screening of genes requires careful experimental design, as there are some facts for which bioinformatics cannot compensate. The final goal needs to be taken into account. For example, a diagnostic tool monitoring a severe infection or even sepsis will certainly need much higher confidence levels and thus higher sample numbers than a random screen for possibly interesting genes. The exploration of a single pathway on a DNA chip (e.g., pentose phosphate cycle in the human pathogen Listeria monocytogenes when switching from extracellular to intracellular survival) with a limited number of genes may make global normalization impossible; a sufficient number of independent sequences from house keeping genes and external controls need to be spotted on the array

1.7 Adaptation in Time and to Stimuli

for signal intensities to be adjusted. In dual labeling experiments, if more than two samples are to be compared, it is useful to run each sample against some internal standard so as to be able to compare relative signal intensities. This internal standard might be some control RNA closely related to the test samples or a commercially available reference RNA. To introduce as little variation as possible, a sufficient amount of reference RNA for all arrays planned in one study should be prepared, pooled, and aliquoted at the beginning of a study. To avoid labeling specific false positive signals, it is essential to include dye switch experiments if two probes are being compared on the same slide. Pooling of several probes may cancel out real differences and at the same time increase the false positive rate due to dominating outliers. Finally, different hybridization protocols will give significantly differing results, and therefore protocols, including possible RNA amplification, must not be changed during the experiment. 1.7.2 Data Analysis

The data analysis and experimental design need to be adapted to each other. Since careful analysis of microarray data yields an enormous amount of information about rapid pathogen adaptations to environmental changes, we will summarize important steps in this analysis. This is still a constantly evolving field and is being continually improved; furthermore, no single method that will serve all purposes can be described. Figure 1.1 gives a brief outline of a standard scheme for the analysis of your data. Image acquisition is usually by a camera or a scanner and a frame grabber (ADC, analog to digital converter). Although images are regarded as raw data, different acquisition equipment and settings can have a significant effect on the data that are produced. Several methods are available for image quantification. Widely used freely available tools are Spotfinder by TIGR (http://www.tigr.org/ software/tm4/spotfinder. html) and ScanAlyze from Stanford (http://www. microarrays.org/software.html). Note that TIGR (The Institute of Genome Research) also offers a valuable and constantly updated resource for microbial genomes, including almost all pathogen genomes sequenced to date if made publicly available. When a system-integrated database is used, signal intensities can be automatically assigned to the corresponding gene/EST. If no database is available, this must be done manually, e.g., using the MSAccess database, which is available on most computers. Usually spot location can be used as a link between data and description tables. After quantification, for the comparison of arrays, it is necessary to normalize signal intensities in order to reduce system variations as far as possible. Most normalization methods are based on the assumption that the data are normally distributed, so it is necessary to log-transform data prior to normalization. Widely used normalization methods are variance stabilization (vsn) [29] or locally weighted regression scatter plot smoothing (lowess) [30]. Different normalization methods and most of the following data analysis procedures are implemented in a

9

10

1 Bioinformatics: Data Mining Among Genome Sequences

Fig. 1.1 Overview of microarray data analysis. Typical steps in gene array analysis are shown. Normalization, gene reduction, clustering, and detailed database analysis are necessary before a well-founded biological interpretation of the data is possible. vsn, variance stabilization; lowess, locally weighted regression scatter plot smoothing.

1.7 Adaptation in Time and to Stimuli

software package called bioconductor (http://www.bioconductor.org/, [31]) built on the R environment for statistical computing (http://www.R-project.org/, [32]). After normalization, if there are only two samples to compare, abundancies may be compared by a simple fold change or statistical tests such as Student’s t-test. Yet, since there is usually only a small number of replicates there is a high risk of underestimating variance and thus obtaining artificially low p-values. Random permutation of data and comparing the amount of genes that exceed a certain threshold can help to estimate the false positive rate. When there are more than two experimental conditions, the signals resulting from a gene in the different test samples should be compared using relative signal intensities. Cluster analysis is then used to group the genes according to their behavior in the experiments. Some cluster algorithms lose power due to noisy or nonsignificant signals. Since genes which do not change during an experiment will not provide much information for differentiation, genes having a higher variance can be selected or – if the number of classes of genes is known – p-values can be used for gene selection. Unsupervised learning is used if classes or their labels are not known a priori. Easy-to-use array data analysis programs for class discovery on a Windows platform are provided by, for instance, Stanford and TIGR. Unsupervised cluster algorithms like hierarchical clustering, SOMs [33], K-means, or PCA, principal component analysis [34] are implemented in these tools. All of these algorithms are also implemented in bioconductors. The most commonly used cluster methods in microarray data analysis are hierarchical clustering and K-mean clustering. Hierarchical clustering and the graphic representation of the trees gives a good overview of distinct groups within a dataset, but the interpretation of larger groups may be complicated. K-mean clustering requires the number of clusters (k) to be given. Each sample will be randomly assigned to one cluster. The distance between the center of each cluster (centroid) and each sample is calculated and the samples are assigned iteratively to the nearest centroid. Each sample is assigned to exactly one centroid. To identify the optimal choice for the value of k it is advisable to test several k. K-means clustering is a fast algorithm, suitable for large datasets, but can be affected by outliers. With a dataset with known class labels, supervised learning methods are superior to unsupervised methods. Based on a training set, supervised methods try to construct a classifier, which can be used to identify the nature of an unknown case. Different algorithms are used for the analysis of microarray data. Decision trees described by Leo Breiman [35] construct a classifier based on hierarchically arranged separation rules. They produce models that can be quite easily interpreted by humans. However, to obtain a reasonable result by decision trees, it is imperative to reduce the number of genes first. Neural networks are statistical models used for pattern recognition and classification. The network is composed of a large number of interconnected nodes. Neural networks learn by example data. They cannot be programmed to perform a specific task. The network finds out how to solve the problem by itself; its operation can be unpredictable and thus hard to interpret. Support vector machines

11

12

1 Bioinformatics: Data Mining Among Genome Sequences

(SVM) are classifiers which try to linearly separate data vectors of different classes by so-called hyperplanes in a high dimensional space [36] (http://www.csie.ntu. edu.tw/~cjlin/libsvm/). Like neural networks they do not generate gene lists; however, in most expression profiling studies the goal is identification of differentially expressed genes. To identify discriminative genes, subsequent statistical methods or classification using the cluster number as class label may be used. If two genes are found in the same cluster, this reveals similarities of expression profiles, but not the reasons for these similarities. Statistical significance does not always mean biological relevance. Coregulated genes may be controlled by the same regulatory pathway or share a specific promoter, but apparent coregulation may also be biological noise. Tools for direct sequence annotation and pathway analysis as discussed above can help to interpret and thus promote improved understanding of the biological function of the genes. Finally, it should be borne in mind that expression analysis can only measure RNA abundances, but gene expression and protein abundance are regulated at many steps. These steps include transcriptional control, alternative splicing, transport and location control, and mRNA degradation control. All these control mechanisms can cause discrepancies between gene expression on the mRNA level and the protein level in response to perturbations. In addition, alternative splicing can Tab. 1.1 Tools to analyze pathogen gene expression changes using microarrays.

The R-project for statistical computing

http://www.r-project.org/

Bioconductor home page

http://www.bioconductor.org/

Useful bioconductor packages Normalization

vsn, marray, limma

MA plot

limma, marray

Principal component analysis

stats

K-means clustering

stats

Hierarchical clustering

cluster

Self-organizing maps

SOM

Regression trees

rpart

Support vector machines

Vienna package e1071

Partitioning around medoids

pamr

Free Affymetrix annotation tool NetAffx

http://www.affymetrix.com/analysis/index.affx

Stanford tools

http://www.microarrays.org/software.html

TIGR tools

http://www.tigr.org/software/

1.8 Pathogen-Specific Challenges

produce different forms of a protein from the same gene, and the direct transfer of results may be ambiguous or problematic. To cope with all these different technical challenges in analysis of gene expression arrays, useful tools for array analysis are summarized in Table 1.1.

1.8 Pathogen-Specific Challenges

Several software tools are geared to specific pathogen challenges. Thus, prediction of antigenicity for a given protein sequence can be calculated. This yields information about whether a protein stretch from a pathogen protein will create a strong immune response in the host. Several empirical scales and algorithms to calculate this are available; well known ones are the Kyte and Doolittle and Hopps and Woods scales [37, 38]. A server for such predictions can be found at http:// occawlonline.pearsoned.com/bookbind/pubbooks/bc_mcampbell_genomics_1/ medialib/activities/kd/kyte-doolittle.htm These predictions do not really take account of structural information, so combination with the AnDOM server (or similar tools) is useful to identify the best patches to raise a strong immune response from a peptide in pathogen research efforts. Another modern tool, the BcePred server (http://www.imtech.res.in/ raghava/bcepred/) [39] uses physicochemical properties to predict continuous B-cell epitopes. More sophisticated methods of homology modeling and refinement will be needed in addition for vaccine development [40]. Further important information about a pathogen can be deduced from the complete genome sequence in addition to its open reading frames. Thus, detection of pathogenicity islands is possible by genome comparisons to related nonpathogenic strains: specific structures in the genome containing clusters of genes responsible for pathogen interaction with the host, toxin production, and resistance, etc. These clusters occur often during the evolutionary change from a nonpathogenic to a pathogenic life style and often involve the acquisition of new genes transferred by a larger recombination event, e.g., from phage insertions into the genome. Such an event can often be detected by monitoring the nucleotide composition over the whole genome: insertion regions from phages or other major insertion events often show up by their AT and GC content, which is different from that of their flanking regions and the rest of the genome. Rapid identification of pathogen-specific features is possible by differential genome analysis comparing a range of related genomes to the pathogen genome and extensive sequence comparisons to identify orthologous gene functions. The above two methods – identification of pathogen islands and differential genome analysis – readily identify groups of virulence factors in the pathogen in question, e.g., for Haemophilus influenzae and Helicobacter pylori [41], or, as a clinically even more important example, in hemorrhagic and nonpathogenic Escherichia coli [42].

13

14

1 Bioinformatics: Data Mining Among Genome Sequences

A further approach uses clusters of orthologous genes (COGs) [12]. Several pathogenicity factors can directly be identified using sequence analysis and identifying their belonging to a cluster of orthologous genes which encodes a virulence factor. Other more advanced methods of sequence analysis add to this repertoire to search for pathogen-specific genes or protein functions (see also above), e.g., predicting pathogen-specific gene expression by looking at common transcription factor binding sites in the target genes using the TESS server (http://www.cbil. upenn.edu/tess/).

1.9 Pathogen Adaptation Potential

Another important question concerns pathogen adaptation potential. In particular, the redundancy in enzymatic pathways allows pathogens to escape antibiotic attack or inhibition of enzyme activities. To calculate metabolic capabilities in this respect, elementary mode analysis (see above) [5] is useful: the elementary modes are the basic metabolic paths in the metabolic network. Scrutinizing the list of calculated modes for a specific pathogen reveals which metabolites require which enzymes for their synthesis. Consideration of subnetworks makes this analysis easier [43]. In some cases the metabolic web of a pathogen allows the compounds required for growth to be produced by alternative routes. However, some key enzymes can be singled out which either are central for many paths or are unique in their capabilities and no detour around their blockage is possible. Similarly, the capabilities for metabolic adaptation to a wide range of environmental conditions can be estimated using elementary mode analysis. Another way to look at enzyme flexibility in pathogens is a more comprehensive evolutionary perspective. Here all our data [26, 44, 45] indicate that widespread recruitment to new pathways allows pathogens to cope within surprisingly few generations with new pathways to fight new man-made antibiotics.

1.10 The Fight Against Resistance

Good new approaches are available to prevent resistance. In particular, calculated enzyme flexibility is very seldom so high that a simultaneous attack on several pathways does not have a good chance of success [44, 45]. The evolutionary flexibility of pathogens can be estimated by considering genetic mechanisms for immune escape and new mutations available: Are there specific antigen-shifting mechanisms? What about recombination, resistance plasmids, genome instability, transposons? – to name just some of the more common mechanisms. Interestingly, this list shows that meticulous genome annotation (see above) is in fact

1.12 Annotation Platforms Suitable for Pathogenomics

quite important in determining and assessing pathogen flexibility in this genetic respect. Another important consideration takes account of the typical resistance mechanisms against antibiotics. Fortunately, similar mechanisms for this are used in many pathogens: resistance is often achieved by (a) inactivation of the attacking antibiotic, (b) modification of the antibiotic target within the bacterial cell, (c) increased transport of antibiotic out of the cell, or (d) prevention of antibiotic uptake. Combined strategies, fighting with both a standard antibiotic and resensitization by blocking such typical resistance mechanisms against antibiotics, are thus quite promising and a bioinformatic prediction about the capabilities in regard to these four mechanisms can be made for the pathogen in question.

1.11 Drug Design and Antibiotics

Drug design follows the basic principle that the target enzyme to be blocked by an antibiotic should not occur in the human patient, but should occur in all pathogens one desires to limit with the antibiotic. Furthermore, it must be essential for the growth or, even better, for the survival of these pathogenic organisms. A further improvement (see above) is the use of drug combinations and, in addition, an estimate as to the number of mutations required for the development of resistance. In particular for viral pathogens such as HIV, the load in the bloodstream is so high that single-point mutations are selected in very short times, with rapid antibiotic failure if a single mutation of this kind suffices for resistance [46].

1.12 Annotation Platforms Suitable for Pathogenomics

Another area of bioinformatics research is to achieve easy comparison of different direct and higher annotation tools for specific genomes and sequences in pathogenomics. Good, powerful genome annotation platforms are already available, such as the noncommercial GenDB [20] and MAGPIE [22], or the commercial ones from PEDANT [21] and BioScout (Lion Biosciences AG, Heidelberg). This should also be seen in the context of similar ongoing activities exploring the advantages of XML (extended markup language) in bioinformatics [47] in different laboratories: There are new XML schemes for bioinformatics [48], EMBL data [49], and a protein markup language [50]. An XML broker exists for integration of microarray data [51] and integrated systems for biological pathways [23]. Strong new integrated data platforms for proteomics [52], XML-based remote procedure calls [53], and an SQL-(sequence query language)based server for online integration of life science data [54] have recently become available. A number of useful integrated public and user-friendly tools for general genome and pathway analysis readily available on the world wide web are summa-

15

16

1 Bioinformatics: Data Mining Among Genome Sequences Tab. 1.2 Useful links and databases for genome and pathway analysis.

Links and general overview of databases Computational Molecular Biology at NIH

http://molbio.info.nih.gov/molbio/db.html

Polish Academy of Sciences

http://www.ibb.waw.pl/biodat/05-02.html

Sanger Institute

http://www.sanger.ac.uk/Info/Links/databases.shtml

Washington University, St. Louis

http://library.wustl.edu/subjects/life/genet. html

NCBI databases

http://www.ncbi.nlm.nih.gov/

Sequence annotation Sequence annotation for nucleotides, protein sequence, and whole genome sequence

GenBank

Gene annotation Gene-oriented clusters of transcript sequences

UniGene

Catalog of human genes and genetic disorders, with links to literature references, sequence records, maps, and related databases

OMIM (Online Mendelian Inheritance in Man)

Map, sequence, expression, structure, function, citation, homology information and related web sites for genes

Entrez Gene

Sanger Institute gene annotation Transcript/translation information, location, SNPs, orthologue prediction, disease matches, related web sites for genes

http://www.ensembl.org/

Stanford software and tools Unification tool which dynamically collects and compiles data from many scientific databases (batch searches possible)

http://genome-www5.stanford.edu/cgi-bin/ source/sourceSearch

KEGG pathway database

http://www.genome.jp/kegg/

SwissProt protein sequence database

http://us.expasy.org/sprot/

References

rized in Table 1.2 as a first primer for any further intended analyses. The wellknown National Center for Biotechnology Information (NCBI) is creating public databases and software tools for the dissemination of biomedical information. The databases are built from sequences submitted by individual laboratories and by data exchange with the international nucleotide sequence databases, and also from the other two major nucleotide databases, the European Molecular Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ). In addition to GenBank, further databases for annotation are supported and made available to the scientific community. A number of software programs, in particular different implementations [2] of BLAST and iterative Blast (Psi-BLAST), are available. Very useful also are the tutorials and the notes on the different parameters and qualifiers the programs accept. For example, the Entrez browser system offered here (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein&itool=toolbar) also retrieves, in addition to journal articles, protein and nucleotide sequences from a specific pathogen of interest.

1.13 Conclusions

We show here that many predictions about pathogens are possible on the basis of sequence and other data by applying bioinformatic tools, including analysis of gene expression data and software designed to reveal pathogen-specific features. However, bioinformatics provides only a valuable start as it rapidly identifies potential new targets (including subtle structural differences), new leads, or further cellular processes to consider. Subsequent further development and validation of leads and targets is done in close collaboration with experimental researchers. References 1 Zhang, Y., and L. M. Amzel. 2002.

Tuberculosis drug targets. Curr. Drug Targets. 3(2):131–154. 2 Altschul, S. F, T. L. Madden, and D. J. Lipman. 1997. Gapped BLAST and PSIBLAST, a new generation of protein database search programs. Nucleic Acids Res. 25(17):3389–3402. 3 Hulo, N., C. J. Sigrist, and A. Bairoch. 2004. Recent improvements to the PROSITE database. Nucleic Acids Res. 32 Database issue:D134–137. 4 Huber, W., and R. Gentleman. 2004. Matchprobes: a Bioconductor package for the sequence-matching of microar-

ray probe elements. Bioinformatics. 20(10):1651–1652. 5 Dandekar, T., and R. Sauerborn. 2002. Comparative genome analysis and pathway reconstruction. Pharmacogenomics. 3(2):245–256. 6 Bengert, B., and T. Dandekar. 2003. A software tool-box for analysis of regulatory RNA elements. Nucleic Acids Res. 31(13):3441–3445. 7 Hofacker, W., P. Fontana, and P. Schuster. 1994. Fast folding and comparison of RNA secondary structures (The Vienna RNA Package). Monatshefte fr Chemie (Chemical Monthly). 125:167– 188.

17

18

1 Bioinformatics: Data Mining Among Genome Sequences 8 Pesole, G., and S. Liuni. 1999. Internet

resources for the functional analysis of 5¢ and 3¢ untranslated regions of eukaryotic mRNA. Trends Genet. 15:378. 9 Barrick, J. E., K. A. Corbino, and R. R. Breaker. 2004, New RNA motifs suggest an expanded scope for riboswitches in bacterial genetic control. Proc. Natl. Acad. Sci. USA. 101(17):6421–6426. 10 Bengert, P., and T. Dandekar. 2004. Riboswitch finder – a tool for identification of riboswitch RNAs. Nucleic Acids Res. 31:3441–3445. 11 Letunic, I., R. R. Copley, and P. Bork. 2004. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 32 Database issue:D142–144. 12 Tatusov, R. L., N. D. Fedorova, and D. A. Natale. 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 4(1):41. 13 Schmidt S., P. Bork, and T. Dandekar. 2002. A versatile structural domain server using profile weight matrices. J. Chem. Inf. Comput. Sci. 42(2):405–407. 14 Lo Conte, L., B. Alley, and C. Chothia. 2000. SCOP: A structural classification of proteins database. Nucleic Acids Res. 28(1):257–259. 15 Becker, K., S. M. Kanzok, and S. Rahlfs. 2003. Plasmoredoxin, a novel redoxactive protein unique for malaria parasites. Eur. J. Biochem. 270:1057–1064 16 Lesk, A. M. 2002. Introduction to bioinformatics. Oxford University Press, Oxford 17 Dandekar, T., B. Snel, and P. Bork. 1998. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23(9):324– 328. 18 Von Mering, C., M. Huynen, and B. Snel. 2003. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31(1):258–261. 19 Dandekar, T., M. A. Huynen, and P. Bork. 2000. Re-annotating the Mycoplasma pneumoniae genome sequence: adding value, function and reading frames. Nucleic Acids Res. 28(17):3278– 3288. 20 Meyer, F., A. Goesmann, and A. Puhler. 2003. GenDB–an open source genome

21

22

23

24

25

26

27

28

29

30

31

annotation system for prokaryote genomes. Nucleic Acids Res. 31(8):2187– 2195. Frishman, D., M. Mokrejs, and H. W. Mewes. 2003. The PEDANT genome database. Nucleic Acids Res. 31(1):207– 211. Gaasterland, T., and C. W. Sensen. 1996. MAGPIE: automated genome interpretation. Trends Genet. 12(2):76–78. Krishnamurthy, L., J. Nadeau, and W. Xu. 2003. Pathways database system: an integrated system for biological pathways. Bioinformatics. 19(8):930–937. Pfeiffer, T., I. Sanchez-Valdenebro, and S. Schuster. 1999. METATOOL: For studying metabolic networks. Bioinformatics. 15(3):251–257. Schuster, S., D. Fell and T. Dandekar. 2000. A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nature Biotechnol. 18(3):326–332. Schmidt, S., S. Sunyaev, and T. Dandekar. 2003. Metabolites: a helping hand for pathway evolution? Trends in Biochem. Sci. 28(6):336–341. Shankar, N., A. S. Baghdayan, and M. S. Gilmore. 2002. Modulation of virulence within a pathogenicity island in vancomycin-resistant Enterococcus faecalis. Nature. 417(6890):746–750. Harmsen, D., H. Claus, and U. Vogel. 2003. Typing of methicillin-resistant Staphylococcus aureus in a university hospital setting by using novel software for Spa repeat determination and database management. J Clin. Microbiol. 41(12):5442–5448. Huber, W., A. von Heydebreck, and M. Vingron. 2002. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 18 Suppl 1:S96–S104. Cleveland, W. S. 1979. Robust locally weighted regression and smoothing scatterplots. J. Am. Statist. Assoc. 74:829–836. Gentleman, R. C., V. J. Carey, and J. Zhang. 2004. Bioconductor: open software development for computational biology and bioinformatics. Genome

References Biol. 5(10):R10. http://genomebiology. com/2004/5/10/R80. 32 Ihaka, R., and R. Gentleman. 1996. R: A language for data analysis and graphics. J. Comput. Graph. Statist. 5(3):299–314. 33 Tamayo, P., D. Slonim, and T. R. Golub. 1999. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. U S A. 16;96(6):2907–2912. 34 Eisen, M. B., P. T. Spellman, and D. Botstein. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U S A. 8;95(25):14863–14868. 35 Breiman, L., J. Friedman, and C. J. Stone. 1984. Classification and regression trees. Wadsworth, Belmont, Calif. 36 Chang, C. C., and C. J. Lin. LIBSVM – a library for support vector machines, http://www.csie.ntu.edu.tw/~cjlin/ libsvm/ 37 Kyte, J., and Doolittle, R. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157:105–132. 38 Womble DD. 2000. GCG: The Wisconsin Package of sequence analysis programs. Methods Mol. Biol. 132:3-22. 39 Saha, S., and G. P. S. Raghava. 2004. BcePred: prediction of continuous B-cell epitopes in antigenic sequences using physico-chemical properties. In: International Conference on Artificial Immune Systems 2004. G. Nicosia and J. Timis, eds. Springer ICARIS, LNCS 3239:197–204 40 Sung, M. H., and R. Simon. 2004. Candidate epitope identification using peptide property models: application to cancer immunotherapy. Methods. Dec 34(4):460–467. 41 Huynen, M., T. Dandekar, and P. Bork. 1998. Differential genome analysis applied to the species-specific features of Helicobacter pylori. FEBS Lett. 426(1):1–5. 42 Schuster, S., T. Pfeiffer, F. Moldenhauer, I. Koch and T. Dandekar (2002). Exploring the pathway structure of metabolism: decomposition into subnetworks and application to Mycoplasma pneumoniae. Bioinformatics. 18:351–361.

43 Perna, N. T., G. Plunkett III, and F. R.

Blattner. 2001. Genome sequence of enterohemorrhagic Escherichia coli O157:H7. Nature. 409 (6819):529–533. 44 Ziebuhr, W., K. Xiao, and T. Dandekar. 2004. Pharmacogenomic strategies against resistance development in microbial infections. Pharmacogenomics. 5:361–379. 45 Dandekar, T. and S. Schmidt. 2004. Metabolites and pathway flexibility. In Silico Biol. 5: 1–13 46 Iwasa, Y., F. Michor and M.A. Nowak. 2003. Evolutionary dynamics of escape from biomedical intervention. Proc. R. Soc. Lond. B Biol. Sci. 270(1533):2573– 2578. 47 Achard, F., G. Vaysseix and E. Barillot. 2001. XML, bioinformatics and data integration. Bioinformatics. 17(2):115– 125. 48 Bruhn, R. E., and P. J. Burton. 2003. Designing XML schemas for bioinformatics. Biotechniques. 34(6):1200–1206. 49 Wang, L., J. J. Riethoven and A. Robinson. 2002. XEMBL: distributing EMBL data in XML format. Bioinformatics. 18(8):1147–1148. 50 Hanisch, D., R. Zimmer, and T. Lengauer. 2002. ProML – the protein markup language for specification of protein sequences, structures and families. In Silico Biol. 2:313–324. 51 Tjandra, D., S. Wong, and L. Esserman. 2003. An XML message broker framework for exchange and integration of microarray data. Bioinformatics. 19(14):1844–1845. 52 Taylor, F., N. W. Paton, and S. G. Oliver. 2003. A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nature Biotechnol. 21(3):247–254 53 Riva, A., and I. S. Kohane. 2002. Accessing genomic data through XML-based remote procedure calls. Proceedings of the 2002 AMIA Annual Symposium, pp 662–666. 54 Freier, R., M. Hofestdt, and A. Stephanik. BioDataServer: 2002. An SQL-based service for the online integration of life science data. In Silico Biol. 2:37–57.

19

21

2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity Ben Sidders and Neil Stoker

2.1 Introduction

In the past few years the field of transcriptomics has grown rapidly. This chapter explores the development of the biology and technologies that gave rise to this new field, and its importance. We will focus in particular on the microarray and the issues important to its use and correct interpretation. Using published examples we explore the many varied uses to which the microarray has been put. The examples show how the data being generated are increasing our understanding of pathogenic bacteria and the hosts that they infect. We hope that this chapter will act primarily as an introduction to the complex field of transcriptomics. Directions to relevant reviews are provided wherever we feel they will be of benefit to the reader.

2.2 Development of Transcriptomics 2.2.1 From Genomics to Functional Genomics

The data from genome sequencing projects has led to a wealth of information about many organisms. For example, there are now approaching 200 complete microbial genome sequences, with an additional 500 or so microbial projects ongoing. In particular, most of the major bacterial pathogens have had their genomes sequenced. For more details see the websites of TIGR [1], NCBI [2], the GOLD database [3], or other chapters within this volume. The information contained within these genomic sequences can tell us large amounts about an organism’s capabilities so long as we are able to identify and annotate the majority of the genes within the genome [4, 5]. However, the properties (genes) that make pathogens unique are often organism- or species-specific. Therefore, unless functional evidence exists, the genetic sequence alone does not

22

2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity

fully unravel the molecular mechanisms by which a microbe causes disease. In order to understand what physiological, metabolic, and pathological mechanisms a particular pathogen employs within its host, we have to know how the genome is used to produce and regulate the proteome of the cell. To fulfil this need, the field of functional genomics emerged and is continuing to evolve. Functional genomics includes technologies that allow us to study the transcriptome and the regulation of transcription (the production of messenger RNA). This is important in a biological context because transcription is the first regulated step in protein production. How transcription is regulated governs microbial physiology and pathogenic mechanisms. Also within the umbrella of functional genomics are technologies that aim to study directly the protein content (proteome) of the cell – proteomics (see Chapter 3 and also Ref. [6]). 2.2.2 From Gene to Whole Genome

Thirty years ago the Southern blot hybridization revolutionized gene analysis, allowing single genes to be studied within complex genomes [7]. Soon techniques such as reverse-transcription polymerase chain reaction (RT-PCR) and versions of the Northern blot followed. These allowed quantification as well as detection of RNA molecules but remained limited to single genes. This limitation was reduced with the development of macroarrays using nylon membranes to support many genes at once. However, the introduction of the microarray made it possible to study the expression of every single gene within a genome at once [8, 9]. The first technological advance that was key to the development of the microarray was the use of glass as a solid substrate for the DNA probes. This allowed many more probes to be accurately held on one surface and in a much smaller area. The second was the development of printing methods that increased the resolution and accuracy of printing onto a glass surface. The genome-wide study of the complete (whole organism) mRNA expression profile (the transcriptome) is known as transcriptomics. Although transcriptomics – the comparison of a control and an experimental sample’s expression profile – is the primary application for microarrays, they can be used for many purposes including the analysis of genome structure and as a strain-screening tool. The conclusions being drawn from pathogen microarray studies are revolutionizing microbiology. Our understanding of microbial pathogenesis is adapting to the view that bacterial activities are the product of a whole organic system rather than the activity of a single gene or regulon. Host responses to microbial infection are also being characterized using host genome microarrays. Studying both the bacterial and host expression profiles during infection will provide the most complete understanding of host–pathogen interactions.

2.3 Introducing the Microarray

2.3 Introducing the Microarray 2.3.1 What Is a Microarray?

Microarrays are densely packed arrays of DNA probes which are fixed to a solid substrate in a predetermined matrix on a small surface area. The arrays are then amenable to hybridization with RNA or DNA nucleotide samples isolated from experimental cultures. The nucleotides are labeled before hybridization and afterwards the fluorescent intensity from each probe spot can be measured. When compared to a control, the fluorescent intensity from each spot can be taken to indicate the relative level of that nucleotide present in the sample. Because the arrays contain probes for all of the open reading frames in a genome, this is achieved on a genome-wide scale with each array used. There are two commonly used forms of microarray, the spotted array and the Affymetrix gene chip, which differ in their method of construction and use. The choice of one array type over another depends largely on cost, application, and commercial availability as well as the need to design and construct specialized arrays within the laboratory. The fundamental differences between the synthesis methods and their use and analysis mean they often show varying degrees of agreement with one another and the expression levels that they report are usually not amenable to direct comparison [10]. 2.3.2 The Affymetrix Gene Chip

Affymetrix uses a method that directly synthesizes an oligonucleotide sequence on a silicon slide/chip using a technique known as light-directed combinatorial chemical synthesis [11]. Affymetrix gene chips are more commonly used for eukaryotic research as they can achieve higher spot densities, allowing the incorporation of more probes to study the many more genes and splice variants that occur. Microbial researchers do not require as many genes or splice variants on an array so can use the more easily obtainable spotted arrays. The latest human oligonucleotide genechip by Affymetrix can incorporate up to 47 000 probes [12], whereas the Mycobacterium tuberculosis whole-genome spotted array contains just 4410 probes [13]. However, it is worth noting that Affymetrix have just released a new Escherichia coli gene chip that contains 10 000 probes covering four common environmental and laboratory strains on the same array, thereby allowing microbiologists to benefit from the higher density arrays. In this chapter we will focus on spotted microarrays (PCR or long-oligo) as they are the most common platform for the analysis of microbial expression profiles. For more information on Affymetrix Gene Chip technology and their applications visit the company website [12] or see Ref. [11].

23

24

2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity

2.3.3 The Spotted Microarray

Spotted arrays most commonly use coated (or “activated”) glass as the solid substrate, and are generally the size and shape of, if not an actual, microscope slide. Using glass as the substrate is beneficial for two reasons: (1) it allows the formation of concentrated spots, thereby increasing the array density, and (2) glass provides a flat surface for grid formation and for the confocal microscope to focus on. The coating chemistry can vary [14], but the common feature of the activated glass is that it allows cross-linking between the positively charged surface and the negatively charged nucleotide probes. The spotted array uses either PCR amplicons or presynthesized oligonucleotides of unique fragments from each of the target genes as probes. The PCR product or oligonucleotide probes are then spotted onto the array surface by an automated robot that uses specially designed pins to pick up the sample from a well and deposit it (spot it) onto the array surface. The probes are spotted in a prearranged grid formation; this allows the analysis software to link each spot with the gene that it represents. PCR probes are generated in a 96-well or higher format using primers specifically designed to amplify a unique region of the ORF (open reading frame). Primer design is most commonly undertaken using software packages such as PrimeArray [15, 16], Primer 3 [16], or GenomePRIDE [17] which automatically develop primers for all coding sequences in a genome. The PCR amplicon probes are generally between 200 and 2000 base pairs long. The PCR products must then be purified, and it is often sensible to test a subset to ensure primer and probe specificity. Long-oligonucleotide spotted arrays were developed as an alternative and beneficial method for the generation of probes, removing the need for time-consuming and error-prone PCR. The probes are single-stranded oligonucleotides synthesized by an external company, so their use makes them inherently faster and cheaper. Also, because they are single-stranded to begin with, they are more sensitive to hybridization with their complementary ORF cDNA (see below). The probes are melting-temperature-normalized and are normally 50–70 nucleotides long. The oligonucleotides are specifically designed to locations in each ORF that increase specificity and reduce cross-reactivity with other probes. As mentioned, the oligonucleotides used for these arrays are purchased in a ready-to-use state (see http:// www.operon.com). The user can choose or design specific probes, or alternatively it is possible to purchase a predesigned set of probes that cover every ORF from one of many genomes. Operon currently sells complete genome array ready sets of oligonucleotide probes for many bacterial pathogens including E. coli, M. tuberculosis, Bacillus anthracis, Neisseria meningitides, and Listeria monocytogenes. It is also possible to buy a set of approximately 35 000 probes that match the human genome, allowing host studies with this method too. However, this system is relatively new, and it is currently unknown whether there are problems with the storage of single-stranded nucleotides that are inherently more susceptible to degradation than double-stranded probes generated by PCR.

2.4 Microarray Methods

2.4 Microarray Methods 2.4.1 Experimental Design

Producing useful data from microarrays depends heavily on the manner in which the study was performed so that sources of error are reduced. There are two fundamental sources of error that occur during a microarray experiment: random errors and systematic errors. Random errors are controlled for by obtaining an adequate number of repeats to minimize extraneous influencing factors. Systematic error, or bias, is reduced as far as possible by using an appropriate experimental design (although further reduction is usually necessary using statistical means). An introduction to the key points of microarray experimental design is provided here (for a more detailed appraisal see Ref. [18]).

2.4.1.1 Type of Experiment Microarrays can be used for many types of experiments. There are experiments that use two samples of DNA (DNA–DNA arrays) and can be used to analyze genome structure to identify strain variation [19] and also to resequence genes [20], etc. Then there are the experiments that analyze transcriptional changes using RNA, for which a system of nomenclature has been proposed. This divides microarray RNA experiments into type 1 and 2 experiments [21]: . A type 1 experiment uses differentially labeled control and experimental RNA populations and makes a direct comparison between the two on one array (RNA–RNA arrays). Type 1 experiments include those that aim to identify genes up- or downregulated under experimental conditions such as during a host infection compared to in vitro culture [22] (Fig. 2.1). . A type 2 experiment makes indirect comparisons between two RNA samples. In this type of experiment each RNA sample (experimental and control) is hybridized to a separate array along with a reference gDNA sample (RNA–DNA arrays). Type 2 experiments include more recent array studies that analyze expression profiles, and more complex studies such as those that follow a population over a time course and those that make multiple comparisons to a reference sample [23] (Fig. 2.2). Note that genomic DNA (gDNA) should be extracted from a cell culture in the stationary phase, as partially replicated chromosomes will affect the gene copy number.

25

26

2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity

Fig. 2.1 A type 1 (RNA–RNA) experiment makes direct comparisons between two RNA samples (an experimental and a control) on one array. The fold change in gene expression between experimental and control popula-

tions is calculated by dividing the fluorescence reading from the experimental population (Cy5, in this case) by the reading from the control (Cy3). This ratio is then transformed to its logarithm to the base 2.

2.4 Microarray Methods

Fig. 2.2 A type 2 (RNA–DNA) experiment makes indirect comparisons between two RNA samples (an experimental and a control). Each RNA sample is hybridized to a separate array along with a genomic DNA (gDNA) standard. The fold changes in gene expression between experimental and control

27

populations are a ratio of ratios, calculated by: (1) dividing the florescence reading of each RNA sample by that of the gDNA standard, and then by (2) dividing the experimental by the control. This ratio is then transformed to its logarithm to the base 2.

28

2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity

2.4.1.2 Replicates As with all biological experiments it is important to have a sufficient number of replicates to insure statistical validity and to remove the effect of any external, random influences. There are two types of replicates to consider when using microarrays: biological and technical. Biological replicates are repetitions of the same experiment but performed with different RNA samples. Technical replicates are simply multiple arrays performed with the same RNA sample. Technical replicates can be performed at different levels – for example, labeling of the cDNA, duplicating spots on the array, and rescanning arrays. It has been suggested that microarray experiments should contain at least three biological replicates to increase reliability [24]; however, other groups have suggested that far more replicates are needed to increase the sensitivity of a microarray experiment so that genes with low-fold changes in expression are easily detected [25]. It is also clear that biological replicates are more important than technical replicates, should there be limited resources. There is therefore a hierarchy of replicates (Fig. 2.3), with biological replicates at the top, and the method of analysis must reflect this (see below). Inherent variations (or systematic errors) occur within any microarray experiment, some of which can be minimized during the experimental procedure. The most common of these is the dye bias, which occurs because the size and shape of the Cy labels influences their rate of incorporation. Cy3 is much smaller and more easily incorporated into cDNA than Cy5. The effect of this is usually adjusted for during data normalization, although some groups prefer to carry out dye swap experiments when performing type 1 studies [26, 27]. In type 2 experiments (RNA–gDNA) the same dyes must be consistently used to label the DNA and RNA populations so that gDNA can act as a constant denominator. This allows raw data from any array produced in this way to be compared almost directly, and is reportedly more reproducible than an RNA–RNA array [28]. 2.4.2 RNA Extraction

Rapid RNA extraction is a vital step in a microarray experiment if it aims to assess gene expression levels in a meaningful manner. The RNA sample tested must reflect transcription levels as they were naturally and not as a result of culture manipulation or RNA degradation. The first step is therefore usually immediate immersion of the culture in a “stop solution” such as guanidinium thiocyanate that halts all biological activity including transcription and enzyme-mediated degradation of RNA. The cell must then be lysed and the contents disrupted, the exact method depending on the bacterial species being used. It is quite common now for commercial kits to be used for the RNA extraction, but essentially the RNA is purified by precipitation and resuspended in RNase-free water. Many laboratories have their own favored method of extracting nucleotide samples from bacterial cultures, and the protocols will vary depending on the species under study.

Fig. 2.3 The hierarchy of replicates shows how complex a microarray experiment can become. It also highlights the multiple layers at which replicates are needed to ensure a microarray experiment remains valid and is not influenced by internal or external forces which are out of the scientist’s control.

2.4 Microarray Methods 29

30

2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity

Because mRNA has a short half-life [29], speed of isolation through to analysis is critical. It is possible to store RNA for short periods at –80 C without any significant degradation. Before the RNA is used for further analysis it is essential to DNAse-treat the sample. This is to ensure the preparation is pure, as RNA extraction protocols often isolate genomic DNA at the same time, which would give a false positive signal in an array experiment. In some cases researchers have removed ribosomal RNA (rRNA), but we find this inconsistent and unnecessary for bacteria. As an added benefit, many arrays contain probes against rRNA so that it can be used as a positive control, or even as a normalization factor. 2.4.3 Labeling/Reverse Transcription

After isolation and purification, total RNA is reverse-transcribed to cDNA, at which point it is labeled in a process known as “direct labeling” [30]. cDNA is a much more convenient and stable molecule to work with than mRNA. This is because RNA has a very short half life and is very easily degraded by ubiquitous RNases. Approximately 2–10 lg of bacterial RNA are used from each sample in a microarray experiment. During the reverse transcription, labels are incorporated into the cDNA. These are usually dCTP molecules labeled with the fluorescent markers Cy3 or Cy5, which fluoresce at different wavelengths. In a type 1 experiment the control and experimental RNA populations must be labeled differently, whereas in a type 2 experiment both the control and experimental RNA populations are labeled using the same fluorescent dye whilst the gDNA standard is labeled using the other. Because microbial mRNAs are not polyadenylated, this feature cannot be targeted with primers during the reverse transcription. Therefore, the reverse transcription of a bacterial total mRNA population requires another approach. Using random primers was the first method developed, but, as Talaat et al. [31] have shown, these are often insensitive, and their nonspecificity results in some mRNAs not being transcribed at all. This can be avoided if genome-directed primers are used. These are designed by an algorithm that has been written to calculate the minimum number of primers needed to amplify every ORF in a genome. This ensures that the primers are highly sensitive to all mRNAs in a cell. Using this algorithm, it has been shown that a set of just 37 oligonucleotide primers is enough to amplify every ORF (approximately 4000) in the M. tuberculosis genome, and that they were far more specific when it came to selectively amplifying mycobacterial, rather than mammalian, RNA [31]. If gDNA is being used, as in a type 2 experiment, it is labeled using random or genome-directed primers and the DNA polymerase I Klenow fragment. The reverse transcription process used to convert the mRNA population into cDNA prior to hybridization is not 100% efficient [32]. The 3¢ end of a gene is often over-represented as occasionally reverse transcriptase only synthesizes a short sequence before falling off. A direct-labeling approach where the mRNA population is labeled without the need for a reverse transcription or amplification

2.4 Microarray Methods

step has been developed [33]. Although not widely used, this method promises to reduce bias and increase the accuracy of microarray results. There are also methods of RNA labeling that improve the efficiency of the labeling reaction and are designed for small (submicrogram) yields of RNA; this is known as “cold labeling” [34]. This is necessary, for example, when extracting bacterial RNA from in vivo experiments because bacteria in the host are then usually far more diffuse than when grown in vitro. This protocol first reverse transcribes the mRNA to cDNA using unlabeled nucleotides. The cDNA is then randomly primed and labeled using the Klenow fragment of DNA polymerase, which has a much higher rate of label incorporation than does reverse transcriptase [35]. This method therefore allows either smaller amounts of RNA to be used (reportedly as little as 0.35 lg of RNA) or, in principle, lower concentrations of Cy dyes, which would make microarray experiments much more cost-effective. 2.4.4 Hybridization

The two cDNA (or cDNA and gDNA) samples are purified from the labeling reaction and combined so that both are competitively hybridized to the array at the same time. Before hybridization, the cDNA/DNA samples must be denatured to single-stranded molecules with a brief incubation at 95 C and added to the hybridization mixture (a salt and detergent solution) which provides a highly stringent environment for the binding reactions to occur in. The slide must also be prehybridized before use to prevent nonspecific binding of nucleotides. During the hybridization reaction the samples are hybridized to the array underneath a slightly raised cover slip. The entire array is then incubated within a sealed chamber, which is submersed in a water bath, for 16–20 h. The incubation temperature is set so that it is high enough to induce stringent binding conditions, the exact temperature depending on the bacterial species and the G/C content of the genome. After hybridization the microarray must be stringently washed to remove any unbound or nonspecifically bound cDNA/DNA. 2.4.5 Scanning

Following hybridization, the levels of fluorescence from each spot for both Cy3 and Cy5 are determined using a confocal scanner. A false color image is produced from a prescan of the array; the image represents the level of fluorescent intensity using a color gradient and is solely for easy observational analysis, allowing researchers to check on the internal controls. If the negative controls are negative, the positive controls are positive, and the background is not too imposing, a gain at which the array will be scanned can be set. The gain refers to the laser power used by the confocal microscope to excite the fluorophores without inducing saturation, and is expressed as the ratio of output (intensity) to input (laser power). It is important that the slide be scanned at a gain just below the threshold where

31

32

2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity

any of the spots become saturated, otherwise these will be of no use. Conversely, if the signal intensity for any probe of the spots is too low, they may be confused with the background and be unreliable. A crude observational analysis of the false color image is enough to ascertain what gain to use. Often, the array is scanned a second time at the maximum gain allowed to clarify the reading from spots with very low intensities. The array is scanned under two wavelengths (or channels), 635 nm for Cy5 and 532 nm for Cy3, and for each dye the computer produces a separate monochromatic image (as a TIFF – tagged information file format – file) that reflects the fluorescent intensities on a black and white gradient. Once the slide has been scanned, the monochromatic TIFF images for each dye can be exported into a piece of image analysis software that performs the quantification of the fluorescence intensity.

2.5 Data Normalization and Analysis 2.5.1 Image Quantification

TIFF images of both channels (Cy3 and Cy5) from the microarray are imported into, and processed by, a piece of image analysis software such as Imagene (http://www.biodiscovery.com/imagene.asp) or BlueFuse (http://www.cambridgebluegnome.com/products/bluefuseformicroarrays/index.htm). The image analysis software programs perform the laborious task of measuring the fluorescence intensities of many thousands of spots at once, which they do by superimposing a malleable grid over the array surface, from which it detects and quantifies the median intensity of the pixels within each spot as well as within a local background area. The median is used because it is not strongly influenced by outliers, and is therefore a robust estimate of the mean. The software links the spot coordinates to an internal database that contains details of all of the genes represented by the spots and produces a tab-delimited or XML results file. Included in this file are the spot coordinates, gene names and identifiers, and the mode and median of the fluorescence readings as well as information about the background reading, spot size, and many other pieces of supplementary information. Usually a separate file for each of the Cy dyes is produced, meaning that an array experiment will produce many large files that contain huge amounts of data. However, the format of this data file means it can easily be imported into any data analysis software package for processing.

2.5 Data Normalization and Analysis

2.5.2 Data Processing

As mentioned, systematic errors (or bias) are reduced as far as possible by using an appropriate experimental design; however, further reduction is usually necessary using statistical processes. First, all dubiously low measurements – normally those lower than the background plus two standard deviations – must be either excluded or set at a base level above the background. Secondly, the data must be normalized, which is an extremely important step in ensuring the validity of the array readings. Normalization is a process that transforms microarray fluorescence intensities to account for systematic errors. These can be divided into “within array” and “between array” normalizations. Systematic errors usually accounted for include, for example, the differences in dye incorporation efficiencies (Cy3 is much smaller and more easily incorporated into cDNA than Cy5), the quantity of RNA hybridized to the array, spot size, probe length, spacial effects etc. Normalization can be done in several ways, and both statistical methods and strategies based on internal controls have been developed. Less commonly used are those based on internal controls and invariant genes. This is because many genes previously thought to be invariant have been shown actually to vary quite significantly, and also because data from a type 1 experiment still suffers from a dye bias. Type 2 experiments using gDNA are a promising method of overcoming these limitations. A nonnormalized data set will exhibit a bias towards spots with a strong fluorescent intensity, whereas a normalized data set should center on a log ratio (see below) of zero and be independent of spot intensity. Most often used for withinarray normalizations like this are statistical, globally applied methods. One statistical method of transformation commonly used to calibrate microarray data is LOWESS (locally weighted scatterplot smoothing [36]), which smooths scatterplots of log ratios in a weighted, least-squares fashion to remove intensity-dependent bias. Another is MAD (median absolute deviation), which provides a robust estimate of standard deviation. For further information on microarray normalization methods and the theory underlying them, many reviews exist, but Refs. [37, 38] are suggested as excellent starting points for a basic discussion, or Ref. [39] for more in-depth coverage. Once the data from technical and biological replicates are compiled and the microarray data is ready for analysis, the fold difference in mRNA levels is calculated. This fold change in expression between the two samples hybridized to the array is represented using a ratio of the fluorescence intensities from the two signals. A problem with the ratio generated by microarray analysis is that it is not symmetrical; a gene induced two-fold is given a ratio of 2 whereas a gene repressed two-fold is given a ratio of –0.5. Therefore, to make ratios from both up- and downregulated genes symmetrical, the data is transformed by its logarithm to the base 2 (log2). This normalizes the ratios to the value of 1, meaning that a two-fold increase is represented as a ratio (log2) of 1, a two-fold decrease is represented as a ratio (log2) of –1, and a gene with an unchanged expression level is given a ratio (log2) of 0.

33

34

2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity

2.5.3 Data Analysis 2.5.3.1

Detection of Differential Expression

On a basic level, many scientific studies report the results of their microarray experiment as a gene list. These lists contain all of the genes significantly up- or downregulated during the experimental conditions under study. A gene whose expression changes more than two-fold, either up or down, was often thought to be significantly differentially expressed. However, this is a completely arbitrary cutoff point, and a measure of statistical significance must be used. Simple statistical analysis using t-tests and analysis of variance (ANOVA) can be used, but they must take account of (a) the experimental structure and (b) multiple testing. The experimental structure may be regarded as hierarchical with biological replicates at the top and technical repeats below, as discussed above. The structure is an important consideration because t-tests, for example, assume that all data points are independent, i.e., that the replicates were performed at the same level. Moderated t-tests are better than standard t-tests as they account for the fact that outliers (very high or low data points) will influence a standard t-test and produce artificially high significance levels, increasing the number of false positives (see below). Even when a set of data has been assessed for its significance, we have to bear in mind that all statistical inferences have a probability of being incorrect. There are two important types of error inherent in statistical analysis: false positives and false negatives. The false negative rate is a reflection of the statistical power of the test being performed and can be reduced simply by increasing sample replicates. The false positive rate is governed by the significance level being tested for (i.e., a P-value less than 0.05); however, it is influenced by the number of comparisons or tests being performed – known as multiple testing. Multiple’ testing refers to the fact that a microarray experiment will perform thousands of multiple comparisons (one per gene) simultaneously. Using a standard significance cutoff of P < 0.05 would fail to account for the effect of this many comparisons and would automatically lead to 50 genes being classed as “significant” per 1000 (5%) purely by chance. The most stringent method of compensating for this, known as the Bonferroni correction [40], can be applied in two ways. The first is to adjust the P-value obtained from each test by multiplying it by the number of genes being tested and then only accept its significance if P is still below 0.05. The second is to manipulate your original false positive rate (say 0.05), dividing it by the number of tests (say, 1000), which provides a new significance cutoff point of 0.00005. This is extremely conservative and will normally result in very few significant genes. The false discovery rate (FDR) is an alternative solution to the multiple testing problem in that it provides an estimate of the amount of false discoveries without being as conservative as the Bonferroni correction [41, 42]. The FDR returns an expected percentage of false predictions within the data set, defined as the ratio of the number of false predictions over the total number of predictions.

2.5 Data Normalization and Analysis

The situation can become extremely complex and biologists should collaborate with statisticians in order to analyze microarray data accurately. Ideally each should be aware of the other’s expertise throughout a microarray experiment, from experimental design and execution through to analysis and interpretation. Reference [38] provides an excellent starting point for the finer details of data analysis.

2.5.3.2 Pattern Recognition If more complex questions are to be asked of a microarray experiment than merely identifying which genes are up- or downregulated, more complex procedures need to be performed with the data. For example, researchers often use cluster analysis to group genes with similar expression profiles. A cluster algorithm iterates over an array data set and acts as a grouping tool, essentially assigning two genes with similar expression profiles to the same cluster. There are many types of cluster algorithm including hierarchical, k-means, principal component analysis (PCA), self-organizing maps (SOMs), etc. In this way clustering can help in the assignment of functionality to unknown genes, divulge groups of genes controlled by the same promoter, and generally provide a more holistic integration of microarray data. However, as with all statistical analysis, clustering can never be perfect, primarily because it requires user input to determine cluster cutoff values. For more detailed descriptions of clustering and its uses, see Ref. [43].

2.5.3.3 Graphical Representations Although they are not particularly informative in terms of hard data, graphical representations of microarray data are quite useful. The distribution and spread of the array data is often a convenient method to visualize the changes occurring in the two samples studied. For this purpose there are two commonly used scatterplots. A log R/G plot shows the log transformed intensity of one channel plotted against the other. This is a very straightforward and intuitive plot, but the high correlation between the two channel intensities can make visualizing smaller outliers harder. A better way to view the distribution of microarray data is to use an MA plot where the average spot intensity (A) is plotted along the x axis and the log2 ratio (M) is plotted on the y axis [44]. This makes it easier to see those genes differentially expressed. The MA plot is also used to visualize the relationship between spot intensity and the level of expression before and after normalization procedures like LOWESS to assess their effect and accuracy. The results of clustering are often displayed as a tree, which leads to a colored representation of the intensity over a time course (for example) so that the relationship between genes’ expression profiles are easy to see. Array data can be presented in numerous other ways, too, most of which help us to wade through the mass of data generated by the experiment.

35

36

2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity

2.5.6 Microarray Analysis Tools

Many software packages have been designed to automate microarray data normalization and analysis. A commonly used program is Genespring (http://www. silicongenetics.com) which provides a fairly automated analysis process and allows the user to view and manipulate many microarray experiments at once. Another attractive benefit is that Genespring has the ability to produce many userfriendly graphical representations of the data. However, this software is costly, whereas there are many open-source alternatives freely available, such as the TM4 suite [45] from The Institute of Genomic Research (TIGR, www.tigr.org/software). TM4 contains four applications: TIGR Spotfinder, Microarray Data Analysis System (MIDAS), Microarray Data Manager (MADAM), and Multiexperiment Viewer (MeV). Also included is a Minimal Information About a Microarray Experiment (MIAME, see Section 2.5.8)-compliant database. These tools cover all aspects of image analysis through to data storage and we recommend them as a good alternative to commercially available software. Alternatively, there are many programs or routines that have been designed by researchers and statisticians that act as pure statistical tools to crunch the raw data from a microarray experiment. These tools, unlike Genespring, put all aspects of analysis into the researchers hands, allowing a far more flexible analysis that is tailored to the experiment. Examples of these programs and routines include SMA (Statistics for Microarray Analysis, http://www.stat.berkeley.edu/ users/terry/zarray/Software/smacode.html), Bioconductor (http://www.bioconductor.org/), LIMMA (Linear Models for Microarray Data, http://bioinf.wehi. edu.au/limma/), and YASMA (Yet Another Statistical Microarray Analysis tool, http://people.cryst.bbk.ac.uk/wernisch/yasma.html). These are add-on packages that work in the “R” statistical environment (http://www.r-project.org/), so, although they are useful for data analysis, knowledge of the R language is required in order to exploit these tools fully. 2.5.7 Microarray Follow-Up

Microarray data often depend on the conditions in which the bacteria were studied, the method of RNA isolation, the statistical analysis, and so on. As a result, there is a lot of uncertainty about using microarray data as concrete evidence. Because of this, it is common for researchers to use microarrays as an exploratory tool that generate leads which can be followed up individually with simpler, more developed and robust (but lower-throughput) processes such as RTq-PCR or Northern blots. RTq-PCR (quantitative or real-time PCR) is frequently used to validate at least some of the results from microarrays.

2.6 Transcriptomics: Where We Are Now and What’s to Come

2.5.8 Data Storage and Reanalysis

Microarrays provide a vast quantity of data, and interpretation of these data is a complex subject matter in its own right. Often the results of experiments such as those above are open to question. The reinterpretation of microarray data will be key in the future as we learn the best ways to handle it. Reinterpretation is not possible, however, if the original data values for the array are not released upon publication of the results. Currently there are no universal guidelines from journals as to what is required in terms of data when a piece of research is submitted for publication, although it is common (but not compulsory) for the complete data set to be published online for access by other researchers. The format of the data is extremely varied and this is a major hurdle to their reinterpretation. Several groups have been set up which aim to standardize what is required of microarray data when it is published. The foremost of these is MIAME (Minimum Information About a Microarray Experiment, http://www.mged.org/), which aims to facilitate the access to, and usability of, microarray data. The standardization of microarray data, and the format in which they are presented, will facilitate the reanalysis of data and allow other researchers to use the data for meta-analyses etc. Centralized databases that are capable of compiling and storing raw microarray data from many sources, the conclusions drawn from the data, and the experimental conditions under which they were derived are slowly being developed. An increase in the use of these databases, perhaps made mandatory, is required if they are to be of real use to researchers in the future. In the meantime, however, simple guidelines for microarray experiments have been proposed [46]: . Data should be in tab-delimited and easily accessible format, such as a text (.txt) file and not as a pdf (.pdf) file. This would allow them to be easily transported into databases and further analysis software. . The data should include the definitive accession number for each gene as well as (or opposed to) common gene names, which are often inconsistent and regularly changing. . Data that are not included in the analysis (e.g., values for spots lower than the background) should be identified and not merely left out. If followed, these guidelines would make the reuse of microarray data much easier.

2.6 Transcriptomics: Where We Are Now and What’s to Come

The study of the transcriptome is rapidly advancing the biological knowledge of pathogenic organisms. Microarray technology allows us to identify, in a holistic manner, all of the components in an organism that work together to mediate their pathogenic activities [47, 48]. In particular, many new virulence factors have been

37

38

2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity

identified without having to revert to the laborious techniques of the past [22]. The true importance of transcriptomics and microarrays is probably best shown by the diverse range of areas in addition to pathogen transcriptomics to which microarrays are contributing significantly, including (among many others) operon identification [49], gene essentiality studies [50], genotyping [19], detection of virulent isolates [51], elucidation of the host response to infection [52], and the effect of virulent organisms on host cells [53]. Transcriptomics is used as an indication of cellular activity at the protein level. In prokaryotes there is thought to be a high (but not an absolute) correlation between transcription and translation. Although there will always be exceptions to this, studies have been performed that assessed the relationship between transcriptional activities and the resulting level of biological activity at the protein level [54]. An obvious drawback becoming apparent to this kind of study is that the current technology available to analyze the proteome of a cell is far behind that used to analyze the transcriptome. For example, Scherl et al. [54] were able to identify proteins corresponding to just 23% of the Staphylococcus aureus predicted ORFs despite using a myriad of technologies to increase their detection rate. To fully unravel the story of a host and its pathogen, much more work is needed in exploring the integration of transcriptomic data with that derived from proteomic studies.

Acknowledgements

We would like to thank both Mike Withers and Yi Zhang for advice and guidance during the preparation of the data analysis sections of this chapter. B. S. is funded by a Royal Veterinary College scholarship.

References 1 http://www.tigr.org. The Institute for 2

3 4

5

Genome Research (TIGR). http://www.ncbi.nlm.nih.gov/genomes/ Complete.html. NCBI’s Genome database. http://www.genomesonline.org. GOLD database. Fickett, J. W. 1996. Finding genes by computer: the state of the art. Trends Genet. 12:316–320. Bocs, S., A. Danchin, and C. Medigue. 2002. Re-annotation of genome microbial coding-sequences: finding new genes and inaccurately annotated genes. BMC Bioinformatics. 3:5.

6 Lim, M., and K. Elenitoba-Johnson.

2004. Proteomics in pathology research. Lab. Invest. 84:1227–1244. 7 Southern, E. M. 1975. Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 98:503–517. 8 Schena, M., D. Shalon, R. W. Davis, and P. O. Brown. 1995. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 270:467-470. 9 Shalon, D., S. J. Smith, and P. O. Brown. 1996. A DNA microarray system for analyzing complex DNA samples

References using two-color fluorescent probe hybridization. Genome Res. 6:639–645. 10 Tong, X., J. W. Campbell, G. Balazsi, K. A. Kay, B. L. Wanner, S. Y. Gerdes, and Z. N. Oltvai. 2004. Genome-scale identification of conditionally essential genes in E. coli by DNA microarrays. Biochem. Biophys. Res. Commun. 322:347–354. 11 Lipshutz, R. J., S. P. Fodor, T. R. Gingeras, and D. J. Lockhart. 1999. High density synthetic oligonucleotide arrays. Nat. Genet. 21:20–24. 12 http://www.affymetrix.com/index.affx. Affymetrix High Density Oligonucleotide Microarray web site. 13 http://bugs.sghms.ac.uk/. Mycobacterium tuberculosis Microarray project. 14 Beier, M., and J. Hoheisel. 1999. Versatile derivatisation of solid support media for covalent bonding on DNA-microchips. Nucl. Acids Res. 27:1970–1977. 15 Raddatz, G., M. Dehio, T. F. Meyer, and C. Dehio. 2001. PrimeArray: genomescale primer design for DNA-microarray construction. Bioinformatics. 17:98–99. 16 Rozen, S., and H. Skaletsky. 2000. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 132:365–386. 17 Haas, S. A., M. Hild, A. P. H. Wright, T. Hain, D. Talibi, and M. Vingron. 2003. Genome-scale design of PCR primers and long oligomers for DNA microarrays. Nucl. Acids Res. 31:5576– 5581. 18 Churchill, G. A. 2002. Fundamentals of experimental design for cDNA microarrays. Nat. Genet. 32 Suppl:490–495. 19 Behr, M. A., M. A. Wilson, W. P. Gill, H. Salamon, G. K. Schoolnik, S. Rane, and P. M. Small. 1999. Comparative genomics of BCG vaccines by whole-genome DNA microarray. Science. 284:1520– 1523. 20 Grimm, V., S. Ezaki, M. Susa, C. Knabbe, R. D. Schmid, and T. T. Bachmann. 2004. Use of DNA microarrays for rapid genotyping of TEM betalactamases that confer resistance. J. Clin. Microbiol. 42:3766–3774. 21 Yang, Y. H., and T. Speed. 2002. Design issues for cDNA microarray experiments. Nat. Rev. Genet. 3:579–588.

22 Orihuela, C. J., J. N. Radin, J. E. Sublett,

G. Gao, D. Kaushal, and E. I. Tuomanen. 2004. Microarray analysis of pneumococcal gene expression during invasive disease. Infect. Immun. 72:5582– 5596. 23 Stintzi, A. 2003. Gene expression profile of Campylobacter jejuni in response to growth temperature variation. J. Bacteriol. 185:2009–2016. 24 Lee, M. L., F. C. Kuo, G. A. Whitmore, and J. Sklar. 2000. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. U. S. A. 97:9834–9839. 25 Zien, A., J. Fluck, R. Zimmer, and T. Lengauer. 2003. Microarrays: how many do you need? J. Comput. Biol. 10:653–667. 26 Rosenzweig, B. A., P. S. Pine, O. E. Domon, S. M. Morris, J. J. Chen, and F. D. Sistare. 2004. Dye bias correction in dual-labeled cDNA microarray gene expression measurements. Environ. Health Perspect.. 112:480–487. 27 Dobbin, K., J. H. Shih, and R. Simon. 2003. Statistical design of reverse dye microarrays. Bioinformatics. 19:803– 810. 28 Talaat, A. M., S. T. Howard, W. Hale IV, R. Lyons, H. Garner, and S. A. Johnston. 2002. Genomic DNA standards for gene expression profiling in Mycobacterium tuberculosis. Nucleic Acids Res. 30:e104. 29 Bernstein, J. A., A. B. Khodursky, P. H. Lin, S. Lin-Chao, and S. N. Cohen. 2002. Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays. Proc. Natl. Acad. Sci. U. S. A. 99:9697–9702. 30 DeRisi, J., L. Penland, P. O. Brown, M. L. Bittner, P. S. Meltzer, M. Ray, Y. Chen, Y. A. Su, and J. M. Trent. 1996. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat. Genet. 14:457–460. 31 Talaat, A. M., P. Hunter, and S. A. Johnston. 2000. Genome-directed primers for selective labeling of bacterial tran-

39

40

2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity scripts for DNA microarray analysis. Nat. Biotechnol. 18:679–682. 32 Baugh, L. R., A. A. Hill, E. L. Brown, and C. P. Hunter. 2001. Quantitative analysis of mRNA amplification by in vitro transcription. Nucleic Acids Res. 29:e29. 33 Gupta, V., A. Cherkassky, P. Chatis, R. Joseph, A. L. Johnson, J. Broadbent, T. Erickson, and J. DiMeo. 2003. Directly labeled mRNA produces highly precise and unbiased differential gene expression data. Nucl. Acids Res. 31:e13. 34 Eriksson, S., S. Lucchini, A. Thompson, M. Rhen, and J. C. Hinton. 2003. Unravelling the biology of macrophage infection by gene expression profiling of intracellular Salmonella enterica. Mol. Microbiol. 47:103–118. 35 Thompson, L. J., D. S. Merrell, B. A. Neilan, H. Mitchell, A. Lee, and S. Falkow. 2003. Gene expression profiling of Helicobacter pylori reveals a growth-phase-dependent switch in virulence gene expression. Infect. Immun. 71:2643–255. 36 Berger, J., S. Hautaniemi, A.-K. Jarvinen, H. Edgren, S. Mitra, and J. Astola. 2004. Optimized LOWESS normalization parameter selection for DNA microarray data. BMC Bioinformatics. 5:194. 37 Quackenbush, J. 2002. Microarray data normalization and transformation. Nat. Genet. 32 Suppl:496–501. 38 Leung, Y. F., and D. Cavalieri. 2003. Fundamentals of cDNA microarray data analysis. Trends Genet. 19:649–659. 39 Nadon, R., and J. Shoemaker. 2002. Statistical issues with microarrays: processing and analysis. Trends Genet. 18:265– 271. 40 Hochberg, Y. 1988. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 75:800–803. 41 Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B (Stat. Methodol.). 57:289–300. 42 Reiner, A., D. Yekutieli, and Y. Benjamini. 2003. Identifying differentially expressed genes using false discovery

rate controlling procedures. Bioinformatics. 19:368–375. 43 Quackenbush, J. 2001. Computational analysis of microarray data. Nat. Rev. Genet. 2:418–427. 44 Dudoit, S., Y. H. Yang, C. M, and T. Speed. 2002. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sin. 12:111–140. 45 Saeed, A. I., V. Sharov, J. White, J. Li, W. Liang, N. Bhagabati, J. Braisted, M. Klapa, T. Currier, M. Thiagarajan, A. Sturn, M. Snuffin, A. Rezantsev, D. Popov, A. Ryltsov, E. Kostukovich, I. Borisovsky, Z. Liu, A. Vinsavich, V. Trush, and J. Quackenbush. 2003. TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 34:374–378. 46 Kendall, S. L., S. C. Rison, F. Movahedzadeh, R. Frita, and N. G. Stoker. 2004. What do microarrays really tell us about M. tuberculosis? Trends Microbiol. 12:537–544. 47 Maurer, L. M., E. Yohannes, S. S. Bondurant, M. Radmacher, and J. L. Slonczewski. 2005. pH regulates genes for flagellar motility, catabolism, and oxidative stress in Escherichia coli K-12. J. Bacteriol. 187:304–319. 48 Stewart, G. R., L. Wernisch, R. Stabler, J. A. Mangan, J. Hinds, K. G. Laing, D. B. Young, and P. D. Butcher. 2002. Dissection of the heat-shock response in Mycobacterium tuberculosis using mutants and microarrays. Microbiology. 148:3129–3138. 49 Tjaden, B., D. R. Haynor, S. Stolyar, C. Rosenow, and E. Kolker. 2002. Identifying operons and untranslated regions of transcripts using Escherichia coli RNA expression analysis. Bioinformatics. 18 Suppl 1:S337–S344. 50 Sassetti, C. M., D. H. Boyd, and E. J. Rubin. 2003. Genes required for mycobacterial growth defined by high density mutagenesis. Mol. Microbiol. 48:77–84. 51 Wu, C. F., J. J. Valdes, W. E. Bentley, and J. W. Sekowski. 2003. DNA microarray for discrimination between pathogenic 0157:H7 EDL933 and non-pathogenic Escherichia coli strains. Biosens. Bioelectron. 19:1–8.

References 52 Nau, G. J., J. F. Richmond, A. Schle-

singer, E. G. Jennings, E. S. Lander, and R. A. Young. 2002. Human macrophage activation programs induced by bacterial pathogens. Proc. Natl. Acad. Sci. U. S. A. 99:1503–1508. 53 Yuan, J.-P., T. Li, H.-B. Chen, Z.-H. Li, G.-Z. Yang, B.-Y. Hu, X.-D. Shi, S.-Q. Tong, Y.-X. Li, and X.-K. Guo. 2004. Analysis of gene expression profile in gastric cancer cells stimulated

with Helicobacter pylori isogenic strains. J. Med. Microbiol. 53:965–974. 54 Scherl, A., P. Francois, M. Bento, J. M. Deshusses, Y. Charbonnier, V. Converset, A. Huyghe, N. Walter, C. Hoogland, and R. D. Appel. 2005. Correlation of proteomic and transcriptomic profiles of Staphylococcus aureus during the post-exponential phase of growth. J. Microbiol. Methods 60:247–257.

41

43

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus: Towards a Comprehensive Understanding of Cell Physiology and Pathogenicity Michael Hecker and Susanne Engelmann

3.1 Introduction

The sequencing of the first complete genome sequence of a living organism in 1995 was a turning point in biology [1], the “blueprint of life” of Haemophilus influenzae followed by that of many other organisms, bacteria, lower and higher eukaryotes included, was decoded. This genome sequence, however, does not provide life itself because it only tells us what may happen, and not what really happens, in the cell. Functional genomic approaches such as mRNA profiling and proteomics are required to bring the “virtual life of the genes to the real life of the proteins.” Within the ensemble of functional genomics, covering from genome sequencing to bioinformatics and systems biology, proteomics will retain its crucial and privileged position because it deals – as no other discipline does – directly with the players of life, the proteins. The proteomics of today has profited greatly from genome sequencing, because protein identification by mass spectrometry (MS) techniques needs the genome sequence. The goal of recent proteomic projects is to visualize the entire proteome, which can only be achieved by a combination of gel-based and non-gelbased techniques. For physiological studies the highly sensitive two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) [2] is still the state of the art. The majority of cellular proteins can be visualized within the main proteomic window pI 4–7. However, many proteins are not located within this main window. Additional subproteomic fractions such as alkaline and acidic proteins, cell-wall-bound or extracellular proteins have to be analyzed separately on the way towards the entire proteome. Many proteins, however, escape detection by 2-D gel-based proteomics, among them the intrinsic membrane proteins, the most prominent proteome subgroup that needs the establishment of gel-free procedures. For the future, simple and feasible quantitative non-gel-based proteomic approaches that are mainly based upon a combination of multidimensional chromatography of peptides and MS/MS procedures are urgently required to make membrane protein and other subproteomic fractions available for high-throughput comparative and quantitative physiological proteomic analysis. At present, however, physiolog-

46

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus …

ical studies still have to rely mainly on gel-based proteomics as an extremely valuable tool in microbial physiology that can, in combination with various visualization and quantitation software packages, very rapidly provide comparative and quantitative data for multisample comparisons [3].

3.2 Proteomics of Bacillus subtilis: The Gram-positive Model Organism 3.2.1 The Vegetative Proteome

Bacillus subtilis, the model organism for gram-positive bacteria, is one of the most advanced bacteria in the context of gel-based physiological proteomics (see Refs. [3, 4] for reviews). In contrast to the genome, which is relatively stable, the proteome is highly flexible even in bacterial systems. From a physiological point of view, two major proteome classes can be distinguished: proteomes of growing cells, which have mainly housekeeping functions, and proteomes of nongrowing cells, which have adaptive functions against those stress or starvation stimuli that forced the growing cell population into a nongrowing state. According to our DNA array studies, cells growing in a minimal medium that does not contain amino acids, vitamins, or purine/pyrimidine bases express about 2500 genes. Of these 2500 genes, 1550 proteins should be visible in the standard window of pI 4– 7. Of the 1550 vegetative proteins, almost 50% have been identified by gel-based proteomics [5] (unpublished results). It is interesting to note that most of the metabolic routes predicted for a B. subtilis cell are covered within this main window. Many enzymes of these metabolic pathways directly visualized on the vegetative proteome have never been seen before (Fig. 3.1, see page 44–45). A list of the 400 most abundant vegetative proteins is available, including 88 proteins of still unknown function [5]. This vegetative proteome, now ready for physiological application, offers the chance to analyze the regulation of entire metabolic pathways. Even for the main catabolic pathways such as glycolysis or the tricarboxylic acid (TCA) cycle, the proteomic view of these pathways provides new information on the regulation mechanisms [5–7].

3 Fig. 3.1 Reference 2-D map of cytosolic proteins of Bacillus subtilis exponentially growing in minimal medium. Protein spots are labeled with protein names according to the SubtiList database. Protein extracts were separated on 18-cm immobilized pH gradient (IPG) strips and 2-D gels were stained with colloidal Coomassie brilliant blue. Detailed information on each individual protein identified is available at http://microbio2.biologie.uni-greifswald.de:8880/sub2d.htm.

3.2 Proteomics of Bacillus subtilis: The Gram-positive Model Organism

3.2.2 Proteomes of Nongrowing Cells: Proteomic Signatures of Stress/Starvation Stimuli

The transition from a growing to a nongrowing state is accompanied by a dramatic reprogramming of the protein synthesis profile. Vegetative proteins no longer required at high concentrations are no longer synthesized in nongrowing cells, and in some cases are even degraded in a specific manner [8, 9]. Other proteins with specific and nonspecific adaptive functions are strongly induced on a time-dependent scale. On the basis of proteomic studies, three main groups of stress/starvation proteins have been defined [10]. The specific stress proteins induced by a single stress stimulus only protect the cell against this stress stimulus by neutralizing the stress factor, by adaptation to its presence, or by repair of damages caused by stress. Starvation-specific proteins, on the other hand, are produced to overcome the starvation by uptake of the limiting substrate with a very high affinity, by searching for alternate substrates not used in the presence of the preferred one (catabolite repression), or by chemotactic moving to new substrates. In addition to these stress- or starvation-specific proteins, general stress proteins are also induced, in this case not by a specific but by a wide range of unrelated stress or starvation stimuli. These general stress/starvation proteins may provide the nongrowing cell with a multiple stress resistance machine irrespective of what kind of stress or starvation stimuli induced the nongrowing situation [4, 10]. The single stimulons can be visualized by the dual channel imaging technique [11], which is an excellent tool by which to identify all proteins induced or repressed by the growth-restricting stimuli. Two digitized images of 2-D gels have to be generated and combined in alternate additive dual-color channels. The first one (densogram), showing accumulated proteins visualized by various staining techniques, is false-colored green. The second image (autoradiograph), showing the proteins newly synthesized and thereby radioactively labeled during a 5-min 35 pulse with S-l-methionine is false-colored red. When the two images are combined, proteins accumulated and synthesized in growing cells are colored yellow. After the imposition of a stress or starvation stimulus, however, proteins not previously accumulated in the cell but newly synthesized are colored red, because these red proteins have already been radioactively labeled, but are not yet stainable. Looking for the red-colored proteins is a simple approach to finding all those proteins induced by a single stimulus, thereby defining the entire set of proteins induced by one stimulus, called a stimulon [4]. Heat stress, for instance, induces more than 100 red-labeled proteins (Fig. 3.2) [11]. These proteins may be identified to gain an overview of how the cell is protected against heat stress. Proteins repressed by the stimulus can also be visualized by this powerful technique. Green-colored proteins no longer being synthesized (no longer red) but still present in the cell are the candidates repressed by the stimulus. The next step in analyzing the adaptation network is to dissect the stimulons into their individual regulation groups, the regulons. A regulon is a group of genes distributed on the bacterial chromosome but regulated by one global regulator (activator, repressor, alternative r factor, etc.). To compare the proteome/tran-

47

48

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus …

Fig. 3.2 Protein pattern of heat-shocked B. subtilis cells. During exposure to heat shock (37 to 48 C), B. subtilis cells were pulselabeled with 35S-l-methionine. After the separation of crude protein extract by 2-D gels, the proteins were visualized by silver staining and the protein synthesis rate was

determined by phosphoimaging. The two images were overlaid with the aid of the Delta-2D software package (dual-channel imaging technique [11]; Decodon, Greifswald, Germany). The red-labeled proteins form the heat stress stimulon. (This figure also appears with the color plates.)

scriptome profile of a mutant with the wild type under inducing conditions is the state of the art for defining the regulon structure (see Refs. [4, 12] for reviews). Furthermore, the allocation of as yet unknown proteins to stimulons or regulons that have an already known function is a simple, but convincing approach to a first prediction of their function [12]. For instance, unknown proteins induced by oxidative stress and belonging to the PerR regulon will probably be involved in protection against oxidative stress. This dissection of stimulons into individual regulons characterized by a typical induction pattern of the members of the regulon is shown in Fig. 3.3 for the heat stress stimulon and in Fig. 3.4 for the phosphate starvation stimulon as examples (see Ref. [4] for review). All proteomic and transcriptomic data can be assembled into an adaptational network consisting of a

3.2 Proteomics of Bacillus subtilis: The Gram-positive Model Organism

great number of stress/starvation stimulons and regulons. This network provides comprehensive information on the functional genomics and physiology of the stress/starvation response of bacteria. An essential feature of this network is the interplay between the individual regulation groups, because the individual stress/ starvation regulons do not exist independently of each other, but are tightly connected, forming the adaptational network. “Color coding” is a new software tool developed by Decodon GmbH that is highly convenient for visualizing complex protein expression patterns [3]. Proteins induced by more than one stimulus can be visualized by a specific color code. The reason for the induction by various stimuli is in many cases that the corresponding gene is controlled by more than one regulator. For instance, the green-colored YvyD (Fig. 3.5) induced by heat, ethB H anol, and oxidative stress is controlled both by r and by r [13]. By applying this technique, already known and probably new overlapping regulons within the adaptational network can be visualized.

Fig. 3.3 The heat stress stimulon of B. subtilis consists of proteins of the HrcA regulon, the CtsR regulon, and the rB regulon. The induction profiles of representative proteins of the respective regulons under various stress conditions are shown: C, control; H, heat; E, ethanol; S, salt; G, glucose starvation; Pm, puromycin; Ox, oxidative stress (H2O2).

49

50

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus …

Fig. 3.4 The phosphate starvation stimulon of B. subtilis consists of both general stress proteins also induced by phosphate starvation (rB regulon, rB-dependent proteins marked in blue) and starvation-specific proteins (PhoR regulon, marked in red). The stress/starvation induction profile of both groups and the functions of the proteins

are indicated. c, control; h, heat stress; e, ethanol stress; s, NaCl stress; a, pH 5.5; o, oxidative stress (H2O2); p, puromycin; g, glucose starvation; O2, oxygen starvation; ph, phosphate starvation; aa, amino acids starvation. (This figure also appears with the color plates.)

The pictures shown so far have given a snapshot of a single moment in the life of a B. subtilis cell. This life, however, is not static but highly dynamic, with sequential gene expression programs in a time-dependent manner as an essential feature. If one assembles such proteomic pictures through time, growth and developmental processes can be followed at the molecular level as in a “life movie,” as shown in Fig. 3.6 for growing cells that entered the stationary phase because of glucose starvation. Using the same color code as already described, synthesized (red) and accumulated (green) proteins were followed along the growth curve. There is a total reprogramming of the protein synthesis pattern when cells enter the stationary growth phase. Many vegetative proteins no longer required in nongrowing cells and no longer synthesized change their color from yellow to green. Many proteins on the other hand, not yet accumulated and stain-

3.2 Proteomics of Bacillus subtilis: The Gram-positive Model Organism

Fig. 3.5 Multicolor imaging of expression patterns under different growth conditions in B. subtilis. Delta-2D software (Decodon) was used to visualize complex protein expression patterns on the 2-D gel image in the standard pH range 4–7. The color code is presented in the top left corner. Proteins only induced by single stresses are colored red (H, heat), light-blue (O, oxidative stress), and orange

(E, ethanol), respectively. Proteins induced by oxidative stress as well as heat stress are colored yellow, proteins induced by ethanol stress and heat stress are colored dark-blue, proteins induced by oxidative and ethanol stress can be recognized as purple spots, and, finally, proteins induced by all three stimuli are displayed in green. (This figure also appears with the color plates.)

able, but strongly induced by glucose starvation, are colored red, forming the glucose starvation stimulon. Collecting all these proteome data combined with basic physiological knowledge, one can gain a comprehensive picture what is happening in the cell, understanding cell physiology as a entity. The proteomic signatures of stress or starvation stimuli [14] can be used as diagnostic tools for prediction of the physiological state of cells grown in a bioreactor or in a biofilm, etc. This is valid not only for growing cells (proteome signatures indicate the availability of nutrients for instance), but also for nongrowing cells. On the basis of the proteomic signatures, one can predict whether nongrowing cells have suffered from heat or oxidative stress or from glucose or phosphate starvation (Fig. 3.7, see page 54). During the stationary phase of B. licheniformis cells there is a strong proteomic signature of oxidative stress (KatA, AhpC etc.), followed by a signature of protein stress, indicated by the induction of ClpC [15]. Proteomic signatures can also be used to predict the action mechanism of unknown drugs. Nitrofurantoin, for instance, an antibiotic that has been in use for a long time, induces almost the same signature as diamide does, strongly suggesting

51

52

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus …

3.3 Physiological Proteomics of Staphylococcus aureus 3 Fig. 3.6 Dynamics of protein synthesis profiles of growing and glucose-starved cells of B. subtilis. A Individual dual-channel 2-D patterns of protein synthesis and accumulation recorded during the different phases of the growth curve are assembled into a “life movie.” B Growth curve (optical density at 500 nm) and 35S-l-methionine incorporation (million cpm per 60 lg protein). C Patterns of selected examples representing different

branches of cellular physiology. Sample points correspond to the following growth phases depicted are shown in the growth curve: 1, 2, exponential growth; 3–7, glucose starvation; 8, 9, recovery of growth after readdition of glucose. The bar graphs on the left display normalized relative synthesis rates of the individual proteins at the different time points. (This figure also appears with the color plates.)

that nitrofurantoin induces oxidative stress in the cell [16]. In collaboration with the Bayer company, we have established a comprehensive antibiotics proteomic signature library. By using this library combined with the stress/starvation signature library, it is possible to predict the molecular action mechanisms of unknown drugs. To give two examples: phenyl-thiazolylurea-sulfonamides showed a proteomic signature of stringent control induced by amino acid starvation [17], while substance Bay 50-2369 showed the same signature as chloramphenicol does, strongly suggesting that the peptidyl transferase is the target protein for the substance [18–20].

3.3 Physiological Proteomics of Staphylococcus aureus 3.3.1 The Postgenome Era of S. aureus

With the long experience in Bacillus proteomics, this expertise has been transferred to a closely related pathogenic gram-positive bacterium, Staphylococcus aureus. S. aureus is a human pathogen of increasing importance, mainly as a result of the spread of antibiotic resistance. The pathogenicity of this species is very complex and involves the strongly regulated production of cell-wall-associated and extracellular proteins forming a changing set of virulence factors. Due to the great variety of these proteins, S. aureus causes a broad spectrum of infectious diseases ranging from superficial abscesses to endocarditis, osteomyelitis, and toxic shock syndrome [21]. Methicillin-resistant S. aureus (MRSA) strains are currently predominant and dangerous nosocomial pathogens, since infections caused by these strains have become difficult to treat. Vancomycin has become the drug of choice for treating MRSA infections. However, the emergence of vancomycin-resistant MRSA strains is leading to urgent demands for alternative anti-MRSA therapies and the development of totally new approaches to antibacterial drug research. This is the reason why the specialists are looking forward to the postgenome era

53

54

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus …

3.3 Physiological Proteomics of Staphylococcus aureus 3 Fig. 3.7 Proteomic signatures of B. subtilis of different physiological stress/starved conditions. Comparisons of the protein profile of both exponentially growing and stressed B. subtilis cells reveal signature-like changes that are specific to certain stress stimuli (e.g., induction of the catalase KatA by oxidative

stress). The individual sections of the 2-D gels display typical parts of the proteomic signatures of oxidative or heat stress, the stringent response or limitation of glucose or phosphate. (This figure also appears with the color plates.)

opening great chances to bring new and urgently required antibacterial drugs on the market (see Refs. [22–24] for review). The postgenomic era of S. aureus began in 2001 with the publication of the genome sequence of two reference strains [25]. The S. aureus genome codes for about 2600–2700 proteins, among them about 1000 genes coding for proteins with still unknown functions. The genome sequence allowed the prediction of more than 60 gene regulators with helix–turn–helix motifs (among them are five SarA and three Fur homologues), 17 two-component systems, and two alternative r factors [25]. The complete genome sequence of seven and partial sequence information of another two S. aureus strains are now available in the databases. Preliminary genome data show that the various strains encode a different set of virulence factors or superantigens mostly located on pathogenicity islands or plasmids, which may be the reason for the broad spectrum of infectious caused by the different strains. The KEGG (www.genome.jp/kegg/pathway/map/) database has provided the first, still preliminary information on metabolic pathways encoded in the genome of S. aureus, from the carbon core catabolism including fermentation to the amino acid metabolism and other pathways. The genome sequence information is required for functional genomic approaches; this is valid not only for DNA chip technologies, but also for highthroughput protein identification via mass spectrometry techniques. Proteomics can now be used to bring the genome sequence to the cell physiology of S. aureus, relying on the panorama view of proteomics providing an increasingly complete picture of the cell physiology of growing and nongrowing cells, including a comprehensive and new understanding of its pathogenicity. There is a great deal of information in the published literature about the regulation, structure, and function of various virulence factors, but only limited information about basic cellular physiology. It has become increasingly accepted that this basic cell physiology determines not only growth and survival, but pathogenicity as well. For this reason, much more knowledge about cell physiology is required to understand pathogenicity, which is probably a central point for the better and more successful combating of multiresistant strains and, thus, of various kinds of diseases. Functional genomics opens up the opportunity for a new and comprehensive understanding of the cell physiology and infection biology of the parasite.

55

56

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus …

3.3.2 Proteomes of Growing and Nongrowing Cells

What is valid for B. subtilis is also true for S. aureus: only a part of the genome is active under definite life circumstances, and the proteomes of growing cells will be different from the proteomes of nongrowing cells. First of all a vegetative proteome map has to be established for S. aureus as the basis for physiological studies. A “theoretical proteome map” simply derived from the genome sequence shows two major peaks, a neutral–acid peak and a more alkaline one. Whereas in the alkaline region most of the ribosomal proteins and membrane proteins should be expected from their pI values, the majority of vegetative proteins such as metabolic enzymes are located in the main proteomic window pI 4–7. This main window provides a good starting point for physiological studies. On the way towards the entire proteome, however, we must not ignore that fact that not only alkaline proteins, but also cell-surface-bound or even extracellular proteins have to be considered in addition to the main fractions of cytosolic proteins. Analyzing the majority of intrinsic membrane proteins requires the establishment of gel-free proteomes (see chapter 3.1). Neutral/weakly acid and alkaline proteins of S. aureus strains COL and 8325 from midexponential phase cells growing in tryptone soy broth at 37 C were analyzed by Cordwell et al. [26]. From the 347 protein spots, 266 proteins were identified, corresponding to approximately 12% of the proteome. Recently we provided a proteome map of growing cells of S. aureus COL with 460 entries [27] (Kohler et al., in press) (Fig. 3.8). Our data on the gel-based proteomics of S. aureus are being integrated into a comprehensive proteome database, Staph-2D (http://microbio2. biologie.uni-greifswald.de:8880/sub2d.htm). In a recent study a combined proteomic and transcriptomic approach was used to analyze the gene expression pattern in postexponential cells of S. aureus N315. Five hundred ninety-one proteins, corresponding to 23% of the proteome, were identified by gel-based and gel-free proteomics [28]. Despite the fact that the main metabolic pathways can be derived from the genome sequence, only very limited information about its regulation is available. Many metabolic enzymes have been identified on the proteome map of growing cells; most of the metabolic pathways have been covered by the proteomic approach [27] (Kohler et al., in press), offering the chance to analyze the regulation of entire metabolic routes in the postgenome era (see Fig. 3.9). As in B. subtilis, the regulation of core carbon catabolism is a good model for such studies, because almost all glycolytic or TCA cycle enzymes belonging to the most abundant vegetative proteins have been visualized by proteomics. Glycolysis is activated by glucose excess or by a shift from aerobic to anaerobic conditions that also triggers repression of the TCA cycle and strong induction of fermentation/overflow metabolisms (Fig. 3.10). The pyruvate formate lyase (Pfl) is one of the most abundant enzymes induced by the anaerobic shift. The arginine deiminase pathway providing additional ATP molecules with energy limitation is also activated under anaerobic conditions, but only in the absence of glucose [27]. In addition to meta-

3.3 Physiological Proteomics of Staphylococcus aureus

Fig. 3.8 Reference 2-D map of cytosolic proteins of S. aureus COL in a pI range of 4–7. Protein extracts of cells grown in TSB medium at the exponential and stationary growth phases and under anaerobic conditions were mixed and separated on 2-D gels. Protein spots are labeled with protein names according to the S. aureus N315 database. Protein extracts were stained with colloidal Coomassie brilliant blue.

bolic enzymes, a global regulator, SrrA, known to be involved in anaerobic gene regulation was also induced by the anaerobic shift. These data represent an example of the opportunities to gain information about the regulation of cell physiology by systematic application of physiological proteomics. From a physiological point of view, proteins produced in response to growthrestricting stimuli in their environment are of crucial significance for survival in nature, because stress and starvation are the rule and not the exception in natural ecosystems. The proteomes of nongrowing cells are probably more heterogeneous than proteomes of growing cells, because most of the stress/starvation stimuli induce a great number of stress/starvation proteins organized in an adaptational gene expression network of nongrowing cells. Following a similar approach as for B. subtilis, proteins newly induced or repressed by environmental stimuli can be allocated to stress or starvation stimulons that can be dissected into single regulons analyzing mutants in global regulators by a combined proteomic and transcriptomic approach. Color coding and related techniques can be used to define overlapping areas within the adaptational network.

57

58

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus …

3.3 Physiological Proteomics of Staphylococcus aureus

59

60

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus …

3 Fig. 3.9 Assignment of proteins identified in S. aureus COL to biochemical pathways and other essential cellular components. Proteins that have not been identified in the 2-D gel images thus far are colored green. A Purine and pyrimidine metabolism, B glycolysis, pentose phosphate shunt, and citric acid

Fig. 3.10 Protein pattern of cells of S. aureus COL grown under aerobic (green) and anaerobic (red) conditions in synthetic medium. Cells were pulse-labeled (5 min) with 35S-lmethionine under aerobic conditions and 30 min after imposition of anaerobic growth conditions. Radioactively labeled proteins were visualized by the phosphoimaging

cycle, C oxidative stress resistance, D ATPase components, E proteolysis, F components of the translational machinery, G amino acid metabolism, H fatty acid synthesis and metabolism of cell wall components, and I biotin metabolism. (This figure also appears with the color plates.)

technique. Proteins whose synthesis was increased after shifting to anaerobic growth conditions are shown in red (e.g., enzymes involved in glycolysis and fermentation) and those whose synthesis was decreased are shown in green (e.g., enzymes involved in the TCA cycle). (This figure also appears with the color plates.)

Only a few data are available that demonstrate the power of proteomics for analyzing the stress or starvation responses. Preliminary proteome data have been published on the heat stress response. A temperature shift from 37 C to 48 C induced the production of at least eight proteins, among them GroEL and GroES [29]. Our unpublished data showed mainly the heat induction of the HrcA and CtsR regulon, represented by induction of the GroEL/S- and DnaK machinery as well as of the Clp proteins. In contrast to B. subtilis, the groES/L and dnaK operon are regulated by the two heat shock regulators, HrcA and CtsR. They act together synergistically to maintain low base levels of expression of these operons in the B absence of stress [30]. Surprisingly, the members of the r -dependent response, strongly heat-inducible in B. subtilis, are not induced in heat-stressed cells of S. aureus grown in synthetic medium.

3.3 Physiological Proteomics of Staphylococcus aureus

Hydrogen peroxide treatment induced only a few proteins in S. aureus. Some protein spots that seemed to be newly induced were found as the result of a protein shift to a more acidic position, caused by oxidation of cysteine residues to sulfonic acid (Fig. 3.11) [31]. The glyceraldehyde-3-phosphate dehydrogenase behaved in this way, being inactivated by this irreversible oxidative damage. In parallel with the inactivation of the enzyme there was a drop in the ATP level followed by a cessation of growth. After 40 min adaptation the enzyme “shifted back,” and in parallel the ATP level increased again, followed by resumption of growth. This “enzyme reactivation” was due to a newly synthesized Gap protein. Repair of fully oxidized sulfhydryl groups is probably impossible from an energetic point of view.

Fig. 3.11 Sectors of 2-D gels covering the region where the Gap protein is located. Protein extracts of S. aureus COL before (control) and 5 min after addition of 100 mM H2O2 were separated on 2-D gels. Note the shift of Gap to a more acidic position (Gapox) under

oxidative stress conditions. Both spots (Gapred and Gapox) were analyzed by tandem hybrid mass spectrometry. A modified peptide was found in Gapox. The sequencing of this peptide indicated the oxidation of a cysteinyl residue (Cys-151) to sulfonic acid.

Oxacillin as a cell-wall-active antibiotic induced a proteomic signature that indicated oxidative damage and protein stress [32]. However, the data are still preliminary and do not allow a complete picture as yet. Triton X-100 as another surfaceB active substance induced members of the r and SarA regulons [26]. It is interesting to note that strains sensitive and those resistant to methicillin showed a slightly different proteomic signature in response to Triton X-100 [26]. Proteomic starvation signatures are not yet available, probably because of the difficulty of designing growth media that trigger a well-defined stationary phase caused by phosphate, glucose, or nitrogen starvation. Stationary phase cells showed significant de-repression of the TCA cycle enzymes and repression of the glycolytic enzymes compared to cells grown on glucose excess. Because S. aureus has the ability to invade different tissues, oxygen could be one of the most crucial growth-limiting stimuli forcing adaptational processes against oxygen limitation and starvation. S. aureus can grow under low-oxygen conditions by fermentation or nitrate respiration. A two-component system, SrrAB, which has a strong similarity to the ResDE two-component system of B. subtilis [33], is probably the main

61

62

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus …

global regulator of the aerobic–anaerobic shift of metabolism. Throup et al. [34] provided the first proteomic data indicating that SrrAB (SrhSR) controls the upregulation of fermentation enzymes (lactate dehydrogenase, alcohol dehydrogenase) as well as the downregulation of aerobic TCA cycle enzymes (succinyl-CoA synthetase, aconitase, fumarase). A mutant in srrAB is characterized by a severe growth defect under anaerobic conditions [34]. There are only a few proteomic data available on general stress/starvation B responses (see Refs. [10, 35] for review). The r -dependent response seems to be totally different from the general stress response of B. subtilis, in relation not only B to the signal transduction pathway of environmental stimuli to r , but also the B genes controlled by r [36, 37] (Pan-Farr et al., submitted). Recent data suggest B that the r regulon does not belong to the heat stress stimulon and only a minority of genes are involved in the development of a nonspecific multiple stress resistance, the main feature of the general stress response in Bacillus species. Proteomic data in combination with more complete transcriptomic data allowed this surprising conclusion [36, 37] (Pan-Farr et al., submitted). These transcriptomic B results support recent data that r -dependent proteins in S. aureus are directly involved in the virulence in a catheter infection model (Lorenz et al., submitted). Proteomic information on other general stress or starvation responses such as the RelA-dependent stringent control or the probably related CodY regulon is still lacking. 3.3.3 Extracellular Proteins and Pathogenicity Networks

For the purposes of defining the structure of stimulons and regulons, transcriptomics provides a more complete picture than does proteomics, because – as already mentioned – many proteins still escape detection by gel-based proteomics. In many areas of application, however, proteomics can not be replaced by transcriptomics, because proteomics can also visualize events that never have been seen before by transcriptomics. Protein secretion is one such field of application that is crucial for Staphylococcus biology because of most of the virulence factors having signal sequences belong to the secretome. Using the dual channel technique it is possible to follow the protein secretion kinetics along the growth curve (Fig. 3.12). There are some proteins (e.g., SsaA, IsaA, Aly, Spa, SceD, LytM, and Aur) that are synthesized and secreted during exponential growth, but most are secreted only during the transient or even the stationary phase of growth [38, 39]. SarA, RNAIII, SaeR, ArlR, and others are key regulators involved in the expression of the majority of virulence genes whose products are required for host cell adhesion, for tissue invasion, or for tissue damage. The expression of these virulence proteins is regulated in a coordinated way, ensuring the expression of cellsurface-associated proteins during growth in a first step, followed by the expression of extracellular virulence proteins after settlement in the host in a second step [39–41]. Proteomics is again a highly sophisticated strategy by which to define the entire set of extracellular proteins controlled by these global virulence regula-

3.3 Physiological Proteomics of Staphylococcus aureus

Fig. 3.12 The extracellular proteome of S. aureus RN6390 at low (green image) and high cell densities (red image). 250 lg proteins of the supernatant of cells grown in TSB medium at an optical density OD540 = 1 and 5 were separated by 2-D gels and stained with Sypro

Ruby. Extracellular proteins present in increased amounts at high cell densities are labeled red and those proteins only present at low cell densities are labeled green. (This figure also appears with the color plates.)

tors. The differential proteomics display can also be used to visualize the entire set of extracellular proteins in the wild type compared to mutants. On the basis of a comparison between the secretomes of a wild type and its corresponding mutant strains, many extracellular proteins have been allocated to the individual regulons, showing both already known and also new members of the regulons [38, 39]. Figure 3.13 gives an overview of the structure of a few selected virulence regulons. The definite function of still unknown members of the virulence regulon, however, needs to be verified in specific follow-up studies analyzing the expression and role of the protein in infection model systems. The proteomic approach is also a useful tool for analyzing the extracellular protein pattern of different clinical isolates which may help to correlate the individual diseases with the gene expression and protein secretion pattern [42]. Recent studies indicate that this pathogenicity network is not confined to the interactions between SarA, agr, SaeR, B ArlR, or r . Many more global regulators seem to be encoded in the genome sequence [41]. A combined application of transcriptomics and proteomics of the

63

64

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus …

Fig. 3.13 A Growth of S. aureus RN6390 in TSB-Medium. The sampling is indicated by an arrow and a letter in the respective growth curve. B, C Virulence factors of S. aureus RN6390 whose amount depends on the growth phase. The amount of the respective proteins at OD540 = 1 (green) of cells grown in TSB medium was compared with the amount of these proteins at higher optical densities

(red). B Virulence factors only present at low cell densities. C Virulence factors only present at high cell densities. In addition, the amount of the respective proteins in the wild type strain was compared to the amount of these proteins in various regulatory mutants (agr, sarA, sigB) known to be impaired in virulence. Proteins were stained with Sypro Ruby. (This figure also appears with the color plates.)

wild type and mutants in global regulators – transcriptomics to visualize the gene expression pattern and proteomics to visualize the protein secretion profile – might be the state-of-the-art approach to defining the expression of virulence genes, which is a very complex gene expression network. This network consists of many overlapping regulons expressed in a time-controlled manner to ensure the optimal pattern and level of the individual virulence factors at the different locations in the host. Infection models in combination with DNA array studies must be used to visualize the gene expression pattern in the host cell and in the para-

3.4 Outlook: Second Generation Proteomics and New Fields in S. aureus Physiology …

site, aiming at a more comprehensive picture of what is happening on both sides during the infection process.

3.4 Outlook: Second Generation Proteomics and New Fields in S. aureus Physiology and Infection Biology

The panorama view of proteomics visualizes cellular events never seen before. Not just a few interesting proteins, but almost all proteins of the cell can be followed by proteomics. The challenge now is not to get lost in the mass of data and not to remain on the surface of the problem, but to leave the first, descriptive level which is absolutely required for a comprehensive and mechanistic understanding of the phenomena to be studied. The panorama view of proteomics, however, allows selection of the most interesting phenomena that deserve more detailed study. To understand life processes, more detailed first-level proteomics (protein expression profiling) must be combined with second-level proteomics in combination with biochemistry and molecular genetics. This second-level proteomics allows crucial questions to be addressed, such as analyzing: . The fate of each individual protein (protein targeting/protein secretion) . The protein interaction network (interactome) . Post-translational modifications such as protein phosphorylation at the proteomic scale . Protein aging or protein damage at the proteomic scale Finally, it also . Allows researchers to follow the stability or proteolysis rate of each individual protein separated by 2-D PAGE These proteomic studies of the second generation require gel-based and non-gelbased proteomics relying on multidimensional chromatography followed by highly sophisticated MS/MS techniques. This proteomics of the second generation will in the future open new and fascinating fields of S. aureus physiology not yet addressed in a systematic way, an essential part and great challenge for S. aureus cell physiology in the postgenome era. The combination of first- and secondlevel proteomics with comparative genomics, structural genomics, transcriptomics, metabolomics, and bioinformatics will open the way towards a comprehensive understanding of life, ending up in a systems biology approach to S. aureus in the near future.

65

66

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus …

Acknowledgments

We are very grateful to all coworkers and students for their excellent data on the proteomics of Bacillus and Staphylococcus species. The support of Jrg Bernhardt and Stephan Fuchs in preparing the figures is acknowledged. We also thank Uwe Vlker, Jrg Hacker, Wilma Ziebuhr, Knut Ohlsen, Christof von Eiff, and Fritz Gtz for longstanding and fruitful collaboration, and Decodon GmbH (Greifswald) for providing Delta-2D software. This work was supported by grants from the Bundesministerium fr Bildung und Forschung (031U107A/-207A; 031U213B), the Deutsche Forschungsgemeinschaft (GK212/3-00; HE1887/7-1; HE1887/7-2; HE1887/6-5; HE1887/6-6), the European Union (Bacell factory QLK3-CT-1999-00413; Bacell network GLG2-CT-1999-01455) and the Fonds der Chemischen Industrie to M. H.

References 1 Fleischmann, R. D., M. D. Adams,

O. White, R. A. Clayton, E. F. Kirkness, A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, J. M. Merrick, et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512. 2 O’Farrell, P. H. 1975. High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem. 250:4007–4021. 3 Hecker, M. and U. Vlker. 2004. Towards a comprehensive understanding of Bacillus subtilis cell physiology by physiological proteomics. Proteomics 4:3727–3750. 4 Hecker, M. 2003. A proteomic view of cell physiology of Bacillus subtilis – bringing the genome sequence to life. Adv. Biochem. Eng. Biotechnol. 83:57– 92. 5 Eymann, C., A. Dreisbach, D. Albrecht, J. Bernhardt, D. Becher, S. Gentner, L. T. Tam, K. Bttner, G. Buurman, C. Scharf, S. Venz, U. Vlker, and M. Hecker. 2004. A comprehensive proteome map of growing Bacillus subtilis cells. Proteomics 4:2849–2876. 6 Ludwig, H., G. Homuth, M. Schmalisch, F. M. Dyka, M. Hecker, and J. Stlke. 2001. Transcription of glycolytic genes and operons in Bacillus subtilis: evidence for the presence of multiple

levels of control of the gapA operon. Mol. Microbiol. 41:409–422. 7 Tobisch, S., D. Zhlke, J. Bernhardt, J. Stlke, and M. Hecker. 1999. Role of CcpA in regulation of the central pathways of carbon catabolism in Bacillus subtilis. J. Bacteriol. 181:6996–7004. 8 Bernhardt, J., J. Weibezahn, C. Scharf, and M. Hecker. 2003. Bacillus subtilis during feast and famine: visualization of the overall regulation of protein synthesis during glucose starvation by proteome analysis. Genome Res. 13:224– 237. 9 Kock, H., U. Gerth, and M. Hecker. 2004. MurAA, catalysing the first committed step in peptidoglycan biosynthesis, is a target of Clp-dependent proteolysis in Bacillus subtilis. Mol. Microbiol. 51:1087–1102. 10 Hecker, M. and U. Vlker. 2001. General stress response of Bacillus subtilis and other bacteria. Adv. Microb. Physiol. 44:35–91. 11 Bernhardt, J., K. Bttner, C. Scharf, and M. Hecker. 1999. Dual channel imaging of two-dimensional electropherograms in Bacillus subtilis. Electrophoresis 20:2225–2240. 12 Hecker, M. and S. Engelmann. 2000. Proteomics, DNA arrays and the analysis of still unknown regulons and

References unknown proteins of Bacillus subtilis and pathogenic gram-positive bacteria. Int. J. Med. Microbiol. 290:123–134. 13 Drzewiecki, K., C. Eymann, G. Mittenhuber, and M. Hecker. 1998. The yvyD gene of Bacillus subtilis is under dual control of sigmaB and sigmaH. J. Bacteriol. 180:6674–6680. 14 VanBogelen, R. A. 2003. Probing the molecular physiology of the microbial organism, Escherichia coli using proteomics. Adv. Biochem. Eng. Biotechnol. 83:27–55. 15 Voigt, B., T. Schweder, D. Becher, A. Ehrenreich, G. Gottschalk, J. Feesche, K. H. Maurer, and M. Hecker. 2004. A proteomic view of cell physiology of Bacillus licheniformis. Proteomics 4:1465–1490. 16 Leichert, L. I., C. Scharf, and M. Hecker. 2003. Global characterization of disulfide stress in Bacillus subtilis. J. Bacteriol. 185:1967–1975. 17 Beyer, D., H. P. Kroll, R. Endermann, G. Schiffer, S. Siegel, M. Bauser, J. Pohlmann, M. Brands, K. Ziegelbauer, D. Haebich, C. Eymann, and H. BrtzOesterhelt. 2004. New class of bacterial phenylalanyl-tRNA synthetase inhibitors with high potency and broad-spectrum activity. Antimicrob. Agents Chemother. 48:525–532. 18 Bandow, J. E., H. Brtz, L. I. Leichert, H. Labischinski, and M. Hecker. 2003. Proteomic approach to understanding antibiotic action. Antimicrob. Agents Chemother. 47:948–955. 19 Freiberg, C., H. Brtz-Oesterhelt, and H. Labischinski. 2004. The impact of transcriptome and proteome analyses on antibiotic drug discovery. Curr. Opin. Microbiol. 7:451–459. 20 Sender, U., J. Bandow, S. Engelmann, U. Lindequist, and M. Hecker. 2004. Proteomic signatures for daunomycin and adriamycin in Bacillus subtilis. Pharmazie 59:65–70. 21 Crossley, K. B. and Archer L. A. 1997. The staphylococci in human disease. Churchill Livingstone, New York. 22 Chan, P. F., R. Macarron, D. J. Payne, M. Zalacain, and D. J. Holmes. 2002. Novel antibacterials: a genomic

approach to drug discovery. Curr. Drug Targets Infect. Disord. 2:291–308. 23 Blondelle, S. E. and R. A. Houghten. 1996. Novel antimicrobial compounds identified using synthetic combinatorial library technology. Trends Biotechnol. 14:60–65. 24 Rachakonda, S. and L. Cartee. 2004. Challenges in antimicrobial drug discovery and the potential of nucleoside antibiotics. Curr. Med. Chem. 11:775– 793. 25 Kuroda, M., T. Ohta, I. Uchiyama, T. Baba, H. Yuzawa, I. Kobayashi, L. Cui, A. Oguchi, K. Aoki, Y. Nagai, J. Lian, T. Ito, M. Kanamori, H. Matsumaru, A. Maruyama, H. Murakami, A. Hosoyama, Y. Mizutani-Ui, N. K. Takahashi, T. Sawano, R. Inoue, C. Kaito, K. Sekimizu, H. Hirakawa, S. Kuhara, S. Goto, J. Yabuzaki, M. Kanehisa, A. Yamashita, K. Oshima, K. Furuya, C. Yoshino, T. Shiba, M. Hattori, N. Ogasawara, H. Hayashi, and K. Hiramatsu. 2001. Whole genome sequencing of meticillin-resistant Staphylococcus aureus. Lancet 357:1225–1240. 26 Cordwell, S. J., M. R. Larsen, R. T. Cole, and B. J. Walsh. 2002. Comparative proteomics of Staphylococcus aureus and the response of methicillin-resistant and methicillin-sensitive strains to Triton X100. Microbiology 148:2765–2781. 27 Kohler, C., C. von Eiff, G. Peters, R. A. Proctor, M. Hecker, and S. Engelmann. 2003. Physiological characterization of a heme-deficient mutant of Staphylococcus aureus by a proteomic approach. J. Bacteriol. 185:6928–6937. 28 Scherl, A., P. Francois, M. Bento, J. M. Deshusses, Y. Charbonnier, V. Converset, A. Huyghe, N. Walter, C. Hoogland, R. D. Appel, J. C. Sanchez, C. G. Zimmermann-Ivol, G. L. Corthals, D. F. Hochstrasser, and J. Schrenzel. 2005. Correlation of proteomic and transcriptomic profiles of Staphylococcus aureus during the post-exponential phase of growth. J. Microbiol. Methods 60:247– 257. 29 Ohta, T., K. Honda, M. Kuroda, K. Saito, and H. Hayashi. 1993. Molecular characterization of the gene operon of heat shock proteins HSP60 and HSP10 in

67

68

3 Physiological Proteomics of Bacillus subtilis and Staphylococcus aureus … methicillin-resistant Staphylococcus aureus. Biochem. Biophys. Res. Commun. 193:730–737. 30 Chastanet, A., J. Fert, and T. Msadek. 2003. Comparative genomics reveal novel heat shock regulatory mechanisms in Staphylococcus aureus and other Gram-positive bacteria. Mol. Microbiol. 47:1061–1073. 31 Weber, H., S. Engelmann, D. Becher, and M. Hecker. 2004. Oxidative stress triggers thiol oxidation in the glyceraldehyde-3-phosphate dehydrogenase of Staphylococcus aureus. Mol. Microbiol. 52:133–140. 32 Singh, V. K., R. K. Jayaswal, and B. J. Wilkinson. 2001. Cell wall-active antibiotic induced proteins of Staphylococcus aureus identified using a proteomic approach. FEMS Microbiol. Lett. 199:79–84. 33 Nakano, M. M., P. Zuber, P. Glaser, A. Danchin, and F. M. Hulett. 1996. Two-component regulatory proteins ResD-ResE are required for transcriptional activation of fnr upon oxygen limitation in Bacillus subtilis. J. Bacteriol. 178:3796–3802. 34 Throup, J. P., F. Zappacosta, R. D. Lunsford, R. S. Annan, S. A. Carr, J. T. Lonsdale, A. P. Bryant, D. McDevitt, M. Rosenberg, and M. K. Burnham. 2001. The srhSR gene pair from Staphylococcus aureus: genomic and proteomic approaches to the identification and characterization of gene function. Biochemistry 40:10392–10401. 35 Hecker, M. , S. Engelmann, and S. J. Cordwell. 2003. Proteomics of Staphylococcus aureus – current state and future challenges. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 787:179–195.

36 Bischoff, M., P. Dunman, J. Kormanec,

37

38

39

40

41

42

D. Macapagal, E. Murphy, W. Mounts, B. Berger-Bchi, and S. Projan. 2004. Microarray-based analysis of the Staphylococcus aureus sigmaB regulon. J. Bacteriol. 186:4085–4099. Gertz, S., S. Engelmann, R. Schmid, A. K. Ziebandt, K. Tischer, C. Scharf, J. Hacker, and M. Hecker. 2000. Characterization of the sigma(B) regulon in Staphylococcus aureus. J. Bacteriol. 182:6983–6991. Ziebandt, A. K., H. Weber, J. Rudolph, R. Schmid, D. Hper, S. Engelmann, and M. Hecker. 2001. Extracellular proteins of Staphylococcus aureus and the role of SarA and sigma B. Proteomics 1:480–493. Ziebandt, A. K., D. Becher, K. Ohlsen, J. Hacker, M. Hecker, and S. Engelmann. 2004. The influence of agr and sigma(B) in growth phase dependent regulation of virulence factors in Staphylococcus aureus. Proteomics 4:3034– 3047. Dunman, P. M., E. Murphy, S. Haney, D. Palacios, G. Tucker-Kellogg, S. Wu, E. L. Brown, R. J. Zagursky, D. Shlaes, and S. J. Projan. 2001. Transcription profiling-based identification of Staphylococcus aureus genes regulated by the agr and/or sarA loci. J. Bacteriol. 183:7341–7353. Novick, R. P. 2003. Autoinduction and signal transduction in the regulation of staphylococcal virulence. Mol. Microbiol. 48:1429–1449. Vytvytska, O., E. Nagy, M. Bluggel, H. E. Meyer, R. Kurzbauer, L. A. Huber, and C. S. Klade. 2002. Identification of vaccine candidate antigens of Staphylococcus aureus by serological proteome analysis. Proteomics 2:580–590.

69

4 Impact of Genome Sequences on Mutational Analysis of Fungal and Bacterial Pathogens Vladimir Pelicic and Xavier Nassif

4.1 The Long Road from Sequence to Function

Less than ten years after the publication of the first complete nucleotide sequence of a free-living organism, that of Haemophilus influenzae Rd [1], more than 220 complete genomes have been sequenced and made available to the public (see Genomes OnLine Database at http://www.genomesonline.org), which has radically changed our way of thinking and doing biology. This apparently endless expansion, with several hundred additional genome projects currently underway (http://www.genomesonline.org), is rapidly leading to the identification of the entire repertoires of genes for almost all the known pathogenic microbes affecting mankind. This sequencing frenzy is boosted by the common belief that genome sequences will be the key in a future, hopefully at hand, to the rational design of novel antimicrobial therapies which could have a fundamental impact on health care in this century [2]. This is expected to be the consequence of a better understanding of a wide range of bacterial and fungal biological processes, including the mechanisms of bacterial and fungal pathogenesis, which will result from the identification of the genes that contribute to these processes and the functional characterization of the corresponding proteins. However, this hope relies on our ability to convert rough sequence data into polished biological information, which promises to be a fantastic challenge since, more often than not, bioinformatic analysis of the sequence actually provides incomplete information, if any, as to gene function. Indeed, in most of the sequenced genomes, approximately 40% of the open reading frames (ORFs) could not be assigned any potential function, either because they are unique to the studied microbe or because they are similar to other genes whose function is also unknown. Moreover, the majority of the functions that could be predicted rely mainly, if not entirely, on circumstantial evidence which, no matter how plausible, needs to be experimentally validated – or, for that matter, invalidated. Studying, amongst other things, gene or protein expression, protein localization, or protein–protein interactions may generate information relevant to the mechanisms of bacterial and fungal pathogenesis [3]. However, the resulting data

70

4 Impact of Genome Sequences on Mutational Analysis of Fungal and Bacterial Pathogens

– e.g., a list of genes transcriptionally active during infection – always need to be subsequently tested by mutagenesis, since the role of a gene in a defined virulence trait becomes actual only after it has been demonstrated by mutational analysis. Therefore, the most direct and efficient path to gene function remains to directly define the phenotypic alterations resulting from gene mutation, which can be done on a gene or a genome scale, as discussed below.

4.2 Classical Genetics Still at the Forefront in the Postgenome Era

Classically, the most popular and successful approaches for identifying microbial virulence determinants have been directed and random mutagenesis [3]. Despite their apparent lack of “hype” compared to the genome-scale functional approaches presented later, these methods have all but disappeared with the advent of the genomic era. On the contrary, both approaches have been greatly facilitated by the availability of complete genomic sequences and will undoubtedly remain important tools for determining gene function. 4.2.1 Reverse Genetics

In reverse genetics, the functional allele encoding a protein suspected of playing a role in the infectious process is replaced by a disrupted or deleted copy by exploiting the homologous recombination properties of the cell. The virulence of the resulting mutants is then compared to the virulence of the wild-type strain using suitable in vitro or in vivo models of infection [3]. If the mutant is significantly less virulent than the wild-type strain, the mutated gene is deemed to play a role in infection. In the pregenome era, the identification of potential virulence factors by bioinformatic methods was hindered by the scarcity of available sequence data. Instead, labor-intensive but elegant techniques were often required to identify a handful of candidate genes, whatever the pathogen studied. Well-characterized virulence factors from a particular pathogenic microbe were often used to design degenerate primers in order to assess the presence of a related sequence by polymerase chain reaction (PCR) in the species under scrutiny, as for example in the case of the identification of the urease genes in Mycobacterium tuberculosis [4]. Alternatively, subtractive hybridization was used to identify the regions of difference between a pathogenic strain and a nonpathogenic relative, such as M. bovis and the vaccine strain M. bovis BCG [5], or two relatives with different pathogenic properties, such as Neisseria meningitidis and N. gonorrhoeae [6], which often contained genes that play an active role in the pathogenic process. Now that hundreds of genomes are or will be shortly available, most of the above techniques have become obsolete and have been advantageously replaced by a personal computer, which increasingly tends to become one of the researcher’s best friends. Indeed, all of the genetic differences between any two sequenced strains or the genes

4.2 Classical Genetics Still at the Forefront in the Postgenome Era

homologous to characterized virulence factors can be identified in silico in a matter of minutes. Moreover, it is possible to identify genes presenting defined subtle sequence characteristics which could not be readily identified by classical methods. For example, Hood et al. screened the genome sequence of H. influenzae for tandemly iterated nucleotides associated with putative ORFs [7], because the loss or gain of nucleotide repeats is known to mediate phase variation of surfaceexposed factors that are often involved in pathogenicity. One of the identified proteins, a predicted glycosyltransferase, whose phase-variable expression was associated with the phenotypic switching of a lipopolysaccharide epitope, was thus found to be necessary for virulence in vivo. In another elegant study, Coutte et al. screened the genome of Bordetella pertussis for ORFs potentially encoding proteases [8], with the idea that one of them might be involved in the maturation of B. pertussis major adhesin, the filamentous hemagglutinin (FHA). The identified candidates were all mutagenized and analyzed for possibly impaired FHA maturation. This confirmed that SphB1, a subtilisin-like protein, specifically cleaves FHA and is therefore the first example of a specialized maturation protease serving in a secretion pathway of a gram-negative bacterium. In conclusion, the ongoing accumulation of fantastic amounts of biological information as a result of bioinformatic analysis of complete genomes and studies using high-throughput analysis methods to monitor gene expression and protein expression on a genome scale, ensures that directed mutagenesis will remain a favorite in the researcher’s toolbox for assessing gene function on small-scale basis – up to a dozen genes, for example. 4.2.2 Transposon Mutagenesis

Most of our current understanding of the mechanisms of microbial pathogenesis comes from studies where scores of mutants, most often randomly generated by transposon mutagenesis, were monitored in suitable models of infection in search of those presenting reduced virulence [3]. For example, the individual screening of 9516 transposon mutants of Salmonella typhimurium in a macrophage culture assay led to the identification of 115 candidates with a diminished capacity for intracellular survival, of which 83 were also less virulent in vivo [9]. For example, the loss of virulence of one of the mutants, which was mutated in the phoP gene, was subsequently shown to be the result of its reduced resistance to defensins, an important microbicidal mechanism of the phagocyte [10]. The main advantages of this approach over directed mutagenesis are that (a) no a priori assumptions (which are often misleading) have to be made about the identity of the genes important for virulence, and (b) large number of mutants are generated and analyzed, which provides plenty of functional information, at a much faster pace. This throughput can even be improved by the use of transposons engineered to contain unique DNA tags, a procedure known as signature-tagged mutagenesis [11], which allowed large number of mutants to be screened simultaneously in vivo for pathogenic microbes as diverse as Staphylococcus aureus, Legionella pneu-

71

72

4 Impact of Genome Sequences on Mutational Analysis of Fungal and Bacterial Pathogens

mophila, Aspergillus fumigatus, or M. tuberculosis [12]. This was previously unfeasible due to the excessively high numbers of animals that would have been necessary and therefore represents an important improvement. For example, 100 infant mice were sufficient to screen 9600 mutants of the intestinal pathogen Vibrio cholerae for defects in the colonization of the small intestine [13]. This led to the identification of 251 attenuated mutants that contained mutations in known colonization factors, as well as in genes whose role in colonization was not previously appreciated. Another study performed in M. tuberculosis led to the identification of mutants presenting impaired replication in mouse lungs that harbored several transposon insertions in a cluster of genes [14]. Interestingly, these genes were subsequently shown to be involved in the synthesis and transport of a complex cell-wall-associated lipid, which is solely required for growth in the lungs. Recently, even more processive screening methods were described, such as TraSH (transposon site hybridization), where the mutants within a pool are identified by hybridizing RNA probes corresponding to the chromosomal sequences immediately adjacent to each transposon insertion site, to a microarray containing DNA fragments specific for every predicted ORF [15]. TraSH was used to monitor simultaneously the behavior of as many as 100 000 transposon mutants of M. tuberculosis in a mouse model of infection [16], which led to the identification of 194 genes important for mycobacterial growth in vivo. In general, this approach has contributed immensely to our understanding of the mechanisms of microbial pathogenesis, and its potential has been further increased by the availability of entire genomes, even though this is less apparent than for reverse genetics. For example, it is now much easier to identify the interrupted genes in most of the selected mutants, which was a laborious and impractical task in the pregenome era. Moreover, it is possible to estimate, following a Poisson distribution, how saturating a library of mutants might be, i.e., how many genes are likely to have been mutated and analyzed, which helps to give an idea of the exhaustiveness of the functional analyses that are performed.

4.3 Genome-Scale Mutational Analyses

The fundamental limitation of the above mutational approaches is that, no matter how large in scale they are, they can never be actually exhaustive since the collections of mutants are undefined and always incomplete. By identifying all the genes that constitute an organism, genome sequences provided the missing information for the construction of a comprehensive collection of mutants, which would be suitable for actually assessing gene function on a global scale. For this reason, although the creation of a comprehensive arrayed collection of mutants remains a daunting challenge, the potential of such collections for the identification of all the genes involved in a defined biological process and, ultimately, the exact contribution of each gene to microbial life [17] have prompted several pro-

4.3 Genome-Scale Mutational Analyses

jects for the construction of these toolboxes based on approaches initially developed in the model organism Saccharomyces cerevisiae. 4.3.1 Saccharomyces cerevisiae

The first microorganism to be domesticated by humans, one of the first to be sequenced, S. cerevisiae (baker’s yeast) remains a forerunner in the postgenomic era. It was used to demonstrate the feasibility of constructing archived genomewide collections of defined mutants. The two mutagenesis methods described above, random and directed mutagenesis, were both used to construct genomewide collection of yeast mutants. In the first approach, Ross-MacDonald et al. used a multipurpose mini-Tn3 derivative to randomly mutagenize, within Escherichia coli, a library of S. cerevisiae genomic DNA [18]. In addition to the selectable markers for yeast and bacteria, the mini-transposon also contained a promoterless lacZ reporter gene and a hemagglutinin epitope tag, allowing thus analysis of gene expression and protein localization. In a first step, E. coli strains containing mutant plasmids, i.e., harboring transposon insertions, were selected and stored individually in 96-well plates. In a second step, 92 544 plasmids were prepared from these strains, digested, and transformed in a diploid yeast strain where the interrupted fragments integrated at their corresponding genomic loci by homologous recombination. To enrich in yeast transformants containing transposon insertions within ORFs, 11 232 strains containing lacZ fusions expressed during vegetative growth were selected for further analysis and arrayed in a 96-well format. The precise location of the mini-transposon in 6358 strains was determined by sequencing the corresponding plasmid-borne insertion alleles, which indicated that insertions affected 1917 different ORFs (31% of yeast’s 6200 ORFs), a large number of which were previously nonannotated. In the second approach, systematic deletion of every S. cerevisiae ORF was started via targeted mutagenesis [19], an effort underpinned by yeast’s highly efficient homologous recombination. In brief, short regions of homology (45 bp) immediately upstream and downstream each of the 6200 ORFs were placed, together with unique DNA barcodes (allowing the simultaneous screening of large number of mutants similarly to signaturetagged mutagenesis), at both ends of a suitable selectable marker through a twostep PCR methodology. A consortium of yeast laboratories in Europe and North America then used the corresponding PCR products to transform the yeast in a 96-well format and to create start-to-stop codon gene deletion mutants, which were individually arrayed and centralized. In a preliminary report, deletion alleles were constructed for 2026 ORFs [19]. To date, the Saccharomyces Genome Deletion Project consortium has deleted 96% of all annotated ORFs, including nearly all ORFs larger than 100 codons (http://sequence-www.stanford.edu/group/yeast/ yeast_deletion_project/). Deletion alleles were used to generate, when possible, four different strains: haploids of both mating types, and both heterozygous and homozygous diploids, which helped demonstrate that 18.7% of yeast’s genes are

73

74

4 Impact of Genome Sequences on Mutational Analysis of Fungal and Bacterial Pathogens

essential for viability since deletion alleles could not be recovered in haploid strains. The value of these toolboxes for gene function identification, which is now undisputed, was first validated in the above original studies. The simultaneous assay of 558 homozygous deletion strains – a subset of the gene deletion mutants collection – for growth in rich and minimal media demonstrated that reliable quantitative fitness data can be simultaneously obtained under various conditions for large number of mutants [19], as was later abundantly documented. Similarly, 7680 transposon insertion haploid strains were scored, using phenotypic macroarrays, for 20 different phenotypes, which confirmed or provided novel functional information for 407 genes [18]. In addition, protein localization was analyzed in 1340 diploid strains carrying in-frame hemagglutinin epitope tag insertions [18], illustrating the extreme versatility of this approach. Globally, these mutant collections, which can be easily obtained by any researcher, have encouraged an unparalleled blossoming of genome-wide studies that have generated a plethora of functional information, a comprehensive review of which is clearly beyond the scope of this article. However, a mere 5 years of use led to significant achievements, since more than 100 experimental conditions were assayed and more than 5000 phenotypic traits assigned to yeast genes [20]. Interestingly, as predicted, some data may actually have important therapeutic implications. For example, a study of the effects of as many as 78 commercially available compounds on pools of 3503 heterozygous deletion strains [21] provided, among other things, a clue to the cholesterol-lowering side-effect of molsidomine, a potent vasodilator used for the past 20 years to treat angina. It was found that a metabolite of molsidomine inhibits lanosterol synthase, an enzyme essential for sterol synthesis that may therefore be a safe target for the development of new cholesterol-lowering drugs. The production of a global map of gene function – an utterly fantastic goal – is therefore becoming increasingly feasible in yeast, and has been scheduled by some for the beginning of 2007 [22]. Both genome-wide mutagenesis strategies have advantages and drawbacks [17]. Random mutagenesis and sequencing of the transposon insertion sites can rapidly lead to the construction of a large collection of mutants, which can immediately be used to generate a wealth of functional information. Moreover, it is relatively cost-effective and is therefore suitable for single laboratories’ efforts. However, although near-saturation mutagenesis of the genome can be obtained in extremely large libraries, transposon mutagenesis generates a lot of redundancy, some genes being mutated multiple times due to their size and to characteristics inherent in the transposon itself, and exhaustiveness cannot be attained. On the other hand, systematic targeted mutagenesis of every gene is restricted to organisms with a high rate of recombination; it is laborious and expensive, and is therefore best accomplished by a consortium of laboratories. However, it can result in a collection of mutants that is actually comprehensive.

4.3 Genome-Scale Mutational Analyses

4.3.2 Bacterial Workhorses: E. coli and Bacillus subtilis

Due to their undisputed position as model organisms in the bacterial world, the construction of genome-wide collections of mutants by systematic targeted mutagenesis of every gene has been attempted only in E. coli and Bacillus subtilis [23, 24], with the intention of creating flexible genomic resources freely available to all researchers. In E. coli, two different projects are currently underway in K12 strains MG1655 and W3110, both relying on high-throughput targeted mutagenesis methods. In the American project (strain MG1655), PCR amplification fragments of every ORF are mutated individually by in vitro transposition of Tn5-based mobile elements [24]. In the Japanese project (strain W3110), deletion alleles are created using a PCR-based strategy similar to the one used in the Saccharomyces Genome Deletion Project described above (http://ecoli.aist-nara.ac.jp/). In both cases, the mutant alleles are integrated into the E. coli genome by homologous recombination, which has been enhanced by the production of the bacteriophage k red recombination system that is encoded by the red genes present on a resident plasmid [25]. This plasmid can be subsequently cured by growth at high temperature. In the American project, of which an overview has been recently published [24], 1976 of the 4288 originally annotated E. coli ORFs (46%) have been successfully mutated in the first pass, whereas no progress report is available for the Japanese project (http://ecoli.aist-nara.ac.jp/). In B. subtilis, the global mutagenesis of the genome involved a consortium of laboratories and relied on the cloning of PCR-amplified fragments internal to each gene within a nonreplicating plasmid, which integrated in the corresponding genes via single crossover recombination after transformation [23]. In addition, the plasmid harbored a promoterless lacZ gene, allowing the generation of transcriptional fusions with the interrupted genes and an inducible promoter for the expression of the genes situated downstream of the target gene in multicistronic organizations, thus minimizing polar effects. However, because the main goal in this project was to estimate the minimum gene set required to sustain bacterial life, mutagenesis was unfortunately not comprehensive. It was not attempted on 1144 of the 4101 annotated B. subtilis ORFs (27.8%) either because they were studied previously for essentiality, or because they could be predicted with confidence to be either essential or dispensable. Finally, although the value of these toolboxes for gene function identification seems obvious in light of the S. cerevisiae example, functional studies using them are still to be reported. 4.3.3 Bacterial Pathogens

In all the other microorganisms where genome-wide libraries of mutants have been created, mobile elements have been used, due to their cost-effectiveness and facility of utilization, to generate large libraries of insertion mutants in which the transposon insertions have been sequenced on a systematic basis. Moreover, since

75

76

4 Impact of Genome Sequences on Mutational Analysis of Fungal and Bacterial Pathogens

these resources are readily usable, they have often been immediately used to generate substantial functional information.

4.3.3.1 Mycoplasma Species Unsurprisingly, the first attempt was in the sequenced bacterial pathogens with the smallest genomic contents, Mycoplasma genitalium and M. pneumoniae [26]. The scope of this study was to define the genes that cannot be mutated and are therefore essential for cellular life. In brief, pools of mutants obtained after electroporation of a plasmid containing the composite transposon Tn4001 from S. aureus were propagated in culture under selective pressure and genomic DNA was isolated. A laborious method was used to identify the junctions between the transposons and the chromosome. The transposon insertion sites, amplified by inverse PCR using primers specific for the transposon, were cloned and the recombinant plasmids were subsequently sequenced. The sequencing of 2209 clones defined 685 and 669 distinct insertion sites that occurred in 140 and 179 different genes of M. genitalium and M. pneumoniae, respectively. Unfortunately, in this study, the transposon mutants were analyzed within mixed populations, thus leaving no resource that could be subsequently used for functional studies. In contrast, arrayed libraries of defined transposon mutants have been constructed more recently for several pathogens.

4.3.3.2 Pseudomonas aeruginosa For Pseudomonas aeruginosa, the leading cause of death in patients with cystic fibrosis, two large-scale libraries of mutants were generated in two different projects using multipurpose Tn5 derivatives, which were introduced into the genome by conjugation. The mini-transposons harbored promoterless phoA and lacZ [27] or Photorhabdus luminescens luxCDABE reporter genes (http://pseudomutant. pseudomonas.com/) capable of generating translational fusions if appropriately inserted in a target gene. In both cases, the identification of the transposon insertion sites, which clearly represents the limiting step in this approach where large number of mutants are to be analyzed, relied on less laborious methods than in the previous effort. Briefly, efficient PCR techniques, either inverse PCR (http:// pseudomutant.pseudomonas.com/) or a two-stage semidegenerate PCR [27], were used to amplify the transposon insertion sites and the PCR products were directly sequenced. In the first project [27], where 42 240 mutants were arrayed, 36 154 of the transposon insertions sites (80%) were successfully mapped, of which 30 100 were unique. This yielded insertions in 4892 different ORFs, representing 87.8% of the annotated coding sequences in the P. aeruginosa genome. Since it was estimated that 300–400 of the 678 ORFs in which transposon insertions were not recovered are likely to be essential, the mutational coverage was probably around 94%, which is a major achievement. In the second project (http://pseudomutant. pseudomonas.com/), although the coverage was more modest, 2519 insertion sites were mapped, yielding insertion in 1284 different ORFs. Interestingly, in

4.3 Genome-Scale Mutational Analyses

1253 mutants, the mini-transposon generated active translational fusions with the P. luminescens lux genes. Importantly, the potential of the larger collection of mutants in producing essentially complete lists of candidate genes implicated in various biological processes has been validated by the results of two simple screens aimed at phenotypes with well-understood genetic bases [27], these being twitching motility and prototrophic growth. In both cases, all the genes expected from earlier studies and several previously undescribed candidates were identified.

4.3.3.3 Staphylococcus aureus In S. aureus, a derivative of the eukaryotic mariner transposon, delivered on a thermosensitive plasmid, was used to generate an archived library of 10 325 clones, of which 9540 were transposon insertion mutants [28]. Transposon insertion sites were amplified by inverse PCR and directly sequenced. Unfortunately, since mutagenesis was performed in the Newman strain, whose genome sequence is not known, the sequencing results could not be fully exploited. Nevertheless, using available S. aureus genome sequences, it was possible to map 8450 transposon insertions sites and to establish that 6917 of these were within coding sequences. Insertions occurred in at least 1812 different ORFs, which probably represent some 67% of the coding sequences in the Newman strain genome. However, this study illustrates perfectly how these resources may lead to a comprehensive understanding of the mechanisms of pathogenesis by leading to the identification of all the determinants of virulence. Indeed, 1736 mutants with insertions in different ORFs were screened in triplicate in the nematode Caenorhabditis elegans as model host in order to identify S. aureus virulence genes, which identified 71 genes important for pathogenicity. Among the 30 genes of unknown function, some were also important virulence factors in a murine model of infection.

4.3.3.4 Neisseria meningitidis As in the S. aureus project, a mini-transposon derived from the eukaryotic Himar1 mariner transposon was used to generate an ordered library of 4548 mutants in N. meningitidis [29], a human pathogen which is one of the leading causes of fatal sepsis and meningitis worldwide. However, transposition was performed in vitro on N. meningitidis chromosomal DNA, which was subsequently reintroduced into the bacterium by natural transformation where it integrated in the chromosome via allelic exchange. Transposon insertion sites were amplified by ligation-mediated PCR from the chromosomal DNAs of each mutant, which were prepared and stored in 96-well plates, and directly sequenced. As in the previous case, mutagenesis was performed in a strain 8013 whose genome sequence is not known but is currently being finished (our unpublished data), which temporarily hinders the exploitation of the sequencing results. Nevertheless, using the available N. meningitidis genome sequences it was possible to tentatively map 3221 transposon inser-

77

78

4 Impact of Genome Sequences on Mutational Analysis of Fungal and Bacterial Pathogens

tions out of the 3881 that could be sequenced, and to determine that insertions occurred in at least 940 different ORFs, or approximately 45% of the 2100 expected N. meningitidis coding sequences (our unpublished data). Since PCRamplified DNA is a suitable target for in vitro mariner transposition as well [30], the missing ORFs are currently being systematically amplified by PCR and subjected to targeted mutagenesis to generate the missing mutants. This hybrid mutagenesis strategy, unlike the previous examples, is expected to result in the creation of a comprehensive collection of mutants. Another difference to the previous examples is that this toolbox has been optimized for functional studies by using mini-transposons that were engineered to contain unique identifying DNA barcodes, as in signature-tagged mutagenesis or in the Saccharomyces Genome Deletion Project, which allows up to 48 mutants to be screened simultaneously [29]. The 4548 mutants have already been assayed in three experimental conditions [29, 31, 32], which assigned phenotypic traits to dozens of meningococcal genes, many of which were previously undescribed. For example, the simultaneous assay of pools of mutants for fitness in human serum identified 18 genes required for resistance to complement-mediated lysis including almost all the genes expected from earlier studies [29]. Although this key virulence attribute of N. meningitidis was well characterized, four previously undescribed candidates were identified. However, the identification of the genes required for type IV pilus (Tfp) formation in N. meningitidis, which play a critical role in its pathogenic lifestyle by facilitating bacterial attachment to human cells, is probably the best illustration of the use of this resource for the identification of all the genes involved in a defined biological process [31]. Mutants affected as to piliation were identified by their impaired ability to form aggregates, another phenotype mediated by these organelles. Fifteen genes, of which only seven were previously characterized in Neisseria species, were found to be essential for Tfp biogenesis, a number similar to the number of genes essential for Tfp biogenesis in other well-studied bacteria. Importantly, this study also pinpointed another advantage of genome-wide collections of mutants, and that is the possibility of adding to the results generated by direct phenotypic screens by complementary reverse genetics analysis of all the other genes of interest that were not detected. The mutants in the corresponding genes, which may have attracted interest on the basis of particular sequence homologies or data available in the literature, can be readily retrieved in the library and analyzed immediately in the same phenotypic screens. For example, a gene that was homologous to a known Tfp biogenesis gene (pilZ) from P. aeruginosa [33] was, surprisingly, found to be dispensable for fiber biogenesis in N. meningitidis. This negative result is interesting since it suggested that there were some subtle differences between otherwise extremely conserved Tfp biogenesis machineries in two different piliated bacterial species.

References

4.4 Conclusion

The availability of entire genome sequences has further increased the popularity and predominance of mutational analysis for determining gene function and blazed a trail for the creation of comprehensive collections of mutants, which was previously unfeasible. The extremely profound impact of this tool in baker’s yeast, where it helped to narrow the gap between sequence and function and may rapidly lead to the production of a global functional map, has prompted similar projects in many pathogenic microorganisms, despite the numerous inherent difficulties. Although it may be unrealistic to foresee the production of a global map of gene function in any pathogen, the widespread use of these libraries for exquisite in vivo or in vitro phenotypic analysis, which is still in its infancy in pathogenic microorganisms, is expected to expedite dramatically the unraveling of bacterial and fungal pathogenicity in the next few years and, hopefully, the design of novel antimicrobial therapies.

References 1 Fleischmann, R. D., M. D. Adams,

2

3

4

5

O. White, R. A. Clayton, E. F. Kirkness, A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, J. M. Merrick, et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512. Sander, C. 2000. Genomic medicine and the future of health care. Science 287:1977–1978. Hensel, M., and D. W. Holden. 1996. Molecular genetic approaches for the study of virulence in both pathogenic bacteria and fungi. Microbiology 142:1049–1058. Reyrat, J.–M., F. X. Berthet, and B. Gicquel. 1995. The urease locus of Mycobacterium tuberculosis and its utilization for the demonstration of allelic exchange in Mycobacterium bovis bacillus CalmetteGurin. Proc Natl Acad Sci USA 92:8768–8772. Mahairas, G. G., P. J. Sabo, M. J. Hickey, D. C. Singh, and C. K. Stover. 1996. Molecular analysis of genetic differences between Mycobacterium bovis BCG and virulent M. bovis. J. Bacteriol. 178:1274–1282.

6 Tinsley, C. R., and X. Nassif. 1996. Anal-

7

8

9

10

ysis of the genetic differences between Neisseria meningitidis and Neisseria gonorrhoeae: two closely related bacteria expressing two different pathogenicities. Proc Natl Acad Sci USA 93:11109– 11114. Hood, D. W., M. E. Deadman, M. P. Jennings, M. Bisercic, R. D. Fleischmann, J. C. Venter, and E. R. Moxon. 1996. DNA repeats identify novel virulence genes in Haemophilus influenzae. Proc Natl Acad Sci USA 93:11121–11125. Coutte, L., R. Antoine, H. Drobecq, C. Locht, and F. Jacob–Dubuisson. 2001. Subtilisin-like autotransporter serves as maturation protease in a bacterial secretion pathway. EMBO J 20:5040–5048. Fields, P. I., R. V. Swanson, C. G. Haidaris, and F. Heffron. 1986. Mutants of Salmonella typhimurium that cannot survive within the macrophage are avirulent. Proc Natl Acad Sci USA 83:5189– 5193. Fields, P. I., E. A. Groisman, and F. Heffron. 1989. A Salmonella locus that controls resistance to microbicidal proteins

79

80

4 Impact of Genome Sequences on Mutational Analysis of Fungal and Bacterial Pathogens from phagocytic cells. Science 243:1059–1062. 11 Hensel, M., J. E. Shea, C. Gleeson, M. D. Jones, E. Dalton, and D. W. Holden. 1995. Simultaneous identification of bacterial virulence genes by negative selection. Science 269:400–403. 12 Mecsas, J. 2002. Use of signature-tagged mutagenesis in pathogenesis studies. Curr Opin Microbiol 5:33–37. 13 Merrell, D. S., D. L. Hava, and A. Camilli. 2002. Identification of novel factors involved in colonization and acid tolerance of Vibrio cholerae. Mol Microbiol 43:1471–1491. 14 Cox, J. S., B. Chen, M. McNeill, and W. R. Jacobs Jr. 1999. Complex lipid determines tissue-specific replication of Mycobacterium tuberculosis in mice. Nature 402:79–83. 15 Sassetti, C. M., D. H. Boyd, and E. J. Rubin. 2001. Comprehensive identification of conditionally essential genes in mycobacteria. Proc Natl Acad Sci USA 98:12712–12717. 16 Sassetti, C. M., and E. J. Rubin. 2003. Genetic requirements for mycobacterial survival during infection. Proc Natl Acad Sci USA 100:12989–12994. 17 Coelho, P. S. R., A. Kumar, and M. Snyder. 2000. Genome-wide mutant collections: toolboxes for functional genomics. Curr Opin Microbiol 3:309–315. 18 Ross-Macdonald, P., P. S. Coelho, T. Roemer, S. Agarwal, A. Kumar, R. Jansen, K. H. Cheung, A. Sheehan, D. Symoniatis, L. Umansky, et al. 1999. Large-scale analysis of the yeast genome by transposon tagging and gene disruption. Nature 402:413–418. 19 Winzeler, E. A., D. D. Shoemaker, A. Astromoff, H. Liang, K. Anderson, B. Andre, R. Bangham, R. Benito, J. D. Boeke, H. Bussey, et al. 1999. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285:901–906. 20 Scherens, B., and A. Goffeau. 2004. The uses of genome-wide yeast mutant collections. Genome Biol 5:229. 21 Lum, P. Y., C. D. Armour, S. B. Stepaniants, G. Cavet, M. K. Wolf, J. S. Butler, J. C. Hinshaw, P. Garnier, G. D. Prestwich, A. Leonardson, et al. 2004. Dis-

22

23

24

25

26

27

28

29

30

covering modes of action for therapeutic compounds using a genome-wide screen of yeast heterozygotes. Cell 116:121–137. Hughes, T. R., M. D. Robinson, N. Mitsakakis, and M. Johnston. 2004. The promise of functional genomics: completing the encyclopedia of a cell. Curr Opin Microbiol 7:546–554. Kobayashi, K., S. D. Ehrlich, A. Albertini, G. Amati, K. K. Andersen, M. Arnaud, K. Asai, S. Ashikaga, S. Aymerich, P. Bessieres, et al. 2003. Essential Bacillus subtilis genes. Proc Natl Acad Sci USA 100:4678–4683. Kang, Y., T. Durfee, J. D. Glasner, Y. Qiu, D. Frisch, K. M. Winterberg, and F. R. Blattner. 2004. Systematic mutagenesis of the Escherichia coli genome. J Bacteriol 186:4921–4930. Datsenko, K. A., and B. L. Wanner. 2000. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc Natl Acad Sci USA 97:6640–6645. Hutchison, C. A., S. N. Peterson, S. R. Gill, R. T. Cline, O. White, C. M. Fraser, H. O. Smith, and J. C. Venter. 1999. Global transposon mutagenesis and a minimal Mycoplasma genome. Science 286:2165–2169. Jacobs, M. A., A. Alwood, I. Thaipisuttikul, D. Spencer, E. Haugen, S. Ernst, O. Will, R. Kaul, C. Raymond, R. Levy, et al. 2003. Comprehensive transposon mutant library of Pseudomonas aeruginosa. Proc Natl Acad Sci USA 100:14339–14344. Bae, T., A. K. Banger, A. Wallace, E. M. Glass, F. Aslund, O. Schneewind, and D. M. Missiakas. 2004. Staphylococcus aureus virulence genes identified by bursa aurealis mutagenesis and nematode killing. Proc Natl Acad Sci USA 101:12312–12317. Geoffroy, M. C., S. Floquet, A. Mtais, X. Nassif, and V. Pelicic. 2003. Largescale analysis of the meningococcus genome by gene disruption: resistance to complement-mediated lysis. Genome Res 13:391–398. Pelicic, V., S. Morelle, D. Lampe, and X. Nassif. 2000. Mutagenesis of Neisseria meningitidis by in vitro transposi-

References tion of Himar1 mariner. J Bacteriol 182:5391–5398. 31 Carbonnelle, E., S. Helaine, L. Prouvensier, X. Nassif, and V. Pelicic. 2005. Type IV pilus biogenesis in Neissseria meningitidis: PilW is involved in a step occurring after pilus assembly, essential for fiber stability and function. Mol Microbiol 55:54–64. 32 Helaine, S., E. Carbonnelle, L. Prouvensier, J. L. Beretti, X. Nassif, and

V. Pelicic. 2005. PilX, a pilus-associated protein essential for bacterial aggregation, is a key to pilus-facilitated attachment of Neisseria meningitidis to human cells. Mol Microbiol 55:65–77. 33 Alm, R. A., A. J. Bodero, P. D. Free, and J. S. Mattick. 1996. Identification of a novel gene, pilZ, essential for type 4 fimbrial biogenesis in Pseudomonas aeruginosa. J Bacteriol 178:46–53.

81

83

II Genomics of Pathogenic Bacteria

85

5 Pathogenomics of Escherichia coli and Shigella Species Ulrich Dobrindt and Jrg Hacker

5.1 Introduction

The enterobacterial “species” Escherichia coli and Shigella can be grouped into a phylogenetic cluster which diverged from other members of the c-subdivision of gram-negative purple bacteria around the time of the emergence of mammalian organisms [1]. Shigella and E. coli are closely related, and it is becoming more and more clear that both could be considered as members of one species as their distinction is only based on the pathogenic character of the bacteria. Their chromosomal organization shares more than 90% homology according to DNA–DNA reassociation experiments [2]. E. coli and Shigella populations are clonal. E. coli isolates can be grouped into particular clones that have evolved under competition as distinct genetic types. They can be classified according to their pathotype and their host [3–7]. These clones arose in parallel by both the loss and the ordered gain of genetic information. The clones are maintained during adaptation to their respective niches, and because of horizontal gene transfer, their further evolution is constantly in progress, thus resulting in very dynamic and diverse genome structures. In the case of E. coli K-12 strain MG1655, about 18% of the genome has been estimated to represent horizontally acquired sequences [8]. The stepwise acquisition of “foreign DNA” from distantly related organisms as well as the loss of DNA regions (genome reduction) resulted in different metabolic and pathogenic features that distinguish the different genera, strains, and pathotypes. Arising from a nonpathogenic E. coli ancestor, the loss of the ompT and cadA gene in combination with the acquisition of two pathogenicity islands and one virulence plasmid led to the evolution of pathogenic Shigella [9]. The parallel gain and loss of mobile genetic elements, such as bacteriophages, plasmids, and the LEE (“locus of enterocyte effacement”) pathogenicity island, in different lineages of pathogenic E. coli enabled the evolution of separate clones which belong to different E. coli pathotypes [6]. Accordingly, there is growing evidence that the genome of E. coli and Shigella can be considered as being composed of a conserved core of genes providing the backbone of genetic information required for essential cellular processes.

86

5 Pathogenomics of Escherichia coli and Shigella Species

In addition, a flexible gene pool exists which is not common for all strains and consists of an individual assortment of strain-specific genetic information which may provide additional properties enabling these strains to adapt to special environmental conditions. Therefore, differences in genome size reflect the size variation of the flexible gene pool and are mainly due to the acquisition and loss of genomic DNA. A surprisingly large proportion of the flexible gene pool consists of uncharacterized unknown open reading frames (ORFs) without any obvious function. Another major constituent of the flexible gene pool is the group of accessory genetic elements, e.g., plasmids, transposons, insertion sequence elements, prophages, nonfunctional fragments thereof, and genomic islands. They can either be integrated into the chromosome or replicate independently as extrachromosomal elements. Several types of these elements can be laterally transferred and are present in probably all of the major bacterial phylogenetic groups, thus contributing to the inter- and intraspecies variability in genome content [9–11].

5.2 Comparative Genomics of Shigella

In contrast to E. coli, Shigella flexneri is pathogenic only for humans. S. flexneri can invade the colon epithelium and cause intense acute inflammatory response. Its ability to enter the host cell cytosol, where it can replicate and spread into neighboring cells, is unusual among enterobacteria. Mainly responsible for the invasive character is a large virulence plasmid that contains the genes required for invasion. Additionally, chromosomal genes located within pathogenicity islands contribute directly to the pathogenic process and to survival during infection [12, 13]. Consequently, genome comparison of S. flexneri and the phylogenetically indistinguishable E. coli should enable identification of the basis of the specific clinical manifestation of S. flexneri infection. The genomes of two S. flexneri serotype 2a strains, 2457T and 301, have been completely sequenced and some of their characteristics are compiled in Table 5.1. Full genome sequence determination of one S. dysenteriae and one S. boydii strain is in progress (http://www.sanger.ac.uk/Projects/Microbes/) (see also Table 5.2). The S. flexneri chromosome exhibits the conserved backbone and flexible island mosaic structure typical of E. coli/Shigella. The genome sequences are very similar, comprising a common colinear chromosomal backbone of 3.9 million base pairs (Mbp) which is also shared with E. coli K-12 strain MG1655 and the O157:H7 strain EDL933. Nevertheless, more than 1400 single nucleotide exchanges have been observed which are scattered all over the S. flexneri genome. Strain 301 carries an unusual set of three spacer tRNAs in the rrnH operon [14, 15]. The strain 2457T genome lacks 357 E. coli genes and contains a large fraction of accessory and (formerly) mobile genetic elements, i.e., 284 intact or nonfunctional insertion sequence (IS) elements that mediated several genomic rearrangements, as well as cryptic prophages. The number of IS-related DNA stretches (n= 314) is even higher in the genome of strain 301 [14, 15], thus also increasing the genome size of strain 301 relative to that of strain 2457T.

5.2 Comparative Genomics of Shigella

87

Accordingly, S. flexneri genomes carry more IS elements than those of any other enteric bacteria. Colinearity of the chromosomal backbone in both genomes is disturbed due to apparently IS-element-mediated chromosomal rearrangements relative to each other and to E. coli K-12. There are at least 15 larger rearrangements between the two Shigella genomes, and nearly all inversions are bordered by insertion sequences. Large rearrangements exist, e.g., around the origin of replication. Around the replication terminus a large inversion occurred in both Shigella strains, followed by reinversion of most of the rearranged DNA region, so that two smaller stretches of inverted sequences indicate the beginning and the end points of the initial event. This implies that the major driving force behind these rearrangements within the S. flexneri genome is recombination between related IS elements. Tab. 5.1 Characteristics of publicly available complete E. coli and S. flexneri

genome sequences. Strain

Pathotype

Size (bp)

G+C content (%)

No. of predicted ORFs

No. of ISs or IS-like elements

No. of pro- Plasmids phages or prophagelike elements

S. flexneri 2457T

2a

4 599 354

50.9

4084

284

12

4 (pINV[15] 2457T, pSf2, pSf4, pSf-R27)

S. flexneri 301

2a

4 607 203

50.9

4434

314

ND

pCP301

[14]

E. coli MG1655

K-12

4 639 221

50.8

4294

42

10

–

[30]

E. coli CFT073

UPEC 5 231 428 (O6:K2:H1)

50.5

5533

ND

5

–

[34]

E. coli EDL933

EHEC 5 528 445 (O157:H7)

50.5

5361

ND

13

pO157

[33]

E. coli Sakai

EHEC 5 594 477 (O157:H7)

50.5

5361

137

24

pO157, pOSAK1

[32]

ORFs, open reading frames; IS, insertion sequence; ND, not determined;

Reference

88

5 Pathogenomics of Escherichia coli and Shigella Species

Tab. 5.2 Ongoing complete E. coli and Shigella genome sequencing projects

(as at April 2005). Organism

Strain

Disease/ pathotype

Genome Status size (Mbp)

Remarks

Institutions involved

E. coli 536 O6:K15:H31

ExPEC

4.9

Sequence annotation

University of Wrzburg; Gttingen Genomics Laboratory, Germany

E. coli O6:K5:H1

DSM6601

non4.9 pathogenic

In progress Gap closure

E. coli O18:K1:H7

RS218

ExPEC

5.2

In progress ~ 200 contigs Genome Center, Univer> 1 kbp sity of Wisconsin, USA

E. coli O42

O42

EAEC

5.355

Finished

4 contigs > 1 kbp

Sanger Centre, UK

E. coli O127:H6

E2348/69

EPEC

5.067

In progress 8 contigs > 1 kbp

Sanger Centre, UK

S. dysenteriae M131

Dysentery

4.8

Finishing/ 330 contigs gap closure > 1 kbp

Sanger Centre, UK

S. sonnei

Dysentery

5.255

Finishing/ 19 contigs gap closure > 1 kbp

Sanger Centre, UK

53G

Finished

Gesellschaft fr Biotechnologische Forschung (GBF), Braunschweig; Medical Hannover; University of Wrzburg; Gttingen Genomics Laboratory, Germany

Acquisition of new traits by HGT (horizontal gene transfer) as well as a complementary loss-of-function mechanism have been proposed [16, 17] to adapt to specialized niches and to increase virulence. Acquisition of the virulence plasmid enabled S. flexneri to enter the intracellular environment in human intestinal epithelial cells. In this new niche, genes required in the intestinal lumen may be deleterious or are no longer beneficial and may accumulate mutations without a selective force to maintain them. Lysine decarboxylase (CadA) produces cadaverine, which inhibits the escape of S. flexneri from the vacuole into the cell cytosol [18, 19]. As replication and spread of S. flexneri depends on access to the cytosol, biosynthesis of cadaverine attenuates virulence. The cadA and cadC genes are deleted as well as the surface protease OmpT encoding gene, which determines the intercellular spreading ability of Shigella [9]. Lack of surface structures, e.g., flagella and curli fimbriae, in S. flexneri may provide the advantage of fewer antigens that could be recognized by the host immune system. In strain 2457T, 14 genes of flagellar biosynthesis are nonfunctional due to frameshift and point

5.2 Comparative Genomics of Shigella

mutations as well as to IS1-mediated truncation [15]. Pseudogenes result from disruption of coding sequences by point mutations, single nucleotide indels, and insertion of IS elements. IS elements cause disruption and truncation of genes as well as larger deletions and insertions. IS elements are probably the major cause of the S. flexneri genome dynamics. Many other phenotypic characteristics of S. flexneri relative to E. coli (e.g., the loss of utilization of lactose, maltose, and xylose) can be assigned to pseudogenes in Shigella which are still functional in E. coli. With respect to the presence of large numbers of pseudogenes, S. flexneri resembles another enteric pathogen restricted to humans, and that is Salmonella enterica serovar Typhi. The accumulation of pseudogenes is considered to be one of the main reasons why both evolved from the rest of their species to become human restricted pathogens [20]. S. flexneri harbors a large virulence plasmid, pINV, that contains all of the genes (ipa, mxi-spa, virG/lscA, virF) required to express the invasive phenotype. This plasmid is also present in enteroinvasive E. coli strains. Sequence comparison of the four available virulence plasmid sequences of S. flexneri, pINV-2457T, pWR100, pWR501, and pCP301 [14, 15, 21, 22] showed that they are essentially identical. Size differences are mainly due to the presence of IS elements. Interestingly, about 50% of all ORFs located on the invasion plasmid are related to putative IS elements indicative of IS-mediated acquisition of gene blocks of various origins. However, the analysis of non-IS-related coding sequences of different pINV variants also indicated that positive selection is also a major driving force involved in the evolution of pINV variants [23]. In contrast to strain 301, strain 2457T also contains two small multicopy plasmids as well as another 165-kbp plasmid with high homology to the S. enterica serovar Typhi R27-like plasmid [15]. The latter was considered to be limited to Salmonella implicated in the acquisition and accumulation of antibiotic resistance. However, sequence similarity between pR27 and the large virulence plasmid of Yersinia pestis, pMT1, has been observed, indicative of a common origin. Comparative genome analysis also provides a powerful means of linking pathogenic processes with specific acquired genes and their encoded products. In addition to the 3.9-Mbp backbone shared with E. coli, several regions unique to Shigella can be found scattered over the chromosome. Sixty-four such Shigella islands (SIs) greater than 1000 base pairs can be detected and many of them exhibit typical features of genomic islands or prophages. Shigella prophages play an important role in serotype conversion, which is an important virulence trait of S. flexneri. Four different temperate bacteriophages involved in O antigen modification have been characterized so far. Compared to E. coli, most prophages found in S. flexneri differ in their DNA sequence, indicating that they were acquired after the separation of the two bacterial lineages [14, 15, 24, 25].

89

PAI I536

PAI V536 PAI IJ96 PAI IIJ96

O6:K15:H31

O6:K15:H31

O6:K15:H31

O6:K15:H31

–

O4:K :H5

–

O4:K :H5

O6:K2:H1

O6:K2:H1

E. coli 536 (UPEC)

E. coli 536 (UPEC)

E. coli 536 (UPEC)

E. coli 536 (UPEC)

E. coli J96 (UPEC)

E. coli J96 (UPEC)

E. coli CFT073 (UPEC)

E. coli CFT073 (UPEC)

PAI IIAL862 kps PAI

K1

O18:K1:H7

O78

E. coli AL862 (SEPEC)

E. coli EV36 (K12/K1 hybrid E. coli)

E. coli C5 (UPEC)

E. coli 222 (APEC)

LEE

E. coli RW1374 (STEC)

O103:H2

LEE

E. coli E2348/69 (EPEC) O127:H6

Vat-PAI

PAI IC5

PAI IAL862

E. coli AL862 (SEPEC)

PAI IICFT073

PAI ICFT073

PAI III536

PAI II536

Designation

Organism

Tab. 5.3 Well-characterized pathogenicity islands of E. coli and Shigella.

58

a-Hemolysin, P fimbriae (Pap)

Type III secretion, invasion, parts of the she PAI (Sh. flexneri 2a)

Type III secretion, invasion

> 80

35

22

~ 100

a-Hemolysin, P fimbriae (Prs), cytotoxic necrotizing factor 1 (CNF1), heat-resistant hemagglutinin Autotransporter vacuolating toxin

n.d.

61

61

K1 capsule

afa8 adhesin

afa8 adhesin

71

110

a-Hemolysin, P fimbriae (Prs), cytotoxic necrotizing factor 1 (CNF1)

P fimbriae (Pap), iron acquisition

> 170

76.8

S fimbriae (SfaI), iro siderophore system, hemoglobin protease

a-Hemolysin, P fimbriae (Pap)

102

a-Hemolysin, P fimbriae (Prf), put. adhesin

79.6

75.8

a-Hemolysin, putative adhesins

K15 capsule

Size (kbp)

Encoded traits

pheV

selC

thrW

leuX

pheV

pheV

pheU

pheU

pheV

pheU

pheV

pheV

thrW

leuX

selC

[90]

[89]

[88]

[87]

[86]

[56]

[56]

[34, 85]

[34, 54]

[53]

[53]

[84]

[43]

[43]

[43]

Chromosomal Referinsertion site ence

90

5 Pathogenomics of Escherichia coli and Shigella Species

LEE

LEE (OI 148) LEE LEE

E. coli (EPEC and EHEC) O111:H8 O111:H– O26:H11 O26:H–

E. coli EDL933 (EHEC) O157:H7

E. coli RDEC-1 (REPEC) O15:H–

O15:H–

Locus of proteolysis activity (LPA)

O91:H

O78:H11

E. coli (EHEC)

E. coli 10407(ETEC)

SHI-1 (she) SHI-2 Shigella resistance locus (SRL) Shi-O

Shigella flexneri

S. flexneri

S. flexneri

S. flexneri

Genes involved in serotype conversion

Ferric dicitrate transport, antibiotic resistances

Aerobactin synthesis, colicin V immunity

Enterotoxin (Set), protease (Pic)

Yersiniabactin synthesis, transport

Invasion

Serine protease (EspI), vitamin B12 receptor (BtuB), adhesin

Diffuse adherence adhesin

Autotransporter/enterotoxin

Type III secretion, invasion

Type III secretion, invasion, put. adhesin, enterotoxin

Type III secretion, invasion, put. adhesin

Type III secretion, invasion

Type III secretion, invasion

Encoded traits

11

66

23–30

46.6

31–43

46

33

> 11

thrW

serX

selCv

pheV

asnTv

selC

selC

pheV

ssrA

pheV

~ 85 15.2

pheU

pheU

selC

pheU

[103]

[102]

[27, 44]

[101]

[98–100]

[97]

[96]

[95]

[94]

[93]

[93]

[92]

[33]

[91]

Chromosomal Referinsertion site ence

59.5

?

43

?

Size (kbp)

DR, direct repeat; APEC, avian pathogenic E. coli; UPEC, uropathogenic E. coli; SEPEC, sepsis-causing E. coli; EPEC, enteropathogenic E. coli; EHEC, enterohemorrhagic E. coli; ETEC, enterotoxigenic E. coli; REPEC, rabbit enteropathogenic E. coli

HPI (PAI IV536)

pathogenic E. coli

TPAI-1

EPEC Afa-PAI

–

O55:H

E. coli 135/12 (EPEC)

EspC-PAI

E. coli E2348/69 (EPEC) O127:H7

–

LEE

E. coli 84/110-1 (REPEC) O103:H2

E. coli 83/39 (REPEC)

Designation

Organism

Tab. 5.3 Continued.

5.2 Comparative Genomics of Shigella 91

92

5 Pathogenomics of Escherichia coli and Shigella Species

The genomic island content of the two S. flexneri strains is similar, but their organization and chromosomal insertion site may differ. Shigella pathogenicity islands which have already been analyzed in detail are shown in Table 5.3. Several islands may be involved in niche-specific processes or virulence: the Pic protease/ mucinase and ShET1 enterotoxin on the she pathogenicity island (PAI) [26], or the aerobactin siderophore gene cluster on the SHI-2 PAI inserted at selC [27] or on SHI-3 inserted at pheU in S. boydii [28]. Eight other smaller islands contain the sit genes coding for another siderophore system, possible specific adhesins similar to long polar fimbriae (Lpf) or the Saf protein of S. enterica serovar Typhimurium. The most notable island type carries genes related to the type III secreted effector protein IpaH that may support escape of Shigella from macrophage vacuoles. They are secreted by the plasmid-encoded type three secretion system. Five ipaH copies are localized on the virulence plasmid of strain 2457T and seven more are present on the chromosome. Three of them are nonfunctional. In strain 301 four complete and three incomplete ipaH copies have been found on the chromosome as well as five copies on the virulence plasmid pCP301. In both S. flexneri strains, incomplete ipaH copies result from disruption by IS elements or by frameshift mutations [14, 15]. The comparative genome analysis of S. flexneri and E. coli [14, 15] confirms the previous finding that the two are closely related and may belong to the same genus [2, 29]. Shigella evolved from multiple E. coli strains in correlation with the appearance of man much later than, e.g., E. coli O157:H7 and K-12, which diverged from a common ancestor about 4.5 million years ago [6]. To meet the demand of its unique pathogenic lifestyle, the chromosomally determined phenotypic properties result from convergent evolution during niche adaptation, mostly due to gene acquisition or loss of function, some from negative selection pressure.

5.3 Comparative Genomics of Escherichia coli 5.3.1 Comparison of Complete Genome Sequences

The chromosome of Escherichia coli K-12 is the best studied microbial genome. Accordingly, that of the E. coli K-12 strain MG1655 was the first E. coli genome which has been completely sequenced [30]. Earlier results already revealed an unexpected level of structural and genetic diversity among genomes of different E. coli strains also mirrored by genome size variation within the species E. coli between 4.6 and 5.5 Mbp [31]. The genomes of four E. coli strains – the nonpathogenic K-12 strain MG1655, two strains of enterohemorrhagic E. coli O157:H7 (EDL933 and Sakai) and the uropathogenic E. coli O6:K2:H1 strain CFT073 – have been completely sequenced [30, 32–34] (see Table 5.1). The availability of these complete genome sequences

5.3 Comparative Genomics of Escherichia coli

allows detailed comparison of the genetic and structural genome variability not only of different E. coli strains, but also of different pathotypes. A genomic comparison of strains CFT073, EDL933, and MG1655 revealed that only 39.2% (2996 genes) of their combined set of proteins are common to all three strains [34], underlining the astonishing diversity among E. coli isolates. The two E. coli O157:H7 genomes are extremely similar. However, comparison of either O157: H7 sequence to E. coli K-12 reveals that an extraordinary amount of gene loss and gain has occurred since these strains last shared a common ancestor about 4.5 million years ago. The 5.5-Mbp O157:H7 genome is nearly 1 Mbp larger than that of E. coli K-12. While roughly 4.1 Mbp of the chromosome was very similar between O157:H7 and K-12, this conserved “backbone” was interrupted by hundreds of “islands” and “islets” of sequences specific to one strain or the other. 0.53 Mbp of E. coli MG1655-specific sequences are absent from the EDL933 genome, which itself contains over 1.4 Mbp of DNA without a counterpart in the K-12 genome. These sequences are clustered in 177 regions ranging from 50 kbp to nearly 90 kbp in length and comprise more than 1000 putative ORFs, including some that have been previously associated with O157:H7 virulence, as well as many new candidate pathogenicity factors, such as iron utilization and host cell adherence-associated genes. Surprisingly, two large islands were discovered, each containing nearly identical genes encoding urease in a strain typically characterized by its lack of urease activity in clinical assays. However, this gene cluster could be expressed in an E. coli K-12 background, indicating that regulation of these determinants which may contribute to the acid tolerance of enterohemorrhagic E. coli (EHEC) is different in various E. coli backgrounds [35]. A large fraction of the genomic differences can be accounted for by the activity of mobile genetic elements. Nearly 40% of the O157-specific elements are found in one of at least 18 cryptic prophage or the one intact bacteriophage (933W) which also contains the most characteristic virulence genes of O157:H7, i.e., those coding for the Shiga toxins [33]. Whole-genome structure comparison of several O157 strains not only demonstrated that the O157-specific sequences are highly conserved among the strains, but also showed that an unexpectedly high genomic diversity exists. Prophages especially exhibit extensive structural and positional diversity, suggesting that variation of prophages is one of the most important factors in generating genome diversity among O157 strains [36, 37]. The 5.2-Mbp genome sequence of uropathogenic E. coli (UPEC) strain CFT073 [34] supported these views, and comparison with an O157:H7 and a K-12 genome revealed 2996 genes of the core chromosome common to all three strains. However, the comparison also reveals 1303 Mbp of DNA only present in the UPEC strain but absent in both of the other strains. E. coli K-12 and O157:H7 are more closely related to each other than either is to this extraintestinal pathogenic E. coli (ExPEC) strain. Although the island contents differ, many of the same chromosomal sites serve as chromosomal insertion sites of these strain-specific elements. According to the genome sequence, there are a number of previously unknown toxins and adhesins that may contribute to pathogenesis in the human urogenital tract. There are ongoing projects to sequence the full genome of other ExPEC and

93

94

5 Pathogenomics of Escherichia coli and Shigella Species

intestinal pathogenic E. coli (IPEC) strains (see Table 5.2), and their comparison with existing sequences is expected as well to reveal genes specific for extraintestinal or intestinal pathogens, but not found in commensal strains, as to identify genes responsible for distinct aspects of their diseases. The majority of strain-/ pathotype-specific regions are found at limited positions in the individual chromosomes, suggesting that the strain-specific elements accumulated over time by repeated horizontal gene transfer, frequently with successive transfers of different elements into the locus of the core chromosome. The fate of horizontally transferred sequences depends on their cost or benefit to the bacteria. The fact that so many different horizontally acquired sequences exist in islands differentiating these closely related E. coli strains suggests that many of them are temporary residents of the genome or provide an advantage specific to the individual lifestyle of particular strains. 5.3.2 Comparative Genomics Using DNA Arrays

The genomes of several different pathogenic and nonpathogenic E. coli as well as those of Shigella isolates have been compared in different studies by DNA–DNA hybridization using DNA arrays. Comparative genomic hybridization using a K-12-specific array allows detection of chromosomal regions that have been replaced by acquisition and chromosomal insertion of horizontally acquired DNA and gene loss. The K-12-specific genome content of different E. coli isolates has been compared to assess the incidence of gene transfer and gene loss for E. coli genome evolution: a total of 67 events, including 37 additions and 30 deletions, were required to account for the distribution of all genes present in the chromosome of K-12 strain MG1655 [38]. The genome content of 26 different ExPEC and IPEC strains was compared using a K-12-specific array in combination with an array comprising many probes specific for virulence-associated genes of ExPEC and IPEC. The conserved chromosomal E. coli backbone was assessed to comprise about 3100 genes and the majority of variable ORFs of the K-12 genome were functionally grouped as hypothetical, unclassified, or as prophages and cell envelope genes. Analysis of the genome content using the so-called “E. coli pathoarray” revealed that many virulence-associated ORFs of ExPEC are also widespread among nonpathogenic, commensal isolates [39]. Another more recent study of 22 ExPEC, IPEC, and Shigella strains confirmed the range of order of the common backbone of the E. coli genome as estimated by Dobrindt and coworkers [39] to contain about 2800 ORFs. The mosaic distribution of absent regions indicated that the genomes of pathogenic strains were highly diversified because of insertions and deletions [40]. An array based on the E. coli K-12 genome has also been used to determine the common E. coli core gene pool in O26 verotoxigenic E. coli strains [41].

5.3 Comparative Genomics of Escherichia coli

5.3.3 Mobile Genetic Elements and Evolution of Pathogenic E. coli

Mobilizable plasmids and bacteriophages represent key elements to enable bacteria to exchange genetic information by conjugation and transduction. During recent years, genomic research revealed that pathogenicity islands (PAIs) have also played a major role for the transformation of avirulent into virulent bacteria. Only limited knowledge exists about the origin of PAIs, but it has been speculated that they may derive from integrating plasmids or phages that have lost the genes required for replication and self-transfer in exchange for a more stable association and inheritance with the host chromosome [11]. Evidence to support this model was in the findings that PAIs often encode phage-like integration systems, as has already been discussed in detail. Additional analysis of PAI-specific DNA sequences revealed the presence of regions with similarity to phage- and/or plasmid-related genes or sequences [42, 43], which is another indication that PAIs may have evolved from other mobile genetic elements. A relationship between PAIs and plasmids is also supported by the fact that island-encoded properties have been detected on plasmids in related species or a different isolate of the same species. Examples are aerobactin-encoding genes that are often part of plasmids that additionally encode colicin ColV. As the aer genes of Shigella island SHI-2 are associated with regions of sequence homology to ColV-specific genes, parts of SHI-2 may have evolved from an integrating plasmid [27, 44]. Alternatively, some PAIs may have been acquired by mechanisms related to conjugation. An increasing number of elements (termed “conjugative transposons,” “constins,” or “conjugative genomic islands”) have been discovered in gram-negative bacteria including E. coli that are normally integrated in the chromosome, but can excise in a precise manner to be subsequently transferred to recipient cells by conjugation [45]. In gram-negative bacteria, some of these elements exhibit several features reminiscent of PAIs, including a site-specific recombinase and lack of autonomous replication. In contrast to PAIs, they carry genes required for mating-pair formation and conjugative DNA metabolism which are related to plasmid-encoded conjugation systems [46]. 5.3.4 Genomic Islands/Pathogenicity Islands

Many virulence-associated genes of pathogenic E. coli are frequently located on (formerly) mobile genetic elements such as bacteriophages, plasmids, and genomic islands. A variety of pathogenic E. coli isolates have been analyzed with respect to the presence and spread of virulence-associated genes. The presence of distinct genomic islands (GEIs) encoding virulence determinants, designated pathogenicity islands (PAIs), has been shown for a broad variety of bacteria including various strains of the different E. coli pathotypes [10, 11, 47]. Studies on their prevalence and phylogenetic relation in, e.g., ExPEC have been undertaken recently [48–52]. As a result, ExPEC virulence factors exhibit distinct patterns of

95

96

5 Pathogenomics of Escherichia coli and Shigella Species

phylogenetic distribution. This provides evidence of both vertical and horizontal transmission of the corresponding virulence-associated genes as well as of hostspecific associations and strong associations among different virulence-associated genes. The GEIs that have been identified so far carry a variety of virulence factors that represent the entire spectrum of bacterial virulence-associated genes, including adhesins, toxins, iron uptake systems, secretion systems, and strategies to avoid the host defense mechanisms. E. coli PAIs that have been analyzed in detail are listed in Table 5.3. GEIs represent characteristic features of E. coli and Shigella and seem to play a important role for evolution of different variants or pathotypes. A considerable fraction of the entire genome represents PAIs with significant structural and functional diversity. There are hardly any pathogenic enterobacterial species with only one PAI per strain, and the number of multiple PAIs detected in individual isolates is constantly increasing [43, 53–56]. This is corroborated by the complete genome sequences: in pathogenic E. coli strains, up to 13 PAI-like genetic entities have been identified in one strain [32–34]. Eight chromosomal regions in S. flexneri 2a exhibit some features of PAIs [14, 15]. The growing sequence information about PAIs of different gram-negative species demonstrates that horizontal gene transfer and homologous recombination play a pivotal role in the evolution of PAIs and enterobacteria. Comparison of PAIs of E. coli reveals that identical or almost identical PAIs can be detected in different enterobacterial species, pathotypes, or strains. Typical examples are the so-called “high pathogenicity island” (HPI), initially described in pathogenic yersiniae, and the “locus of enterocyte effacement” (LEE). However, many PAIs have a mosaic-like, modular structure, and although many of them superficially resemble each other with respect to the presence and/or genetic linkage of certain virulence determinants, great variability exists with regard to PAI composition, structural organization, and chromosomal localization even among strains of the same patho- or serotype [43, 51, 55, 57]. Recently, even the LEE island in E. coli, which is generally considered to be a clonal unit inside a clonal host and thus expected to evolve as a single unit, has been shown to exhibit a mosaic-like composition [58–60]. Many PAI regions exhibit notable homology to fragments of mobile genetic elements such as bacteriophages and enterobacterial virulence plasmids. Furthermore, multiple copies of accessory DNA elements or fragments thereof that are present on islands in one genome facilitate homologous recombination within one or between different islands or horizontally acquired DNA elements, thus leading to rearrangements, deletions, and acquisition of “foreign” DNA. Comparison of the overall genome structure and genetic organization of GEIs of nonpathogenic probiotic E. coli strain Nissle 1917 (O6:K5:H1) with PAIs of UPEC O6 strains CFT073 and 536 showed that the islands of nonpathogenic E. coli strain Nissle 1917 resemble more closely those of UPEC strain CFT073 than those of UPEC strain 536 [61] (see also Fig. 5.1). Although the overall genetic structure of the investigated GEIs is very similar, important differences exist which are responsible for the nonpathogenic nature of strain Nissle 1917, whereas strain CFT073 is uropathogenic. The pheV-associated PAI of strain CFT073 carries

5.3 Comparative Genomics of Escherichia coli

the complete hly and pap gene clusters coding for the important virulence factors a-hemolysin and P fimbriae, respectively. These determinants, together with other putative hypothetical ORFs, represent a 30-kbp region which is presumably of crucial importance for this strain’s virulence properties. Interestingly, only a fragmented pap operon is present in a similar DNA context on GEI IINissle 1917 associated with a transposon-like element. Comparing the “left and right” halves of GEI IINissle1917 and the pheV-associated island of strain CFT073, it is clearly visible that there are many inversions and rearrangements in the left half of GEI IINissle1917, whereas the right half is rather colinear with the corresponding counterpart in strain CFT073 (Fig. 5.2). During evolution of GEI IINissle 1917 from a pheV-associated CFT073-like island, the intact pap gene cluster has probably been disrupted and partially deleted due to insertion of IS1 and IS10 elements and consecutive

Fig. 5.1 Comparison of the overall genome structure of three E. coli O6 strains. The chromosomal localization and the presence of virulence-associated genes on genomic islands/pathogenicity islands (gray bars) as well as of other virulence-associated gene clusters (white bars) of two uropathogenic E. coli O6 strains, CFT073 and 536, and of one nonpathogenic probiotic E. coli O6 strain, Nissle 1917, is indicated. The chromosomal maps are based on the E. coli K-12 strain MG1655 chromosome. fim, type 1 fimbrial determinant; mch/mcm, microcin determinants; foc, F1C fimbrial determinant;

iro, salmochelin determinant; HPI, high pathogenicity island; wb*O6, O6 LPS side chain biosynthesis determinant; sat, secreted autotransporter toxin determinant; iuc, aerobactin determinant; iha, IrgA homologue (Iha) adhesin determinant; kps, capsule determinant; chu, E. coli hemin uptake determinant; cva, colicin V export-related gene; wa*, LPS core biosynthesis determinant; pap, P fimbrial determinant; prf, P-related fimbriae determinant; pix, pilus involved in E. coli X2194 adhesion determinant; hly, a-hemolysin determinant; PAI, pathogenicity island.

97

98

5 Pathogenomics of Escherichia coli and Shigella Species

5.3 Comparative Genomics of Escherichia coli 3 Fig. 5.2 Comparison of the genetic organization of GEI IINissle 1917 and the pheV-associated PAI of E. coli strain CFT073 (A) demonstrating the loss of the a-hemolysin-encoding determinant (hly) and of large parts of the P fimbrial operon (pap) in strain Nissle 1917. The DNA sequences comparison of the two islands is visualized using Artemis and ACT [104].

Identical regions of the two islands are highlighted in red. Functionally related DNA regions are indicated by different colors as shown. (B) Enlarged section of GEI IINissle1917 comprising the partially deleted P fimbrial determinant. (This figure also appears with the color plates.)

recombination events. These events were important for the evolution of an ancestor of strain Nissle 1917 as they were responsible for inactivation of the P-fimbrial operon (and probably for the loss of the a-hemolysin-encoding determinant as well), reducing the hypothetical virulence capacity of the corresponding strain. According to the available data on genome content and organization of the E. coli O6 strains Nissle 1917, CFT073, and 536, it is tempting to speculate on the evolution of the nonpathogenic, probiotic character of the first strain and of uropathogenicity of the latter two isolates. Strain Nissle 1917 is characterized by a specific combination of traits which enable successful survival and colonization of the human intestine. In addition, this strain does not express important virulence factors of uropathogenic E. coli due to gene loss by deletion and point mutations [61, 62]. Meanwhile, raw genome data of nonpathogenic probiotic E. coli strain Nissle 1917 support these results and confirm that this strain’s 5.1-Mbp genome is more similar to that of UPEC strain CFT073 than to the other completely sequenced E. coli strains [63]. An interesting aspect of PAI evolution may also be duplication of entire islands. The genome of the EHEC O157:H7 strain EDL933 contains two PAI-like structures (the so-called O-islands #43 and #48) inserted next to the serW and serX tRNA-encoding genes, which comprise 106 genes and include those for tellurite resistance and urease production [33]. The sepsis strain AL862 carries two 61-kbp PAIs which are located at pheV and pheU and which contain the afimbrial adhesin variant 8 (afa8) gene cluster. Whether they are real duplicates will have to be confirmed by sequence determination of both islands [56]. 5.3.5 Plasmids and Bacteriophages

Whereas substantial knowledge has been acquired on the genetic structure and diversity of GEIs in E. coli, relatively few data exist on comparative genomics of E. coli plasmids and bacteriophages. The complete genome sequences of four E. coli strains indicate that these strains differ considerably in number of prophage-related sequences [30, 32–34]. It is well known that genes coding for different adhesins, resistance genes, and the heat-labile and the heat-stable enterotoxins of ETEC strains are localized on plasmids [64–67]. F17 fimbrial adhesin determinants have been discovered together with genes coding for the cytotoxic necrotizing factor 2 on virulence plasmids of necrotoxigenic E. coli strains [68]. Many

99

100

5 Pathogenomics of Escherichia coli and Shigella Species

EPEC strains contain large EPEC adherence factor (EAF) plasmids which carry the bfp locus coding for bundle forming pilus and the per genes whose gene products are important for the regulation of LEE-encoded gene expression [69]. The EAST1 toxin is frequently encoded on EAF plasmids [70]. The presence of the socalled pO157 plasmid is characteristic of the majority of EHEC strains. This plasmid carries the ehx genes coding for an enterohemolysin as well as for a catalase– peroxidase (katP) and potential adherence factors [105]. Several other plasmids ranging in size from 2 kbp to 87 kbp have been described in O157:H7 strains, too. Their composition and importance for virulence of these strains is unclear. Most of the EAEC strains carry large plasmids which share a high degree of homology [71]. The aggregative adherence fimbriae 1-encoding determinant has been detected on a 60-MDa plasmid which may also harbor the EAST1 toxin gene [70]. EIEC strains share a large pInv plasmid (140 MDa) with S. flexneri. This plasmid carries the invasion-related genes which encode a type III secretion apparatus (mxi, spa) as well as secreted proteins (ipa) involved in the invasion phenotype and a toxin designated Shigella enterotoxin 2 [72]. Other genes providing intestinal E. coli strains with advantageous properties, e.g., resistance to antibiotics, expression of colicins and siderophores, are plasmid-encoded as well. Colicin-encoding plasmids can also harbor siderophore- or other virulence-associated determinants [73–76]. The fact that gene blocks located on GEIs can also be present on plasmids indicates that a similar heterogeneity and diversity may exist among closely related plasmids as among functionally related islands in E. coli. The different types of shigatoxins (Stx), the major virulence factor of enterohemorrhagic E. coli (EHEC) strains, are usually encoded on temperate bacteriophages [77]. Another virulence-associated gene cluster frequently encoded on bacteriophages in E. coli encodes the cytolethal distending toxin. Whereas the cdtIII determinant of ExPEC strain 1404 is plasmid-encoded [78], similar genes in EHEC strain 493/89 may have been acquired by bacteriophage transduction [79]. 5.3.6 Genetic Diversity Among Extraintestinal Pathogenic E. coli

In recent years, knowledge about virulence traits and evolution of ExPEC has accumulated. ExPEC strains cause frequent infections in man and animals. The corresponding isolates are distinct from normal commensal and intestinal pathogenic E. coli isolates in that they typically derive from different phylogenetic groups and lack distinctive virulence factors characteristic of extraintestinal types of infection. Instead, they exhibit considerable genome diversity and possess a broad range of virulence factors including toxins, siderophores, adhesins, lipopolysaccharides, polysaccharide capsules, proteases, and invasins which are frequently encoded on islands and other mobile DNA elements [80]. Suppression subtractive hybridization analysis of avian ExPEC isolates further underlined the marked genome plasticity among human and avian ExPEC isolates. Furthermore, it turned out that even among strains causing the same disease, different alternative virulence factors can be involved in a “mix and match” combinatorial fashion

5.4 Conclusions

at each step of the infection [81, 82]. It has been shown that a considerable fraction of genetic information of ExPEC, which has so far been considered as virulenceassociated, is also present in many commensal, probiotic E. coli isolates [61, 83]. Thus, many of these features can be considered rather as contributing to fitness (e.g., iron uptake systems, bacteriocins, proteases, fimbriae, and other adhesins), thereby generally increasing adaptability, competitiveness, and the ability to colonize the human body efficiently, than as typical virulence factors directly involved in infection. Whether a commensal E. coli will develop into a pathogen depends not only on the acquisition of fitness-conferring genetic information enabling successful colonization of the host, but also requires the presence of functional genes directly contributing to pathogenesis. This highlights the thin line between “virulence” and “fitness” or “colonization” factors and questions the definition of several “ExPEC virulence factors.”

5.4 Conclusions

Comparative genomics analysis of E. coli and S. flexneri revealed that horizontal gene transfer, gene loss, and IS element-mediated chromosomal rearrangements play important roles in the evolution of these pathogens. Strain differences are not exclusively restricted to large islands, but also occur in a considerable number of smaller gene clusters coding for integrative functions such as nutrient utilization or surface modification. There is no doubt that the large PAIs are of major importance in conferring a virulent phenotype; however, the smaller clusters are of additional benefit in the course of an infection. Comparison of the available complete genome sequences revealed the genetic diversity underlying the phenotypic diversity of this species. Gene acquisition and loss is extensive, providing different lineages with distinct metabolic, pathogenic, and other capabilities. No clear distinction can be drawn between pathogenic and commensal E. coli strains, particularly in regard to extraintestinal disease. As colonizing sites outside the gut are unlikely to provide any selective advantage in terms of transmissibility, it is clear that any so-called “extraintestinal virulence factors” are likely to have evolved to enhance survival in the gut and/or transmission between hosts, and therefore will be shared with at least some commensal strains. In addition, genomic islands are seldom fixed, as first thought, and bear the potential for ongoing rearrangements, deletions, and insertions. Accessory DNA elements especially, such as IS elements and transposons, play a pivotal role during genome plasticity and evolution in Shigella and E. coli. It is becoming more and more evident that genome evolution in these bacteria cannot be described by a simple “backbone and flexible gene pool” model, but also by “palimpsest” where parts of the genome have seen repeated insertions and deletions. There is still considerable interest in sequencing additional E. coli and other Shigella genomes. The existing data raise important questions about the levels and mechanisms generating and maintaining genome variability within and between

101

102

5 Pathogenomics of Escherichia coli and Shigella Species

E. coli and Shigella populations that will require substantial genome-scale sequence information to resolve. Furthermore, knowledge of all proteins expressed in a given bacterial pathogen will provide the entire repertoire of surface proteins that are potential vaccine candidates. Genome comparison between closely related pathogenic and nonpathogenic variants will reveal factors involved in pathogenicity, and furthermore it is anticipated that factors that may direct host specificity of, e.g., human and animal ExPEC isolates could be identified by pathogenomics.

Acknowledgments

Our work relating to this topic was supported by the Deutsche Forschungsgemeinschaft (Sonderforschungsbereich 479), by the “Bayerische Forschungsstiftung”, the Bundesministerium fr Bildung und Forschung (BMBF) competence network “Pathogenomics”, and by the European Union (COLIRISK). We thank G. Gottschalk (Gttingen) and C. Buchrieser (Paris) for fruitful collaboration.

References 1 Woese, C. R. 1987. Bacterial evolution. 2

3

4

5

6

Microbiol Rev. 51:221–271. Pupo, G. M., R. Lan, and P. R. Reeves. 2000. Multiple independent origins of Shigella clones of Escherichia coli and convergent evolution of many of their characteristics. Proc Natl Acad Sci U S A. 97:10567–10572. Achtman, M., and G. Pluschke. 1986. Clonal analysis of descent and virulence among selected Escherichia coli. Annu Rev Microbiol. 40:185–210. Maslow, J. N., T. S. Whittam, C. F. Gilks, R. A. Wilson, M. E. Mulligan, K. S. Adams, and R. D. Arbeit. 1995. Clonal relationships among bloodstream isolates of Escherichia coli. Infect Immun. 63:2409–2417. Boyd, E. F., and D. L. Hartl. 1998. Chromosomal regions specific to pathogenic isolates of Escherichia coli have a phylogenetically clustered distribution. J Bacteriol. 180:1159–1165. Reid, S. D., C. J. Herbelin, A. C. Bumbaugh, R. K. Selander, and T. S. Whittam. 2000. Parallel evolution of viru-

7

8

9

10

11

12

lence in pathogenic Escherichia coli. Nature. 406:64–67. Wang, F. S., T. S. Whittam, and R. K. Selander. 1997. Evolutionary genetics of the isocitrate dehydrogenase gene (icd) in Escherichia coli and Salmonella enterica. J Bacteriol. 179:6551–6559. Lawrence, J. G., and H. Ochman. 1998. Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci U S A. 95:9413–9417. Ochman, H., J. G. Lawrence, and E. A. Groisman. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature. 405:299–304. Hacker, J., and J. B. Kaper eds. 2002. Pathogenicity islands and the evolution of pathogenic microbes. Springer, Berlin Heidelberg New York. Dobrindt, U., B. Hochhut, U. Hentschel, and J. Hacker. 2004. Genomic islands in pathogenic and environmental microorganisms. Nat Rev Microbiol. 2:414–424. Jennison, A. V., and N. K. Verma. 2004. Shigella flexneri infection: pathogenesis and vaccine development. FEMS Microbiol Rev. 28:43–58.

References 13 Cossart, P., and P. J. Sansonetti. 2004.

14

15

16

17

18

19

20

Bacterial invasion: the paradigms of enteroinvasive pathogens. Science. 304:242–248. Jin, Q., Z. Yuan, J. Xu, Y. Wang, Y. Shen, W. Lu, J. Wang, H. Liu, J. Yang, F. Yang, X. Zhang, J. Zhang, G. Yang, H. Wu, D. Qu, J. Dong, L. Sun, Y. Xue, A. Zhao, Y. Gao, J. Zhu, B. Kan, K. Ding, S. Chen, H. Cheng, Z. Yao, B. He, R. Chen, D. Ma, B. Qiang, Y. Wen, Y. Hou, and J. Yu. 2002. Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res. 30:4432–4441. Wei, J., M. B. Goldberg, V. Burland, M. M. Venkatesan, W. Deng, G. Fournier, G. F. Mayhew, G. Plunkett, 3rd, D. J. Rose, A. Darling, B. Mau, N. T. Perna, S. M. Payne, L. J. Runyen– Janecky, S. Zhou, D. C. Schwartz, and F. R. Blattner. 2003. Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457T. Infect Immun. 71:2775–2786. Ochman, H., and N. A. Moran. 2001. Genes lost and genes found: evolution of bacterial pathogenesis and symbiosis. Science. 292:1096–1099. Sokurenko, E. V., D. L. Hasty, and D. E. Dykhuizen. 1999. Pathoadaptive mutations: gene loss and variation in bacterial pathogens. Trends Microbiol. 7:191– 195. Maurelli, A. T., R. E. Fernandez, C. A. Bloch, C. K. Rode, and A. Fasano. 1998. “Black holes” and bacterial pathogenicity: a large genomic deletion that enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli. Proc Natl Acad Sci U S A. 95:3943–3948. Fernandez, I. M., M. Silva, R. Schuch, W. A. Walker, A. M. Siber, A. T. Maurelli, and B. A. McCormick. 2001. Cadaverine prevents the escape of Shigella flexneri from the phagolysosome: a connection between bacterial dissemination and neutrophil transepithelial signaling. J Infect Dis. 184:743–753. Parkhill, J., G. Dougan, K. D. James, N. R. Thomson, D. Pickard, J. Wain, C. Churcher, K. L. Mungall, S. D. Bentley, M. T. Holden, M. Sebaihia, S. Baker,

D. Basham, K. Brooks, T. Chillingworth, P. Connerton, A. Cronin, P. Davis, R. M. Davies, L. Dowd, N. White, J. Farrar, T. Feltwell, N. Hamlin, A. Haque, T. T. Hien, S. Holroyd, K. Jagels, A. Krogh, T. S. Larsen, S. Leather, S. Moule, P. O’Gaora, C. Parry, M. Quail, K. Rutherford, M. Simmonds, J. Skelton, K. Stevens, S. Whitehead, and B. G. Barrell. 2001. Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature. 413:848–852. 21 Venkatesan, M. M., M. B. Goldberg, D. J. Rose, E. J. Grotbeck, V. Burland, and F. R. Blattner. 2001. Complete DNA sequence and analysis of the large virulence plasmid of Shigella flexneri. Infect Immun. 69:3271–3285. 22 Buchrieser, C., P. Glaser, C. Rusniok, H. Nedjari, H. D’Hauteville, F. Kunst, P. Sansonetti, and C. Parsot. 2000. The virulence plasmid pWR100 and the repertoire of proteins secreted by the type III secretion apparatus of Shigella flexneri. Mol Microbiol. 38:760–771. 23 Lan, R., G. Stevenson, and P. R. Reeves. 2003. Comparison of two major forms of the Shigella virulence plasmid pINV: positive selection is a major force driving the divergence. Infect Immun. 71:6298–6306. 24 Brssow, H., C. Canchaya, and W. D. Hardt. 2004. Phages and the evolution of bacterial pathogens: from genomic rearrangements to lysogenic conversion. Microbiol Mol Biol Rev. 68:560–602. 25 Allison, G. E., and N. K. Verma. 2000. Serotype-converting bacteriophages and O-antigen modification in Shigella flexneri. Trends Microbiol. 8:17–23. 26 Al-Hasani, K., I. R. Henderson, H. Sakellaris, K. Rajakumar, T. Grant, J. P. Nataro, R. Robins-Browne, and B. Adler. 2000. The sigA gene which is borne on the she pathogenicity island of Shigella flexneri 2a encodes an exported cytopathic protease involved in intestinal fluid accumulation. Infect Immun. 68:2457–2463. 27 Vokes, S. A., S. A. Reeves, A. G. Torres, and S. M. Payne. 1999. The aerobactin iron transport system genes in Shigella

103

104

5 Pathogenomics of Escherichia coli and Shigella Species flexneri are present within a pathogenicity island. Mol Microbiol. 33:63–73. 28 Purdy, G. E., and S. M. Payne. 2001. The SHI-3 iron transport island of Shigella boydii 0-1392 carries the genes for aerobactin synthesis and transport. J Bacteriol. 183:4176-4182. 29 Lan, R., and P. R. Reeves. 2002. Escherichia coli in disguise: molecular origins of Shigella. Microbes Infect. 4:1125– 1132. 30 Blattner, F. R., G. Plunkett, 3rd, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Collado-Vides, J. D. Glasner, C. K. Rode, G. F. Mayhew, J. Gregor, N. W. Davis, H. A. Kirkpatrick, M. A. Goeden, D. J. Rose, B. Mau, and Y. Shao. 1997. The complete genome sequence of Escherichia coli K-12. Science. 277:1453– 1474. 31 Bergthorsson, U., and H. Ochman. 1998. Distribution of chromosome length variation in natural isolates of Escherichia coli. Mol Biol Evol. 15:6–16. 32 Hayashi, T., K. Makino, M. Ohnishi, K. Kurokawa, K. Ishii, K. Yokoyama, C. G. Han, E. Ohtsubo, K. Nakayama, T. Murata, M. Tanaka, T. Tobe, T. Iida, H. Takami, T. Honda, C. Sasakawa, N. Ogasawara, T. Yasunaga, S. Kuhara, T. Shiba, M. Hattori, and H. Shinagawa. 2001. Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res. 8:11–22. 33 Perna, N. T., G. Plunkett, 3rd, V. Burland, B. Mau, J. D. Glasner, D. J. Rose, G. F. Mayhew, P. S. Evans, J. Gregor, H. A. Kirkpatrick, G. Posfai, J. Hackett, S. Klink, A. Boutin, Y. Shao, L. Miller, E. J. Grotbeck, N. W. Davis, A. Lim, E. T. Dimalanta, K. D. Potamousis, J. Apodaca, T. S. Anantharaman, J. Lin, G. Yen, D. C. Schwartz, R. A. Welch, and F. R. Blattner. 2001. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature. 409:529–533. 34 Welch, R. A., V. Burland, G. Plunkett, 3rd, P. Redford, P. Roesch, D. Rasko, E. L. Buckles, S. R. Liou, A. Boutin, J. Hackett, D. Stroud, G. F. Mayhew, D. J. Rose, S. Zhou, D. C. Schwartz, N. T. Perna, H. L. Mobley, M. S. Don-

35

36

37

38

39

40

41

42

43

nenberg, and F. R. Blattner. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A. 99:17020–17024. Heimer, S. R., R. A. Welch, N. T. Perna, G. Posfai, P. S. Evans, J. B. Kaper, F. R. Blattner, and H. L. Mobley. 2002. Urease of enterohemorrhagic Escherichia coli: evidence for regulation by fur and a trans—acting factor. Infect Immun. 70:1027–1031. Ohnishi, M., K. Kurokawa, and T. Hayashi. 2001. Diversification of Escherichia coli genomes: are bacteriophages the major contributors? Trends Microbiol. 9:481–485. Ohnishi, M., J. Terajima, K. Kurokawa, K. Nakayama, T. Murata, K. Tamura, Y. Ogura, H. Watanabe, and T. Hayashi. 2002. Genomic diversity of enterohemorrhagic Escherichia coli O157 revealed by whole genome PCR scanning. Proc Natl Acad Sci U S A. 99:17043–17048. Ochman, H., and I. B. Jones. 2000. Evolutionary dynamics of full genome content in Escherichia coli. EMBO J. 19:6637–6643. Dobrindt, U., F. Agerer, K. Michaelis, A. Janka, C. Buchrieser, M. Samuelson, C. Svanborg, G. Gottschalk, H. Karch, and J. Hacker. 2003. Analysis of genome plasticity in pathogenic and commensal Escherichia coli isolates by use of DNA arrays. J Bacteriol. 185:1831–1840. Fukiya, S., H. Mizoguchi, T. Tobe, and H. Mori. 2004. Extensive genomic diversity in pathogenic Escherichia coli and Shigella strains revealed by comparative genomic hybridization microarray. J Bacteriol. 186:3911–3921. Anjum, M. F., S. Lucchini, A. Thompson, J. C. Hinton, and M. J. Woodward. 2003. Comparative genomic indexing reveals the phylogenomics of Escherichia coli pathogens. Infect Immun. 71:4674– 4683. Cheetham, B. F., and M. E. Katz. 1995. A role for bacteriophages in the evolution and transfer of bacterial virulence determinants. Mol Microbiol. 18:201– 208. Dobrindt, U., G. Blum-Oehler, G. Nagy, G. Schneider, A. Johann, G. Gottschalk,

References

44

45

46

47

48

49

50

51

and J. Hacker. 2002. Genetic structure and distribution of four pathogenicity islands (PAI I(536) to PAI IV(536)) of uropathogenic Escherichia coli strain 536. Infect Immun. 70:6365–6372. Moss, J. E., T. J. Cardozo, A. Zychlinsky, and E. A. Groisman. 1999. The selCassociated SHI-2 pathogenicity island of Shigella flexneri. Mol Microbiol. 33:74– 83. Burrus, V., G. Pavlovic, B. Decaris, and G. Guedon. 2002. Conjugative transposons: the tip of the iceberg. Mol Microbiol. 46:601–610. Schubert, S., S. Dufke, J. Sorsa, and J. Heesemann. 2004. A novel integrative and conjugative element (ICE) of Escherichia coli: the putative progenitor of the Yersinia high-pathogenicity island. Mol Microbiol. 51:837–848. Hacker, J., and E. Carniel. 2001. Ecological fitness, genomic islands and bacterial pathogenicity. A Darwinian view of the evolution of microbes. EMBO Rep. 2:376–381. Bingen-Bidois, M., O. Clermont, S. Bonacorsi, M. Terki, N. Brahimi, C. Loukil, D. Barraud, and E. Bingen. 2002. Phylogenetic analysis and prevalence of urosepsis strains of Escherichia coli bearing pathogenicity island-like domains. Infect Immun. 70:3216–3226. Bonacorsi, S., O. Clermont, V. Houdouin, C. Cordevant, N. Brahimi, A. Marecat, C. Tinsley, X. Nassif, M. Lange, and E. Bingen. 2003. Molecular analysis and experimental virulence of French and North American Escherichia coli neonatal meningitis isolates: identification of a new virulent clone. J Infect Dis. 187:1895–1906. Johnson, J. R., P. Delavari, M. Kuskowski, and A. L. Stell. 2001. Phylogenetic distribution of extraintestinal virulence–associated traits in Escherichia coli. J Infect Dis. 183:78–88. Johnson, J. R., T. T. O’Bryan, M. Kuskowski, and J. N. Maslow. 2001. Ongoing horizontal and vertical transmission of virulence genes and papA alleles among Escherichia coli blood isolates from patients with diverse-source bacteremia. Infect Immun. 69:5363– 5374.

52 Johnson, J. R., A. L. Stell, P. Delavari,

53

54

55

56

57

58

59

A. C. Murray, M. Kuskowski, and W. Gaastra. 2001. Phylogenetic and pathotypic similarities between Escherichia coli isolates from urinary tract infections in dogs and extraintestinal infections in humans. J Infect Dis. 183:897–906. Swenson, D. L., N. O. Bukanov, D. E. Berg, and R. A. Welch. 1996. Two pathogenicity islands in uropathogenic Escherichia coli J96: cosmid cloning and sample sequencing. Infect Immun. 64:3736–3743. Kao, J. S., D. M. Stucker, J. W. Warren, and H. L. Mobley. 1997. Pathogenicity island sequences of pyelonephritogenic Escherichia coli CFT073 are associated with virulent uropathogenic strains. Infect Immun. 65:2812–2820. Guyer, D. M., J. S. Kao, and H. L. Mobley. 1998. Genomic analysis of a pathogenicity island in uropathogenic Escherichia coli CFT073: distribution of homologous sequences among isolates from patients with pyelonephritis, cystitis, and catheter-associated bacteriuria and from fecal samples. Infect Immun. 66:4411–4417. Lalioui, L., and C. Le Bouguenec. 2001. afa–8 Gene cluster is carried by a pathogenicity island inserted into the tRNA(Phe) of human and bovine pathogenic Escherichia coli isolates. Infect Immun. 69:937–948. Redford, P., and R. A. Welch. 2002. Extraintestinal Escherichia coli as a model system for the study of pathogenicity islands. Curr Top Microbiol Immunol. 264:15–30. Castillo, A., L. E. Eguiarte, and V. Souza. 2005. A genomic population genetics analysis of the pathogenic enterocyte effacement island in Escherichia coli: the search for the unit of selection. Proc Natl Acad Sci U S A. 102:1542–1547. Rumer, L., J. Jores, P. Kirsch, Y. Cavignac, K. Zehmke, and L. H. Wieler. 2003. Dissemination of pheU- and pheV-located genomic islands among enteropathogenic (EPEC) and enterohemorrhagic (EHEC) E. coli and their possible role in the horizontal transfer of the locus of enterocyte effacement

105

106

5 Pathogenomics of Escherichia coli and Shigella Species

60

61

62

63

64

65

66

67

(LEE). Int J Med Microbiol. 292:463– 475. Jores, J., L. Rumer, and L. H. Wieler. 2004. Impact of the locus of enterocyte effacement pathogenicity island on the evolution of pathogenic Escherichia coli. Int J Med Microbiol. 294:103–113. Grozdanov, L., C. Raasch, J. Schulze, U. Sonnenborn, G. Gottschalk, J. Hacker, and U. Dobrindt. 2004. Analysis of the genome structure of the nonpathogenic probiotic Escherichia coli strain Nissle 1917. J Bacteriol. 186:5432–5441. Grozdanov, L., U. Zhringer, G. BlumOehler, L. Brade, A. Henne, Y. A. Knirel, U. Schombel, J. Schulze, U. Sonnenborn, G. Gottschalk, J. Hacker, E. T. Rietschel, and U. Dobrindt. 2002. A single nucleotide exchange in the wzy gene is responsible for the semirough O6 lipopolysaccharide phenotype and serum sensitivity of Escherichia coli strain Nissle 1917. J Bacteriol. 184:5912–5925. Sun, J., F. Gunzer, A. M. Westendorf, J. Buer, M. Scharfe, M. Jarek, F. Gssling, H. Blcker, and A. P. Zeng. 2005. Genomic peculiarity of coding sequences and metabolic potential of probiotic Escherichia coli strain Nissle 1917 inferred from raw genome data. J Biotechnol. 117:147–161. Bertin, A. 1998. Phenotypic expression of K88 adhesion alone or simultaneously with K99 and/or F41 adhesins in the bovine enterotoxigenic Escherichia coli strain B41. Vet Microbiol. 59:283–294. Gomez-Duarte, O. G., A. Ruiz-Tagle, D. C. Gomez, G. I. Viboud, K. G. Jarvis, J. B. Kaper, and J. A. Giron. 1999. Identification of lngA, the structural gene of longus type IV pilus of enterotoxigenic Escherichia coli. Microbiology. 145[Pt 7]:1809–1816. Mazaitis, A. J., R. Maas, and W. K. Maas. 1981. Structure of a naturally occurring plasmid with genes for enterotoxin production and drug resistance. J Bacteriol. 145:97–105. Mainil, J. G., G. Daube, E. Jacquemin, P. Pohl, and A. Kaeckenbeeck. 1998. Virulence plasmids of enterotoxigenic

68

69

70

71

72

73

74

75

76

Escherichia coli isolates from piglets. Vet Microbiol. 62:291–301. Mainil, J. G., J. Gerardin, and E. Jacquemin. 2000. Identification of the F17 fimbrial subunit- and adhesin-encoding (f17A and f17G) gene variants in necrotoxigenic Escherichia coli from cattle, pigs and humans. Vet Microbiol. 73:327–335. Donnenberg, M. S., J. A. Giron, J. P. Nataro, and J. B. Kaper. 1992. A plasmid-encoded type IV fimbrial gene of enteropathogenic Escherichia coli associated with localized adherence. Mol Microbiol. 6:3427–3437. Nataro, J. P., and J. B. Kaper. 1998. Diarrheagenic Escherichia coli. Clin Microbiol Rev. 11:142–201. Vial, P. A., R. Robins-Browne, H. Lior, V. Prado, J. B. Kaper, J. P. Nataro, D. Maneval, A. Elsayed, and M. M. Levine. 1988. Characterization of enteroadherent-aggregative Escherichia coli, a putative agent of diarrheal disease. J Infect Dis. 158:70–79. Nataro, J. P., J. Seriwatana, A. Fasano, D. R. Maneval, L. D. Guers, F. Noriega, F. Dubovsky, M. M. Levine, and J. G. Morris, Jr. 1995. Identification and cloning of a novel plasmid-encoded enterotoxin of enteroinvasive Escherichia coli and Shigella strains. Infect Immun. 63:4721–4728. Martinez, J. L., M. Herrero, and V. de Lorenzo. 1994. The organization of intercistronic regions of the aerobactin operon of pColV-K30 may account for the differential expression of the iucABCD iutA genes. J Mol Biol. 238:288–293. Otto, B. R., S. J. van Dooren, J. H. Nuijens, J. Luirink, and B. Oudega. 1998. Characterization of a hemoglobin protease secreted by the pathogenic Escherichia coli strain EB1. J Exp Med. 188:1091–1103. Gophna, U., A. Parket, J. Hacker, and E. Z. Ron. 2003. A novel ColV plasmid encoding type IV pili. Microbiology. 149:177–184. Gomez–Lus, R. 1998. Evolution of bacterial resistance to antibiotics during the last three decades. Int Microbiol. 1:279– 284.

References 77 Herold, S., H. Karch, and H. Schmidt.

78

79

80

81

82

83

84

85

2004. Shiga toxin-encoding bacteriophages – genomes in motion. Int J Med Microbiol. 294:115–121. Peres, S. Y., O. Marches, F. Daigle, J. P. Nougayrede, F. Herault, C. Tasca, J. De Rycke, and E. Oswald. 1997. A new cytolethal distending toxin (CDT) from Escherichia coli producing CNF2 blocks HeLa cell division in G2/M phase. Mol Microbiol. 24:1095–1107. Janka, A., M. Bielaszewska, U. Dobrindt, L. Greune, M. A. Schmidt, and H. Karch. 2003. Cytolethal distending toxin gene cluster in enterohemorrhagic Escherichia coli O157:H- and O157:H7: characterization and evolutionary considerations. Infect Immun. 71:3634– 3638. Johnson, J. R., and T. A. Russo. 2002. Extraintestinal pathogenic Escherichia coli: “the other bad E coli.” J Lab Clin Med. 139:155–162. Mokady, D., U. Gophna, and E. Z. Ron. 2005. Extensive gene diversity in septicemic Escherichia coli strains. J Clin Microbiol. 43:66–73. Schouler, C., F. Koffmann, C. Amory, S. Leroy-Setrin, and M. Moulin-Schouleur. 2004. Genomic subtraction for the identification of putative new virulence factors of an avian pathogenic Escherichia coli strain of O2 serogroup. Microbiology. 150:2973–2984. Hejnova, J., U. Dobrindt, R. Nemcova, C. Rusniok, A. Bomba, L. Frangeul, J. Hacker, P. Glaser, P. Sebo, and C. Buchrieser. 2005. Characterization of the flexible genome complement of the commensal Escherichia coli strain A0 34/86 (O83:K24:H31). Microbiology. 151:385–398. Schneider, G., U. Dobrindt, H. Brggemann, G. Nagy, B. Janke, G. Blum-Oehler, C. Buchrieser, G. Gottschalk, L. Emdy, and J. Hacker. 2004. The pathogenicity island-associated K15 capsule determinant exhibits a novel genetic structure and correlates with virulence in uropathogenic Escherichia coli strain 536. Infect Immun. 72:5993– 6001. Rasko, D. A., J. A. Phillips, X. Li, and H. L. Mobley. 2001. Identification of

DNA sequences from a second pathogenicity island of uropathogenic Escherichia coli CFT073: probes specific for uropathogenic populations. J Infect Dis. 184:1041–1049. 86 Cieslewicz, M., and E. Vimr. 1997. Reduced polysialic acid capsule expression in Escherichia coli K1 mutants with chromosomal defects in kpsF. Mol Microbiol. 26:237–249. 87 Houdouin, V., S. Bonacorsi, N. Brahimi, O. Clermont, X. Nassif, and E. Bingen. 2002. A uropathogenicity island contributes to the pathogenicity of Escherichia coli strains that cause neonatal meningitis. Infect Immun. 70:5865–5869. 88 Parreira, V. R., and C. L. Gyles. 2003. A novel pathogenicity island integrated adjacent to the thrW tRNA gene of avian pathogenic Escherichia coli encodes a vacuolating autotransporter toxin. Infect Immun. 71:5087–5096. 89 Elliott, S. J., L. A. Wainwright, T. K. McDaniel, K. G. Jarvis, Y. K. Deng, L. C. Lai, B. P. McNamara, M. S. Donnenberg, and J. B. Kaper. 1998. The complete sequence of the locus of enterocyte effacement (LEE) from enteropathogenic Escherichia coli E2348/69. Mol Microbiol. 28:1–4. 90 Jores, J., L. Rumer, S. Kiessling, J. B. Kaper, and L. H. Wieler. 2001. A novel locus of enterocyte effacement (LEE) pathogenicity island inserted at pheV in bovine Shiga toxin-producing Escherichia coli strain O103:H2. FEMS Microbiol Lett. 204:75–79. 91 Sperandio, V., J. B. Kaper, M. R. Bortolini, B. C. Neves, R. Keller, and L. R. Trabulsi. 1998. Characterization of the locus of enterocyte effacement (LEE) in different enteropathogenic Escherichia coli (EPEC) and Shiga-toxin producing Escherichia coli (STEC) serotypes. FEMS Microbiol Lett. 164:133–139. 92 Zhu, C., T. S. Agin, S. J. Elliott, L. A. Johnson, T. E. Thate, J. B. Kaper, and E. C. Boedeker. 2001. Complete nucleotide sequence and analysis of the locus of enterocyte Effacement from rabbit diarrheagenic Escherichia coli RDEC–1. Infect Immun. 69:2107–2115. 93 Tauschek, M., R. A. Strugnell, and R. M. Robins-Browne. 2002. Characterization

107

108

5 Pathogenomics of Escherichia coli and Shigella Species

94

95

96

97

98

99

and evidence of mobilization of the LEE pathogenicity island of rabbit-specific strains of enteropathogenic Escherichia coli. Mol Microbiol. 44:1533–1550. Mellies, J. L., F. Navarro-Garcia, I. Okeke, J. Frederickson, J. P. Nataro, and J. B. Kaper. 2001. espC Pathogenicity island of enteropathogenic Escherichia coli encodes an enterotoxin. Infect Immun. 69:315–324. Keller, R., J. G. Ordonez, R. R. de Oliveira, L. R. Trabulsi, T. J. Baldwin, and S. Knutton. 2002. Afa, a diffuse adherence fibrillar adhesin associated with enteropathogenic Escherichia coli. Infect Immun. 70:2681–2689. Schmidt, H., W. L. Zhang, U. Hemmrich, S. Jelacic, W. Brunder, P. I. Tarr, U. Dobrindt, J. Hacker, and H. Karch. 2001. Identification and characterization of a novel genomic island integrated at selC in locus of enterocyte effacement-negative, Shiga toxin-producing Escherichia coli. Infect Immun. 69:6863– 6873. Fleckenstein, J. M., D. J. Kopecko, R. L. Warren, and E. A. Elsinghorst. 1996. Molecular characterization of the tia invasion locus from enterotoxigenic Escherichia coli. Infect Immun. 64:2256– 2265. Bach, S., C. Buchrieser, M. Prentice, A. Guiyoule, T. Msadek, and E. Carniel. 1999. The high-pathogenicity island of Yersinia enterocolitica Ye8081 undergoes low-frequency deletion but not precise excision, suggesting recent stabilization in the genome. Infect Immun. 67:5091– 5099. Karch, H., S. Schubert, D. Zhang, W. Zhang, H. Schmidt, T. Olschlager, and J. Hacker. 1999. A genomic island,

termed high-pathogenicity island, is present in certain non-O157 Shiga toxin-producing Escherichia coli clonal lineages. Infect Immun. 67:5994–6001. 100 Schubert, S., A. Rakin, H. Karch, E. Carniel, and J. Heesemann. 1998. Prevalence of the “high-pathogenicity island” of Yersinia species among Escherichia coli strains that are pathogenic to humans. Infect Immun. 66:480–485. 101 Rajakumar, K., C. Sasakawa, and B. Adler. 1997. Use of a novel approach, termed island probing, identifies the Shigella flexneri she pathogenicity island which encodes a homolog of the immunoglobulin A protease-like family of proteins. Infect Immun. 65:4606–4614. 102 Luck, S. N., S. A. Turner, K. Rajakumar, H. Sakellaris, and B. Adler. 2001. Ferric dicitrate transport system (Fec) of Shigella flexneri 2a YSH6000 is encoded on a novel pathogenicity island carrying multiple antibiotic resistance genes. Infect Immun. 69:6012–6021. 103 Adhikari, P., G. Allison, B. Whittle, and N. K. Verma. 1999. Serotype 1a O-antigen modification: molecular characterization of the genes involved and their novel organization in the Shigella flexneri chromosome. J Bacteriol. 181:4711– 4718. 104 Rutherford, K., J. Parkhill, J. Crook, T. Horsnell, P. Rice, M. A. Rajandream, and B. Barrell. 2000. Artemis: sequence visualization and annotation. Bioinformatics. 16:944–945. 105 Brunder, W., Schmidt, H., Karch, H. (1996) KatP, a novel catalyse-peroxidase encoded by the Large plasmid of enterohaemorrhagic Escherichia coli O157:H7. Microbiology 142:3305–3315.

109

6 Pathogenomics of Salmonella Species Helene Andrews-Polymenis and Andreas J. Bumler

6.1 Introduction

The genus Salmonella contains a group of closely related pathogens that can be differentiated by serology into 2463 serotypes [1]. The genus contains two species, Salmonella bongori (containing 20 different serotypes) and Salmonella enterica (containing 2443 different serotypes). S. enterica is further subdivided into six subspecies: enterica (subspecies I), salamae (subspecies II), arizonae (subspecies IIIa), diarizonae (subspecies IIIb), houtenae (subspecies IV), and indica (subspecies VI) [1]. Unlike members of the closely related genus Escherichia, which contains both pathogenic and commensal organisms, all members of the genus Salmonella are considered pathogenic for humans. However, Salmonella serotypes vary with regard to their natural host range and the disease manifestations associated with infection of different host species. Genomic comparison of Salmonella serotypes with closely related commensal organisms has revealed genetic differences that are important for their pathogenic lifestyle. Identification of genomic differences between Salmonella serotypes is beginning to provide new insights into the genetic basis of host adaptation and the pathogenesis of certain diseases. This chapter will review recent progress in understanding Salmonella pathogenesis and evolution as a result of whole genome sequencing.

6.2 Salmonella Signature Genes

What genetic features set Salmonella serotypes apart from closely related organisms such as E. coli? Comparison of bacterial genomes shows that Salmonella-specific DNA regions encode many of the biological properties that distinguish Salmonella serotypes from other members of the family Enterobacteriaceae. An alignment of the chromosomes of S. enterica serotype Typhimurium and E. coli K-12 provided the first evidence that their collinear genetic maps are interrupted by

110

6 Pathogenomics of Salmonella Species

large DNA segments that are present in only one of these organisms [2, 3]. Highresolution global analysis of these genetic differences was initiated with the completion of the S. enterica serotype Typhimurium LT2 genome sequence [4]. McClelland and coworkers used genomic subtraction to identify 935 genes present in S. enterica serotype Typhimurium (LT2), but absent from E. coli (K-12 and O157:H7), and from sample sequences of Klebsiella pneumoniae (MGH 78578) and Yersinia pestis (CO92) [5, 6]. Of these 935 genes, 224 were never scored as absent when genomic DNA of 22 Salmonella serotypes was hybridized to an LT2 microarray containing 4483 of the 4596 annotated S. Typhimurium ORFs [5]. Furthermore, 56 of these 224 genes were reliably identified as being present in all 22 Salmonella serotypes by microarray and were designated Salmonella “signature” genes. In a similar study, Falkow and coworkers hybridized genomic DNA from 18 different Salmonella serotypes to a S. enterica serotype Typhimurium strain LT2 microarray containing 4169 ORFs. This work revealed a list of 2244 genes shared by all members of the genus [7]. Some 1931 of these genes are also present in the genome of E. coli K-12, thus leaving a list of some 313 Salmonella signature genes. For Salmonella signature genes whose functions are known it seems clear that their presence explains many of the biological features that distinguish Salmonella serotypes from closely related enterobacteria such as E. coli. One important biochemical property for differentiating Salmonella serotypes from E. coli or Shigella serotypes is their ability to produce hydrogen sulfide. This biochemical reaction is part of a complex pathway (involving 88 genes) that allows Salmonella serotypes to utilize 1,2-propanediol and ethanolamine as carbon sources under anaerobic conditions [8]. Genes required for degradation of ethanolamine are encoded by the eutSPQTDMNEJGHABCLK operon whose expression is controlled by the adjacent eutR regulatory gene [9, 10]. Genes required for degradation of 1,2-propanediol are encoded by a gene cluster consisting of porR, pduF, and the pduABCDEGHJKLMNOPQSTUVWX operon [11]. The degradation of 1,2-propanediol and ethanolamine furthermore requires the cofactor cobalamin (vitamin B12), whose biosynthesis involves genes encoded by the cobCD operon, cysG, and the cobI operon (cbiABCDETFGHJKLMNQOPUST) [12]. Cobalamin biosynthesis occurs only anaerobically in Salmonella serotypes [13, 14]. The only known electron acceptor supporting anaerobic growth of Salmonella serotypes with 1,2-propanediol or ethanolamine as carbon source is tetrathionate [8]. The genes involved in using tetrathionate as a terminal respiratory electron acceptor are encoded by the divergently described operons ttrSR and ttrBCA [15]. During anaerobic growth on 1,2-propanediol or ethanolamine, Salmonella serotypes reduce tetrathionate to thiosulfate, which is then further reduced to hydrogen sulfide by enzymes encoded by the asrABC operon [16] and phsABC operon [17]. Although the cbiABCDETFGHJKLMNQOPUST operon, the cobCD operon, the porR pduF pduABCDEGHJKLMNOPQSTUVWX gene cluster, the ttrSR ttrBCA gene cluster, the asrABC operon, and the phsABC operon are located at different map positions on the S. Typhimurium genome, each operon contains Salmonella signature genes identified by microarray analysis (Fig. 6.1) [7], while only the eut operon and cysG are also present in E. coli. Despite the fact that these operons act in concert

6.2 Salmonella Signature Genes

during 1,2-propanediol and ethanolamine respiration, their map position and phylogenetic distribution suggests that each was either independently acquired by horizontal gene transfer in the genus Salmonella or lost by deletion from E. coli during divergence of their lineages. 1,2-Propanediol is produced by the fermentation of the common plant sugar rhamnose [18], suggesting that its utilization may contribute to intestinal colonization of Salmonella serotypes.

Fig. 6.1 Location on the S. enterica serotype Typhimurium LT2 chromosome of Salmonella signature genes involved in 1,2-propanediol respiration and epithelial invasion. The global transcriptional regulator CsrA controls expression of genes required for both phenotypic pathways.

A second characteristic that differentiates Salmonella serotypes from E. coli is the ability to cause invasive enteric infections resulting in an inflammatory diarrhea [19]. The genes important for invasion of the intestinal epithelium and the elicitation of an inflammatory response in the intestine are among the Salmonella signature genes identified by microarray analysis [5, 7]. Invasion of intestinal epithelial cells is mediated by a type III secretion system (T3SS-1) encoded by 31 genes located on Salmonella pathogenicity island 1 (SPI1) (Fig. 6.1) [20]. With the exception of the avrA gene [21], T3SS1 genes located on SPI1 are present in all Salmonella serotypes but absent from the genomes of related enterobacteria [22]. The T3SS-1 translocates effector proteins into the host cell cytosol. Seven T3SS-1 translocated effector proteins are encoded by genes located outside SPI1, including sopA [23], sopB (sigD) (located on SPI5) [24], sopD [25], sopE [26], sopE2 [27], slrP [28], and sspH1 [29]. The genes sipA, sopB, sopD, and sopE2 are present in all phylogenetic lineages of the genus Salmonella [21] and are required for bacterial invasion [30]. The different map positions of SPI1, SPI5, sopA, sopD, and sopE2 imply that individual components of the invasion machinery were acquired independently during the divergence of the genus Salmonella from the E. coli lineage by horizontal gene transfer. The above considerations suggest that some 97 Salmonella signature genes determine two important characteristics that distinguish the genus from closely related enterobacteria, namely the ability to invade and cause inflammation in the intestinal mucosa and the ability to use 1,2-propanediol anaerobically as carbon

111

112

6 Pathogenomics of Salmonella Species

source (Fig. 6.1). Interestingly, transcriptional profiling using a S. Typhimurium LT2 microarray shows that both groups of genes are coordinately regulated by the CsrA protein [31]. That is, a csrA mutant expresses the pdu operon, the eut operon, the cob operon, the phs operon, the sopE2 gene, the sopA gene, the SPI5 invasion genes, and the SPI1 invasion genes at markedly reduced levels compared to the S. enterica serotype Typhimurium wild type. CsrA may thus be a global regulator of a large subset of Salmonella signature genes involved in the intestinal phase of infection. Several additional Salmonella signature genes may be critical for interaction of Salmonella serotypes with their vertebrate hosts. Some of these Salmonella signature genes are organized in operons involved in the biosynthesis of fimbrial adhesions, including bcfABCDEFG and sthABCDE [7, 32]. Other Salmonella signature genes involved in host pathogen interaction encode functions allowing bacteria to evade killing by macrophages, a property important for bacterial survival in intestinal tissue upon T3SS-1-mediated penetration of the epithelium. These genes include the magnesium transporters mgtBC encoded by SPI3 [33] and genes encoding a second type III secretion system (T3SS-2) encoded by SPI2 [34]. However, while microarray analysis shows that some SPI2 genes (i.e., sseB, sscA, sscB, ssaNQ, and yscR) seem to be conserved in S. bongori [7], large parts of SPI2 appear to be absent or highly divergent in S. bongori by hybridization analysis [35, 36]. SPI2 is, however, highly conserved among serotypes of S. enterica. In summary, as many as 121 Salmonella signature genes may encode functions that allow Salmonella serotypes to adhere, utilize carbon sources, invade, and subsequently survive in tissue during intestinal colonization of their vertebrate hosts.

6.3 Subspecies I Signature Genes

While the genomic features that differentiate Salmonella serotypes and their closest relative E. coli nicely fit the differences in metabolism and pathogenicity that have been historically described in the literature, the relationship between genetic and phenotypic differences within the species S. enterica are less clearly defined. The species S. enterica contains six subspecies, and all of these are capable of causing disease in mammals [1]. However, the overwhelming majority of cases of disease in mammals and birds are the result of infection with S. enterica subspecies I; the remaining subspecies are commonly isolated from cold-blooded animals [1, 37]. These findings suggest that genetic factors exclusive to S. enterica subspecies I may influence the epidemiological success of this subspecies among mammals and birds [38–40]. Understanding the relationship between the genetics and pathogenesis of S. enterica subspecies I is complicated by the fact that it contains 1454 serotypes [1], and there is considerable genetic diversity and diversity in disease syndromes and host range within this group [41]. Nevertheless, the completion of four genomes of S. enterica subspecies I serotypes [4, 42–44], with 12 more underway [45], and microarray analysis [5, 7, 46] have provided a list of genes

6.3 Subspecies I Signature Genes

exclusive to subspecies I, and there is preliminary information about the function of several of these genes. Understanding the emergence of S. enterica subspecies I as the dominant subspecies infecting mammals and birds has been the subject of recent genomic analysis using microarrays [5]. This study revealed that 216 genes were gained by S. enterica subspecies I after its divergence from other subspecies. Of these genes, 128, or 59%, are currently unnamed, which illustrates that our understanding of factors that contribute to the epidemiological success of Salmonella serotypes is still in its infancy. Microarray analysis further revealed that 74 of these genes were present or possibly present across all S. enterica subspecies I strains tested and can thus be considered subspecies I signature genes. Subspecies I signature genes with known function or sequence homology can be grouped into three categories, as follows. One group of subspecies I signature genes encode products that are located in the bacterial outer membrane. These include envF, sinI, yfeN, STM0280, STM2816, STM2423, and STM3026, whose products are predicted to be located in the outer membrane but whose function is poorly characterized [5, 47]. Furthermore, S. enterica subspecies I serotypes have gained three fimbrial gene clusters, including stfACDEFG, safABCD, and stcABCD [32, 48]. The saf operon encodes Salmonella atypical fimbria and is located on SPI6 [48], a DNA region that also contains the putative regulatory gene sinR [49]. The saf operon is present in the vast majority of S. enterica subspecies I isolates (195 of 198 tested), and deletion of safA or the sinR region does not result in virulence defects in genetically susceptible mice (BALB/c) [48, 49]. The effect of deleting the saf operon on long-term intestinal persistence has not been studied. In addition to fimbrial gene sequences, S. enterica subspecies I serotypes gained a nonfimbrial adhesin, termed ShdA [50]. The shdA gene encodes an outer membrane autotransporter protein that binds fibronectin and type I collagen on the bacterial surface [51, 52] and contributes to long-term intestinal persistence of S. enterica serotype Typhimurium in genetically resistant mice (CBA) [53]. The shdA gene is located on the CS54 genetic island, adjacent to another subspecies I signature gene termed ratB [53]. Although its mode of action is not known, the product of the ratB gene has recently also been shown to be critical for intestinal persistence in a murine model of infection [53]. Based on mathematical models that combine epidemiology with population biology, persistent intestinal carriage is predicted to be a factor that will enhance the ability of a pathogen to circulate in populations of mammals and birds [50]. Acquisition of intestinal colonization factors such as shdA and ratB by S. enterica subspecies I may have contributed to the epidemiological success of this group of pathogens within populations of warm-blooded vertebrates. A second group of subspecies I signature genes encode products that affect the properties of the bacterial cell surface. Microarray analysis suggests that the O-antigen biosynthesis (rfb) gene cluster of S. enterica subspecies I contains the subspecies I signature genes rfbP (wbaP), rfbK, rfbU, rfbI, rfbC, and rfbM (manC) [5]. The O-antigen comprises the portion of the bacterial lipopolysaccharide (LPS) that is exposed on the surface of the bacterium and is composed of repeating oli-

113

114

6 Pathogenomics of Salmonella Species

gosaccharide units (O-repeat). The rfbP gene encodes an enzyme with two functions: a galactosyltransferase function necessary for the first step of O-antigen synthesis, and a flippase function necessary for flipping the O-antigen subunit on undecaprenyl pyrophosphate from the cytoplasmic to the periplasmic face of the cytoplasmic membrane [54]. The product of the rfbK gene is important for completion of outer core synthesis and subsequent attachment of the O-antigen [55]. The rfbU gene encodes a mannosyl transferase that is necessary for the incorporation of GDPmannose into the oligosaccharide backbone of the O-antigen repeat unit [56]. The rfbC gene product is involved in production of an activated precursor (dTDP-l-rhamnose) necessary for the incorporation of rhamnose moieties into the oligosaccharide backbone of the O-antigen [57]. The rfbI gene encodes an enzyme involved in adding a dideoxyhexose branch to the oligosaccharide backbone of the O-antigen [57]. Finally, the product of the rfbM gene is required to form GDPmannose, the activated precursor necessary for the incorporation of mannose residues into the oligosaccharide backbone of the O-antigen [58]. Comparative sequence analysis of rfbM from Salmonella serotypes belonging to subspecies I, II, and VI shows that although this gene contains considerable nucleotide polymorphisms, it is not a subspecies I signature gene [59]. Thus, the list of 74 subspecies I signature genes identified by microarray analysis [5] may be further reduced as more information becomes available for individual genes. Mutational analysis shows that subspecies I signature genes involved in O-antigen biosynthesis are required for host–pathogen interaction, because a S. enterica serotype Typhimurium rfbK mutant is rough and shows a reduced ability to colonize the intestine of 3-week-old chicks [55]. However, comparison of a bacterial strain expressing intact LPS with its isogenic derivative that is defective for O-antigen biosynthesis provides little insight into the selective advantage conferred by subspecies I signature genes located in the rfb gene cluster, because the difference between S. enterica subspecies I serotypes and serotypes belonging to other S. enterica subspecies is not the presence or absence of O-antigen, but the presence of O-antigen biosynthesis gene clusters that differ genetically. The biological significance of genetic differences between O-antigen biosynthesis gene clusters of S. enterica subspecies I and other S. enterica subspecies is not obvious, because most of the O-antigens used to distinguish Salmonella serotypes serologically occur in two or more subspecies. Since most O-antigens expressed by S. enterica subspecies I serotypes can also be detected in serotypes that do not belong to this subspecies, it is not clear which selective forces are responsible for the presence of subspecies I signature genes in the rfb gene cluster. A third group of subspecies I signature genes encode those involved in transport and utilization of nutrients. These include subspecies I signature genes STM3251–STM3256 encoding components of a putative sugar phosphotransferase transport system. Additional subspecies I signature genes of this group are located in the phn operon, which confers the ability to utilize phosphonate as a sole phosphorus source by mediating phosphonate transport and breakdown [60]. The phn operon is absent from E. coli but present in K. pneumoniae, and mutations in the S. enterica serotype Typhimurium phn operon do not seem to affect viru-

6.4 Host Restriction

lence in mice [60]. The selective advantage conferred upon S. enterica subspecies I serotypes by possessing the ability to utilize the above nutrients remains to be identified.

6.4 Host Restriction

The 1454 different serotypes that can currently be distinguished within S. enterica subspecies I [1] contain a few specialists that appear to be better adapted to their respective host species than other members of the genus Salmonella. These “specialists” commonly cause a systemic infection in their preferred host species but are isolated infrequently from other host species. These features are perhaps best exemplified by S. enterica serotype Typhi, the cause of typhoid fever, and S. enterica serotype Paratyphi A, a cause of paratyphoid fever in humans. Both serotypes persist in the human population by person-to-person transmission through contaminated food or water but are rarely isolated from nonhuman vertebrates. Furthermore, S. enterica serotypes Typhi and Paratyphi A do not cause disease in nonprimate vertebrates or even in lower primate species. Although there has been substantial interest in identifying the molecular basis for this host restriction, the bacterial genes responsible have long eluded identification. Recent genome sequence analysis shows that the genomes of the generalist S. enterica serotype Typhimurium strain LT2 and the two sequenced S. enterica serotype Typhi isolates (CT18 and TY2) have similar sizes (4857 kbp, 4809 kbp, and 4792 kbp respectively), while the S. enterica serotype Paratyphi A genome is somewhat smaller (4585 kbp) [4, 42–44]. Comparison of these genomes and microarray analysis have allowed direct comparison between more generalist and more specialist genomes, and have provided first insights into the genetic changes accompanying the evolution of hostrestricted serotypes [4, 5, 7, 42–44, 46, 61]. The genetic differences between the host generalist and the host specialist can be classified into four categories: (1) genes present in the generalist genome that are absent from the specialist genome, (2) pseudogenes, (3) genes present in the specialist genome that are absent from the generalist genome, and (4) rearrangements of gene order in the genome itself. Genomic comparison of 18 different Salmonella serotypes using a S. enterica serotype Typhimurium strain LT2-based microarray revealed a cluster of genes belonging to the first category: those absent from strictly human-adapted serotypes (Typhi and Paratyphi A) but present in other serotypes of S. enterica subspecies I [7]. While 99.5% of genes present on the S. enterica serotype Typhimurium microarray had homologues in the genomes of other S. enterica serotype Typhimurium isolates (SARA1, SARA6), only between 89% and 94% had homologues in the genomes of S. enterica serotypes Typhi and Paratyphi A isolates. The majority of the genes absent from S. enterica serotypes Typhi and Paratyphi A isolates were also absent from genomes of other generalist serotypes. However, cluster analysis identified a group of 53 genes whose absence was a shared genetic feature of genomes of strictly human-adapted S. enterica subspecies I serotypes. The

115

116

6 Pathogenomics of Salmonella Species Table 6.1 Genome degradation affecting fimbrial and nonfimbrial adherence determinants

in typhoidal Salmonella serotypes compared to S. enterica serotype Typhimurium. Genetic determinant

Determinant (pseudogene) present in the genome of S. enterica serotype Typhimurium strain LT2

Paratyphi A strain SARB42

Typhi strain CT18 Typhi strain Ty2

shdA

+

+ (shdA)

+ (shdA)

+ (shdA)

misL

+

+

+ (misL)

+ (misL)

sivH

+

+ (sivH)

+ (sivH)

+ (sivH)

ratB

+

+ (ratB)

+ (ratB)

+ (ratB)

csg (agf)

+

+ (csgF)

+

+

fim

+

+

+ (fimI)

+

tcf

–

+

+

+

sef

–

+

+ (sefAD)

+ (sefAD)

saf

+

+ (safD)

+

+

bcf

+

+ (bcfF)

+ (bcfC)

+ (bcfC)

sta

–

–

+

+

stb

+

+

+

+ (stbC)

stc

+

+

+

+ (stcC)

std

+

+

+

+

ste

–

+

+ (steA)

+ (steA)

stg

–

–

+ (stgC)

+ (stgC)

sth

+ (sthC)

+

+ (sthCE)

+ (sthCE)

stk

–

+

–

–

lpf

+

–

–

–

pef

+

–

–

–

stf

+

+ (stfCF)

–

–

sti

+

–

–

–

stj

+

–

–

–

pil

–

–

+

+

+ Determinant is present in a genome; – determinant is absent in a genome

6.4 Host Restriction

majority of these genes are either of unknown function or they encode putative intestinal colonization factors, including ratB and the fimbrial genes lpfCDE, stbD, stfF, and stiB [7]. Although ratB and stbD were scored as being absent in the microarray analysis, sequence analysis shows that these genes are present in the genomes of S. enterica serotypes Typhi and Paratyphi A (where ratB was identified as a pseudogene) [42–44]. Furthermore, sequence analysis shows that stfF is present in the genome of S. enterica serotype Paratyphi A [44]. Pseudogene formation comprises a second category of genetic differences between specialists and generalists that lead to loss of function mutations. Complete genome sequencing reveals that the generalist S. enterica serotype Typhimurium has far fewer pseudogenes than the host-restricted serotypes S. enterica serotypes Typhi and Paratyphi A. There are approximately 210 pseudogenes in the genome of S. enterica serotype Typhi (strains CT18 and Ty2) [42, 43] and 173 pseudogenes in the genome of S. enterica serotype Paratyphi A strain SARB42 [44] compared to only 39 pseudogenes in the genome of S. enterica serotype Typhimurium strain LT2 [4]. Of the 173 pseudogenes present in S. enterica serotype Paratyphi A, 166 have orthologues but only 28 are also pseudogenes in S. enterica serotype Typhi [44]. One group of pseudogenes present in strictly human-adapted serotypes is formed by putative intestinal colonization factors, including autotransporter genes (misL, shdA, and sivH), fimbrial biosynthesis genes (fimI, csgF, sefAD, safD, bcfCF, stbC, stcC, steA, stfCF, stgC, and sthCE) and the intestinal colonization factor ratB (Table 6.1) [4, 32, 42–44]. These data show that genome degradation through deletion of genes or through pseudogene formation resulted in loss of putative adherence determinants in the genomes of strictly human-adapted serotypes. As a result, the genomes of S. enterica serotypes Typhi and Paratyphi A contain between 7 and 10 intact determinants, compared to 16 present in the genome of S. enterica serotype Typhimurium (Table 6.1). The only adherence determinants that are both present and intact in all three genomes of human-adapted serotypes are the fimbrial operons stdABCDE and tcfABCD. The extent of the genome degradation in the host specialist serotypes may correlate with increased specialization of these pathogens to one particular niche (i.e., host species) or, put another way, with the loss of ability of these serotypes to use multiple niches. Genome degradation may thus in part explain the host restriction to humans exhibited by S. enterica serotypes Typhi and Paratyphi A. The majority of S. enterica subspecies I serotypes (i.e., nontyphoidal Salmonella serotypes) cause infections in humans that remain localized to the intestine and mesenteric lymph node and result in diarrhea. In contrast, S. enterica serotypes Typhi and Paratyphi A (i.e., typhoidal Salmonella serotypes) cause systemic infections in humans with diarrhea being an insignificant symptom [19, 62]. It is generally assumed that acquiring the ability to cause enteric fever (i.e., typhoid fever or paratyphoid fever) involved gain of function (i.e., acquisition of genes) by typhoidal Salmonella serotypes. Genomic analysis has thus focused on identifying a third category of genes, namely those that are present in the specialist genome but absent in the more generalist isolates. Phylogenetic analysis by multilocus enzyme electrophoresis shows that the ability to cause enteric fever evolved inde-

117

118

6 Pathogenomics of Salmonella Species

pendently in four lineages within S. enterica subspecies I, represented by S. enterica serotypes Typhi, Paratyphi A, Paratyphi B, and Paratyphi C [63]. Recent analysis of S. enterica subspecies I serotypes using a S. enterica serotype Typhimurium LT2 microarray supplemented with open reading frames from S. enterica serotype Typhi CT18 did not identify any genes that are conserved among typhoidal serotypes but absent from all nontyphoidal serotypes [46]. The absence of signature genes for typhoidal Salmonella serotypes may reflect the fact that the ability to cause enteric fever developed in four lineages independently, perhaps by acquisition of different genetic material in each lineage, which would suggest that serotypes of each lineage cause enteric fever by somewhat different mechanisms. It may thus be more revealing to identify gene acquisition events that accompanied the formation of an individual typhoidal Salmonella serotype by comparing its genome with that of nontyphoidal serotypes. The genome of S. enterica serotype Typhi CT18 reveals 601 genes in 82 blocks that are unique compared to S. enterica serotype Typhimurium LT2 [42]. Insertions unique to S. enterica serotype Typhi include four prophages (phages 10, 15, 18, 46), three pathogenicity islands (SPI7, SPI8, SPI10), four chaperone-usher fimbrial systems (staABACDEFG, tcfABCD, steABCDEF, stgABCDEF), a homologue of an E. coli hemolysin gene (STY1498), homologues of the Campylobacter toxin gene cdtB (STY1886), homologues of the Bordetella pertussis toxin genes ptxA and ptxB (STY1890 and STY1891), and a putative polysaccharide acetyltransferase gene (STY2629) [32, 42]. Unique genetic material in the S. enterica serotype Paratyphi A SARB42 genome includes three prophages, (SPA-1, SPA2554-2600, SPA-3-P2) and 12 other insertions containing two or more genes including the stkABCDEFG fimbrial operon [44]. However, most unique insertions identified in the genomes of typhoidal serotypes by comparison with S. enterica serotype Typhimurium LT2 are also present in genomes of at least some nontyphoidal Salmonella serotypes [46]. These data suggest that the ability to cause enteric fever did not result from a single horizontal transfer event but may have required acquisition of a unique combination of new insertions in each lineage of typhoidal serotypes, including the incorporation of different genetic material into the genomes of S. enterica serotypes Typhi and Paratyphi A. The final category of genomic difference between host generalist and host specialist serotypes of S. enterica subspecies I are large-scale genomic rearrangements. S. enterica serotype Paratyphi A (SARB42), for example, generally has conservation of orthologous gene order with respect to S. enterica serotype Typhimurium LT2. However, a recombination event between the rrnH and rrnG operons has inverted a large portion of the S. enterica serotype Paratyphi A genome with respect to the S. enterica serotype Typhimurium LT2 genome [44, 64]. A similar situation is apparent for S. enterica serotype Typhi TY2 and CT18, where overall the gene order is similar to that of S. enterica serotype Typhimurium, but genomic rearrangements have occurred around the rrn genes [42, 65]. Genome rearrangements due to homologous recombination between the rrn operons, resulting in translocations and inversions, are also observed in the genome of S. enterica serotype Gallinarum, a host-restricted pathogen consisting of two biotypes, Galli-

6.4 Host Restriction

narum and Pullorum, that are strictly fowl-adapted [66, 67]. In contrast, the gene order in the S. enterica subspecies I genomes of host generalists is conserved with respect to that of S. enterica serotype Typhimurium LT2 [66]. It is currently unknown why large genomic rearrangements are exclusively found in the genomes of host-restricted serotypes of S. enterica serotype subspecies I. The above examples illustrate that the formation of host-restricted serotypes within S. enterica subspecies I was a complex process that involved numerous genetic changes whose significance has not been established, making it difficult to identifying the key events driving this evolutionary process. One strategy to gain further insight into the evolution of host-restricted pathogens has been the analysis of strains in which adaptation to one niche is incomplete, providing the opportunity to obtain a snapshot of genetic changes that can be observed when a host-restricted lineage is beginning to branch from that of its generalist relatives. A possible example of a pathogen with incomplete host restriction is the pigeonassociated variant of S. enterica serotype Typhimurium. While the majority of S. enterica serotype Typhimurium isolates (including the sequenced LT2 strain) are host generalists, one clone represented by the phage types DT2 and DT99 causes a systemic infection in pigeons but is rarely isolated from other host species [68]. Host restriction in the pigeon-associated variant of S. enterica serotype Typhimurium is incomplete since it is still capable of causing disease in other host species, although virulence seems to be somewhat reduced compared to that of generalist isolates [69–71]. While genomic rearrangements are absent from the genomes of S. enterica serotype Typhimurium generalists and isolates of the phage type DT99, the genome of some S. enterica serotype Typhimurium DT2 isolates carry genomic inversions between rrn genes [72]. Thus, the development of genomic rearrangements is still incomplete within the pigeon-associated variant of S. enterica serotype Typhimurium, suggesting that genomic rearrangements are a consequence rather than a cause of host restriction. Analysis of DT2 and DT99 isolates using a LT2 microarray identified no genes that were present in the genomes of S. enterica serotype Typhimurium generalists but absent from the specialist genome of the pigeon-associated variant, suggesting that genome degradation by loss of discrete genes is not a major driving force for the evolution of host restriction [70]. Whether pseudogene formation or acquisition of genetic material by horizontal transfer has contributed to formation of the pigeon-associated variant of S. enterica serotype Typhimurium remains to be investigated.

Acknowledgments

We would like to thank M. Raffatellu for reading the manuscript. Work in A.B.’s laboratory is supported by USDA/NRICGP grant #2002-35204-12247 and Public Health Service grants #AI40124 and #AI44170. H.A. is supported by Public Health Service grant #AI52250.

119

120

6 Pathogenomics of Salmonella Species

References 1 Brenner, F.W., R.G. Villar, F.J. Angulo,

R. Tauxe, and B. Swaminathan. 2000. Salmonella nomenclature. J. Clin. Microbiol. 38: 2465–2467. 2 Riley, M. and A. Anilionis. 1976. Evolution of the bacterial genome. Annu. Rev. Microbiol. 32: 519–560. 3 Riley, M. and S. Krawiec. 1987. Genome organization. In: Escherichia coli and Salmonella typhimurium: cellular and molecular biology, (ed) F.C. Neidhardt. American Society for Microbiology, Washington, DC, pp 967–981. 4 McClelland, M., K.E. Sanderson, J. Spieth, S.W. Clifton, P. Latreille, L. Courtney, S. Porwollik, J. Ali, M. Dante, F. Du, S. Hou, D. Layman, S. Leonard, C. Nguyen, K. Scott, A. Holmes, N. Grewal, E. Mulvaney, E. Ryan, H. Sun, L. Florea, W. Miller, T. Stoneking, M. Nhan, R. Waterston, and R.K. Wilson. 2001. Complete genome sequence of Salmonella enterica serovar Typhimurium LT2. Nature 413: 852–856. 5 Porwollik, S., R.M. Wong, and M. McClelland. 2002. Evolutionary genomics of Salmonella: gene acquisitions revealed by microarray analysis. Proc. Natl. Acad. Sci. U. S. A. 99: 8956– 8961. 6 McClelland, M., L. Florea, K. Sanderson, S.W. Clifton, J. Parkhill, C. Churcher, G. Dougan, R.K. Wilson, and W. Miller. 2000. Comparison of the Escherichia coli K-12 genome with sampled genomes of a Klebsiella pneumoniae and three Salmonella enterica serovars, Typhimurium, Typhi and Paratyphi. Nucleic Acids Res. 28: 4974–4986. 7 Chan, K., S. Baker, C.C. Kim, C.S. Detweiler, G. Dougan, and S. Falkow. 2003. Genomic comparison of Salmonella enterica serovars and Salmonella bongori by use of an S. enterica serovar typhimurium DNA microarray. J. Bacteriol. 185: 553–563. 8 Price-Carter, M., J. Tingey, T.A. Bobik, and J.R. Roth. 2001. The alternative electron acceptor tetrathionate supports B12-dependent anaerobic growth of Sal-

monella enterica serovar typhimurium on ethanolamine or 1,2-propanediol. J. Bacteriol. 183: 2463–2475. 9 Stojiljkovic, I., A.J. Bumler, and F. Heffron. 1995. Ethanolamine utilization in Salmonella typhimurium: nucleotide sequence, protein expression, and mutational analysis of the cchA cchB eutE eutJ eutG eutH gene cluster. J. Bacteriol. 177: 1357–1366. 10 Kofoid, E., C. Rappleye, I. Stojiljkovic, and J. Roth. 1999. The 17-gene ethanolamine (eut) operon of Salmonella typhimurium encodes five homologues of carboxysome shell proteins. J. Bacteriol. 181: 5317–5329. 11 Bobik, T.A., G.D. Havemann, R.J. Busch, D.S. Williams, and H.C. Aldrich. 1999. The propanediol utilization (pdu) operon of Salmonella enterica serovar Typhimurium LT2 includes genes necessary for formation of polyhedral organelles involved in coenzyme B(12)-dependent 1,2-propanediol degradation. J. Bacteriol. 181: 5967–5975. 12 Roth, J.R., J.G. Lawrence, M. Rubenfield, S. Kieffer-Higgins, and G.M. Church. 1993. Characterization of the cobalamin (vitamin B12) biosynthetic genes of Salmonella typhimurium. J. Bacteriol. 175: 3303–3316. 13 Jeter, R.M. 1990. Cobalamin-dependent 1,2-propanediol utilization by Salmonella typhimurium. J. Gen Microbiol. 136(Pt 5): 887–896. 14 Roof, D.M. and J.R. Roth. 1988. Ethanolamine utilization in Salmonella typhimurium. J. Bacteriol. 170: 3855–3863. 15 Hensel, M., A.P. Hinsley, T. Nikolaus, G. Sawers, and B.C. Berks. 1999. The genetic basis of tetrathionate respiration in Salmonella typhimurium. Mol. Microbiol. 32: 275–287. 16 Huang, C.J. and E.L. Barrett. 1991. Sequence analysis and expression of the Salmonella typhimurium asr operon encoding production of hydrogen sulfide from sulfite. J. Bacteriol. 173: 1544– 1553. 17 Heinzinger, N.K., S.Y. Fujimoto, M.A. Clark, M.S. Moreno, and E.L. Barrett.

References

18

19

20

21

22

23

24

1995. Sequence analysis of the phs operon in Salmonella typhimurium and the contribution of thiosulfate reduction to anaerobic energy metabolism. J. Bacteriol. 177: 2813–2820. Obradors, N., J. Badia, L. Baldoma, and J. Aguilar. 1988. Anaerobic metabolism of the L-rhamnose fermentation product 1,2-propanediol in Salmonella typhimurium. J. Bacteriol. 170: 2159–2162. Zhang, S., R.A. Kingsley, R.L. Santos, H. Andrews-Polymenis, M. Raffatellu, J. Figueiredo, J. Nunes, R.M. Tsolis, L.G. Adams, and A.J. Bumler. 2003. Molecular pathogenesis of Salmonella enterica serotype typhimurium-induced diarrhea. Infect. Immun. 71: 1–12. Mills, D.M., V. Bajaj, and C.A. Lee. 1995. A 40 kb chromosomal fragment encoding Salmonella typhimurium invasion genes is absent from the corresponding region of the Escherichia coli K-12 chromosome. Mol. Microbiol. 15: 749–759. Mirold, S., K. Ehrbar, A. Weissmuller, R. Prager, H. Tschape, H. Russmann, and W.D. Hardt. 2001. Salmonella host cell invasion emerged by acquisition of a mosaic of separate genetic elements, including Salmonella pathogenicity island 1 (SPI1), SPI5, and sopE2. J. Bacteriol. 183: 2348–2358. Li, J., H. Ochman, E.A. Groisman, E.F. Boyd, F. Solomon, K. Nelson, and R.K. Selander. 1995. Relationship between evolutionary rate and cellular location among the Inv/Spa invasion proteins of Salmonella enterica. Proc. Natl. Acad. Sci. U. S. A. 92: 7252–7256. Wood, M.W., M.A. Jones, P.R. Watson, A.M. Siber, B.A. McCormick, S. Hedges, R. Rosqvist, T.S. Wallis, and E.E. Galyov. 2000. The secreted effector protein of Salmonella dublin, SopA, is translocated into eukaryotic cells and influences the induction of enteritis. Cell Microbiol. 2: 293–303. Galyov, E.E., M.W. Wood, R. Rosqvist, P.B. Mullan, P.R. Watson, S. Hedges, and T.S. Wallis. 1997. A secreted effector protein of Salmonella dublin is translocated into eukaryotic cells and mediates inflammation and fluid secretion in

infected ileal mucosa. Mol. Microbiol. 25: 903–912. 25 Jones, M.A., M.W. Wood, P.B. Mullan, P.R. Watson, T.S. Wallis, and E.E. Galyov. 1998. Secreted effector proteins of Salmonella dublin act in concert to induce enteritis. Infect. Immun. 66: 5799–5804. 26 Wood, M.W., R. Rosqvist, P.B. Mullan, M.H. Edwards, and E.E. Galyov. 1996. SopE, a secreted protein of Salmonella dublin, is translocated into the target eukaryotic cell via a sip-dependent mechanism and promotes bacterial entry. Mol. Microbiol. 22: 327–338. 27 Stender, S., A. Friebel, S. Linder, M. Rohde, S. Mirold, and W.D. Hardt. 2000. Identification of SopE2 from Salmonella typhimurium, a conserved guanine nucleotide exchange factor for Cdc42 of the host cell. Mol. Microbiol. 36: 1206–1221. 28 Tsolis, R.M., S.M. Townsend, E.A. Miao, S.I. Miller, T.A. Ficht, L.G. Adams, and A.J. Bumler. 1999. Identification of a putative Salmonella enterica serotype typhimurium host range factor with homology to IpaH and YopM by signature-tagged mutagenesis. Infect. Immun. 67: 6385–6393. 29 Miao, E.A., C.A. Scherer, R.M. Tsolis, R.A. Kingsley, L.G. Adams, A.J. Bumler, and S.I. Miller. 1999. Salmonella typhimurium leucine-rich repeat proteins are targeted to the SPI1 and SPI2 type III secretion systems. Mol. Microbiol. 34: 850–864. 30 Raffatellu, M., R.P. Wilson, D. Chessa, H. Andrews-Polymenis, Q.T. Tran, S. Lawhon, S. Khare, L.G. Adams, and A.J. Bumler. 2005. SipA, SopA, SopB, SopD and SopE2 contribute to Salmonella enterica serotype Typhimurium invasion of epithelial cells. Infect. Immun. 73: 146–154. 31 Lawhon, S.D., J.G. Frye, M. Suyemoto, S. Porwollik, M. McClelland, and C. Altier. 2003. Global regulation by CsrA in Salmonella typhimurium. Mol. Microbiol. 48: 1633–1645. 32 Townsend, S.M., N.E. Kramer, R. Edwards, S. Baker, N. Hamlin, M. Simmonds, K. Stevens, S. Maloy, J. Parkhill, G. Dougan, and A.J. Bum-

121

122

6 Pathogenomics of Salmonella Species ler. 2001. Salmonella enterica serovar Typhi possesses a unique repertoire of fimbrial gene sequences. Infect. Immun. 69: 2894–2901. 33 Blanc-Potard, A.B., F. Solomon, J. Kayser, and E.A. Groisman. 1999. The SPI-3 pathogenicity island of Salmonella enterica. J. Bacteriol. 181: 998–1004. 34 Ochman, H., F.C. Soncini, F. Solomon, and E.A. Groisman. 1996. Identification of a pathogenicity island for Salmonella survival in host cells. Proc. Natl. Acad. Sci. USA. 93: 7800–7804. 35 Ochman, H. and E.A. Groisman. 1996. Distribution of pathogenicity islands in Salmonella spp. Infect. Immun. 64: 5410–5412. 36 Hensel, M., J.E. Shea, A.J. Bumler, C. Gleeson, F. Blattner, and D.W. Holden. 1997. Analysis of the boundaries of Salmonella pathogenicity island 2 and the corresponding chromosomal region of Escherichia coli K-12. J. Bacteriol. 179: 1105–1111. 37 Aleksic, S., F. Heinzerling, and J. Bockemhl. 1996. Human infection caused by salmonellae of subspecies II to VI in Germany, 1977–1992. Zentralbl. Bakteriol. 283: 391–398. 38 Bumler, A.J. 1997. The record of horizontal gene transfer in Salmonella. Trends Microbiol. 5: 318-322. 39 Bumler, A.J., R.M. Tsolis, T.A. Ficht, and L.G. Adams. 1998. Evolution of host adaptation in Salmonella enterica. Infect. Immun. 66: 4579–4587. 40 Kingsley, R.A. and A.J. Bumler. 2002. Pathogenicity islands and host adaptation of Salmonella serovars. Curr. Top. Microbiol. Immunol. 264: 67–87. 41 Boyd, E.F., F.-S. Wang, P. Beltran, S.A. Plock, K. Nelson, and R.K. Selander. 1993. Salmonella reference collection B (SARB): strains of 37 serovars of subspecies I. J. Gen. Microbiol. 139: 1125– 1132. 42 Parkhill, J., G. Dougan, K.D. James, N.R. Thomson, D. Pickard, J. Wain, C. Churcher, K.L. Mungall, S.D. Bentley, M.T. Holden, M. Sebaihia, S. Baker, D. Basham, K. Brooks, T. Chillingworth, P. Connerton, A. Cronin, P. Davis, R.M. Davies, L. Dowd, N. White, J. Farrar, T. Feltwell, N. Hamlin, A. Haque, T.T.

Hien, S. Holroyd, K. Jagels, A. Krogh, T.S. Larsen, S. Leather, S. Moule, P. O’Gaora, C. Parry, M. Quail, K. Rutherford, M. Simmonds, J. Skelton, K. Stevens, S. Whitehead, and B.G. Barrell. 2001. Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413: 848–852. 43 Deng, W., S.R. Liou, G. Plunkett, 3rd, G.F. Mayhew, D.J. Rose, V. Burland, V. Kodoyianni, D.C. Schwartz, and F.R. Blattner. 2003. Comparative genomics of Salmonella enterica serovar Typhi strains Ty2 and CT18. J. Bacteriol. 185: 2330–2337. 44 McClelland, M., K.E. Sanderson, S.W. Clifton, P. Latreille, S. Porwollik, A. Sabo, R. Meyer, T. Bieri, P. Ozersky, M. McLellan, C.R. Harkins, C. Wang, C. Nguyen, A. Berghoff, G. Elliott, S. Kohlberg, C. Strong, F. Du, J. Carter, C. Kremizki, D. Layman, S. Leonard, H. Sun, L. Fulton, W. Nash, T. Miner, P. Minx, K. Delehaunty, C. Fronick, V. Magrini, M. Nhan, W. Warren, L. Florea, J. Spieth, and R.K. Wilson. 2004. Comparison of genome degradation in Paratyphi A and Typhi, humanrestricted serovars of Salmonella enterica that cause typhoid. Nat. Genet. 36: 1268–1274. 45 Porwollik, S. and M. McClelland. 2003. Lateral gene transfer in Salmonella. Microbes Infect. 5: 977–989. 46 Porwollik, S., E.F. Boyd, C. Choy, P. Cheng, L. Florea, E. Proctor, and M. McClelland. 2004. Characterization of Salmonella enterica subspecies I genovars by use of microarrays. J. Bacteriol. 186: 5883–5898. 47 Gunn, J.S., C.M. Alpuche-Aranda, W.P. Loomis, W.J. Belden, and S.I. Miller. 1995. Characterization of the Salmonella typhimurium pagC/pagD chromosomal region. J. Bacteriol. 177: 5040– 5047. 48 Folkesson, A., A. Advani, S. Sukupolvi, J.D. Pfeifer, S. Normark, and S. Lofdahl. 1999. Multiple insertions of fimbrial operons correlate with the evolution of Salmonella serovars responsible for human disease. Mol. Microbiol. 33: 612–622.

References 49 Groisman, E.A., M.H. Saier, and

H. Ochman. 1992. Horizontal transfer of a phosphatase gene as evidence for mosaic structure of the Salmonella genome. EMBO J. 11: 1309–1316. 50 Kingsley, R.A., K. van Amsterdam, N. Kramer, and A.J. Bumler. 2000. The shdA gene is restricted to serotypes of Salmonella enterica subspecies I and contributes to efficient and prolonged fecal shedding. Infect. Immun. 68: 2720–2727. 51 Kingsley, R.A., R.L. Santos, A.M. Keestra, L.G. Adams, and A.J. Bumler. 2002. Salmonella enterica serotype Typhimurium ShdA is an outer membrane fibronectin-binding protein that is expressed in the intestine. Mol. Microbiol. 43: 895–905. 52 Kingsley, R.A., A.M. Keestra, M.R. de Zoete, and A.J. Bumler. 2004. The ShdA adhesin binds to the cationic cradle of the fibronectin 13FnIII repeat module: evidence for molecular mimicry of heparin binding. Mol. Microbiol. 53: 345–355. 53 Kingsley, R.A., A.D. Humphries, E.H. Weening, M.R. De Zoete, S. Winter, A. Papaconstantinopoulou, G. Dougan, and A.J. Bumler. 2003. Molecular and phenotypic analysis of the CS54 island of Salmonella enterica serotype typhimurium: identification of intestinal colonization and persistence determinants. Infect. Immun. 71: 629–640. 54 Wang, L., D. Liu, and P.R. Reeves. 1996. C-terminal half of Salmonella enterica WbaP (RfbP) is the galactosyl-1-phosphate transferase domain catalyzing the first step of O-antigen synthesis. J. Bacteriol. 178: 2598–2604. 55 Turner, A.K., M.A. Lovell, S.D. Hulme, L. Zhang-Barber, and P.A. Barrow. 1998. Identification of Salmonella typhimurium genes required for colonization of the chicken alimentary tract and for virulence in newly hatched chicks. Infect. Immun. 66: 2099–2106. 56 Xiang, S.H., M. Hobbs, and P.R. Reeves. 1994. Molecular analysis of the rfb gene cluster of a group D2 Salmonella enterica strain: evidence for its origin from an insertion sequence-mediated recombination event between group

E and D1 strains. J. Bacteriol. 176: 4357–4365. 57 Xiang, S.H., A.M. Haase, and P.R. Reeves. 1993. Variation of the rfb gene clusters in Salmonella enterica. J. Bacteriol. 175: 4877–4884. 58 Thomsen, L.E., M.S. Chadfield, J. Bispham, T.S. Wallis, J.E. Olsen, and H. Ingmer. 2003. Reduced amounts of LPS affect both stress tolerance and virulence of Salmonella enterica serovar Dublin. FEMS Microbiol Lett. 228: 225– 231. 59 Jensen, S.O. and P.R. Reeves. 2001. Molecular evolution of the GDP-mannose pathway genes (manB and manC) in Salmonella enterica. Microbiology 147: 599–610. 60 Jiang, W., W.W. Metcalf, K.S. Lee, and B.L. Wanner. 1995. Molecular cloning, mapping, and regulation of Pho regulon genes for phosphonate breakdown by the phosphonatase pathway of Salmonella typhimurium LT2. J. Bacteriol. 177: 6411–6421. 61 Boyd, E.F., S. Porwollik, F. Blackmer, and M. McClelland. 2003. Differences in gene content among Salmonella enterica serovar typhi isolates. J. Clin. Microbiol. 41: 3823–3828. 62 Santos, R.L., S. Zhang, R.M. Tsolis, R.A. Kingsley, L.G. Adams, and A.J. Bumler. 2001. Animal models of Salmonella infections: enteritis vs. typhoid fever. Microb. Infect. 3: 237–247. 63 Selander, R.K., P. Beltran, N.H. Smith, R. Helmuth, F.A. Rubin, D.J. Kopecko, K. Ferris, B.D. Tall, A. Cravioto, and J.M. Musser. 1990. Evolutionary genetic relationships of clones of Salmonella serovars that cause human typhoid and other enteric fevers. Infect. Immun. 58: 2262–2275. 64 Liu, S.L. and K.E. Sanderson. 1995. The chromosome of Salmonella paratyphi A is inverted by recombination between rrnH and rrnG. J. Bacteriol. 177: 6585– 6592. 65 Liu, S.L. and K.E. Sanderson. 1995. Rearrangements in the genome of the bacterium Salmonella typhi. Proc. Natl. Acad. Sci. U. S. A. 92: 1018–1022. 66 Sanderson, K.E. and S.L. Liu. 1998. Chromosomal rearrangements in

123

124

6 Pathogenomics of Salmonella Species

67

68

69

70

enteric bacteria. Electrophoresis 19: 569–72. Liu, G.R., A. Rahn, W.Q. Liu, K.E. Sanderson, R.N. Johnston, and S.L. Liu. 2002. The evolving genome of Salmonella enterica serovar Pullorum. J. Bacteriol. 184: 2626–2633. Rabsch, W., H.L. Andrews, R.A. Kingsley, R. Prager, H. Tschape, L.G. Adams, and A.J. Bumler. 2002. Salmonella enterica serotype Typhimurium and its host-adapted variants. Infect. Immun. 70: 2249–2255. Pasmans, F., F. Van Immerseel, K. Hermans, M. Heyndrickx, J.M. Collard, R. Ducatelle, and F. Haesebrouck. 2004. Assessment of virulence of pigeon isolates of Salmonella enterica subsp. enterica serovar typhimurium variant copenhagen for humans. J. Clin. Microbiol. 42: 2000–2002. Andrews-Polymenis, H.L., W. Rabsch, S. Porwollik, M. McClelland, C. Rosetti,

L.G. Adams, and A.J. Bumler. 2004. Host restriction of Salmonella enterica serotype Typhimurium pigeon isolates does not correlate with loss of discrete genes. J. Bacteriol. 186: 2619–2628. 71 Pasmans, F., F. Van Immerseel, M. Heyndrickx, A. Martel, C. Godard, C. Wildemauwe, R. Ducatelle, and F. Haesebrouck. 2003. Host adaptation of pigeon isolates of Salmonella enterica subsp. enterica serovar Typhimurium variant Copenhagen phage type 99 is associated with enhanced macrophage cytotoxicity. Infect. Immun. 71: 6068– 6074. 72 Helm, R.A., S. Porwollik, A.E. Stanley, S. Maloy, M. McClelland, W. Rabsch, and A. Eisenstark. 2004. Pigeon-associated strains of Salmonella enterica serovar Typhimurium phage type DT2 have genomic rearrangements at rRNA operons. Infect. Immun. 72: 7338–7341.

125

7 Pathogenomics of Enterococcus faecalis Janet M. Manson and Michael S. Gilmore

7.1 Introduction

In the last two decades, enterococci have emerged as an important cause of nosocomial infection, and with the appearance of vancomycin resistance, some strains are often resistant to most currently approved antimicrobial agents. The robust nature of this bacterium coupled with the rapid spread of antimicrobial resistance in this organism has led to increased interest in the pathogenesis of enterococcal infection. In this chapter we review how the genome of E. faecalis relates to the pathogenesis of enterococcal infection. We discuss the role of mobile DNA in the evolution and acquisition of virulence traits, and investigate the genetic properties of E. faecalis that contribute to its ability to adapt and survive in different environments. Finally we look at genes that could contribute to virulence in this organism and the role of the E. faecalis pathogenicity island in the bacteria/host relationship.

7.2 Enterococcal Pathogenesis

Enterococcus faecalis is a gram-positive coccus, has a low G+C content (37.5%), and is a facultative anaerobe that grows optimally at 35 C (with a growth range of 10– 45 C). E. faecalis forms part of the normal host microflora of both the human and animal intestinal tracts and can also be isolated from insects such as cockroaches, flies, and beetles [1]. In humans typical concentrations of enterococci in stool are 8 up to 10 CFU per gram [2], and because of this they are also abundant in wastewater, sewage, on vegetation, and in other environment areas where contamination has occurred. Although E. faecalis is a normal member of the human gastrointestinal tract, problems occur when this organism gains access to sites it does not normally colonize. Since 1986, enterococci have consistently ranked among the most common pathogens isolated from nosocomial infections [3]. Enterococci are associated with a variety of clinical infections, the most common being urinary tract infections.

126

7 Pathogenomics of Enterococcus faecalis

Urinary tract infections caused by enterococci are generally acquired in a longterm health care facility, and because of the bacteria’s increased exposure to antimicrobial agents in such environments, are also more likely to be resistant to some antimicrobial agents. In the 2000 SENTRY surveillance program enterococci were the second ranked cause of nosocomial urinary tract infections, accounting for 12.6% of all infections [4]. The second most frequent infections caused by enterococci are intra-abdominal abscesses and postsurgical wound infections, and these are followed in frequency by bloodstream infections (bacteremia) and bacterial endocarditis. According to National Nosocomial Infections Surveillance (NNIS) data from January 1992 to July 1998, enterococci ranked first as the causative agent in surgical site infection and third in bloodstream infections among patients in intensive care units [5]. At a much lower frequency, enterococci can be involved in central nervous system and neonatal infections, and very rarely can be involved in osteomyelitis, respiratory tract infections, and cellulitis. E. faecalis is the most prevalent enterococcal species associated with human infections. Data from the SENTRY surveillance program (1997–1999) showed that during this time period, E. faecalis accounted for 57.2–76.8% of enterococcal nosocomial infections, while Enterococcus faecium made up 4.6% to 20% [6]. To cause disease, bacterial pathogens need to be able to adapt to the physiological conditions found within the host. A potential reason for the emergence of E. faecalis as a causative agent of nosocomial infection is the robust nature of this organism. E. faecalis has an intrinsic ability to grow in hypotonic, hypertonic, acidic, or alkaline conditions and to withstand detergents, oxidative stress, and desiccation. To discover more about the mechanisms this bacteria uses both to survive and to cause infection in the clinical setting, the genome of the first reported clinical vancomycin-resistant E. faecalis isolate in the United States, E. faecalis V583 [7], was sequenced [8].

7.3 Genome Sequence of E. faecalis

The genome of E. faecalis V583 is 3.2 Mbp and contains three plasmids, pTEF1, pTEF2, and pTEF3, which are 66.3 kbp, 57.6 kbp, and 17.9 kbp, respectively. The total genome encodes 3337 predicted open reading frames (ORFs), of which onequarter comprise mobile elements and acquired DNA, including a 150-kbp pathogenicity island [9]. As E. faecalis V583 is the only E. faecalis isolate sequenced to date, it is uncertain whether these characteristics are also true of other isolates from this bacterial genus. The genome sizes of other E. faecalis strains have previously been mapped and were estimated to range from 2825 kbp to 3250 kbp [10, 11]. Strain-specific variability in gene content and genomic organization may account for the differences in pathogenicity in some bacterial isolates. Comparison of E. coli K-12 with E. coli OH157:H7 found 1500 ORFs specific to OH157:H7 [12], while comparative differences in three Listeria monocytogenes strains noted between 51 and 97 strain-specific genes [13].

7.3 Genome Sequence of E. faecalis

Encoded on the pathogenicity island are genes for cytolysin, enterococcal surface protein (Esp), and other putative virulence or adaptation genes, and this region of DNA has an atypical G+C content (32.2%). In E. faecalis V583 (EF0479– EF0628), the previously described pathogenicity island contains a 17-kbp deletion (ORFs EF0047–EF0057), an IS256-like insertion in cylB, and an insertion of an IS905-like element. In addition, a 2.8-kbp region possessing the features of a group II intron (EF0012) is absent. 1) As previously noted, 85% of the E. faecalis V583 ORFs share their greatest homology with other low G+C gram-positive bacteria; however, there is no largescale synteny to any other genome in this group, which is probably due to the large amount of mobile DNA elements present in the E. faecalis genome [8]. A feature E. faecalis does share with this group is a strong transcriptional skew, with 90% of the ORFs in E. faecalis aligned to the direction of replication. 7.3.1 Mobile Elements, Acquired DNA, and Antimicrobial Resistance

A total of 47 insertion sequences are present in E. faecalis V583; only 38 of these are considered functional, and of these 26 are present in the chromosome, 10 of them belonging to the IS256 family. There are two main clusters of IS elements, one of which is associated with the pathogenicity island (G+C content 32.2%), while the other is associated with a separate region of atypical nucleotide composition (G+C content 31%). The disparate G+C content of these regions combined with the presence of mobile DNA elements suggests origination by horizontal gene transfer. From EF1855 to EF1874, six either full or partial IS elements are present. Also present in this region are panB, panC, and panD, (EF1860–EF1858), which encode three of the four steps of the pantothenate biosynthesis pathway. Homology comparisons show a greater relationship of these enzymes to counterparts in clostridia, rather than other lactic acid bacteria. There are three homologues present in the rest of the genome of the remaining enzyme in this pathway, one of which is encoded on the pathogenicity island. Also in this atypical nucleotide region are homologues of the Streptococcus pneumoniae VncRS twocomponent signal transduction system (EF1863–EF1864) and the Vex secretion ABC transporter (EF1867–EF1869). Previously mutations in the VncRS locus in S. pneumoniae had been linked to vancomycin tolerance, and it was hypothesized that VncR functioned as a repressor of autolytic functions [14]; however, a second study suggested that VncRS is involved in the regulation of a proximal ABC transporter, Vex [15]. More recent work has cast further doubt on the role of the VncRS two-component system, and its function is at present undetermined [16]. Homologues of the Vex transport system are also present in both E. faecium and Streptococcus agalactiae, and in both of these organisms the genes have greater than 88% 1) All genes referred to in this chapter as being

present on the pathogenicity island are annotated according to the gene locus

described in E. faecalis MMH594, which contains the full length island [9] (GenBank accession number AF454824).

127

128

7 Pathogenomics of Enterococcus faecalis

identity at the nucleotide level to E. faecalis. A homologue of the VncS sensor kinase is present in both E. faecalis and S. agalactiae, again sharing greater than 88% identity at the nucleotide level, but is absent from E. faecium. This degree of similarity suggests a common source for these genes in these three organisms (Table 7.1).

Table 7.1 Mobile and/or exogenously acquired DNA in the E. faecalis V583 genome.

Locus

Description

EF0131–EF0166

Possible integrated plasmid, contains genes of plasmid, phage, and conjugative plasmid origin

EF0479–EF0628

E. faecalis pathogenicity island

EF1855–EF1874

Area of atypical nucleotide composition containing 6 full/partial IS elements, panB, panC, panD, and VncRS/Vex homologues

EF2512–EF2545

Possible integrated plasmid

EF2284–EF2334

Area of atypical nucleotide composition with some similarities to Tn1549, and containing the vanB operon.

EF0302–EF0355

Possible integrated prophage (PHAGE01)

EF1275–EF1293

Possible integrated prophage (PHAGE02)

EF1416–EF1489

Possible integrated prophage (PHAGE03)

EF1987–EF2043

Possible integrated prophage (PHAGE04); contains a putative ferrochetalase, a cold shock protein, and homologues of PblB and PblA.

EF2084–EF2145

Possible integrated prophage (PHAGE05)

EF2798–EF2856

Possible integrated prophage (PHAGE06); contains a homologue of PblB

EF2936–EF2955

Possible integrated prophage (PHAGE07)

pTEF1

66.3-kb plasmid

pTEF2

57.6-kb plasmid

pTEF3

17.9-kb plasmid

As well as the large number of insertion elements, the E. faecalis V583 genome contains seven regions that are derived from probable phage integration. Homology searches of all seven phage regions show their best matches to phages from other lactic acid bacteria. Interestingly, the fourth phage region contains the hemH gene (EF1989, a putative ferrochelatase), a cold shock protein, CspC (EF1991), and homologues of both PblB (EF2001) and PblA (EF2003). A second protein with sequence identity to PblB is also present in the sixth phage region

7.3 Genome Sequence of E. faecalis

(EF2811). In Streptococcus mitis, PblA and PblB resemble phage capsid and tail fiber proteins and are present on an inducible phage, SM1. In S. mitis these proteins have been linked to platelet adherence, as disruption of either of these genes resulted in reduced platelet binding [17]; however, their role in E. faecalis virulence is unknown. Phage integration events in Streptococcus pyogenes are responsible for the major gaps in the alignments of different M serotypes [18], and the presence of seven integrated prophages in the E. faecalis V583 genome suggests an important role for phage integration in genome plasticity. The E. faecalis V583 chromosome also contains three regions that are possibly integrated plasmids. Present in two of these regions are genes encoding aggregation substance (EF0485, EF0149). One of the integrated plasmid regions is located within the pathogenicity island (EF0485–EF0506) and is very similar to pTEF1, but the aggregation substance is more closely related to that from pTEF2. The second region (EF0131–EF0166), has genes of plasmid, phage, and conjugative transposon origin, and is flanked by a phage integrase (EF0166). The third plasmid region (EF2512–EF2545) is very similar to this, has no aggregation substance gene, but does contain cell-wall surface anchor protein (EF2525) and a putative sortase (EF2524). The presence of three integrated plasmids points to a potentially important role for plasmids in the genome evolution of E. faecalis. In Staphylococcus epidermidis, it has been noted that that IS256 occurs preferentially in nosocomial isolates [19], and this IS element has also been linked to phenotypic variation in biofilm formation [20, 21]. It could be interesting to assess the frequency of IS256 in E. faecalis commensal strains relative to clinical isolates, to see whether this trend is also seen in enterococci. It has been speculated that the presence of multiple copies of IS256 could play a role in genomic flexibility and development of virulence, as inactivation of genes by IS elements allows the development of novel phenotypes [19]. Insertion element movement and integration allows rapid evolution of a bacterial genome. Another example of this is the genome comparisons of Yersinia pestis and Yersinia pseudotuberculosis, which illustrated that differences in virulence are linked to insertion sequence integration and homologous recombination [22]. Inactivation of certain genes and pathways in Y. pestis appear to have played a substantial role in the development of virulence in this organism [22]. Conjugative plasmids in E. faecalis have long been studied as one of the ways in which E. faecalis disseminates antimicrobial resistance genes and other genetic information rapidly. E. faecalis V583 contains three plasmids, pTEF1, pTEF2 (similar to the pheromone response plasmids pAD1 [23] and pAM373 [24]), and pTEF3, which belongs to the pAMb1 family of broad host range plasmids. PTEF2 shares regions of similarity with pCF10 [25], with identical prgA, prgB, and prgC genes (EFB0010–EFB0012), but lacks prgQ and has a novel pheromone inhibitor (EFB0005.1). Only one insertion element, IS256 (EFB0052), is present in this plasmid. The smallest plasmid, pTEF3, contains a prgZ-like pheromone receptor, similar to pTEF2. This prgZ homologue (EFC0001) neighbors several insertion sequences (EFC0002, EFC0004, EFC0007).

129

130

7 Pathogenomics of Enterococcus faecalis

pTEF1 shares extensive sequence homology with the well-characterized enterococcal pheromone response plasmid pAD1 [23]. pTEF1 has both an identical pheromone inhibitor (iAD1) and aggregation substance (Asa1); however, there is a 31kbp inversion falling between the sex pheromone inhibitor and traE1. This plasmid contains a total of eight insertion sequences, five of which are IS1216. The insertion sequences are clustered in two regions at either end of the 31-kbp inversion, one of which (EFA0006–EFA0016) contains an erythromycin resistance gene (EFA0007), a putative multidrug resistance protein (EFA0010), and a putative drug resistance transporter (EFA0014). The second area (EFA0056–EFA0063) contains a 6-aminoglycoside N-acetyltransferase (EFA0060) conferring high-level gentamicin resistance. In addition to the antimicrobial resistance genes, encoded on the V583 chromosome are four putative drug resistance transporters of the EmrB/QacA family (EF0420, EF0785, EF1370, EF1814), three major facilitator type drug resistance transporters (EF1078, EF1943, EF2068), three putative streptomycin resistance genes (EF1076, EF2300, EF2861), and genes predicted to encode daunorubicin resistance (EF1032), tunicamycin resistance (EF1055), and tellurite resistance (EF2698). Vancomycin resistance in E. faecalis V583 is due to the presence of the vanB operon [26] (EF2293–EF2299); however, this is not present on the conjugative element Tn1549 [27]. Instead, the operon is located on a previously unknown mobile element sharing some similarities with Tn1549, but also containing multiple insertions, deletions, and rearrangements. This mobile element has an atypical nucleotide content, suggesting acquisition via horizontal gene transfer. From the E. faecalis genome sequence, homologues for most of the competence proteins of S. pneumoniae can be found, but several are absent. E. faecalis V583 contains four proteins that have identity to the Bacillus subtilis comG operon (EF2044–EF2046, EF1986), but is missing comG4, comG5, and comG7. Interestingly, a prophage is present between comG3 and comG6, suggesting that these genes may in fact be present in other strains of E. faecalis that do not contain the integrated phage. The genome sequences of other lactic acid bacteria such as L. monocytogenes [28], Lactococcus lactis [29], and S. pyogenes [30], also reveal the presence of homologues for most of the genes involved in competence. Due to the discovery of a homologue of the B. subtilis competence regulatory protein MecA in L. monocytogenes, the ability of L. monocytogenes to take up DNA was tested, but competence was not detected in the various strains screened [31]. A protein with identity to MecA is also present in E. faecalis (EF2677). In L. monocytogenes, it was hypothesized that these competence-related proteins may be involved in other functions, and that perhaps these bacteria have developed similar complex regulatory mechanisms in response to certain signals in their environmental niches [31]. It is interesting to speculate on the possibility of natural transformation in E. faecalis, especially due to the large amount of mobile DNA present in its genome, and to suggest that perhaps under certain environmental conditions (e.g., the acidic pH of the gut, or the alkaline pH of the bile duct) this organism could be capable of up taking foreign DNA.

7.3 Genome Sequence of E. faecalis

7.3.2 Environmental Adaptation and Stress Response

Environmental adaptation traits and stress resistance mechanisms have been linked to virulence, as bacterial survival in the host is often reliant on these factors. Disruption of genes encoding such traits often leads to attenuation of the organism in an animal model [32–35]. As mentioned previously, E. faecalis is a very robust organism able to survive in many harsh environments. E. faecalis V583 contains a V-type and F1F0-type ATPase responsible for regulating the intracellular pH and proton motive force. Cation transport ATPases can also contribute to pH + homeostasis and, interestingly, a homologue of a K -ATPase is present on the pathogenicity island. A further mechanism for pH homeostasis is the arginine deiminase pathway, which can raise the pH of the environment due to the release + of ammonia and its reaction with H [36]. The E. faecalis V583 genome shows the presence of all the putative members of this pathway, and has two putative ornithine cadamoyltransferases (EF0105, EF0732), and ornithine cyclodeaminases (EF0118, EF0616), and contains four putative carbamate kinases (EF0106, EF0386, EF0735, EF2575). The genome also contains an MscL-like protein (EF3152), thought to act as a electromechanical switch involved in sensing the state of lipid bilayers [37]. In addition, the genome contains five cold shock proteins and four universal stress proteins. The presence of 14 predicted metal ion P-type ATPases may account for the resistance of this organism to metals and ensure cation homeostasis [38]. Present in the E. faecalis V583 genome are a number of proteins involved in heat shock response, including homologues of heat shock proteins DnaK (EF1308) and GroEL (EF2633), gsp66 and gsp67, respectively [39]. The genome also contains a homologue of CtsR (EF3283) from B. subtilis which has been shown to control molecular chaperone gene expression [40]. Unlike Staphylococcus aureus, S. pneumoniae, S. pyogenes, and L. lactis, the groR operon does not show the presence of CtsR binding sites, and this suggests that in E. faecalis the groE and dnaK operons are primarily controlled by HrcA [39]. CtsR recognition sites have been noted upstream from the clpB (EF2355), clpP (EF0771), and clpE (EF0706) genes [40]. The Clp ATP-dependent proteases have been shown in B. subtilis to play essential roles in stress survival [41]. Recent data from L. monocytogenes suggests that in that organism, clpB is required for virulence and has a role in thermotolerance, but is not involved in other stress responses [42]. The role of these proteins has not been investigated in E. faecalis. Also present is an HtrA (DegP) homologue (EF3027), part of the widely distributed family of serine proteases. In Streptococcus mutans, htrA mutants have a reduced ability to withstand high temperature, and are also more sensitive to low pH and H2O2 [43]. In S. pyogenes the virulence of a degP knockout was reduced in a mouse model, and was sensitive to both temperature and oxidative stress [32]. It was hypothesized that this enzyme is responsible for degrading misfolded or aggregated proteins [32]. More recent work has suggested that DegP influences the expression of at least two virulence factors in S. pyogenes [44]. In S. pneumo-

131

132

7 Pathogenomics of Enterococcus faecalis

niae, HtrA is involved in growth at high temperatures, resistance to oxidative stress, and the ability to undergo genetic transformation [45]. Gls24 is a general stress protein found in E. faecalis (EF0079), and has an adjacent homologue, GlsB (EF0080), which is 71% identical at the amino acid level [46]. A further two homologues of this protein are located on the E. faecalis pathogenicity island (EF0055 and EF0117), both having 55% identity to Gls24 and 54% identity to each other. Gls24 has been shown to be induced at the onset of starvation and also when exposed to bile salt and CdCl2 stress [46]. It has been observed that expression of the gls24 gene is increased when E. faecalis is grown in serum (15-fold increase) and urine (nine-fold increase) [47]. Recently disruption of gls24 has been shown to affect both virulence and stress response; however, disruption of glsB increased bile salt sensitivity, but had no effect on virulence [48]. The actual function of Gls24 and GlsB is unknown at present. Other genes present which may have a role in bile salt resistance include SagA [49], (EF0394), a putative ABC transporter (EF0675-EF0674) which has homology to the BilE bile exclusion system in L. monocytogenes [50], and a putative bile salt hydrolase which is present on the pathogenicity island. E. faecalis has a respiratory pathway that in the absence of hematin or fumarate produces substantial amounts of superoxide [51]. Superoxide anion has a destructive effect on a wide variety of cells and tissues, in addition to biological compounds such as lipids, proteins, and nucleic acids, and because of this, E. faecalis has a number of strategies to combat oxidative stress. Firstly, absent from the E. faecalis genome sequence are superoxide-sensitive enzymes associated with the tricarboxylic acid (TCA) cycle. The E. faecalis V583 genome sequence also shows the presence of several enzymes and gene products that contribute to oxidative stress resistance, such as NADH peroxidase (npr) (EF1211), NADH:peroxiredoxin oxidoreductase (EF2738), alkyl hydroperoxidase resistance protein (aphCF), NADH oxidase (nox) (EF1586), and manganese superoxide dismutase (EF0463), which catalyzes the reduction of superoxide to oxygen and hydrogen peroxide. In addition to these genes, V583 contains a cydABCD operon (EF2061-EF2058), ohr (organic hydroperoxidase resistance protein) (EF0453) [52], and two Dps family proteins (EF3233, EF0606), which have been implicated in the protection of DNA from oxidative stress. A second ohr/OsmC protein (EF3201) may also aid in resistance to organic hydroperoxides. The genome contains an apoenzyme of catalase (katA) (EF1597), which can produce catalase upon the addition of heme [53]. – Extracellular O2 generated by E. faecalis has been shown to damage colonic epithelial cell DNA both in vitro and in vivo [54], and electron spin resonance data sug– gests that nutrient conditions in the rat intestine favor the production of O2 [55]. Previously a link has been suggested between the invasiveness of E. faecalis isolates and their rates of superoxide production [56]. Despite its ability to adapt to many different environmental stresses, E. faecalis possesses a moderate number of regulatory genes, including only three alternative sigma factors, 17 two-component systems, and one orphan response regulator. Three response regulator systems are present on mobile elements, the previously mentioned VncRS homologue (EF1863–EF1864), VanRSB (involved in regulation

7.3 Genome Sequence of E. faecalis

of the van operon) (EF2298–EF2299), and KpdDE, which is found on the pathogenicity island. Two-component signal transduction systems allow bacteria to monitor their environment and respond to a wide array of stresses [57], and recent work has investigated the role of these two-component systems in stress response in E. faecalis V583 [58]. Generation of insertional mutations in 17 of the 18 response regulators demonstrated one response regulator that is essential for cell viability (vicR) (EF1193) [58]. Mutation in the CroRS two-component system has previously been linked to the intrinsic resistance of E. faecalis to cephalosporins [59], and insertional mutation of this locus increased susceptibility to bacitracin, cefotaxime, cefuroxime, and vancomycin, but not ampicillin [58], and has also been associated with defects in growth and cell morphology [60]. These data suggest a role for CroRS in the regulation of cell wall integrity in E. faecalis. The role of two-component systems in E. faecalis JH2-2 has also been examined, and in this genetic background four systems were induced by environmental stresses [60]. In E. faecalis OG1RF, an EtaRS (EF1050–EF1051) mutant was found to be more acidsensitive, and was attenuated in a mouse peritonitis model [61]. S B No r or r -like sigma factors have been found in the genome of E. faecalis, which raises the question of how this bacterium regulates gene expression during stress responses. A regulatory protein HypR (EF2958) has recently been characterized that is involved in oxidative stress response, and disruption reduced survival of E. faecalis in an in vivo–in vitro macrophage infection model [62]. Quorum sensing is an important mechanism used by many prokaryotes to adapt to different environments encountered during infection. E. faecalis has a homologue of LuxS, which is required for the biosynthesis of the type 2 autoinducer (AI-2). AI-2 has been shown to be detected by a sensory protein in a cell-densitydependent manner, and is thought to be involved in the regulation of a wide range of bacterial physiological functions [63]. In S. pyogenes, LuxS activity has been shown to play a role in the expression of virulence factors associated with epithelial cell internalization [64]. The role of LuxS in E. faecalis is at present unknown. Adding to its ability to survive in diverse environments, E. faecalis contains 35 probable PTS-type sugar transporters and encodes pathways for the utilization of 15 different sugars. This is comparable to L. monocytogenes, of which 4% of the genome encodes PTS and ABC transporters [28]. As mentioned previously, the TCA cycle is absent in E. faecalis and energy is derived via glycolysis or the pentose phosphate pathway. E. faecalis has a catabolite repression pathway to actively regulate energy metabolism while growing on easily fermentable sugars such as glucose [51]. The ability of E. faecalis to respond to rapidly changing environments and stresses probably contributes to its survival through all stages of infection. 7.3.3 Survival In Vivo

Originally thought to be an adhesin [65], the E. faecalis endocarditis antigen EfaA has since been found to be present in all E. faecalis isolates [66], and is the third gene of the efaCBA operon (EF2074-EF2076). A homologue, ScaCBA from Strepto-

133

134

7 Pathogenomics of Enterococcus faecalis 2+

coccus gordonii, has been shown to be a Mn ABC transporter [67] and is regulated by a DtxR family metalloregulator, ScaR [68]. EfaA expression has recently been 2+ shown to be Mn -dependent and regulated by a DtxR family protein, EfaR 2+ (EF1005), suggesting that in E. faecalis this protein is also involved in Mn transport [69]. Analysis of the V583 genome shows an additional ten putative EfaRbinding sites, demonstrating that manganese availability is potentially an important factor in infection and gene regulation in vivo [69]. In correlation with this, the abundance of mRNA of efaA has been shown to increase 2195-fold in stationary phase when grown in urine [47], and disruption of efaA has been shown to result in reduced virulence in a mouse peritonitis model [66]. A homologue of the 2+ putative EfaCBA Mn transporter is also present on the pathogenicity island. A further protein potentially involved in in vivo survival is EF0467, a member of 2+ the MgtC family, which has been implicated in survival and growth in low Mg concentrations. EF0467 has 37% identity to MgtC from Salmonella typhimurium, which has been shown to be required for survival inside macrophages and growth 2+ in low Mg medium. 7.3.4 Potential Virulence Factors 7.3.4.1

Hemolysins, Proteases, and other Enzymes

The genome sequence of E. faecalis V583 shows the presence of three putative hemolysins (EF0700, EF0982, EF1685). EF0700 has its closest identity to L64811 (57.2%) from L. lactis, while EF0982 is most similar to Lm01366 from L. monocytogenes. The third putative hemolysin, EF1685, shares 46% identity to hemolysin III from Bacillus cereus [70]. E. faecalis V583 is nonhemolytic on blood agar; however, it is possible that expression of these genes could be induced in in vivo conditions. It is unknown at present whether these genes do indeed encode functional hemolysins. A putative exfoliative toxin A (EF0645) is likewise present in the chromosome. This protein shares 32% identity to Staphylococcus hyicus exfoliative toxin A and again its function in E. faecalis is unknown. Also encoded on the chromosome are two putative xanthan lyases, EF0818 and EF3023, which share 36% identity at the protein level. EF0818 and EF3023 have 31–28% identity to the xanthan lyase from Bacillus sp. G11 [71], and 29–27% identity to hyaluronate lyase from S. pneumoniae [72]. Both these enzymes are involved in breaking glycosidic bonds in polysaccharides, although they have different substrate specificities. Hyaluronidases are an important pathogenic bacterial spreading factor, and cleave hyaluronan, which is a constituent of the extracellular matrix of connective tissues [73]. The substrate specificity of these two enzymes in E. faecalis is uncertain. GelE (EF1818) encodes a secreted zinc metalloprotease which has been shown to cleave a number of substrates including casein, hemoglobin, collagen, fibrin, and gelatin, as well as degrading pheromone and inhibitor peptides [74, 75]. GelE has been hypothesized to clear the bacterial cell wall of misfolded proteins, and

7.3 Genome Sequence of E. faecalis

disruption of this gene has been shown to increase the bacterial chain length [76]. The presence of GelE is essential for the degradation of polymerized fibrin, which has important implications in pathogenesis, and the linkage of GelE to decreased chain length may suggest a role in increasing dissemination of bacteria in highdensity environments [76]. A gelE knockout has shown reduced virulence in models of mouse peritonitis [77], rabbit endophthalmitis [78], and in a Caenorhabditis elegans virulence model [79]. Located downstream from GelE is an ORF called sprE (EF1817), coding for a serine protease, and the presence of functional GelE has been implicated in SprE maturation [80]. SprE encodes a 26-kDa protein that shares homology with the S. aureus V8 protease, and has also been shown to contribute to pathogenesis in several infection models including C. elegans, mouse peritonitis, and a rabbit endophthalmitis model [78, 79, 81]. Deletion of the serine protease, but not gelatinase has been shown to cause attenuation of bacterial infection in the Arabidopsis thaliana root pathogenicity model with the SprE mutant forming poor bacterial communities [82]. The expression of both gelE and sprE is dependent on the fsr genes that encode a two-component signal transduction system [81, 83]. The fsr locus (EF1822–EF1820) encodes three proteins, FsrA, FsrB, and FsrC, which share sequence similarity with AgrA, AgrB, and AgrC, respectively, of S. aureus. This locus has been identified as a two-component signal transduction system that is dependent upon a secreted peptide lactone, meaning that regulation is controlled in a cell-density-dependent manner [83–85]. Accumulation of the peptide, encoded at the C terminus of the FsrB protein, is thought to be sensed by the FsrC histidine kinase, and leads to activation of the FsrA response regulator [85]. Hancock and Perego recently showed a role for fsr and GelE in biofilm formation [86]. Interestingly, although gelatinase and serine protease were found to contribute to virulence in a rabbit endophthalmitis model, an fsrB mutant showed much greater attenuation, suggesting the effect of this gene on the expression of additional traits that contribute to virulence [78]. E. faecalis V583 also contains a second membrane-associated zinc metalloprotease (EF2380), Eep. This protein has been shown to play a role in the production of the sex pheromone cAD1, and is thought to be involved in processing the pheromone precursor structure, or in regulating the expression or secretion [87]. The final secreted virulence factor in E. faecalis is cytolysin, which is a structurally novel toxin. As previously mentioned, this gene locus is present on the pathogenicity island; however, most of this gene locus is absent in E. faecalis V583 due to a 17-kbp deletion. In V583, the genes cylR1, cylR2, clyLL, cylLS, and cylM are present, but the genes cylB, cylA, and cylI, which are essential for mature cytolysin production and self-immunity, are either absent or nonfunctional. Cytolysin is comprised of two nonidentical peptide subunits, both of which are required for lytic activity [88], and is capable of lysing erythrocytes, polymorphonuclear leukocytes, and macrophages [89], as well as a broad range of gram-positive bacteria. Recently cytolysin expression has been shown to be induced in response to target cells present in the environment through a quorum-sensing autoinduction mech-

135

136

7 Pathogenomics of Enterococcus faecalis

anism [90]. In virtually all infection models tested, cytolysin has been shown to contribute to pathogenesis [91–94].

7.3.4.2 Cell-Wall-Associated Virulence Factors In addition to hemolysins and proteases, E. faecalis contains a number of cell-wallassociated proteins that could potentially be involved in pathogenesis. The ability to adhere to host tissues is a critical step in the onset of most microbial infections. Recent examination of the E. faecalis genome has identified the presence of 41 putative cell-wall-anchored proteins, and 17 of these were predicted to contain structural characteristics of the MSCRAMM (microbial surface component recognizing adhesive matrix molecules) type [95]. In addition to this, the E. faecalis genome contains 134 predicted surface-exposed proteins, of which 65 contain repeat regions and may have a role in antigenic variation via a slippage mechanism [8]. The only characterized adhesion from E. faecalis is Ace (EF1099), a 674-aminoacid protein that has similarities with the central region of the A domain of the collagen-binding protein Cna of S. aureus [96]. Ace is thought to be involved in E. faecalis collagen binding, as mutation of Ace showed reduction in 46 C growthelicited binding to immobilized collagen type I, collagen type IV, and mouse laminin [97, 98], and Ace-specific antibody has been shown to inhibit this binding in wild-type E. faecalis [96, 97]. Present in the V583 genome is a putative internalin protein family protein (EF2686), which has 29% identity to InlA from L. monocytogenes and contains a premature stop codon. The internalin family of proteins is characterized by the presence of leucine-rich repeats. In L. monocytogenes the receptor for InlA is E-cadherin, and this protein is required for cell invasion [99, 100]. The interaction of internalin A with E-cadherin on enterocytes is an early critical step for the onset of listeriosis in vivo [101]. In some strains of Listeria, this protein is truncated, but clinical strains express a full-length protein more frequently [102]. Disruption of a homologue of the internalin family of proteins in group A Streptococcus has been shown to result in reduced virulence and increased susceptibility to phagocytosis [103]. Interestingly, EF2686 in E. faecalis was found to be induced 162-fold in stationary phase when cultured in urine compared to culture in 2YT broth, showing an induction of gene expression in an infection-type environment [47]. Aggregation substance (EF0149, EF0485, EFA0047, EFB0011) is expressed on the cell surface of E. faecalis, allowing close contact for conjugation. This protein has been implicated as a virulence trait, due to the fact that cells expressing it have increased adherence and internalization into phagocytes [104, 105], renal cells [106], and epithelia [107–110]. Expression of aggregation substance has been shown to be inducible by human serum in the absence of plasmid-free Enterococcus [106]. Analysis of Asc10, the aggregation substance encoded on pCF10, showed that a domain of this protein was required for aggregation, HT-29 internalization, and maximum levels of lipoteichoic acid binding [107]. Aggregation substance has been shown to be involved in the survival of enterococci within poly-

7.3 Genome Sequence of E. faecalis

morphonuclear leukocytes [104], but does not enhance the urinary tract colonization ability of E. faecalis [111]. The E. faecalis genome encodes a protein (EF1249) which shares 44% identity to FbpA from L. monocytogenes. In L. monocytogenes, this protein has been shown to bind to human fibronectin, and a fbpA mutant was found to have a reduced ability to colonize and survive in a listeriosis mouse model [112]. Inactivation of fbpA also resulted in reduction of protein levels of two other Listeria virulence factors, LLO and InlB, suggesting that FbpA also acts to modulate expression of these two proteins [112]. The function of EF1249 in E. faecalis is at present unknown. Enterococcal surface protein, Esp [113], is found on the pathogenicity island in MMH594, but is absent from V583 due to the previously mentioned 17-kbp deletion in the pathogenicity island of this isolate. The gene, esp, encodes a large cellwall-associated protein and contains a variable number of highly conserved 82amino-acid repeats [113]. Within these repeats is a 13-amino-acid stretch that is identical to a sequence found in the repeats of the Rib and Ca proteins from S. agalactiae [113]. Studies using a mouse model of urinary tract infections have shown that this protein enhances colonization and persistence in urinary bladders [114]. The Esp protein has recently been linked to enhanced biofilm formation, and its contribution was most pronounced in the presence of glucose [115]; however, biofilm production has been shown to occur in the absence of this protein, showing that it is not essential for this process [86, 116]. GelE has also been implicated in biofilm formation [86]. Disruption of GelE or the response regulator required for GelE production in V583 affected biofilm formation, while complementation of GelE restored the wild-type biofilm phenotype [86]. The same study showed that of the 17 two-component regulatory systems present in E. faecalis V583, only the fsr locus was involved in biofilm production. Biofilm production in E. faecalis has also been linked to the protein BopD (EF0954). This protein has homology to a sugar-binding transcriptional regulator, and has 50% similarity to CcpA (EF1741). The actual role of this regulatory gene is unknown, but the association of enhanced biofilm formation in the presence of glucose and the possible involvement of a sugar-binding transcriptional regulator suggest a linkage to increased biofilm production in E. faecalis in the presence of specific carbohydrates. Contributing to immune cell evasion is the E. faecalis variable capsular polysaccharide (EF2495–EF2485). Expression of capsular polysaccharide from the cpsC-K genes has been shown to contribute to host immune evasion in E. faecalis, with mutants being more susceptible to phagocytic killing [117]. Recent work suggests a limited number of E. faecalis capsule serotypes, 60% of the isolates tested being placed into four serotypes [118]. Also present in E. faecalis V583 is a common cell wall polysaccharide, Epa (enterococcal polysaccharide antigen). An epa knockout has been shown to be unable to translocate across a monolayer of polarized human enterocyte-like T84 cells, and complementation was shown to restore its translocation ability [119]. In the same model, 9 out of 15 E. faecalis isolates were able to translocate, showing strain-specific differences in this phenotype [119].

137

138

7 Pathogenomics of Enterococcus faecalis

Other genes that may have a role in immune evasion include homologues of the dlt operon (EF2749–EF2746), responsible for D-alanine incorporation of teichoic acid polymers in cell wall. Formation of D-alanyl-lipoteichoic acid has been shown to be required for host cell adhesion in L. monocytogenes [120]. D-Alanine incorporation into lipoteichoic acid reduces the negative charge, and therefore electrostatic interactions with cationic peptides are altered. The E. faecalis V583 genome also contains a protein, EF0031, which shares identity with MprF. This protein was first described in S. aureus [121], and confers resistance to human defensins. MprF has been demonstrated as modifying phosphatidylglycerol with L-lysine, therefore reducing the negative charge of the membrane [121]. Disruption of the mprF gene in S. aureus led to a 13-fold higher sensitivity to defensin from human neutrophils; however, the action of this protein in E. faecalis has not yet been investigated [121]. 7.3.5 Pathogenicity Island of E. faecalis

The E. faecalis pathogenicity island is 153 kbp and encodes 129 predicted ORFs. Present on this element are genes encoding for virulence factors, transposases, transcriptional regulators, and other proteins potentially involved in adaptation to diverse environments. Spread across the pathogenicity island are 11 transposases and insertion elements, with homology to ORFs from other low G+C gram-positive organisms such as E. faecium, S. aureus, Listeria innocua, L. monocytogenes, S. pneumoniae, and S. epidermidis. In addition, the 5¢ region of the pathogenicity island has extensive nucleotide sequence identity to the pheromone-responsive plasmids pAM373, pCF10, and pAD1 (EF0005-EF0026), suggesting integration of a pheromone-response plasmid. The only transfer-related genes present, however, comprise a TraG-like protein (EF0011) and a region with 87% identity at the nucleotide level to the second transfer origin (oriT) identified in pAD1. Within the pathogenicity island are the virulence traits, Esp, cytolysin, and aggregation substance. In addition are a number of proteins that may have potential roles in virulence and/or environmental adaptation in E. faecalis. One of these traits is a putative bile salt hydrolase (cbh) (EF0040), which could add a new ecological niche to this E. faecalis strain, perhaps allowing growth in bile-rich environments such as the bile ducts, and therefore potentially giving this strain a competitive advantage. EF0040 has 67% similarity to the Bsh from L. monocytogenes, which is a bile salt hydrolase [122]. In L. monocytogenes, Bsh is a virulence factor and is regulated by the virulence regulator PrfA [122]. In vivo, deletion of this gene results in decreased fecal carriage in guinea pigs, and reduced liver colonization and virulence was noted in a mouse intravenous model [122]. The contribution of cbh to the pathogenesis of enterococcal infections has not yet been investigated. In terms of adaptation to different environments, in addition to the bile salt hydrolase, the pathogenicity island contains a phosphotransferase system (PTS) (EF0078–EF0081) and adjacent xylose metabolism genes (EF0082–EF0083), potentially allowing growth on xylose. Other metabolism genes include a ketopantoate

7.3 Genome Sequence of E. faecalis

reductase (EF0037), an ornithine cyclodeaminase (EF0124), a polysaccharide deacetylase (EF0108), and a glycosyl hydrolase (EF0077). These pathways may aid survival in nutrient limiting conditions, or in diverse environments. Additional mechanisms to cope with nutrient limiting conditions include a second putative manganese transporter with homology to EfaCBA, and a putative iron transporter (EF0095, EF0096), both of which may aid in survival in vivo. As previously mentioned, two Gls24-like proteins are present on the pathogenicity island (EF0117 and EF0055). The Gls24 protein has been shown to be induced under certain stress conditions [46], and disruption of gls24 affects virulence in a mouse model [48]. The exact function of Gls24 or its homologues is unknown. Also potentially involved in stress response is a Dps-family surface protein (EF0119), with closest homology to Lactobacillus rhamnosus. Some members of this protein family have been shown to bind nonspecifically to DNA under starved conditions, protecting it from cleavage by reactive oxygen species [123]. In addition a DNA-J homologue is present (EF0028), and a putative DNA-damage-inducible protein (EF0032). Potentially these three proteins may contribute to DNA repair and protection, and may have an important role in bacterial survival in harsh environmental conditions. Encoded on the pathogenicity island is a putative potassium ABC transporter and a neighboring two-component regulatory system (EF0087–EF0091) with homology to the kdp operon from E. coli [124], but with closer identity to putative homologues in L. innocua. This operon may be involved in stress response to osmotic shock, allowing transport of potassium ions. Other potential genes involved in virulence or environmental adaptation include a putative N-actylmannosamine-6-phosphate epimerase (EF0066), which is part of the N-acetylmannosamine utilization pathway, often found in oropharyngeal pathogens [125]. In addition, a large cell-wall-associated protein is present (EF0109), with a predicted size of 207 kDa. This large protein may have a role in colonization or virulence through adhesion or immune evasion, but its role is at present unknown. Two genes (EF0120 and EF0122) present on the pathogenicity island have homology to IbrB, and IbrA, respectively, from E. coli [126]. In E. coli these proteins are responsible for activation of the otherwise silent phage-derived eib genes, which bind immunoglobin [126]. Interestingly, also on the pathogenicity island is a protein EF0052, with 49% similarity to the nisin-resistance protein from L. lactis subsp. lactis biovar diacetylactis DRC3, which has been demonstrated to be involved in resistance to nisin in this strain [127, 128]. In a study of S. agalactiae virulence using signature-tagged mutagenesis, two attenuated mutants were identified with insertions in an NisR homologue, although the function of this gene has not been characterized [129]. As nisin is a cationic peptide, it is possible that this protein may confer cross-resistance to other cationic peptides such as defensins, or bacteriocins produced by other gut bacteria.

139

140

7 Pathogenomics of Enterococcus faecalis

7.4 Conclusions and Future Perspectives

E. faecalis is a leading cause of nosocomial infection and this is probably associated with its ability both to survive in the hospital environment and also to overcome the harsh conditions associated with colonization and infection. In addition, over a quarter of the genome is mobile and acquired DNA, suggesting that this organism has the ability to rapidly acquire genetic information and evolve to adapt to its environment. The expeditious acquisition and dissemination of antimicrobial resistance in this organism has been previously demonstrated. Although several traits that contribute to virulence have been characterized in E. faecalis, little is known about the function of many of the potential virulence genes encoded both on the chromosome and in the pathogenicity island. The bacterial/host relationship is very complex, and the mechanisms that enterococci use to change from commensal organism to the etiologic agent of disease are not yet fully understood. However, with information from the genome sequence it is now possible to look closer at gene loci that may have important roles in this interaction.

References 1 Martin, J. D. and J. O. Mundt. 1972.

2

3

4

5

Enterococci in insects. Appl. Microbiol. 24:575–580. Rice, E. W., J. W. Messer, C. H. Johnson and D. J. Reasoner. 1995. Occurrence of high-level aminoglycoside resistance in environmental isolates of enterococci. Appl. Environ. Microbiol. 61:374–376. Schaberg, D. R., D. H. Culver and R. P. Gaynes. 1991. Major trends in the microbial etiology of nosocomial infection. Am. J. Med. 91:72S–75S. Gordon, K. A. and R. N. Jones. 2003. Susceptibility patterns of orally administered antimicrobials among urinary tract infection pathogens from hospitalized patients in North America: comparison report to Europe and Latin America. Results from the SENTRY Antimicrobial Surveillance Program (2000). Diagn. Microbiol. Infect. Dis. 45:295–301. National Nosocomial Infections Surveillance System. 1999. National Nosocomial Infections Surveillance (NNIS) System report, data summary from January 1990–May 1999, issued June 1999. Am. J. Infect. Control 27:520–532.

6 Low, D. E., N. Keller, A. Barth and R. N.

7

8

9

10

Jones. 2001. Clinical prevalence, antimicrobial susceptibility, and geographic resistance patterns of enterococci: results from the SENTRY Antimicrobial Surveillance Program, 1997–1999. Clin. Infect. Dis. 32:133–145. Sahm, D. F., J. Kissinger, M. S. Gilmore, P. R. Murray, R. Mulder, J. Solliday and B. Clarke. 1989. In vitro susceptibility studies of vancomycin-resistant Enterococcus faecalis. Antimicrob. Agents Chemother. 33:1588–1591. Paulsen, I. T., L. Banerjei, G. S. Myers, K. E. Nelson, R. Seshadri, T. D. Read, D. E. Fouts, J. A. Eisen, S. R. Gill, J. F. Heidelberg et al. 2003. Role of mobile DNA in the evolution of vancomycinresistant Enterococcus faecalis. Science 299:2071–2074. Shankar, N., A. S. Baghdayan and M. S. Gilmore. 2002. Modulation of virulence within a pathogenicity island in vancomycin-resistant Enterococcus faecalis. Nature 417:746–750. Singh, K. V. and B. E. Murray. 1994. Revised estimates of enterococcal chro-

References mosomal sizes. DNA Cell. Biol. 13:1145–1146. 11 Oana, K., Y. Okimura, Y. Kawakami, N. Hayashida, M. Shimosaka, M. Okazaki, T. Hayashi and M. Ohnishi. 2002. Physical and genetic map of Enterococcus faecium ATCC19434 and demonstration of intra- and interspecific genomic diversity in enterococci. FEMS Microbiol. Lett. 207:133–139. 12 Hayashi, T., K. Makino, M. Ohnishi, K. Kurokawa, K. Ishii, K. Yokoyama, C. G. Han, E. Ohtsubo, K. Nakayama, T. Murata et al. 2001. Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res. 8:11–22. 13 Nelson, K. E., D. E. Fouts, E. F. Mongodin, J. Ravel, R. T. DeBoy, J. F. Kolonay, D. A. Rasko, S. V. Angiuoli, S. R. Gill, I. T. Paulsen et al. 2004. Whole genome comparisons of serotype 4b and 1/2a strains of the food-borne pathogen Listeria monocytogenes reveal new insights into the core genome components of this species. Nucleic Acids Res. 32:2386–2395. 14 Novak, R., E. Charpentier, J. S. Braun and E. Tuomanen. 2000. Signal transduction by a death signal peptide: uncovering the mechanism of bacterial killing by penicillin. Mol. Cell. 5:49–57. 15 Robertson, G. T., J. Zhao, B. V. Desai, W. H. Coleman, T. I. Nicas, R. Gilmour, L. Grinius, D. A. Morrison and M. E. Winkler. 2002. Vancomycin tolerance induced by erythromycin but not by loss of vncRS, vex3, or pep27 function in Streptococcus pneumoniae. J. Bacteriol. 184:6987–7000. 16 Haas, W., J. Sublett, D. Kaushal and E. I. Tuomanen. 2004. Revising the role of the pneumococcal vex-vncRS locus in vancomycin tolerance. J. Bacteriol. 186:8463–8471. 17 Bensing, B. A., I. R. Siboo and P. M. Sullam. 2001. Proteins PblA and PblB of Streptococcus mitis, which promote binding to human platelets, are encoded within a lysogenic bacteriophage. Infect. Immun. 69:6186–6192. 18 Smoot, J. C., K. D. Barbian, J. J. Van Gompel, L. M. Smoot, M. S. Chaussee,

19

20

21

22

23

24

G. L. Sylva, D. E. Sturdevant, S. M. Ricklefs, S. F. Porcella, L. D. Parkins et al. 2002. Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks. Proc. Natl. Acad. Sci. U. S. A. 99:4668– 4673. Kozitskaya, S., S. H. Cho, K. Dietrich, R. Marre, K. Naber and W. Ziebuhr. 2004. The bacterial insertion sequence element IS256 occurs preferentially in nosocomial Staphylococcus epidermidis isolates: association with biofilm formation and resistance to aminoglycosides. Infect. Immun. 72:1210–1215. Kiem, S., W. S. Oh, K. R. Peck, N. Y. Lee, J. Y. Lee, J. H. Song, E. S. Hwang, E. C. Kim, C. Y. Cha and K. W. Choe. 2004. Phase variation of biofilm formation in Staphylococcus aureus by IS256 insertion and its impact on the capacity adhering to polyurethane surface. J. Korean Med. Sci. 19:779–782. Conlon, K. M., H. Humphreys and J. P. O’Gara. 2004. Inactivations of rsbU and sarA by IS256 represent novel mechanisms of biofilm phenotypic variation in Staphylococcus epidermidis. J. Bacteriol. 186:6208–6219. Chain, P. S., E. Carniel, F. W. Larimer, J. Lamerdin, P. O. Stoutland, W. M. Regala, A. M. Georgescu, L. M. Vergez, M. L. Land, V. L. Motin et al. 2004. Insights into the evolution of Yersinia pestis through whole-genome comparison with Yersinia pseudotuberculosis. Proc. Natl. Acad. Sci. U. S. A. 101:13826–13831. Francia, M. V., W. Haas, R. Wirth, E. Samberger, A. Muscholl-Silberhorn, M. S. Gilmore, Y. Ike, K. E. Weaver, F. Y. An and D. B. Clewell. 2001. Completion of the nucleotide sequence of the Enterococcus faecalis conjugative virulence plasmid pAD1 and identification of a second transfer origin. Plasmid 46:117– 127. De Boever, E. H., D. B. Clewell and C. M. Fraser. 2000. Enterococcus faecalis conjugative plasmid pAM373: complete nucleotide sequence and genetic analyses of sex pheromone response. Mol. Microbiol. 37:1327–1341.

141

142

7 Pathogenomics of Enterococcus faecalis 25 Dunny, G. M. and B. A. Leonard. 1997.

34 Halsey, T. A., A. Vazquez-Torres, D. J.

Cell-cell communication in gram-positive bacteria. Annu. Rev. Microbiol. 51:527–564. 26 Evers, S., D. F. Sahm and P. Courvalin. 1993. The vanB gene of vancomycinresistant Enterococcus faecalis V583 is structurally related to genes encoding D-Ala:D-Ala ligases and glycopeptideresistance proteins VanA and VanC. Gene 124:143–144. 27 Garnier, F., S. Taourit, P. Glaser, P. Courvalin and M. Galimand. 2000. Characterization of transposon Tn1549, conferring VanB-type resistance in Enterococcus spp. Microbiology 146:1481–1489. 28 Glaser, P., L. Frangeul, C. Buchrieser, C. Rusniok, A. Amend, F. Baquero, P. Berche, H. Bloecker, P. Brandt, T. Chakraborty et al. 2001. Comparative genomics of Listeria species. Science 294:849–852. 29 Bolotin, A., P. Wincker, S. Mauger, O. Jaillon, K. Malarme, J. Weissenbach, S. D. Ehrlich and A. Sorokin. 2001. The complete genome sequence of the lactic acid bacterium Lactococcus lactis ssp. lactis IL1403. Genome Res. 11:731–753. 30 Ferretti, J. J., W. M. McShan, D. Ajdic, D. J. Savic, G. Savic, K. Lyon, C. Primeaux, S. Sezate, A. N. Suvorov, S. Kenton et al. 2001. Complete genome sequence of an M1 strain of Streptococcus pyogenes. Proc. Natl. Acad. Sci. U. S. A. 98:4658–4663. 31 Borezee, E., T. Msadek, L. Durant and P. Berche. 2000. Identification in Listeria monocytogenes of MecA, a homologue of the Bacillus subtilis competence regulatory protein. J. Bacteriol. 182:5931–5934. 32 Jones, C. H., T. C. Bolken, K. F. Jones, G. O. Zeller and D. E. Hruby. 2001. Conserved DegP protease in gram-positive bacteria is essential for thermal and oxidative tolerance and full virulence in Streptococcus pyogenes. Infect. Immun. 69:5538–5545. 33 Brenot, A., K. Y. King, B. Janowiak, O. Griffith and M. G. Caparon. 2004. Contribution of glutathione peroxidase to the virulence of Streptococcus pyogenes. Infect. Immun. 72:408–413.

Gravdahl, F. C. Fang and S. J. Libby. 2004. The ferritin-like Dps protein is required for Salmonella enterica serovar Typhimurium oxidative stress resistance and virulence. Infect. Immun. 72:1155– 1158. 35 Brown, J. S., S. M. Gilliland, S. Basavanna and D. W. Holden. 2004. phgABC, a three-gene operon required for growth of Streptococcus pneumoniae in hyperosmotic medium and in vivo. Infect. Immun. 72:4579–4588. 36 Casiano-Colon, A. and R. E. Marquis. 1988. Role of the arginine deiminase system in protecting oral bacteria and an enzymatic basis for acid tolerance. Appl. Environ. Microbiol. 54:1318–1324. 37 Sukharev, S. I., P. Blount, B. Martinac and C. Kung. 1997. Mechanosensitive channels of Escherichia coli: the MscL gene, protein, and activities. Annu. Rev. Physiol. 59:633–657. 38 Agranoff, D. D. and S. Krishna. 1998. Metal ion homeostasis and intracellular parasitism. Mol. Microbiol. 28:403–412. 39 Laport, M. S., J. A. Lemos, C. Bastos Md Mdo, R. A. Burne and M. Giambiagi-De Marval. 2004. Transcriptional analysis of the groE and dnaK heat-shock operons of Enterococcus faecalis. Res. Microbiol. 155:252–258. 40 Derre, I., G. Rapoport and T. Msadek. 1999. CtsR, a novel regulator of stress and heat shock response, controls clp and molecular chaperone gene expression in gram-positive bacteria. Mol. Microbiol. 31:117–131. 41 Krger, E., E. Witt, S. Ohlmeier, R. Hanschke and M. Hecker. 2000. The clp proteases of Bacillus subtilis are directly involved in degradation of misfolded proteins. J. Bacteriol. 182:3259– 3265. 42 Chastanet, A., I. Derre, S. Nair and T. Msadek. 2004. clpB, a novel member of the Listeria monocytogenes CtsR regulon, is involved in virulence but not in general stress tolerance. J. Bacteriol. 186:1165–1174. 43 Diaz-Torres, M. L. and R. R. Russell. 2001. HtrA protease and processing of extracellular proteins of Streptococcus

References mutans. FEMS Microbiol. Lett. 204:23– 28. 44 Lyon, W. R. and M. G. Caparon. 2004. Role for serine protease HtrA (DegP) of Streptococcus pyogenes in the biogenesis of virulence factors SpeB and the hemolysin streptolysin S. Infect. Immun. 72:1618–1625. 45 Ibrahim, Y. M., A. R. Kerr, J. McCluskey and T. J. Mitchell. 2004. Role of HtrA in the virulence and competence of Streptococcus pneumoniae. Infect. Immun. 72:3584–3591. 46 Giard, J. C., A. Rinc, H. Capiaux, Y. Auffray and A. Hartke. 2000. Inactivation of the stress- and starvation-inducible gls24 operon has a pleiotrophic effect on cell morphology, stress sensitivity, and gene expression in Enterococcus faecalis. J. Bacteriol. 182:4512–4520. 47 Shepard, B. D. and M. S. Gilmore. 2002. Differential expression of virulencerelated genes in Enterococcus faecalis in response to biological cues in serum and urine. Infect. Immun. 70:4344– 4352. 48 Teng, F., E. C. Nannini and B. E. Murray. 2005. Importance of gls24 in virulence and stress response of Enterococcus faecalis and use of the Gls24 protein as a possible immunotherapy target. J. Infect. Dis. 191:472–480. 49 Breton, Y. L., A. Maze, A. Hartke, S. Lemarinier, Y. Auffray and A. Rinc. 2002. Isolation and characterization of bile salts-sensitive mutants of Enterococcus faecalis. Curr. Microbiol. 45:434–439. 50 Sleator, R. D., H. H. Wemekamp-Kamhuis, C. G. M. Gahan, T. Abee and C. Hill. 2005. A PrfA-regulated bile extrusion system (BilE) is a novel virulence factor in Listeria monocytogenes. Mol. Microbiol. 55:1183–1195. 51 Huycke, M. M. 2002. Physiology of enterococci. In: The Enterococci: Pathogenesis, Molecular Biology, and Antibiotic Resistance. M. S. Gilmore, D. B. Clewell, P. Courvalin, G. M. Dunny, B. E. Murray, and L. B. Rice, editors. ASM Press, Washington, DC, 133–175. 52 Rinc, A., J. C. Giard, V. Pichereau, S. Flahaut and Y. Auffray. 2001. Identification and characterization of gsp65, an organic hydroperoxide resistance (ohr)

gene encoding a general stress protein in Enterococcus faecalis. J. Bacteriol. 183:1482–1488. 53 Frankenberg, L., M. Brugna and L. Hederstedt. 2002. Enterococcus faecalis heme-dependent catalase. J. Bacteriol. 184:6351–6356. 54 Huycke, M. M., V. Abrams and D. R. Moore. 2002. Enterococcus faecalis produces extracellular superoxide and hydrogen peroxide that damages colonic epithelial cell DNA. Carcinogenesis 23:529–536. 55 Huycke, M. M., D. Moore, W. Joyce, P. Wise, L. Shepard, Y. Kotake and M. S. Gilmore. 2001. Extracellular superoxide production by Enterococcus faecalis requires demethylmenaquinone and is attenuated by functional terminal quinol oxidases. Mol. Microbiol. 42:729– 740. 56 Huycke, M. M., W. Joyce and M. F. Wack. 1996. Augmented production of extracellular superoxide by blood isolates of Enterococcus faecalis. J. Infect. Dis. 173:743–746. 57 Stock, A. M., V. L. Robinson and P. N. Goudreau. 2000. Two-component signal transduction. Annu. Rev. Biochem. 69:183–215. 58 Hancock, L. E. and M. Perego. 2004. Systematic inactivation and phenotypic characterization of two-component signal transduction systems of Enterococcus faecalis V583. J. Bacteriol. 186:7951– 7958. 59 Comenge, Y., R. Quintiliani, Jr., L. Li, L. Dubost, J. P. Brouard, J. E. Hugonnet and M. Arthur. 2003. The CroRS twocomponent regulatory system is required for intrinsic beta-lactam resistance in Enterococcus faecalis. J. Bacteriol. 185:7184–7192. 60 Le Breton, Y., G. Boel, A. Benachour, H. Prevost, Y. Auffray and A. Rinc. 2003. Molecular characterization of Enterococcus faecalis two-component signal transduction pathways related to environmental stresses. Environ. Microbiol. 5:329–337. 61 Teng, F., L. Wang, K. V. Singh, B. E. Murray and G. M. Weinstock. 2002. Involvement of PhoP-PhoS homologs in

143

144

7 Pathogenomics of Enterococcus faecalis Enterococcus faecalis virulence. Infect. Immun. 70:1991–1996. 62 Verneuil, N., M. Sanguinetti, Y. Le Breton, B. Posteraro, G. Fadda, Y. Auffray, A. Hartke and J. C. Giard. 2004. Effects of the Enterococcus faecalis hypR gene encoding a new transcriptional regulator on oxidative stress response and intracellular survival within macrophages. Infect. Immun. 72:4424–4431. 63 Schauder, S. and B. L. Bassler. 2001. The language of bacteria. Genes Dev. 15:1468–1480. 64 Marouni, M. J. and S. Sela. 2003. The luxS gene of Streptococcus pyogenes regulates expression of genes that affect internalization by epithelial cells. Infect. Immun. 71:5633–5639. 65 Lowe, A. M., P. A. Lambert and A. W. Smith. 1995. Cloning of an Enterococcus faecalis endocarditis antigen: homology with adhesins from some oral streptococci. Infect. Immun. 63:703–706. 66 Singh, K. V., T. M. Coque, G. M. Weinstock and B. E. Murray. 1998. In vivo testing of an Enterococcus faecalis efaA mutant and use of efaA homologs for species identification. FEMS Immunol. Med. Microbiol. 21:323–331. 67 Kolenbrander, P. E., R. N. Andersen, R. A. Baker and H. F. Jenkinson. 1998. The adhesion-associated sca operon in Streptococcus gordonii encodes an inducible high-affinity ABC transporter for 2+ Mn uptake. J. Bacteriol. 180:290–295. 68 Jakubovics, N. S., A. W. Smith and H. F. Jenkinson. 2000. Expression of the viru2+ lence-related Sca (Mn ) permease in Streptococcus gordonii is regulated by a diphtheria toxin metallorepressor-like protein ScaR. Mol. Microbiol. 38:140– 153. 69 Low, Y. L., N. S. Jakubovics, J. C. Flatman, H. F. Jenkinson and A. W. Smith. 2003. Manganese-dependent regulation of the endocarditis-associated virulence factor EfaA of Enterococcus faecalis. J. Med. Microbiol. 52:113–119. 70 Baida, G. E. and N. P. Kuzmin. 1995. Cloning and primary structure of a new hemolysin gene from Bacillus cereus. Biochim. Biophys. Acta. 1264:151–154. 71 Hashimoto, W., H. Nankai, B. Mikami and K. Murata. 2003. Crystal structure

72

73

74

75

76

77

78

79

80

of Bacillus sp. GL1 xanthan lyase, which acts on the side chains of xanthan. J. Biol. Chem. 278:7663–7673. Ponnuraj, K. and M. J. Jedrzejas. 2000. Mechanism of hyaluronan binding and degradation: structure of Streptococcus pneumoniae hyaluronate lyase in complex with hyaluronic acid disaccharide at 1.7 A resolution. J. Mol. Biol. 299:885–895. Kreil, G. 1995. Hyaluronidases – a group of neglected enzymes. Protein Sci. 4:1666–1669. Makinen, P. L., D. B. Clewell, F. An and K. K. Makinen. 1989. Purification and substrate specificity of a strongly hydrophobic extracellular metalloendopeptidase (“gelatinase”) from Streptococcus faecalis (strain 0G1–10). J. Biol. Chem. 264:3325–34. Bleiweis, A. S. and L. N. Zimmerman. 1964. Properties of proteinase from Streptococcus faecalis Var. liquefaciens. J. Bacteriol. 88:653–659. Waters, C. M., M. H. Antiporta, B. E. Murray and G. M. Dunny. 2003. Role of the Enterococcus faecalis GelE protease in determination of cellular chain length, supernatant pheromone levels, and degradation of fibrin and misfolded surface proteins. J. Bacteriol. 185:3613–3623. Singh, K. V., X. Qin, G. M. Weinstock and B. E. Murray. 1998. Generation and testing of mutants of Enterococcus faecalis in a mouse peritonitis model. J. Infect. Dis. 178:1416–1420. Engelbert, M., E. Mylonakis, F. M. Ausubel, S. B. Calderwood and M. S. Gilmore. 2004. Contribution of gelatinase, serine protease, and fsr to the pathogenesis of Enterococcus faecalis endophthalmitis. Infect. Immun. 72:3628–3633. Sifri, C. D., E. Mylonakis, K. V. Singh, X. Qin, D. A. Garsin, B. E. Murray, F. M. Ausubel and S. B. Calderwood. 2002. Virulence effect of Enterococcus faecalis protease genes and the quorum-sensing locus fsr in Caenorhabditis elegans and mice. Infect. Immun. 70:5647–5650. Kawalec, M., J. Potempa, J. L. Moon, J. Travis and B. E. Murray. 2005. Molecular diversity of a putative virulence factor: purification and characterization of

References isoforms of an extracellular serine glutamyl endopeptidase of Enterococcus faecalis with different enzymatic activities. J. Bacteriol. 187:266–275. 81 Qin, X., K. V. Singh, G. M. Weinstock and B. E. Murray. 2000. Effects of Enterococcus faecalis fsr genes on production of gelatinase and a serine protease and virulence. Infect. Immun. 68:2579– 2586. 82 Jha, A. K., H. P. Bais and J. M. Vivanco. 2005. Enterococcus faecalis mammalian virulence-related factors exhibit potent pathogenicity in the Arabidopsis thaliana plant model. Infect. Immun. 73:464– 475. 83 Qin, X., K. V. Singh, G. M. Weinstock and B. E. Murray. 2001. Characterization of fsr, a regulator controlling expression of gelatinase and serine protease in Enterococcus faecalis OG1RF. J. Bacteriol. 183:3372–3382. 84 Nakayama, J., Y. Cao, T. Horii, S. Sakuda, A. D. Akkermans, W. M. de Vos and H. Nagasawa. 2001. Gelatinase biosynthesis-activating pheromone: a peptide lactone that mediates a quorum sensing in Enterococcus faecalis. Mol. Microbiol. 41:145–154. 85 Gilmore, M. S., P. S. Coburn, S. R. Nallapareddy and B. E. Murray. 2002. Enterococcal virulence. In: The Enterococci: Pathogenesis, Molecular Biology, and Antibiotic Resistance. M. S. Gilmore, D. B. Clewell, P. Courvalin, G. M. Dunny, B. E. Murray, and L. B. Rice, editors. ASM Press, Washington, DC, 301–354. 86 Hancock, L. E. and M. Perego. 2004. The Enterococcus faecalis fsr two-component system controls biofilm development through production of gelatinase. J. Bacteriol. 186:5629–5639. 87 An, F. Y., M. C. Sulavik and D. B. Clewell. 1999. Identification and characterization of a determinant (eep) on the Enterococcus faecalis chromosome that is involved in production of the peptide sex pheromone cAD1. J. Bacteriol. 181:5915–5921. 88 Gilmore, M. S., R. A. Segarra, M. C. Booth, C. P. Bogie, L. R. Hall and D. B. Clewell. 1994. Genetic structure of the Enterococcus faecalis plasmid pAD1-

encoded cytolytic toxin system and its relationship to lantibiotic determinants. J. Bacteriol. 176:7335–7344. 89 Miyazaki, S., A. Ohno, I. Kobayashi, T. Uji, K. Yamaguchi and S. Goto. 1993. Cytotoxic effect of hemolytic culture supernatant from Enterococcus faecalis on mouse polymorphonuclear neutrophils and macrophages. Microbiol. Immunol. 37:265–270. 90 Coburn, P. S., C. M. Pillar, B. D. Jett, W. Haas and M. S. Gilmore. 2004. Enterococcus faecalis senses target cells and in response expresses cytolysin. Science 306:2270–2272. 91 Ike, Y., H. Hashimoto and D. B. Clewell. 1984. Hemolysin of Streptococcus faecalis subspecies zymogenes contributes to virulence in mice. Infect. Immun. 45:528–530. 92 Chow, J. W., L. A. Thal, M. B. Perri, J. A. Vazquez, S. M. Donabedian, D. B. Clewell and M. J. Zervos. 1993. Plasmidassociated hemolysin and aggregation substance production contribute to virulence in experimental enterococcal endocarditis. Antimicrob. Agents Chemother. 37:2474–2477. 93 Jett, B. D., H. G. Jensen, R. E. Nordquist and M. S. Gilmore. 1992. Contribution of the pAD1-encoded cytolysin to the severity of experimental Enterococcus faecalis endophthalmitis. Infect. Immun. 60:2445–2452. 94 Garsin, D. A., C. D. Sifri, E. Mylonakis, X. Qin, K. V. Singh, B. E. Murray, S. B. Calderwood and F. M. Ausubel. 2001. A simple model host for identifying Gram-positive virulence factors. Proc. Natl. Acad. Sci. U. S. A. 98:10892– 10897. 95 Sillanpaa, J., Y. Xu, S. R. Nallapareddy, B. E. Murray and M. Hook. 2004. A family of putative MSCRAMMs from Enterococcus faecalis. Microbiology 150:2069– 2078. 96 Rich, R. L., B. Kreikemeyer, R. T. Owens, S. LaBrenz, S. V. Narayana, G. M. Weinstock, B. E. Murray and M. Hook. 1999. Ace is a collagen-binding MSCRAMM from Enterococcus faecalis. J. Biol. Chem. 274:26939– 26945.

145

146

7 Pathogenomics of Enterococcus faecalis 97 Nallapareddy, S. R., X. Qin, G. M. Wein-

98

99

100

101

102

103

104

stock, M. Hook and B. E. Murray. 2000. Enterococcus faecalis adhesin, ace, mediates attachment to extracellular matrix proteins collagen type IV and laminin as well as collagen type I. Infect. Immun. 68:5218–5224. Nallapareddy, S. R., K. V. Singh, R. W. Duh, G. M. Weinstock and B. E. Murray. 2000. Diversity of ace, a gene encoding a microbial surface component recognizing adhesive matrix molecules, from different strains of Enterococcus faecalis and evidence for production of ace during human infections. Infect. Immun. 68:5210–5217. Gaillard, J. L., P. Berche, C. Frehel, E. Gouin and P. Cossart. 1991. Entry of Listeria monocytogenes into cells is mediated by internalin, a repeat protein reminiscent of surface antigens from gram-positive cocci. Cell 65:1127–1141. Mengaud, J., H. Ohayon, P. Gounon, R. M. Mege and P. Cossart. 1996. E-cadherin is the receptor for internalin, a surface protein required for entry of L. monocytogenes into epithelial cells. Cell 84:923–932. Lecuit, M., S. Vandormael-Pournin, J. Lefort, M. Huerre, P. Gounon, C. Dupuy, C. Babinet and P. Cossart. 2001. A transgenic model for listeriosis: role of internalin in crossing the intestinal barrier. Science 292:1722–1725. Jacquet, C., M. Doumith, J. I. Gordon, P. M. Martin, P. Cossart and M. Lecuit. 2004. A molecular marker for evaluating the pathogenic potential of foodborne Listeria monocytogenes. J. Infect. Dis. 189:2094–2100. Reid, S. D., A. G. Montgomery, J. M. Voyich, F. R. DeLeo, B. Lei, R. M. Ireland, N. M. Green, M. Liu, S. Lukomski and J. M. Musser. 2003. Characterization of an extracellular virulence factor made by group A Streptococcus with homology to the Listeria monocytogenes internalin family of proteins. Infect. Immun. 71:7043–7052. Rakita, R. M., N. N. Vanek, K. JacquesPalaz, M. Mee, M. M. Mariscalco, G. M. Dunny, M. Snuggs, W. B. Van Winkle and S. I. Simon. 1999. Enterococcus faecalis bearing aggregation substance is

105

106

107

108

109

110

111

resistant to killing by human neutrophils despite phagocytosis and neutrophil activation. Infect. Immun. 67:6067–6075. Vanek, N. N., S. I. Simon, K. JacquesPalaz, M. M. Mariscalco, G. M. Dunny and R. M. Rakita. 1999. Enterococcus faecalis aggregation substance promotes opsonin-independent binding to human neutrophils via a complement receptor type 3-mediated mechanism. FEMS Immunol. Med. Microbiol. 26:49–60. Kreft, B., R. Marre, U. Schramm and R. Wirth. 1992. Aggregation substance of Enterococcus faecalis mediates adhesion to cultured renal tubular cells. Infect. Immun. 60:25–30. Waters, C. M., H. Hirt, J. K. McCormick, P. M. Schlievert, C. L. Wells and G. M. Dunny. 2004. An amino-terminal domain of Enterococcus faecalis aggregation substance is required for aggregation, bacterial internalization by epithelial cells and binding to lipoteichoic acid. Mol. Microbiol. 52:1159–1171. Wells, C. L., E. A. Moore, J. A. Hoag, H. Hirt, G. M. Dunny and S. L. Erlandsen. 2000. Inducible expression of Enterococcus faecalis aggregation substance surface protein facilitates bacterial internalization by cultured enterocytes. Infect. Immun. 68:7190–7194. Isenmann, R., M. Schwarz, E. Rozdzinski, R. Marre and H. G. Beger. 2000. Aggregation substance promotes colonic mucosal invasion of Enterococcus faecalis in an ex vivo model. J. Surg. Res. 89:132–138. Olmsted, S. B., G. M. Dunny, S. L. Erlandsen and C. L. Wells. 1994. A plasmid-encoded surface protein on Enterococcus faecalis augments its internalization by cultured intestinal epithelial cells. J. Infect. Dis. 170:1549–1556. Johnson, J. R., C. Clabots, H. Hirt, C. Waters and G. Dunny. 2004. Enterococcal aggregation substance and binding substance are not major contributors to urinary tract colonization by Enterococcus faecalis in a mouse model of ascending unobstructed urinary tract infection. Infect. Immun. 72:2445– 2448.

References 112 Dramsi, S., F. Bourdichon, D. Cabanes,

121 Peschel, A., R. W. Jack, M. Otto,

M. Lecuit, H. Fsihi and P. Cossart. 2004. FbpA, a novel multifunctional Listeria monocytogenes virulence factor. Mol. Microbiol. 53:639–649. 113 Shankar, V., A. S. Baghdayan, M. M. Huycke, G. Lindahl and M. S. Gilmore. 1999. Infection-derived Enterococcus faecalis strains are enriched in esp, a gene encoding a novel surface protein. Infect. Immun. 67:193–200. 114 Shankar, N., C. V. Lockatell, A. S. Baghdayan, C. Drachenberg, M. S. Gilmore and D. E. Johnson. 2001. Role of Enterococcus faecalis surface protein Esp in the pathogenesis of ascending urinary tract infection. Infect. Immun. 69:4366– 4372. 115 Tendolkar, P. M., A. S. Baghdayan, M. S. Gilmore and N. Shankar. 2004. Enterococcal surface protein, Esp, enhances biofilm formation by Enterococcus faecalis. Infect. Immun. 72:6032–6039. 116 Kristich, C. J., Y. H. Li, D. G. Cvitkovitch and G. M. Dunny. 2004. Esp-independent biofilm formation by Enterococcus faecalis. J. Bacteriol. 186:154–163. 117 Hancock, L. E. and M. S. Gilmore. 2002. The capsular polysaccharide of Enterococcus faecalis and its relationship to other polysaccharides in the cell wall. Proc. Natl. Acad. Sci. U. S. A. 99:1574– 1579. 118 Hufnagel, M., L. E. Hancock, S. Koch, C. Theilacker, M. S. Gilmore and J. Huebner. 2004. Serological and genetic diversity of capsular polysaccharides in Enterococcus faecalis. J. Clin. Microbiol. 42:2548–2557. 119 Zeng, J., F. Teng, G. M. Weinstock and B. E. Murray. 2004. Translocation of Enterococcus faecalis strains across a monolayer of polarized human enterocyte-like T84 cells. J. Clin. Microbiol. 42:1149–1154. 120 Abachin, E., C. Poyart, E. Pellegrini, E. Milohanic, F. Fiedler, P. Berche and P. Trieu-Cuot. 2002. Formation of D-alanyl-lipoteichoic acid is required for adhesion and virulence of Listeria monocytogenes. Mol. Microbiol. 43:1–14.

L. V. Collins, P. Staubitz, G. Nicholson, H. Kalbacher, W. F. Nieuwenhuizen, G. Jung, A. Tarkowski et al. 2001. Staphylococcus aureus resistance to human defensins and evasion of neutrophil killing via the novel virulence factor MprF is based on modification of membrane lipids with 1-lysine. J. Exp. Med. 193:1067–1076. 122 Dussurget, O., D. Cabanes, P. Dehoux, M. Lecuit, C. Buchrieser, P. Glaser and P. Cossart. 2002. Listeria monocytogenes bile salt hydrolase is a PrfA-regulated virulence factor involved in the intestinal and hepatic phases of listeriosis. Mol. Microbiol. 45:1095–1106. 123 Martinez, A. and R. Kolter. 1997. Protection of DNA during oxidative stress by the nonspecific DNA-binding protein Dps. J. Bacteriol. 179:5188–5194. 124 Altendorf, K., and W. Epstein. 1996. The Kdp-ATPase of Escherichia coli. In: Biomembranes. A. G. Lee, editor, vol. 5. JAI Press, Greenwich, Conn., 403–420. 125 Vimr, E. and C. Lichtensteiger. 2002. To sialylate, or not to sialylate: that is the question. Trends Microbiol. 10:254–257. 126 Sandt, C. H., J. E. Hopper and C. W. Hill. 2002. Activation of prophage eib genes for immunoglobulin-binding proteins by genes from the IbrAB genetic island of Escherichia coli ECOR-9. J. Bacteriol. 184:3640–3648. 127 Froseth, B. R., R. E. Herman and L. L. McKay. 1988. Cloning of nisin resistance determinant and replication origin on 7.6-kilobase EcoRI fragment of pNP40 from Streptococcus lactis subsp. diacetylactis DRC3. Appl. Environ. Microbiol. 54:2136–2139. 128 Froseth, B. R. and L. L. McKay. 1991. Molecular characterization of the nisin resistance region of Lactococcus lactis subsp. lactis biovar diacetylactis DRC3. Appl. Environ. Microbiol. 57:804–811. 129 Jones, A. L., K. M. Knoll and C. E. Rubens. 2000. Identification of Streptococcus agalactiae virulence genes in the neonatal rat sepsis model using signature-tagged mutagenesis. Mol. Microbiol. 37:1444–1455.

147

149

8 Genomics of Streptococci Joseph J. Ferretti and W. Michael McShan

Research directed against a specific medical problem has resulted in contributions to fundamental biological knowledge. The most dramatic example of this is the discovery that deoxyribonucleic acid (DNA) is the substance that transmits genetic information. This initial discovery of the phenomenon known as “the transformation of pneumococcal types” led to the identification of the transforming substance as DNA. Maclyn McCarty, 1911–2005 The Transforming Principle: Discovering that Genes are made of DNA

8.1 Introduction

The genus Streptococcus is made up of a group of low-percentage G+C gram-positive bacteria that includes nearly forty species. The individual species are pathogens or commensals of either humans or animals, such as cows, horses, and pigs. Although a species of streptococci may preferentially colonize a specific host, a number of species can infect alternate hosts when the opportunity arises. These organisms are found mainly on the mucosal surfaces of the upper respiratory tract or oral cavity, and may also be found in the intestinal tract or on the skin. Because of the great medical, veterinary, and industrial interest in these organisms, they were early targets for genome sequencing. In the last several years, the genome sequences of 14 different strains of streptococci have been completed, with another 14 in various stages of completion (Tables 8.1, 8.2).

150

8 Genomics of Streptococci [a]

Tab. 8.1 Complete sequences

Organism

Size

GenBank

RefSeq

Center

Streptococcus agalactiae 2603V/R

2.16

AE009948

NC_004116

TIGR

Streptococcus agalactiae NEM316

2.21

AL732656

NC_004368

Institut Pasteur

Streptococcus mutans UA159

2.03

AE014133

NC_004350

University of Oklahoma

Streptococcus pneumoniae R6

2.04

AE007317

NC_003098

Eli Lilly Company

Streptococcus pneumoniae TIGR4

2.16

AE005672

NC_003028

TIGR

Streptococcus pyogenes SF370

1.85

AE004092

NC_002737

University of Oklahoma

Streptococcus pyogenes MGAS10394

1.90

CP000003

NC_006086

NIH-RML

Streptococcus pyogenes MGAS315

1.90

AE014074

NC_004070

NIH-RML

Streptococcus pyogenes MGAS8232

1.90

AE009949

NC_003485

NIH-RML

Streptococcus pyogenes SSI-1

1.89

BA000034

NC_004606

Osaka University

Streptococcus thermophilus CNRZ1066

1.80

CP000024

NC_006449

INRA-FR

Streptococcus thermophilus LMG 18311

1.80

CP000023

NC_006448

Catholic University of Louvain

NIH-RML, National Institutes of Heatlh – Rocky Mountain Laboratories; INRA-FR, Institut National de la Recherche Agronomique, France. a http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi

8.1 Introduction Tab. 8.2 Incomplete sequences

Organism

Web site

Streptococcus agalactiae A909

http://www.tigr.org/tdb/mdb/mdbinprogress.html

Streptococcus equi

http://www.sanger.ac.uk/Projects/S_equi/

Streptococcus gordonii str. Challis

http://www.tigr.org/tdb/mdb/mdbinprogress.html

Streptococcus mitis NCTC 12261

http://www.tigr.org/tdb/mdb/mdbinprogress.html

Streptococcus pneumoniae 23F

http://www.sanger.ac.uk/Projects/S_pneumoniae/

Streptococcus pneumoniae 670-6B P

http://www.tigr.org/tdb/mdb/mdbinprogress.html

Streptoccus pneumoniae 23F

http://www.sanger.ac.uk/Projects/S_pneumoniae/

Streptococcus pyogenes Manfredo

http://www.sanger.ac.uk/Projects/S_pyogenes/

Streptococcus pyogenes NZ131 (M49)

http://www.genome.ou.edu/

Streptococcus sanguis strain SK36

http://www.sanguis.mic.vcu.edu/

Streptococcus sobrinus 6715

http://www.tigr.org/tdb/mdb/mdbinprogress.html

Streptococcus suis 89/1591

http://genome.jgi-psf.org/microbial/index.html

Streptococcus suis P1/7

http://www.sanger.ac.uk/Projects/S_suis/

Streptococcus thermophilus LMD-9

http://genome.jgi-psf.org/microbial/index.html

Streptococcus uberis 0140J

http://www.sanger.ac.uk/Projects/S_uberis/

Streptococcus zooepidemicus

http://www.sanger.ac.uk/Projects/S_zooepidemicus/

Comparative microbial genomics had its beginning with the evaluation of differences and similarities in distantly related species, including an analysis of the minimal gene set essential to sustaining a functional microbe in accord with its environmental niche [1]. However, with the completion of many new genomes, inter- and intraspecies comparisons of genomes became possible. Such is the case with the streptococci, including the genome sequences of 11 different species, with multiple strains of several species. This chapter will provide an overview of the present state of genome biology of each of the species, with a following section on bacteriophages unique to the streptococci. These organisms were serologically differentiated from each other on the basis of the major carbohydrate found in the cell wall by Rebecca Lancefield [2], and hence the organisms were designated into serological groups. More recently, Kawamura et al. [3] divided the genus Streptococcus into six major groups based on their 16S rRNA sequences: i.e., the pyogenic, mitis, bovis, salivarius, anginosus, and mutans groups. Excellent recent reviews of the biology, genetics, and host response to all the streptococcal species can be found in Fischetti et al.’s Gram-Positive Pathogens [4].

151

152

8 Genomics of Streptococci

8.2 Bacterial Genomes 8.2.1 Pyogenic Group 8.2.1.1

Streptococcus pyogenes

Streptococcus pyogenes, the group A streptococcus (GAS), is a strict human pathogen responsible for a wide variety of diseases, ranging from acute infections such as pharyngitis, scarlet fever, and erysipelas to severe invasive infections such as necrotizing fasciitis (flesh-eating disease) and streptococcal toxic shock syndrome, as well as the sequelae of rheumatic fever and acute glomerulonephritis. This organism has been divided into over 150 distinct M types based on serological differences found in a surface protein known as the M protein (specified by the emm gene). The genome sequences of seven different strains of GAS have been completed, including M1 [5], M3 (two strains) [6, 7], M5 (www.sanger.ac.uk/Projects/ S_pyogenes/), M6 [8], M18 [9], and M49 (McShan, personal communication) strains (Table 8.1). The average G+C content ranged from 38.4% to 38.7 % and the genome size from 1.84 Mbp to 1.90 Mbp. A prominent feature of the genomes is the presence of mobile genetic elements; i.e., bacteriophage genomes, insertion sequences (IS), and other transposase-containing elements. Bacteriophages are ubiquitous in GAS genomes [10], which contain between two and six different integrated bacteriophage genomes, accounting for up to 12% of the GAS genome (M6). The presence of these phage genomes accounts for the major differences in gene content and size observed between GAS strains, with most core and functional categories of genes being highly conserved. The genomes of the six different M-type strains for which complete sequences are available all contained six rRNA operons. A characteristic of these genomes is that most genes (approx. 75%) are located on the leading rather than the lagging strand of replication. Another feature is that gene order and chromosome organization of the various GAS genomes is generally conserved. Nevertheless, variation here also occurs, as evidenced by large chromosome inversions found in M3 and M5 strains. Nakagawa and colleagues [7] showed that large chromosome rearrangements were found in 65% of M3 strains analyzed in Japan after 1990, but only 25% of strains analyzed before 1985. These recombinations occurred across the replication axis and between homologous bacterial chromosome and prophage sequences, resulting in gross changes in gene order (Fig. 8.1). The alignment of six different GAS genomes is also shown in Fig. 8.1, where it can be seen for the most part that gene order and orientation are highly preserved. Note that significant inversions are present in both the M3 (SSI-1) and M5 genomes. Genetic diversity has been described in GAS strains as ascertained by single nucleotide polymorphism analysis, whole genome polymerase chain reaction (PCR) scanning, and DNA–DNA microarray analysis [12]. Allelic variation and polymorphisms are particularly common among the GAS virulence genes [13], as

8.2 Bacterial Genomes

Fig. 8.1 GAS Genome alignments. The seven available GAS genomes were compared using the software package Mauve, a method that identifies conserved genomic regions, rearrangements and inversions in conserved regions, and the exact sequence breakpoints of such rearrangements across multiple genomes [11]. Corresponding regions share the same color

and are connected by lines. Collinear regions are positioned above the central axis (relative to M1) while inverted regions are drawn below the axis. Light gray regions were too divergent in at least one genome to be meaningfully aligned. (This figure also appears with the color plates.)

noted for emm, sic [14], scl [15], ska [16], speA [17], and speB [18]. This is especially the case with surface antigen genes such as emm, which has undergone significant gene duplication, recombination, and point mutations to form 158 different emm genes and the M family of genes [19]. These kinds of changes result in great genetic diversity of the GAS and represent a common survival strategy of microbial pathogens to avoid detection or to escape a host’s immune system.

8.2.1.1.1 Virulence Factors A comprehensive review of GAS virulence factors, including cell-associated and extracellular proteins as well as genes that regulate their expression, has been compiled by Hynes [20]. The genes for these virulence factors are either chromosomally located or associated with a bacteriophage. Well-known and well-characterized virulence factor genes found on the bacterial chromosome include those for M protein, hyaluronic acid capsule, C5a peptidase, M-like proteins, fibronectin binding proteins, cysteine proteinase, streptolysin O, streptolysin S, streptokinase, CAMP factor, and hyaluronidases. Phage-associated GAS virulence factors (discussed below) include superantigens (pyrogenic exotoxins), DNases (streptodornases), and phospholipases.

153

154

8 Genomics of Streptococci

More than 40 putative virulence factors were originally identified in the M1 genome, and since that time additional newly described virulence factors have been identified in the recently sequenced genomes: e.g., phospholipase A2 (Sla) [21], EndoS (endoglycosidase activity on IgG) [22], SpyA (ADP-ribosyltransferase) [23], Lbp (laminin binding protein [24], IdeS, immunoglobulin G degrading enzyme [25], and R6 surface protein [8]. Since approximately one-third of the genes in each genome remain to be characterized in terms of function, additional virulence factors will most certainly be identified in the future. Approximately 50% of GAS strains obtained in population-based surveillance studies in the United States have the ability to opacify sera. These strains are said + to be OF and contain the gene for serum opacity factor (sof) [26]. McShan and colleagues (personal communication) have recently sequenced an M49 strain, the first complete sequence of a GAS strain known to contain sof. The M49 genome contains only two complete bacteriophage genomes and, as in other strains, each phage has genes located near the insertion site which specify superantigens SpeH and MF3.

8.2.1.1.2 Horizontal Gene Transfer Horizontal gene transfer mediated by bacteriophages is perhaps the most important phenomenon occurring in the GAS, where they have a major impact on pathogenicity as well as bacterial genome diversity and evolution. These bacteriophages have been shown to contain genes termed “morons,” because they add more DNA to the phage genome [27]. Virtually all of these bacteriophage-associated genes specify proteins with signal peptides essential for secretion into the extracellular environment where they may enhance the survival of the bacterial cell in that particular environment. Phage-associated GAS virulence factors, many of which were unknown prior to genome sequencing, include SpeA, SpeC, SpeG, SpeH, SpeI, SpeJ, SpeK, SpeL, SpeM, MF2. MF3, MF4, Sla, Ssa, Sda, Sdn, as well as HylP. Most of these factors are superantigens, also known as streptococcal pyrogenic exotoxins and erythrogenic toxins, which are potent activators of T cells that release cytokines and cause toxic shock (see Ref. [28]]). All of the streptodornases identified to date, with the exception of MF (SpeF or DNaseB), are phage encoded [29]. Proteomic analysis showed that these proteins shared a mosaic pattern, suggestive of multiple recombination events between phage virulence genes occurring during phage excision and integration. As in many of the chromosomal genes, considerable allelic variation exists in the associated virulence genes [13]. Taken together, these facts emphasize the important contribution phages make via horizontal gene transfer to the generation of new strains with increased virulence. The primary mechanism of horizontal gene transfer among the GAS has been by bacteriophages, though it is clear that conjugative transfer of plasmids containing antibiotic resistance genes occurs [30]. Natural transformation of GAS has never been observed; however, Hanski and colleagues [31] discovered a streptococcal invasion locus (sil) that appears to be involved with DNA transfer in vitro. The

8.2 Bacterial Genomes

sil locus is contained on one of two notable complex transposable elements that have been recently identified in GAS. The IS1562 transposon was found associated with many strains causing necrotizing fasciitis, and inactivation of this transposable element caused decreased lethality in a mouse model. Identified in an M14 strain, the transposon contains at least five genes (silA-E) that are highly homologous to the quorum-sensing competence regulons of Streptococcus pneumoniae. silA and silB encode a putative two-component system whereas silD and silE encode two putative ABC transporters. silC is a small open reading frame (ORF) of unknown function preceded by a combox promoter [31]. The authors speculate that the sil locus may have been obtained by horizontal gene transfer since it has a lower G+C content (32%) than the 38.5% found in the M1 genome. Among sequenced GAS genomes, the sil locus is absent from the M1, M3, and M5 genomes, but is found in the M18 and M49 genomes. IS1562 has been suggested to be responsible for facilitating recombination events in GAS [32] and is also found in genome strains MGAS8232 and NZ131 (unpublished results). The genome of the M6 strain MGAS10394 contains a genetic element (MGAS10394.4) that confers macrolide resistance (mefA) [8]. This element is very closely related to Tn1207.3 [33], having a portion of another IS element incorporated into the distal end. MGAS10394 also has a gene encoding a novel surface-exposed protein that elicits an antibody response detectable in convalescent sera [8]. Thus, horizontal gene transfer, as well as an active recombination system, contributes to overall genome plasticity and GAS genetic diversity. Genome decay by the loss of genes and gene function is thought to counterbalance the acquisition of new sequences by horizontal gene transfer [34, 35], and both of these processes contribute to GAS evolution.

8.2.2.1 Streptococcus agalactiae Streptococcus agalactiae, a group B streptococcus (GBS), is a human pathogen responsible for neonatal sepsis and meningitis, as well as for severe invasive disease in adults. Additionally, GBS are also animal pathogens responsible for bovine mastitis. Nine serotypes of GBS have been identified based on the differences in polysaccharide capsule [36]. The sequences of GBS serotypes III and V genomes [37, 38] have been completed, and the genome of a serotype Ia strain is in progress (Table 8.2). The two complete genomes were approximately 2.2 Mbp in size with a G+C content of 35.6%. Although no complete bacteriophages were found in any of the sequenced genomes, bacteriophages have long been used in various GBS typing systems [39] and epidemiological studies [40–42]. The most prominent observation in the two genomes was a clustering of virulence factors in regions or islands encoding at least five contiguous genes and of sizes up to 81 kbp. The majority of putative virulence factors were found in these islands, 14 islands in the type II strain and 15 in the type V strain, as well as mobile genetic element sequences such as phage and insertion sequences, transposons, and plasmids. Although the role of these sequences in gene acquisition is

155

156

8 Genomics of Streptococci

not clear because of extensive rearrangements among the islands, many of the virulence genes had G+C contents differing from that of the entire genome [38]. As in the GAS, there is a strong association of genetic transfer elements and virulence factors contributing to the common theme that pathogenicity islands play an important role in increased virulence and genetic diversity. Both of the sequenced GBS strains contained a larger number of two-component regulatory systems than other streptococcal species, e.g., 17–21 in GBS vs 14 in S. pneumoniae, and 13 in both GAS and S. mutans. A number of other differences were also observed, including differences in metabolic pathways and transport systems, as well as a larger number of transcriptional regulators. These observations suggest that the GBS have a wider range of life style opportunities than other gram-positive streptococci, which allows them to survive in response to variations in the external environment. A central question relates to the differences which exist in GBS strains that are preferentially associated with humans or animals. Spellerberg and colleagues [43] identified a putative composite transposon that contains a gene encoding the surface protein Lmb (mediates binding of GBS to human laminin) and ScpB (C5a peptidase), both factors which have been identified as important GBS virulence factors. All GBS strains of human origin, as well as group C and G strains isolated from humans, contained this putative composite transposon. However, the majority of GBS strains from animal origin contained little or none of the scpB or lmb sequences.

8.2.2.2 Group C (GCS) and Group G Streptococci The streptococci of groups C and G are found in humans and animals, where they may cause serious diseases similar to those described for GAS disease (see Ref. [44] for a comprehensive review). These organisms include Streptococcus equi subsp. equi, Streptococcus dysgalactiae subsp. equisimilis, and Streptococcus equi subsp. zooepidemicus. These three species are “large” colony forming streptococci and resemble the GAS in many biochemical and clinical characteristics. Other, typically small colony forming streptococcal species can also carry the group C antigen, including Streptococcus dysgalactiae subsp. dysgalactiae and Streptococcus milleri [45]. Streptococcus canis is a member of the group G streptococci. The genome sequences of S. equi and S. zooepidemicus are in progress (Table 8.2). GCS, in contrast to the other pathogenic streptococci, are remarkable in that they are commonly associated with both human and animal disease. S. equi is the causative agent of strangles, an acute upper respiratory infection of horses. S. equisimilis is the most common member of the GCS associated with human disease, causing pharyngitis, bacteremia, skin infections, and puerperal sepsis. Like GAS, GCS often carry genes encoding superantigens [46–48]. For example, evidence from the in-progress S. equi genome reveals that two streptococcal superantigen genes (speL and speM) are present; remarkably, these superantigens are also found in a subpopulation of GAS isolates (15% and 5%, respectively) [48]. The carriage of these genes by separate streptococcal species suggests that these genes are avail-

8.2 Bacterial Genomes

Fig. 8.2 GAS vs S. uberis alignment. The corresponding regions in the M1 and S. uberis genomes identified by Mauve analysis are shown. Several large regions of collinearity are evident between the two species. Corresponding regions share the same color and are connected by lines. (This figure also appears with the color plates.)

able for horizontal transfer, perhaps via a mobile element that enables gene transfer across phylogenetic borders.

8.2.2.3 Streptococcus uberis The species Streptococcus uberis is heterogeneous with respect to Lancefield typing and is another important animal pathogen, which causes bovine mastitis. The genome sequence of this organism has just been completed (J. Leigh and P. Ward, personal communication) (Table 8.2). S. uberis 0140J has a genome size of 1.85 Mbp and a G+C content of 36.6%. Several proposed virulence-associated genes have been characterized in S. uberis, including the capsule, neutrophil toxin, M-like protein, R-like protein, plasminogen activator PauA, and the CAMP factor [49–51]. Two substantial tracts of bacteriophage-associated DNA are present, as are paralogues of several superantigens found in GAS. The alignment of the S. uberis and GAS M1 genomes is shown in Fig. 8.2, where several rearrangements are apparent, but with substantial homology in the overall genome order between the two genomes, suggesting a close evolutionary relatedness between the two organisms. As with several of the other streptococci that infect animals of agricultural importance, this genetic relatedness with GAS may provide a pool of genes that could be horizontally transferred from the animal to the human pathogen in situations where humans and animals are in close contact, such as on farms or in food processing facilities. 8.2.2 Bovis Group 8.2.2.1

Streptococcus bovis and Streptococcus suis

Both Streptococcus bovis and Streptococcus suis are classified as group D streptococci. S. suis is a pathogen of both pigs and humans, causing meningitis where

157

158

8 Genomics of Streptococci

pigs and humans are in close contact. The genome sequence of S. suis is in progress (Table 8.2). A number of virulence factors have been identified in virulent isolates of S. suis, including suilysin (a cholesterol binding cytolysin), MRP (muramidase-released protein), and EF (extracellular protein) [52]. Bacteriophages have been isolated from both S. bovis and S. suis [53–56]; however, any role that these elements play in the dissemination of virulence genes or horizontal transfer is unknown. 8.2.3 Mitis Group 8.2.3.1

Streptococcus pneumoniae

The most notable member of the mitis group is Streptococcus pneumoniae, which is found primarily in the nasopharynx. It is responsible for a great deal of human disease, including acute bacterial pneumonia and invasive disease. It contains no Lancefield group-specific antigens and is subdivided into more than 90 groups based on polysaccharide capsule type. Two genomes of S. pneumoniae have been completely sequenced, and a third is in progress (Tables 8.1, 8.2). Additionally, a sequence diversity study of 14 different strains with important phenotypic and pathogenetic differences is in progress (http://genome.microbio.uab.edu/strep/ info/). The two S. pneumoniae sequenced strains have a genome size and G+C content of 2.03 Mbp and 40% respectively for the type 2 capsule R6 strain, [57] and 2.16 Mbp and 39.7% for the type 4 capsule TIGR4 strain [58]. With an almost 10% difference in gene content between the two strains, there are strain-specific multiple gene clusters present in each genome. Several of these clusters contain genes encoding cell surface proteins known to interact with host cells and which appear to have sequence variability. S. pneumoniae has been appropriately termed a “paradigm for recombination-mediated genetic plasticity” [59], because of its natural competence and high transformability. The S. pneumoniae genome contains numerous IS, BOX, and RUP repetitive elements which account for up to 5% of the genome and represent hotspots for genetic recombination. More than 250 duplicated genes are present in the genome as well as nearly 400 genes with iterative motifs that can result in phase variation. Additionally, codon preference profiles of almost 500 genes in S. pneumoniae showed a lower G+C content (33.4%) than in most of the genome (39.7%), suggesting that they were obtained by horizontal gene transfer [60]. Some of these genes may have come from other organisms since Tettelin et al. identified at least 40 genes with significant homology to genes in gram-negative bacteria [58]. Evidence of intragenic recombinations creating mosaic structure is apparent in several genes, including the penicillin binding proteins, which contained segments identical to genes of other streptococcal species [61]. Genome heterogeneity, mosaic genes, and plasticity are features not only of S. pneumoniae, but are also found in other members of the mitis group of streptococci [62].

8.2 Bacterial Genomes

8.2.3.2 Streptococcus mitis, Streptococcus sanguis, and Streptococcus gordonii Further members of the mitis group of streptococci, but which also possess the Lancefield group H antigen, are Streptococcus mitis, Streptococcus sanguis, and Streptococcus gordonii. These species predominate in the oral cavity, and the sequencing of each of the genomes is in progress (Table 8.2). While generally considered human commensals, these organisms may play important roles in dental or medical disease. One series of studies found that S. mitis released superantigen-like molecules that might promote the oral inflammatory response [63]. Subsequently, at least one outbreak of streptococcal toxic shock syndrome was reported with S. mitis as the causative agent, validating the earlier observations [64]. Pneumolysin, a major virulence factor in S. pneumoniae, is also present in S. mitis [65], suggesting that horizontal transfer occurs between these species. S. mitis is an occasional cause of infective endocarditis, and bacterial binding of platelets is an important mechanism in the pathogenesis of this disease. Platelet binding by S. mitis is mediated by the surface proteins PblA and PblB, encoded by a lysogenic bacteriophage [66, 67]. Thus, it is possible that S. mitis shares a number of genetic characteristics with the better known pathogenic streptococci, and the analysis of the genome of this organism may reveal even further insights into potential virulence mechanisms. No similar association between a temperate phage and the transmission of virulence traits has been seen yet in S. sanguis or S. gordonii. 8.2.4 Anginosus and Salivarius Group 8.2.4.1

Streptococcus salivarius

A group K Streptococcus, Streptococcus salivarius is a primary colonizer of the oral mucosa and is a permanent member of the oral flora [68]. The anginosus (milleri) group of streptococcal species are commensal organisms found in the oral cavity and pharynx that are frequent agents of human disease. Members of the group can differ in Lancefield classification while being related at the genetic level. Streptococcus intermedius (group F, C, or not typeable) has been found associated with abscesses of the brain and liver, while Streptococcus anginosus (group G or C) and Streptococcus constellatus subsp. constellatus (group F mostly) are isolated from infections in a wider range of anatomical sites [69]. In 1999, a new member of the anginosus group, S. constellatus subsp. pharynges, was isolated from cases of pharyngitis and was serologically a member of Lancefield group C [70]. The toxin intermedilysin is produced by S. intermedius, distinguishing it from the other members of this group who all lack this gene. Further, intermedilysin production level in isolates from brain and liver abscesses is higher than in strains from normal habitats, such as dental plaque, or from peripheral infection sites, suggesting that this protein is a key factor in inducing or maintaining deep-seated infections [71]. The potential genetic relationships to other pathogenic streptococci of this group of species is poorly characterized and should be clarified by future genome analysis.

159

160

8 Genomics of Streptococci

8.2.4.2 Streptococcus thermophilus Streptococcus thermophilus is a thermophilic lactic acid bacterium used for the manufacture of dairy products. On the basis of phylogenetic sequence data of the RNase P RNA gene, rnpB, it is part of the salivarius group [72]. The complete sequence of two different strains of S. thermophilus has been reported (Table 8.1) [73]. Both strains have a genome size of 1.8 Mbp with 42 regions of sequence differences and a remarkably high level (10%) of gene decay. Horizontal gene transfer is evident in the genomes with the presence of numerous phages, transposons, and integrons. One region identified as a hotspot of IS sequences contains a mosaic of fragments with high identity to gene regions from Lactobacillus bulgaricus and L. lactis, organisms commonly found in milk. Interestingly, many of the genes involved in carbohydrate metabolism are decayed or missing. However a specific symporter for lactose (the main carbohydrate of milk) is present in S. thermophilus, but absent in other streptococci. The significant differences in gene content from other streptococci most likely have occurred by horizontal gene transfer and gene decay as the organism evolved into its unique ecological niche. 8.2.5 Mutans Group 8.2.5.1

Streptococcus mutans and Streptococcus sobrinus

The two species Streptococcus mutans and Streptococcus sobrinus are found in the oral cavity, colonize exclusively the tooth surface of humans and some animals, and are responsible for dental caries. The genome of a serotype c strain of S. mutans has been sequenced and has a genome size of 2 Mbp and a G+C content of 36.8% [74]. Although considered an oral pathogen, it lacks virtually all virulence functions of other pathogenic streptococci. Its primary virulence mechanisms include glucosyltransferase enzymes capable of synthesizing adherent glucans, which bind to the smooth surfaces of teeth, glucan binding proteins, and its acidogenic potential as a lactic acid producer. Additionally, it has the ability to metabolize a wide variety of carbohydrates and has pathways for the synthesis of all of its required amino acids. Approximately 15% of the S. mutans genome is devoted to genes for various transport systems, most of which are devoted to the carbohydrates it uses. It also possesses genes for proteases, peptidases, and other exoenzymes for action on various food substrates present in the oral cavity. Horizontal gene transfer occurs via transformation in S. mutans as it has a fully active competence system. However, no bacteriophages have been identified in its genome. The alignment of the genomes of S. mutans and S. mitis indicates a lesser degree of homology than with organisms in the same species.

8.2 Bacterial Genomes

8.2.6 Other Organisms: Enterococcus faecalis

Originally known as Streptococcus faecalis, Enterococcus faecalis is the organism most commonly associated with the group D streptococci; however, DNA analyses showed that enterococci are biologically, serologically, and genetically different from the streptococci [75]. This organism is reviewed in Chapter 7 of the present volume. 8.2.7 Comparative Genomics

As a group of organisms, the streptococci have both common themes and differences which emphasize their individuality and genetic diversity. The availability of several genome sequences within a given species illustrates that significant differences occur even in the same serotype. Horizontal gene transfer and recombination, common to all the streptococci, has contributed greatly to the generation of strategies for the survival of an organism in its ecological niche. These strategies include altering or deleting surface structure genes to evade the host immune system, gaining new genes such as surface proteins with particular binding affinities, gaining or deleting metabolic pathways for suitable nutrient acquisition, and addition of new virulence genes such as exoenzymes and toxins. Prime examples are the bacteriophage transfer of superantigen genes to new GAS hosts and the highly efficient transformation system of S. pneumoniae, generating mosaic structures, genomic rearrangements, and genetic diversity. In both cases new clones with increased virulence properties are created which have the ability to eventually disseminate a new wave of disease. The ecological niche and the environment of an organism is reflected in the genetic makeup of pathways and other fitness traits essential for its existence. Organisms such as the GAS in the pyogenic group live in a nutrient-rich environment [76] and cannot survive without substrates available from host tissues. Their metabolic pathways mirror this need with virtually no pathways present for the synthesis of amino acids, but at the same time possessing several ABC transporters for amino acid uptake systems, as well as additional transporter systems for the uptake of dipeptides and oligopeptides. In contrast, organisms such as the mitis group of streptococci contain additional pathways essential for an environment not so rich in nutrients where different kinds of substrates are present. These include additional and different uptake and transport systems as well as different regulatory controls. One member of the mitis group, S. pneumoniae, has a high proportion of sugar transporter genes, a feature that allows it to compete for sugars with other respiratory tract bacteria. Gene expression profiling under various environmental conditions will be an important avenue for future investigation. Evolutionary changes in genomes can be illustrated by aligning orthologous regions of sequences. Large-scale changes such as rearrangements may result in

161

162

8 Genomics of Streptococci

regions of one genome being inverted or reordered relative to another. Figure 8.1 illustrates how closely related organisms of the same species are generally closely aligned, with a high conservation of gene order and orientation. Inversions and rearrangements in these genomes is equally obvious, indicating more recent changes. Such a change led Nakagawa et al. to suggest that clonal expansion of the rearranged M3 strains may be related to recent increases in rheumatic fever and severe invasive infections in Japan [7]. The lack of alignment of genomes from different streptococcal species illustrates that larger changes have occurred with little concordance of gene block sequence and order, and suggests that these genomes are constantly undergoing rearrangements (not shown). The genomes of the streptococci are proving to be powerful tools in the understanding of how these organisms gain new virulence genes as well as discovering new approaches to combating their pathogenic potential. Postgenomic approaches such as microarrays, proteomics, and structure–function studies will fill in many of the gaps remaining in our understanding of these organisms, particularly in identifying ORFs with unknown function, new putative virulence genes, and regulatory networks controlling gene expression. Clearly, important work remains to be completed for rational drug development aimed at newly identified target genes as well as the identification of new antigens for vaccines.

8.3 Streptococcal Genomic Bacteriophages 8.3.1 Prophages and Streptococcal Genomes

The single most striking feature of all of the sequenced GAS genomes is the number of endogenous phage genomes found in each strain, ranging from two in M49 strain NZ131 to seven or eight in M6 strain MGAS10394, depending upon whether element MGAS10394.4 is considered a real prophage genome or a transposon. A comprehensive review of the bacteriophages of group A streptococci can be found in the book Gram Positive Pathogens [77]. While temperate phages exist that target S. pneumoniae and S. agalactiae [78–80], it is remarkable that none were found in the genome strains, providing a dramatic contrast to S. pyogenes. For example, the genome of GBS NEM316 harbors 14 pathogenicity islands containing elements (especially integrases or transposases) related to bacteriophages or IS elements, but no complete or partial phage genomes [37]. Likewise, while a number of IS elements were found in the S. pneumoniae genomes, no prophage genome or significant remnant was present in either. Thus, it appears that the high frequency of phage carriage in the GAS genomes is not stochastic, but reflects a strong selective pressure for possessing the virulence genes carried by these GAS phages, and demonstrates how intimately these phages are linked to the biology of GAS.

8.3 Streptococcal Genomic Bacteriophages

8.3.2 GAS Genome Prophages 8.3.2.1

Prophages and Virulence Factors

The number and type of phages found in each GAS genome varies, as do the associated virulence genes. The insertion sites for the various phage genomes in the common GAS genome backbone is shown in Fig. 8.3, with several having the same insertion site. Although phage genomes are found throughout the bacterial chromosome, there is a prevalence of insertion near the terminus region on the lagging strand. Virulence genes are found associated with multiple attachment sites, and it is evident that frequent genetic rearrangements have occurred that have shuffled these virulence genes to different phage backgrounds. The phage-associated virulence genes, while dispensable for phage replication and maturation, confer novel phenotypes upon the host streptococcus that increase its fitness. A considerable range of virulence genes are carried by the genomes of GAS prophages, including superantigens and streptodornases. A charac-

Fig. 8.3 Bacteriophage locations on the GAS backbone. The locations of the genome prophages on the S. pyogenes genome are shown on a generalized GAS DNA backbone based upon the M1 genome; prophages that share the same attachment site are boxed together.

Toxin genes that are associated with a particular prophage are indicated with the phage name. The rRNA operons are indicated as white blocks, while the cluster of virulence genes flanking emm are hatched. The origin of replication is indicated (oriC).

163

164

8 Genomics of Streptococci

teristic of these genes is an atypically low G+C content relative to the rest of the genome, a feature shared with other genes of microbial pathogens transferred by horizontal gene transfer. For example, most of the pyrogenic exotoxins have a G+C content of about 30% whereas that of the genome is about 38.5%. While these atypical percentages of the virulence genes are suggestive of origination from an unknown ancestor, the actual genetic source for any of these ORFs is unknown and may also have originated within the species by gene duplication and descent. However, cryptic and sometimes expressed coding regions often are found in regions flanking the attachment sites in the genomes of temperate bacteriophages, even from those infecting nonpathogenic bacteria [81–83]. It seems possible that the origin of some virulence factors may have occurred by a process

Fig. 8.4 Phylogram of the S. pyogenes-identified genome prophages and the related S. thermophilus temperate phages. The phylogenetic tree analysis of the genome phages shows probable groups with related evolutionary histories. The prophages of GAS are

indicated in black and the S. thermophilus phages in red. Multiple sequence alignment of the prophage genomes was done using ClustalX [86]. The genomes of S. thermophilus have been previously reported [85, 87–90].

8.3 Streptococcal Genomic Bacteriophages

of selection for function from preexisting phage genes. The relationships seen between some prophages of S. pyogenes and S. thermophilus (Fig. 8.4) raises the possibility that prototoxin genes may have arisen in a separate genera and reached GAS by horizontal transfer [84, 85]. Many of the genome prophages contain mutations or deletions that prevent the phage from entering the lytic cycle, thus fixing the prophage DNA in the streptococcal genome. The inactivation of a prophage while maintaining the associated virulence gene may be a frequent and evolutionarily favored event, preserving the benefit to the streptococcus conferred by the toxin while eliminating the danger of prophage induction and subsequent lysis. Frequent examples can be found in the GAS genome prophages where either a genetic defect (inactivation of the portal protein in phage SF370.2) or large genome deletion (MGAS10394.7) has fixed the associated toxin gene as a permanent element of the host. Homologous recombination with another phage genome could rescue the disabled prophage or recombine to generate a novel, functional phage. If the forces that lead to prophage decay are constant over time, then inactivated prophages may be long-term residents of their host, and conversely, the fully inducible phages may have been recently acquired. The number of inducible prophage genomes in the M3 MGAS315 strain may reflect this strain’s status as a recently emerged clone [6]. The emergence of this highly virulent strain has been proposed to have resulted from the sequential addition of prophages (MGAS315.5, acquired in the 1920s; MGAS315.2, acquired around 1940; and MGAS315.4, acquired around 1980) to generate the current M3 clone [91]. The inducible status of these phages suggests that the timeframe for maintenance of intact prophages in GAS may be measured in decades. The M6 MGAS10394 strain has, by contrast, a large number of phage remnants, suggesting that the toxins associated with these prophages have been long-established host features.

8.3.2.2 Prophage Attachment Sites and Host Biology Prophage integration occurs by a Campbell-type homologous exchange between a common sequence shared between the phage and host chromosomes (attP and attB, respectively). These duplications are of varying lengths, ranging from a few bases to nearly 100 bp and often represent coding regions of the host genome [92, 93]. In the S. pyogenes genome prophages, the identifiable duplications range from 12 bp in phage MGAS10394.1 to around 96 bp in T12 and the phages that share its attachment site [94]. In most bacterial species, the majority of duplications occur at the 3¢ end of some host target gene, and integration either leaves the target gene intact (via the duplication) or, in at least one case, provides an alternative carboxy terminus for the specified protein [92, 95]. In the S. pyogenes prophages, integration and duplication occurs frequently at either the 5¢ and 3¢ end of genes. Integration into the 5¢ end of a gene or its predicted promoter is an unusual strategy, potentially altering host gene function. Genes that are targeted for 5¢ integration in GAS include methyl-directed mismatch repair protein mutL, a dipeptidase gene, recombination protein recX, proB, a HAD-like hydrolase, and a conserved

165

166

8 Genomics of Streptococci

hypothetical protein. Other prophages appear to target the promoter region immediately upstream of the actual ORF, including the promoter regions of yesN, c-glutamyl kinase, and an iron-dependent repressor. Similarly, the transposon MGAS10394.4 with some prophage features separates the comE operon proteins 2 and 3, probably creating a polar mutation that silences protein 3. In all of these cases, the insertion of a prophage at the 5¢ end of a gene has the potential for altering host gene expression by introducing a polar mutation or an alternative promoter, and such transcription-altering prophages may be an important class of genetic regulatory elements in S. pyogenes in modifying the biology of their hosts.

8.3.2.3 Prophage Diversity Pregenomic studies predicted that considerable diversity existed in GAS phages, with phages having a variety of toxin genes integrating into the bacterial chromosome at multiple sites [94, 96–98]. The GAS genome sequences have confirmed these predictions, and further, have revealed that virulence genes have often been reassorted with functional phage modules for integration and regulation or maturation and packaging to increase diversity. As shown in Fig. 8.3, the results of phage diversification have placed some virulence genes (such as speA) at multiple integration sites on the GAS chromosome. Thus, while natural selection has favored the preservation of overall phage gene order, the modules for integration, regulation, replication, DNA packaging, structural proteins, host lysis, and virulence have been often exchanged with alleles from other phages to generate the diversity that may well account for the sudden appearance of new strains of GAS. 8.3.3 Prophages Associated with other Streptococcal Species

Temperate bacteriophages are associated with a number of the other streptococcal species and may well form part of a common pool of phage genetic material that is available for horizontal transfer of virulence and other genes. For example, S. mitis is an occasional cause of infective endocarditis, and bacterial binding of platelets is an important mechanism in the pathogenesis of this disease. Platelet binding by S. mitis is mediated by the surface proteins PblA and PblB, encoded by a lysogenic bacteriophage [66, 67]. Thus, it may well be that S. mitis shares a number of genetic characteristics with the better known pathogenic streptococci, and the analysis of the genome of this organism may reveal even further insights into potential virulence mechanisms. No similar association between a temperate phage and the transmission of virulence traits has been seen yet in S. sanguis or S. gordonii. Bacteriophages have been isolated from both S. bovis and S. suis [53– 56]; however, any role that these elements play in the dissemination of virulence genes or horizontal transfer is unknown.

References

Note added in proof Since the preparation of this chapter, two additional genomes of Group A Streptococci have been published; i.e., Sumby P, Porcella SF, Madrigal AG, Barbian KD, Virtaneva K, Ricklefs SM, Sturdevant DE, Graham MR, Vuopio-Varkila J, Hoe NP, Musser JM. Evolutionary Origin and Emergence of a Highly Successful Clone of Serotype M1 Group A Streptococcus Involved Multiple Horizontal Gene Transfer Events, J. Infect. Dis. (2005) 192:771–782; and Nicole M. Green, Stephen B. Beres, Edward A. Graviss, James E. Allison, Allison J. McGeer, Jaana Vuopio-Varkila, Rance B. LeFebvre, and James M. Musser, Genetic Diversity among Type emm28 Group A Streptococcus Strains Causing Invasive Infections and Pharyngitis, J. Clin. Microbiol. (2005) Vol. 43:4083–4091. And six additional Group B Streptococcus genomes – in one publication: Herv Tettelin, Vega Masignani, Michael J. Cieslewicz, Claudio Donati, Duccio Medini, Naomi L. Ward, Samuel V. Angiuoli, Jonathan Crabtree, Amanda L. Jones, A. Scott Durkin, Robert T. DeBoy, Tanja M. Davidsen, Marirosa Mora, Maria Scarselli, Immaculada Margarit y Ros, Jeremy D. Peterson, Christopher R. Hauser, Jaideep P. Sundaram, William C. Nelson, Ramana Madupu, Lauren M. Brinkac, Robert J. Dodson, Mary J. Rosovitz, Steven A. Sullivan, Sean C. Daugherty, Daniel H. Haft, Jeremy Selengut, Michelle L. Gwinn, Liwei Zhou, Nikhat Zafar, Hoda Khouri, Diana Radune, George Dimitrov, Kisha Watkins, Kevin J. B. O’Connor, Shannon Smith, Teresa R. Utterback, Owen White, Craig E. Rubens, Guido Grandi, Lawrence C. Madoff, Dennis L. Kasper, John L. Telford, Michael R. Wessels, Rino Rappuoli, and Claire M. Fraser. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pangenome”. PNAS (2005) 102: 13950–13955.

References 1 Koonin, E. V. 2003. Comparative geno-

4 Fischetti, V. A., R. P. Novick, J. J. Fer-

mics, minimal gene–sets and the last universal common ancestor. Nat Rev Microbiol 1: 127–136. 2 Lancefield, R. C. 1928. The antigenic complex of Streptococcus haemolyticus. III. Chemical and immunological properties of the species-specific substance. J Exp Med 47: 481–491. 3 Kawamura, Y., X. G. Hou, F. Sultana, H. Miura, and T. Ezaki. 1995. Determination of 16S rRNA sequences of Streptococcus mitis and Streptococcus gordonii and phylogenetic relationships among members of the genus Streptococcus. Int J Syst Bacteriol 45: 406–408.

retti, D. A. Portnoy, and J. I. Rood, editors. 2005. Gram–Positive Pathogens. ASM Press, Washington, DC 5 Ferretti, J. J., W. M. McShan, D. Ajdic, D. J. Savic, G. Savic, K. Lyon, C. Primeaux, S. Sezate, A. N. Suvorov, S. Kenton, H. Lai, S. Lin, Y. Qian, H. G. Jia, F. Z. Najar, Q. Ren, H. Zhu, L. Song, J. White, X. Yuan, S. W. Clifton, B. A. Roe, and R. McLaughlin. 2001. Complete genome sequence of an M1 strain of Streptococcus pyogenes. Proc Natl Acad Sci USA 98: 4658–4663. 6 Beres, S. B., G. L. Sylva, K. D. Barbian, B. Lei, J. S. Hoff, N. D. Mammarella,

167

168

8 Genomics of Streptococci M. Y. Liu, J. C. Smoot, S. F. Porcella, L. D. Parkins, D. S. Campbell, T. M. Smith, J. K. McCormick, D. Y. Leung, P. M. Schlievert, and J. M. Musser. 2002. Genome sequence of a serotype M3 strain of group A Streptococcus: phage-encoded toxins, the high-virulence phenotype, and clone emergence. Proc Natl Acad Sci U S A 99: 10078– 10083. 7 Nakagawa, I., K. Kurokawa, A. Yamashita, M. Nakata, Y. Tomiyasu, N. Okahashi, S. Kawabata, K. Yamazaki, T. Shiba, T. Yasunaga, H. Hayashi, M. Hattori, and S. Hamada. 2003. Genome sequence of an M3 strain of Streptococcus pyogenes reveals a large-scale genomic rearrangement in invasive strains and new insights into phage evolution. Genome Res 13: 1042–1055. 8 Banks, D. J., S. F. Porcella, K. D. Barbian, S. B. Beres, L. E. Philips, J. M. Voyich, F. R. DeLeo, J. M. Martin, G. A. Somerville, and J. M. Musser. 2004. Progress toward characterization of the group A Streptococcus metagenome: complete genome sequence of a macrolide-resistant serotype M6 strain. J Infect Dis 190: 727–738. 9 Smoot, J. C., K. D. Barbian, J. J. Van Gompel, L. M. Smoot, M. S. Chaussee, G. L. Sylva, D. E. Sturdevant, S. M. Ricklefs, S. F. Porcella, L. D. Parkins, S. B. Beres, D. S. Campbell, T. M. Smith, Q. Zhang, V. Kapur, J. A. Daly, L. G. Veasy, and J. M. Musser. 2002. Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks. Proc Natl Acad Sci U S A 99: 4668–4673. 10 Hynes, W. L., L. Hancock, and J. J. Ferretti. 1995. Analysis of a second bacteriophage hyaluronidase gene from Streptococcus pyogenes: evidence for a third hyaluronidase involved in extracellular enzymatic activity. Infect Immun 63: 3015–3020. 11 Darling, A. C., B. Mau, F. R. Blattner, and N. T. Perna. 2004. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14: 1394–1403.

12 Beres, S. B., G. L. Sylva, D. E. Sturde-

13

14

15

16

17

18

19

vant, C. N. Granville, M. Liu, S. M. Ricklefs, A. R. Whitney, L. D. Parkins, N. P. Hoe, G. J. Adams, D. E. Low, F. R. DeLeo, A. McGeer, and J. M. Musser. 2004. Genome-wide molecular dissection of serotype M3 group A Streptococcus strains causing two epidemics of invasive infections. Proc Natl Acad Sci U S A 101: 11833–11838. Reid, S. D., N. P. Hoe, L. M. Smoot, and J. M. Musser. 2001. Group A Streptococcus: allelic variation, population genetics, and host–pathogen interactions. J Clin Invest 107: 393–399. Stockbauer, K. E., D. Grigsby, X. Pan, Y. X. Fu, L. M. Mejia, A. Cravioto, and J. M. Musser. 1998. Hypervariability generated by natural selection in an extracellular complement-inhibiting protein of serotype M1 strains of group A Streptococcus. Proc Natl Acad Sci U S A 95: 3128–3133. Lukomski, S., K. Nakashima, I. Abdi, V. J. Cipriano, B. J. Shelvin, E. A. Graviss, and J. M. Musser. 2001. Identification and characterization of a second extracellular collagen-like protein made by group A Streptococcus: control of production at the level of translation. Infect Immun 69: 1729–1738. Huang, T. T., H. Malke, and J. J. Ferretti. 1989. Heterogeneity of the streptokinase gene in group A streptococci. Infect Immun 57: 502–506. Musser, J. M., A. R. Hauser, M. H. Kim, P. M. Schlievert, K. Nelson, and R. K. Selander. 1991. Streptococcus pyogenes causing toxic-shock-like syndrome and other invasive diseases: clonal diversity and pyrogenic exotoxin expression. Proc Natl Acad Sci USA 88: 2668–2672. Stockbauer, K. E., L. Magoun, M. Liu, E. H. Burns, Jr., S. Gubba, S. Renish, X. Pan, S. C. Bodary, E. Baker, J. Coburn, J. M. Leong, and J. M. Musser. 1999. A natural variant of the cysteine protease virulence factor of group A Streptococcus with an arginine–glycine–aspartic acid (RGD) motif preferentially binds human integrins avb3 and aIIbb3. Proc Natl Acad Sci U S A 96: 242–247. McGregor, K. F., B. G. Spratt, A. Kalia, A. Bennett, N. Bilek, B. Beall, and D. E.

References Bessen. 2004. Multilocus sequence typing of Streptococcus pyogenes representing most known emm types and distinctions among subpopulation genetic structures. J Bacteriol 186: 4285–4294. 20 Hynes, W. 2004. Virulence factors of the group A streptococci and genes that regulate their expression. Front Biosci 9: 3399–3433. 21 Nagiec, M. J., B. Lei, S. K. Parker, M. L. Vasil, M. Matsumoto, R. M. Ireland, S. B. Beres, N. P. Hoe, and J. M. Musser. 2004. Analysis of a novel prophageencoded group A Streptococcus extracellular phospholipase A(2). J Biol Chem 279: 45909–45918. 22 Collin, M., and A. Olsen. 2001. EndoS, a novel secreted protein from Streptococcus pyogenes with endoglycosidase activity on human IgG. EMBO J 20: 3046– 3055. 23 Coye, L. H., and C. M. Collins. 2004. Identification of SpyA, a novel ADPribosyltransferase of Streptococcus pyogenes. Mol Microbiol 54: 89–98. 24 Terao, Y., S. Kawabata, E. Kunitomo, I. Nakagawa, and S. Hamada. 2002. Novel laminin-binding protein of Streptococcus pyogenes, Lbp, is involved in adhesion to epithelial cells. Infect Immun 70: 993–997. 25 von Pawel-Rammingen, U., B. P. Johansson, and L. Bjorck. 2002. IdeS, a novel streptococcal cysteine proteinase with unique specificity for immunoglobulin G. EMBO J 21: 1607–1615. 26 Beall, B., G. Gherardi, M. Lovgren, R. R. Facklam, B. A. Forwick, and G. J. Tyrrell. 2000. emm and sof gene sequence variation in relation to serological typing of opacity-factor-positive group A streptococci. Microbiology 146( Pt 5): 1195– 1209. 27 Hendrix, R. W., J. G. Lawrence, G. F. Hatfull, and S. Casjens. 2000. The origins and ongoing evolution of viruses. Trends Microbiol 8: 504–508. 28 McCornick, J. K., Peterson, M. L., and Schlievert, A. M. 2005. Toxins and Superantigens of Group A Streptococci. In: Gram–Positive Pathogens. Fischetti, V. A., R. P. Novick, J. J. Ferretti, D. A. Portnoy, and J. I. Rood, editors. ASM Press, Washington, DC

29 Aziz, R. K., S. A. Ismail, H. W. Park,

and M. Kotb. 2004. Post-proteomic identification of a novel phage-encoded streptodornase, Sda1, in invasive M1T1 Streptococcus pyogenes. Mol Microbiol 54: 184–197. 30 Malke, H. 1979. Conjugal transfer of plasmids determining resistance to macrolides, lincosomides and streptogramin-B type antibiotics among group A, B, D and H streptococci. FEMS Microbiol Lett 5: 335–338. 31 Hidalgo–Grass, C., M. Ravins, M. DanGoor, J. Jaffe, A. E. Moses, and E. Hanski. 2002. A locus of group A Streptococcus involved in invasive disease and DNA transfer. Mol Microbiol 46: 87–99. 32 Berge, A., M. Rasmussen, and L. Bjorck. 1998. Identification of an insertion sequence located in a region encoding virulence factors of Streptococcus pyogenes. Infect Immun 66: 3449–3453. 33 Santagati, M., F. Iannelli, C. Cascone, F. Campanile, M. R. Oggioni, S. Stefani, and G. Pozzi. 2003. The novel conjugative transposon tn1207.3 carries the macrolide efflux gene mef(A) in Streptococcus pyogenes. Microb Drug Resist 9: 243–247. 34 Hacker, J., U. Hentschel, and U. Dobrindt. 2003. Prokaryotic chromosomes and disease. Science 301: 790– 793. 35 Wren, B. W. 2000. Microbial genome analysis: insights into virulence, host adaptation and evolution. Nat Rev Genet 1: 30–39. 36 Chaffin, D. O., S. B. Beres, H. H. Yim, and C. E. Rubens. 2000. The serotype of type Ia and III group B streptococci is determined by the polymerase gene within the polycistronic capsule operon. J Bacteriol 182: 4466–4477. 37 Glaser, P., C. Rusniok, C. Buchrieser, F. Chevalier, L. Frangeul, T. Msadek, M. Zouine, E. Couve, L. Lalioui, C. Poyart, P. Trieu–Cuot, and F. Kunst. 2002. Genome sequence of Streptococcus agalactiae, a pathogen causing invasive neonatal disease. Mol Microbiol 45: 1499–1513. 38 Tettelin, H., V. Masignani, M. J. Cieslewicz, J. A. Eisen, S. Peterson, M. R.

169

170

8 Genomics of Streptococci

39

40

41

42

43

44

45

Wessels, I. T. Paulsen, K. E. Nelson, I. Margarit, T. D. Read, L. C. Madoff, A. M. Wolf, M. J. Beanan, L. M. Brinkac, S. C. Daugherty, R. T. DeBoy, A. S. Durkin, J. F. Kolonay, R. Madupu, M. R. Lewis, D. Radune, N. B. Fedorova, D. Scanlan, H. Khouri, S. Mulligan, H. A. Carty, R. T. Cline, S. E. Van Aken, J. Gill, M. Scarselli, M. Mora, E. T. Iacobini, C. Brettoni, G. Galli, M. Mariani, F. Vegni, D. Maione, D. Rinaudo, R. Rappuoli, J. L. Telford, D. L. Kasper, G. Grandi, and C. M. Fraser. 2002. Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae. Proc Natl Acad Sci U S A 99: 12391–12396. Haug, R. H., R. Gudding, and G. Bakken. 1981. Serotyping and bacteriophage typing of human and bovine group-B streptococci. J Med Microbiol 14: 479–482. Russell, H., N. L. Norcross, and D. E. Kahn. 1969. Isolation and characterization of Streptococcus agalactiae bacteriophage. J Gen Virol 5: 315–317. Stringer, J. 1980. The development of a phage-typing system for group-B streptococci. J Med Microbiol 13: 133–143. Boyer, K., L. Vogel, S. Gotoff, C. Gadzala, J. Stringer, and W. Maxted. 1980. Nosocomial transmission of bacteriophage type 7/11/12 group B streptococci in a special care nursery. Am J Dis Child 134: 964–966. Franken, C., G. Haase, C. Brandt, J. Weber-Heynemann, S. Martin, C. Lammler, A. Podbielski, R. Lutticken, and B. Spellerberg. 2001. Horizontal gene transfer and host specificity of beta-haemolytic streptococci: the role of a putative composite transposon containing scpB and lmb. Mol Microbiol 41: 925–935. Malke. H. 2005. Genetics and Pathogenicity Factors of Group C and G Streptococci. In: Gram-Positive Pathogens. Fischetti, V. A., R. P. Novick, J. J. Ferretti, D. A. Portnoy, and J. I. Rood, editors. ASM Press, Washington, DC Carmeli, Y., and K. L. Ruoff. 1995. Report of cases of and taxonomic considerations for large-colony-forming

46

47

48

49

50

51

52

53

Lancefield group C streptococcal bacteremia. J Clin Microbiol 33: 2114–2117. Sachse, S., P. Seidel, D. Gerlach, E. Gunther, J. Rodel, E. Straube, and K. H. Schmidt. 2002. Superantigen-like gene(s) in human pathogenic Streptococcus dysgalactiae, subsp equisimilis: genomic localisation of the gene encoding streptococcal pyrogenic exotoxin G (speG(dys)). FEMS Immunol Med Microbiol 34: 159–167. Igwe, E. I., P. L. Shewmaker, R. R. Facklam, M. M. Farley, C. van Beneden, and B. Beall. 2003. Identification of superantigen genes speM, ssa, and smeZ in invasive strains of beta-hemolytic group C and G streptococci recovered from humans. FEMS Microbiol Lett 229: 259–264. Proft, T., P. D. Webb, V. Handley, and J. D. Fraser. 2003. Two novel superantigens found in both group A and group C Streptococcus. Infect Immun 71: 1361– 1369. Oliver, S. P., R. A. Almeida, and L. F. Calvinho. 1998. Virulence factors of Streptococcus uberis isolated from cows with mastitis. Zentralbl Veterinarmed B 45: 461–471. Gase, K., J. J. Ferretti, C. Primeaux, and W. M. McShan. 1999. Identification, cloning, and expression of the CAMP factor gene (cfa) of group A streptococci. Infect Immun 67: 4725–4731. Rosey, E. L., R. A. Lincoln, P. N. Ward, R. J. Yancey, Jr., and J. A. Leigh. 1999. PauA: a novel plasminogen activator from Streptococcus uberis. FEMS Microbiol Lett 178: 27–33. Staats, J. J., B. L. Plattner, G. C. Stewart, and M. M. Changappa. 1999. Presence of the Streptococcus suis suilysin gene and expression of MRP and EF correlates with high virulence in Streptococcus suis type 2 isolates. Vet Microbiol 70: 201–211. Harel, J., G. Martinez, A. Nassar, H. Dezfulian, S. J. Labrie, R. Brousseau, S. Moineau, and M. Gottschalk. 2003. Identification of an inducible bacteriophage in a virulent strain of Streptococcus suis serotype 2. Infect Immun 71: 6104–6108.

References 54 Tarakanov, B. V. 1996. [Biology of lyso-

genic strains of Streptococcus bovis and virulent mutants of their temperate phages]. Mikrobiologiia 65: 656–662. In Russian. 55 Klieve, A. V., G. L. Heck, M. A. Prance, and Q. Shu. 1999. Genetic homogeneity and phage susceptibility of ruminal strains of Streptococcus bovis isolated in Australia. Lett Appl Microbiol 29: 108– 112. 56 Styriak, I., P. Pristas, and P. Javorsky. 1998. Lack of surface receptors not restriction-modification system determines F4 phage resistance in Streptococcus bovis II/1. Folia Microbiol (Praha) 43: 35–38. 57 Hoskins, J., W. E. Alborn, Jr., J. Arnold, L. C. Blaszczak, S. Burgett, B. S. DeHoff, S. T. Estrem, L. Fritz, D. J. Fu, W. Fuller, C. Geringer, R. Gilmour, J. S. Glass, H. Khoja, A. R. Kraft, R. E. Lagace, D. J. LeBlanc, L. N. Lee, E. J. Lefkowitz, J. Lu, P. Matsushima, S. M. McAhren, M. McHenney, K. McLeaster, C. W. Mundy, T. I. Nicas, F. H. Norris, M. O’Gara, R. B. Peery, G. T. Robertson, P. Rockey, P. M. Sun, M. E. Winkler, Y. Yang, M. Young–Bellido, G. Zhao, C. A. Zook, R. H. Baltz, S. R. Jaskunas, P. R. Rosteck, Jr., P. L. Skatrud, and J. I. Glass. 2001. Genome of the bacterium Streptococcus pneumoniae strain R6. J Bacteriol 183: 5709–5717. 58 Tettelin, H., K. E. Nelson, I. T. Paulsen, J. A. Eisen, T. D. Read, S. Peterson, J. Heidelberg, R. T. DeBoy, D. H. Haft, R. J. Dodson, A. S. Durkin, M. Gwinn, J. F. Kolonay, W. C. Nelson, J. D. Peterson, L. A. Umayam, O. White, S. L. Salzberg, M. R. Lewis, D. Radune, E. Holtzapple, H. Khouri, A. M. Wolf, T. R. Utterback, C. L. Hansen, L. A. McDonald, T. V. Feldblyum, S. Angiuoli, T. Dickinson, E. K. Hickey, I. E. Holt, B. J. Loftus, F. Yang, H. O. Smith, J. C. Venter, B. A. Dougherty, D. A. Morrison, S. K. Hollingshead, and C. M. Fraser. 2001. Complete genome sequence of a virulent isolate of Streptococcus pneumoniae. Science 293: 498–506. 59 Claverys, J. P., M. Prudhomme, I. Mortier-Barriere, and B. Martin. 2000. Adaptation to the environment: Streptococcus

pneumoniae, a paradigm for recombination-mediated genetic plasticity? Mol Microbiol 35: 251–259. 60 Martin–Galiano, A. J., J. M. Wells, and A. G. de la Campa. 2004. Relationship between codon biased genes, microarray expression values and physiological characteristics of Streptococcus pneumoniae. Microbiology 150: 2313–2325. 61 Dowson, C. G., T. J. Coffey, and B. G. Spratt. 1994. Origin and molecular epidemiology of penicillin-binding-proteinmediated resistance to beta-lactam antibiotics. Trends Microbiol 2: 361–366. 62 Hakenbeck, R., N. Balmelle, B. Weber, C. Gardes, W. Keck, and A. de Saizieu. 2001. Mosaic genes and mosaic chromosomes: intra- and interspecies genomic variation of Streptococcus pneumoniae. Infect Immun 69: 2477–2486. 63 Matsushita, K., W. Fujimaki, H. Kato, T. Uchiyama, H. Igarashi, H. Ohkuni, S. Nagaoka, M. Kawagoe, S. Kotani, and H. Takada. 1995. Immunopathological activities of extracellular products of Streptococcus mitis, particularly a superantigenic fraction. Infect Immun 63: 785–793. 64 Lu, H. Z., X. H. Weng, B. Zhu, H. Li, Y. K. Yin, Y. X. Zhang, D. W. Haas, and Y. W. Tang. 2003. Major outbreak of toxic shock-like syndrome caused by Streptococcus mitis. J Clin Microbiol 41: 3051–3055. 65 Neeleman, C., C. H. Klaassen, D. M. Klomberg, H. A. de Valk, and J. W. Mouton. 2004. Pneumolysin is a key factor in misidentification of macrolideresistant Streptococcus pneumoniae and is a putative virulence factor of S. mitis and other streptococci. J Clin Microbiol 42: 4355–4357. 66 Siboo, I. R., B. A. Bensing, and P. M. Sullam. 2003. Genomic organization and molecular characterization of SM1, a temperate bacteriophage of Streptococcus mitis. J Bacteriol 185: 6968–6975. 67 Bensing, B. A., I. R. Siboo, and P. M. Sullam. 2001. Proteins PblA and PblB of Streptococcus mitis, which promote binding to human platelets, are encoded within a lysogenic bacteriophage. Infect Immun 69: 6186–6192.

171

172

8 Genomics of Streptococci 68 Whiley, R. A., and D. Beighton. 1998.

Current classification of the oral streptococci. Oral Microbiol Immunol 13: 195– 216. 69 Whiley, R. A., D. Beighton, T. G. Winstanley, H. Y. Fraser, and J. M. Hardie. 1992. Streptococcus intermedius, Streptococcus constellatus, and Streptococcus anginosus (the Streptococcus milleri group): association with different body sites and clinical infections. J Clin Microbiol 30: 243–244. 70 Whiley, R. A., L. M. Hall, J. M. Hardie, and D. Beighton. 1999. A study of small-colony, beta-haemolytic, Lancefield group C streptococci within the anginosus group: description of Streptococcus constellatus subsp. pharyngis subsp. nov., associated with the human throat and pharyngitis. Int J Syst Bacteriol 49 Pt 4: 1443–1449. 71 Nagamune, H., R. A. Whiley, T. Goto, Y. Inai, T. Maeda, J. M. Hardie, and H. Kourai. 2000. Distribution of the intermedilysin gene among the anginosus group streptococci and correlation between intermedilysin production and deep-seated infection with Streptococcus intermedius. J Clin Microbiol 38: 220– 226. 72 Tapp, J., M. Thollesson, and B. Herrmann. 2003. Phylogenetic relationships and genotyping of the genus Streptococcus by sequence determination of the RNase P RNA gene, rnpB. Int J Syst Evol Microbiol 53: 1861–1871. 73 Bolotin, A., B. Quinquis, P. Renault, A. Sorokin, S. D. Ehrlich, S. Kulakauskas, A. Lapidus, E. Goltsman, M. Mazur, G. D. Pusch, M. Fonstein, R. Overbeek, N. Kyprides, B. Purnelle, D. Prozzi, K. Ngui, D. Masuy, F. Hancy, S. Burteau, M. Boutry, J. Delcour, A. Goffeau, and P. Hols. 2004. Complete sequence and comparative genome analysis of the dairy bacterium Streptococcus thermophilus. Nat Biotechnol 22: 1554–1558. 74 Ajdic, D., W. McShan, R. McLaughlin, G. Savic, J. Chang, M. Carson, C. Primeaux, R. Tian, S. Kenton, H. Jia, S. Lin, Y. Qian, S. Li, H. Zhu, F. Najar, H. Lai, J. White, B. Roe, and J. Ferretti. 2002. Genome sequence of Streptococcus mutans UA159, a cariogenic dental

75

76

77

78

79

80

81

82

83

pathogen. Proc Natl Acad Sci U S A 99: 14434–14439. Schleifer, K. H., and R. Kilpper-Balz. 1984. Transfer of Streptococcus faecalis and Streptococcus faecium to the genus Enterococcus nom. rev. as Enterococcus faecalis comb. nov. and Enterococcus faecium comb. nov. Int J Syst Bacteriol 34: 31–34. Steiner, K., and H. Malke. 2000. Life in protein-rich environments: the relA-independent response of Streptococcus pyogenes to amino acid starvation. Mol Microbiol 38: 1004–1016. McShan, W. M. 2005. The Bacteriophages of Group A Streptococci. In: Gram-Positive Pathogens. Fischetti, V. A., R. P. Novick, J. J. Ferretti, D. A. Portnoy, and J. I. Rood (eds.). ASM Press, Washington, DC Pritchard, D. G., S. Dong, J. R. Baker, and J. A. Engler. 2004. The bifunctional peptidoglycan lysin of Streptococcus agalactiae bacteriophage B30. Microbiology 150: 2079–2087. Obregon, V., P. Garcia, R. Lopez, and J. L. Garcia. 2003. VO1, a temperate bacteriophage of the type 19A multiresistant epidemic 8249 strain of Streptococcus pneumoniae: analysis of variability of lytic and putative C5 methyltransferase genes. Microb Drug Resist 9: 7–15. Obregon, V., J. L. Garcia, E. Garcia, R. Lopez, and P. Garcia. 2003. Genome organization and molecular analysis of the temperate bacteriophage MM1 of Streptococcus pneumoniae. J Bacteriol 185: 2362–2368. Boyce, J. D., B. E. Davidson, and A. J. Hillier. 1995. Identification of prophage genes expressed in lysogens of the Lactococcus lactis bacteriophage BK5-T. Appl Environ Microbiol 61: 4099–4104. Ventura, M., C. Canchaya, M. Kleerebezem, W. M. de Vos, R. J. Siezen, and H. Brussow. 2003. The prophage sequences of Lactobacillus plantarum strain WCFS1. Virology 316: 245–255. Ventura, M., S. Foley, A. Bruttin, S. C. Chennoufi, C. Canchaya, and H. Brussow. 2002. Transcription mapping as a tool in phage genomics: the case of the temperate Streptococcus thermophilus phage Sfi21. Virology 296: 62–76.

References 84 Canchaya, C., F. Desiere, W. McShan,

J. Ferretti, J. Parkhill, and H. Brussow. 2002. Genome analysis of an inducible prophage and prophage remnants integrated in the Streptococcus pyogenes strain SF370. Virology 302: 245–258. 85 Desiere, F., W. M. McShan, D. van Sinderen, J. J. Ferretti, and H. Brussow. 2001. Comparative genomics reveals close genetic relationships between phages from dairy bacteria and pathogenic Streptococci: evolutionary implications for prophage–host interactions. Virology 288: 325–341. 86 Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673–4680. 87 Proux, C., D. van Sinderen, J. Suarez, P. Garcia, V. Ladero, G. F. Fitzgerald, F. Desiere, and H. Brussow. 2002. The dilemma of phage taxonomy illustrated by comparative genomics of Sfi21-like Siphoviridae in lactic acid bacteria. J Bacteriol 184: 6026–6036. 88 Lucchini, S., F. Desiere, and H. Brussow. 1999. Comparative genomics of Streptococcus thermophilus phage species supports a modular evolution theory. J Virol 73: 8647–8656. 89 Stanley, E., G. F. Fitzgerald, C. Le Marrec, B. Fayard, and D. van Sinderen. 1997. Sequence analysis and characterization of phi O1205, a temperate bacteriophage infecting Streptococcus thermophilus CNRZ1205. Microbiology 143(Pt 11): 3417–3429. 90 Tremblay, D. M., and S. Moineau. 1999. Complete genomic sequence of the lytic

bacteriophage DT1 of Streptococcus thermophilus. Virology 255: 63–76. 91 Banks, D. J., S. B. Beres, and J. M. Musser. 2002. The fundamental contribution of phages to GAS evolution, genome diversification and strain emergence. Trends Microbiol 10: 515–521. 92 Campbell, A. M. 1992. Chromosomal insertion sites for phages and plasmids. J Bacteriol 174: 7495–7499. 93 Groth, A. C., and M. P. Calos. 2004. Phage integrases: biology and applications. J Mol Biol 335: 667–678. 94 McShan, W. M., Y.-F. Tang, and J. J. Ferretti. 1997. Bacteriophage T12 of Streptococcus pyogenes integrates into the gene for a serine tRNA. Mol Microbiol 23: 719–728. 95 Campbell, A., S. J. Schneider, and B. Song. 1992. Lambdoid phages as elements of bacterial genomes. Genetica 86: 259–267. 96 McShan, W. M., and J. J. Ferretti. 1997. Genetic diversity in temperate bacteriophages of Streptococcus pyogenes: identification of a second attachment site for phages carrying the erythrogenic toxin A gene. J Bacteriol 179: 6509–6511. 97 Yu, C.–E., and J. J. Ferretti. 1991. Molecular characterization of new group A streptococcal bacteriophages containing the gene for streptococcal erythrogenic toxin A (speA). Mol Gen Genet 231: 161–168. 98 Yu, C.–E., and J. J. Ferretti. 1989. Molecular epidemiologic analysis of the type A streptococcal exotoxin (erythrogenic toxin) gene (speA) in clinical Streptococcus pyogenes strains. Infect Immun 57: 3715–3719.

173

175

9 Pathogenic Staphylococci: Lessons from Comparative Genomics Knut Ohlsen, Martin Eckart, Christian Httinger, and Wilma Ziebuhr

9.1 Introduction

Staphylococci are gram-positive bacteria usually living as commensals on the skin of mammals and birds. The most important species from the human point of view are Staphylococcus aureus and S. epidermidis. In recent years, both these species have become a serious health problem, causing more than 50% of all nosocomial infections. At the same time, the treatment options to combat infections due to these pathogens have dramatically declined. In particular the emergence and spread of multiresistant strains of methicillin-resistant S. aureus (MRSA) and S. epidermidis (MRSE) arouses substantial fear in physicians. Staphylococci are spherical bacteria with a diameter of approx. 1 lm. They are nonmotile, do not form spores, and are unencapsulated or form a special type of capsule (approx. 75% of clinical S. aureus isolates form a uronic acid-containing microcapsule). Most species are facultative anaerobes and react positively to catalase and benzidine tests. Staphylococci appear under the microscope as grapeshaped clusters, for which reason the Scottish surgeon Ogston named the bacteria in 1881 after the Greek words for grape (staphyle) and berry (kokkos). In a first attempt to classify staphylococci 3 years later, Rosenbach distinguished between the yellow-orange pigmented S. aureus and the white S. albus. The latter was later renamed S. epidermidis. Rosenbach recognized that S. aureus is the pathogenic form responsible for wound infections and furunculosis, and that S. epidermidis is a normal colonizer of the skin [1]. This simple classification reflected for a long time a useful categorization of staphylococci in pathogenic and nonpathogenic forms. Later again, staphylococci were classified on the basis of their ability to clot plasma and divided into coagulase-positive and coagulase-negative species. Until the early 1970s, the genus Staphylococcus consisted of three species: the coagulasepositive species S. aureus and the coagulase-negative species S. epidermidis and S. saprophyticus. Molecular typing methods and biochemical properties of staphylococci have led to the identification of many new staphylococcal species. Currently, 36 species and several subspecies belong to the genus Staphylococcus [2].

176

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

Most species are nonpathogenic, but the coagulase-positive species S. aureus, S. intermedius, S. delphini, S. schleiferi subsp. coagulans, and the coagulase-variable species S. hyicus have been recognized as potentially serious pathogens. Until recently, the coagulase-negative staphylococcal species (CoNS) have been regarded as nonpathogenic constituents of the normal microflora in humans, but the proportion of CoNS causing nosocomial infections has been steadily increasing during the last 20 years. S. epidermidis especially is now a leading cause of bacteremia in hospitals [3]. The growth in numbers of infections by S. epidermidis has been paralleled by increased use of prosthetic and indwelling devices and the growing number of immunocompromised patients. The enormous clinical relevance of staphylococci has driven several genome projects aimed at answering questions about virulence, resistance, epidemiology, genetic flexibility, and physiology. There is substantial hope that understanding the nature of staphylococcal infections can be achieved by comparing the genome sequences of the principal pathogenic species S. aureus with the genomes of less virulent forms such as S. epidermidis and the nonpathogenic species S. carnosus. In this review, we summarize the findings of comparative genomics based on the availability of six complete S. aureus genomes and two S. epidermidis genomes, focusing on virulence and resistance traits, and discuss the implications of these studies for the evolution of staphylococcal pathogenicity. Not surprisingly, the whole-genome studies revealed that, as has been described for other pathogens, staphylococci have a relative stable core genome encoding factors essential for growth in a specific environment, and a flexible gene pool encoding virulenceassociated factors, resistance determinants, and genes that confer gene mobility such as transposons, integrases, and insertion sequences. The flexible part of the genome presumably determines the virulence of a particular strain. This became especially evident through the discovery of several pathogenicity islands which carry superantigen toxins including toxic shock syndrome toxin (tst), enterotoxins (seb, sec, sem, sen, seo), several exotoxins (set cluster), and bacteriophages. However, correlation of a particular disease type with a specific genotype is still a challenge, and much efforts is still needed if we are to understand the complex nature of staphylococcal infections.

9.2 Comparative Genomics of S. aureus

S. aureus is the most pathogenic species of the genus Staphylococcus, being one of the most successful pathogens in hospitals. The types of infections caused by S. aureus range from mild superficial infections of the skin like furuncle, wound infections, and carbuncle to life-threatening infections including sepsis, endocarditis, and pneumonia [4]. Moreover, the organism is one of the leading causes of foodborne diseases [5]. S. aureus produces an extraordinary number of virulence traits such as superantigen toxins, hemolysins, adhesins, and degradative enzymes; most of them play a significant role in virulence [6]. Importantly, S. aureus easily

9.2 Comparative Genomics of S. aureus

acquires antibiotic resistance determinants which favor survival in the highly competitive environment of hospitals. Since the 1980s MRSA has emerged as a major nosocomial pathogen. In several countries including the United States, Japan, and the United Kingdom, MRSA is the causative agent of more than 50% of nosocomial infections in intensive care units [7, 8]. For these cases, the glycopeptide antibiotic vancomycin remains the drug of last resort. In 1997, however, the first case of vancomycin-intermediate S. aureus (VISA) appeared, and in 2002 the first case of a high-level vancomycin-resistant S. aureus strain expressing the vanA gene cluster of enterococci was isolated [9–12]. The high-level vancomycin resistance vanA-carrying transposon Tn1546 has been horizontally transferred from E. faecalis to S. aureus in the hospital setting [13]. 9.2.1 Overall Genome Structure

The size of the sequenced S. aureus genomes varies from 2.800 Mbp to 2.903 Mbp, encoding 2565–2721 proteins (Table 9.1) [14–17]. It has been calculated that approx. 75% of the genome is conserved among all S. aureus isolates representing the core genome [18]. This part of the genome encodes housekeeping functions such as factors involved in central metabolism. In addition, some virulence-associated factors are expressed by almost all isolates: protein A (spa), a-toxin (hla), clumping factor (clfAB), aureolysin (aur), lipase (lip), fibrinogen binding protein A (fnbA), coagulase (coa), superoxide dismutase (sodM), and the intercellular adhesin (ica) [19]. Besides comparing whole genome sequences, DNA microarray technology has been used to define the genetic diversity of S. aureus isolates of different origin [18, 20, 21]. Fitzgerald et al. examined the extent and types of genetic diversity in S. aureus using a collection of strains representing the most abundant lineages identified by multilocus enzyme electrophoresis (MLEE) [18]. In that study, the core of the genome found in all strains encompasses 2198 open reading frames (ORFs), representing 78% of all ORFs. Most of the dispensable genes were clustered in 18 regions of difference (RDs) that contained elements related to pathogenicity islands, phages, plasmids, transposons, and insertion sequences. Ten of these regions encode virulence factors or antibiotic resistance determinants. These results underscore the great degree of diversity in S. aureus. Interestingly, several RDs vary extensively in gene content and size, suggesting that multiple deletions and integration and recombination events have occurred during the evolution of the strains. In a similar approach, the genetic diversity between strains of different origins including sepsis isolates, wound isolates, isolates from mucoviscidosis (cystic fibrosis) patients, and carrier isolates from the nose were compared [22]. Here, it was found that 14 RDs were scattered throughout the genome. These regions encompass the mecA genomic island, the pathogenicity and genomic islands SaPIn1, SaPIn2, SaPIn3, a phage region (˘ 315), the capsule gene cluster, and cluster of genes of unknown function. Interestingly, the differences between the strains were almost completely restricted to these regions; the sequences outside of these regions were highly conserved. For some of the

177

178

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

regions putative mobility factors like integrases, transposases, and IS elements could be identified, but for others the reasons for the great extent of variations remains unknown. The clustering of genes/ORFs which are absent in different isolates suggests a common mechanism for genome stability and flexibility. Apart from integration and excision of staphylococcal phages (see below), the processes leading to small or large genome variations are less understood. For some pathogenicity islands (SaPIn2 and SaPInbov) genetic transfer has been experimentally demonstrated, revealing a phage-like mechanism [23–25]. In addition, IS elements like IS256 confer recombination events leading to gene deletion, inversion, and integration [26]. Several multilocus sequencing (MLST) studies were performed to analyze clonal structures of S. aureus. Interestingly, clinical isolates have a highly clonal population structure. For example, 87% of strains in hospitals and the community belong to 11 clonal complexes (CCs) [27]. Furthermore, MLST has been employed to analyze clonal relationships of MRSA. Here, it became evident that the SCCmec genomic island was introduced in only five clonal complexes [28]. Moreover, MLST analysis of the seven sequenced S. aureus strains has shown that six strains are closely related (COL, 8325, N315, Mu50, MW2, MSSA476). Only the strain MRSA252 belongs to an unrelated sequence type (ST) [29]. Importantly, this strain is a representative of the EMRSA-16 clonal group, the cause of 50% of the MRSA infections in the UK and one of the predominant clones spreading in the USA [30, 31]. 9.2.2 Core Genome 9.2.2.1

Metabolism

Enzymes of metabolic pathways are usually encoded within the core genome. S. aureus is capable of growth using a range of sugars as energy source. Consequently, the complete set of genes of the glycolytic and pentose phosphate pathway and the tricarboxylic acid cycle are found in the genome. Interestingly, S. aureus metabolizes lactose and galactose via the d-tagatose-6-phosphate pathway. Thus, all genes encoding enzymes of this pathway are also present in the S. aureus genome, including a phosphotransferase system to transport lactose [14]. On the other hand, no transport systems for arabinose and mannose were found. The ability of S. aureus to metabolize lactose and galactose may be important to its growth in milk; S. aureus is one the leading causes of mastitis in humans and cattle. Importantly, the major difference between S. aureus and S. epidermidis in relation to carbohydrate transport is the absence of three phosphotransferase system (PTS) transporters for mannitol, sorbitol, and pentitol in S. epidermidis [17]. Staphylococci have been recognized as well-adapted to living on mucosal surfaces and the skin by dealing with osmotic stress. To counterbalance environmental stress due to high salt conditions, S. aureus possesses seven sodium ion/proton exchangers and S. epidermidis eight. In addition, both organisms encode six trans-

23

20

6

Insertion sequences

Genomic islands

8

6

2 632

2 820 462

a Hospital-acquired MRSA. b Hospital-acquired vancomycin-intermediate resistant S. aureus. c Community-acquired MRSA.

8

2 697

Open reading frames 2 595

2 878 084

7

17

2 721

2 809 422

7

30

2 671

2 902 619

HA-MRSA UK 1997 (EMRSA-16)

6

5

2 565

2 799 802

CA-MSSA UK 1998

4

57

2 533

2 616 530

HA-MRSE USA 1979 Biofilm+

2 813 641

HA-MRSA UK 1961

Length

CA-MRSA[c] USA 1998 PVL+

HA-VISA[b] Japan 1997

4

57

2 381

2 499 279

Reference strain, USA Biofilm–

ATCC12228 [102]

HA-MRSA[a] Japan 1982

MSSA476 [15]

Background

MRSA252 [15]

RP62A [17]

COL [17]

Mu50 [14]

N315 [14]

MW2 [16]

Staphylococcus epidermidis

Staphylococcus aureus

Table 9.1 Summarized features of eight staphylococcal genomes.

9.2 Comparative Genomics of S. aureus 179

180

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

port systems for osmoprotectants: proline and glycine betaine, two mechanosensitive ion channels MscL and MscS, and two Trk potassium channels [17]. Adapted to a parasitic lifestyle, S. aureus requires a variety of anorganic ions and an organic nitrogen source for growth. The acquisition of iron in particular demands serious effort, which is reflected in the expression of six complete or partial ferric iron ABC uptake systems, four additional orphan iron binding proteins, and two ferrous iron FeoB uptake systems [17]. The organism is auxotrophic for a variable number of amino acids, chiefly arginine, valine, leucine, cysteine, and proline. Interestingly, strain N315 requires alanine, glycine, isoleucine, arginine, valine, and proline for growth, although the complete biosynthesis pathways for these amino acids were found in the genome. Obviously, regulatory effects which are not yet understood may be responsible for the phenotypic auxotrophy [17]. S. aureus uses some remarkable metabolic pathways. In contrast to other grampositive bacteria, S. aureus synthesizes isoprenoids exclusively by the mevalonate pathway of isopentenyl diphosphate synthesis. However, S. aureus also encodes orthologues of two enzymes of the non-mevalonate pathway, the function of which has not yet been defined. Two genes in the genome are homologous to ygbP/yacM, which are involved in isoprenoid biosynthesis in Escherichia coli and Bacillus subtilis, respectively. Interestingly, one of these proteins seems to be essential in S. aureus [32].

9.2.2.2 Information Pathways S. aureus encodes three sigma factors of RNA polymerase: sigA, sigB, and sigH, cor70 B responding to the r general sigma factor, the general stress sigma factor r , of H B. subtilis and the alternative sigma factor r of B. subtilis, respectively. The role of B the general stress sigma factor r in S. aureus is not completely understood. Several studies including proteome analysis, DNA microarray approaches, and promoter searching identified more than 100 genes which are under the control of SigB [33–36]. SigB seems to be involved in oxidative, alkaline, and growth-phase-dependent stress response since genes like the alkaline shock protein asp23 and several + + Na /H antiporters are under the control of SigB. Moreover, a role in virulence has been postulated on the basis of in vitro data, but in several in vivo studies no significant impact of sigB on virulence could be demonstrated [37–40]. Recently, the importance of sigB for virulence has been shown in an arthritis model [41]. Likewise, we found that a sigB mutant was attenuated to cause metastatic disease in a catheter-related infection model in rats (unpublished). Probably, SigB-dependent gene expression may be critical for virulence under specific conditions as yet unknown. Furthermore, S. aureus, and also S. epidermidis, encode 16 two-component regulatory systems involved in virulence, peptidoglycan biosynthesis, potassium transport, and environmental sensing (Table 9.2). Importantly, only one twocomponent system, the yycF/yycG system (alternatively vicK/vicR), seems to be essential in S. aureus [42]. A conserved yycG-specific DNA recognition sequence which consist of two hexamers [TGT(A/T)A(A/T/C)-5N-TGT(A/T)A(A/T/C)] was first identified in B. subtilis and later on also in S. aureus [42, 43]. Genome-wide

9.2 Comparative Genomics of S. aureus

analysis identified the recognition motif of the response regulator YycG upstream of 31 putative ORFs. Specific binding of the response regulator to the upstream recognition sequence, however, has been shown in vitro for three genes: lytM, ssa, and isaA [42, 44] Since none of these genes is essential in S. aureus, the actual essential gene(s) regulated by YycF/YycG remains to be defined. In addition to two-component regulatory systems, more than 100 transcriptional regulators including regulators with helix–turn–helix motifs, antiterminators, Fur homologues, SarA homologous regulators, Arg repressors, and an HrcA repressor are present in the genome of S. aureus. In particular, regulation of virulence factor expression is intensively studied in this pathogen. Many virulenceassociated genes are regulated in a quorum-sensing dependent manner by the agr locus encoding the accessory gene regulator. In principle, surface components like adhesins are mainly expressed in the logarithmic growth phase and extracellular proteins such as toxins and degradative enzymes in the postlogarithmic phase. The opposite expression of adhesins and toxins depends on the expression level of the regulatory RNAIII which activates toxin expression and downregulates surface adhesin production [45] In addition, a growing number of regulators like sarA, the sar homologues sarR, sarV, sarX, sarZ, sarY, sarU, sarT, sarS, rot, mgrA, and codY, two-component systems including arlRS, srrAB, saeSR, and the alternative sigma factor sigB affect the expression of certain virulence-associated genes directly or via other regulatory elements (for intensive reviews see Refs. [45, 46]). The complex nature of virulence gene regulation is further complicated by the fact that strain-specific regulatory phenomena are common in S. aureus [47–49].

9.2.2.3 Virulence Factors S. aureus expresses a plethora of putative virulence-associated factors, many of which may play a key role in virulence. Comparative genome analysis indicated that 11% of all ORFs encode cell virulence determinants [17]. In principle, virulence factors can be divided into cell-surface-associated components and secreted proteins. Members of the first group are involved in adhesion to the extracellular matrix of the host, biofilm formation, immune evasion, and inflammatory response. Members of the second group trigger cytolysis of immune and somatic cells, digestion of macromolecules, and inflammation. The question is, which virulence factors are necessary to cause an infection? In addition to genome sequencing, comparative genomics using DNA microarrays, proteome analysis, and virulence-factor screening by polymerase chain reaction (PCR) has been performed to unravel the nature of S. aureus virulence [18–20, 50]. All these studies suggest that it is not possible to predict the pathogenic potential of a strain based on the presence or absence of a particular set of virulence genes. This is probably due to the redundant nature of S. aureus virulence factors. Peacock et al. investigated the pattern of virulence factors of 334 strains and found that fnbA, sdrE, sej, eta, hlg, and cna were significantly more common in invasive isolates than in carriage isolates. Obviously, prevalent clinical isolates have a high capacity to bind host factors and produce low amounts of extracellular toxins [51]. Furthermore, it

181

SA2419

PhoR

SrrB

SA2152

KdpD

ArlS

VraS

SA2180

SA1667

SA2418

PhoP

SrrA

SA2151

KdpE

ArlR

VraR

SA2179

SA1666

CheY-like?

n.d.

phatase synthesis

Alkaline phos-

Oxygen

Virulence

n.d.

K+ transport

Autolysis

Virulence

biosynthesis

Cell wall

Competence

LuxR family

n.d.[a]

Virulence

AgrC

AgrA

Function

Histidine kinase Response regulator

regulated genes are indicated.

N315

tst, spa, agr

kdpABC

spa, agr, hla

pbp2, sgtB, murZ

SA2418

SA2419

SA1516

SA1515

SA1323

SA1322

SA2151

SA2152

SA1883

SA1882

SA1247

SA1246

SA1700

SA1701

SA2179

SA2180

SA1666

SA1667

e.g., hla, hlb, spa, seb, tst, SA1843 lukSF, spl, aur, cap SA1844

Regulated genes

SAV2625

SAV2626

SAV1693

SAV1692

SAV1492

SAV1491

SAV2361

SAV2362

SAV2079

SAV2078

SAV1415

SAV1414

SAV1884

SAV1885

SAV2391

SAV2392

SAV1848

SAV1849

SAV2039

SAV2038

Mu50

MW2545

MW2546

MW1637

MW1636

MW1446

MW1445

MW2282

MW2283

MW2003

MW2002

MW1305

MW1304

MW1824

MW1825

MW2313

MW2314

MW1789

MW1790

MW1963

MW1962

MW2

SA2646

SA2647

SA1740

SA1739

SA1535

SA1534

SA2358

SA2359

SA2071

SA2070

SA1451

SA1450

SA1942

SA1943

SA2389

SA2390

SA1905

SA1906

SA2026

SA2025

COL

Table 9.2 Two-component systems of S. aureus. Locus numbers of six sequenced S. aureus strains, putative function, and known

SAR2703

SAR2704

SAR1772

SAR1771

SAR1568

SAR1567

SAR2447

SAR2448

SAR2167

SAR2166

SAR1427

SAR1426

SAR1974

SAR1975

SAR2480

SAR2481

SAR1939

SAR1940

SAR2126

SAR2125

MRSA252

SAS2511

SAS2512

SAS1621

SAS1620

SAS1432

SAS1431

SAS2252

SAS2253

SAS1984

SAS1983

SAS1358

SAS1357

SAS1806

SAS1807

SAS2282

SAS2283

SAS1769

SAS1770

SAS1944

SAS1943

MSSA476

182

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

SA1159

LytR

SA0215

SaeR

KdpE SCCmec

VicR YycF

fnbA, cap, hla, hlb

LuxR family

n.d.

Murein hydrolysis lrgA, lrgB

AraC family

n.d.

Virulence

SA1159

SA1158

SA0251

SA0250

SA0215

SA0216

SA0661

SA0660

SA0066

SA0067

kdpABC

K+-transport

SA0018 SA0017

ssa, isaA, lytM

SA0614

SA0615

N315

biosynthesis

Peptidoglycan

Regulated genes

SAV1322

SAV1321

SAV0261

SAV0260

SAV0223

SAV0224

SAV0706

SAV0705

SAV0070

SAV0071

SAV0018

SAV0019

SAV0659

SAV0660

Mu50

MW1209

MW1208

MW0237

MW0236

MW0198

MW0199

MW0668

MW0667

n.p.

n.p.c

MW0018

MW0019

MW0621

MW0622

MW2

SA1355

SA1354

SA0246

SA0245

SA0201

SA0202

SA0766

SA0765

n.p.

n.p.

SA0019

SA0020

SA0716

SA0717

COL

Gene symbols: agr, accessory gene regulator; aur, auroelysin; cap, capsular polysaccharide synthesis; fnbA, fibronectin binding protein A; hla, a-toxin; hlb, b-toxin; isaA, immunodominant staphylococcal antigen; kdpABC, potassium-transporting ATPase A, B, C chain homologue; lrgA, holin-like protein LrgA; lrgB, holin-like protein LrgB; lukSF, Pantone–Valentine leukocidin components S and F; lytM, peptidoglycan hydrolase; murZ, UDP-N-acetylglucosamine 1-carboxylvinyl transferase 2; pbp2, penicillin binding protein 2; seb, staphylococcal enterotoxinB; sgtB, hypothetical protein, similar to penicillin-binding protein 1A/1B; spa, staphylococcal protein A; spl, serin protease; ssa, staphylococcal secretory antigen; tst, toxic shock syndrome toxin-1. n.d., not defined; n.p., not present.

SA1158

LytS

SA0216

SaeS

KdpD SCCmec

VicK YycG

n.d.

SA0615

SA0614

Function

Histidine kinase Response regulator

SAR1332

SAR1331

SAR258

SAR259

SAR214

SAR215

SAR0759

SAR0758

SAR0068

SAR0069

SAR0019

SAR0018

SAR669

SAR670

MRSA252

SAS1262

SAS1261

SAS238

SAS239

SAS198

SAS199

SAS0671

SAS0670

n.p.

n.p.

SAS0019

SAS0018

SAS624

SAS625

MSSA476

9.2 Comparative Genomics of S. aureus 183

184

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

has been suggested that the presence of the collagen binding protein (cna) may be associated with more virulent strains as many predominant lineages express cna [51, 52]. Likewise, many MSCRAMMs (microbial surface components recognizing adhesive matrix molecules) including clumping factor (ClfA and ClfB), fibronectin binding protein A (FnbA), and protein A, which binds the Fc part of antibodies, were found in most S. aureus strain [19, 20]. However, strains in which one or even more of those adhesins are lacking are also able to cause diseases. Interestingly, for some adhesins like clfA, clfB, fnbB, sdrC, and sdrD allelic variants due to a variable number of repeats have been described [20, 53]. This observation might be important in the light of antigenic variations of these adhesins by which the pathogen escapes immune defense mechanisms of the host. Most of the surface proteins of S. aureus carry a conserved C-terminal LPXTG motif by which the adhesins are attached to the cell wall by an enzyme named sortase. Both the cell wall attachment motif and the sortase enzyme seem to be conserved in all grampositive pathogens [54, 55]. Importantly, all S. aureus strains investigated so far encode the intercellular adhesin (ica) locus which is important for biofilm formation [56]. In contrast to S. epidermidis, where ica is predominantly associated with nosocomial isolates, in S. aureus the ica gene cluster is also found in carrier strains [19, 57]. Although most S. aureus isolates carry the sequence of the ica locus, only a few strains express the icaABCD genes and form a biofilm in vitro. Obviously, environmental stimuli as well as unknown regulatory elements control the ica expression in a different way from S. epidermidis. Exoenzymes of S. aureus including lipase (geh), coagulase (coa), serine proteases (splC, splD), V8 protease (V8), aureolysin (aur), and hyaluronidase (hysA) are found in most strains and can therefore be regarded as part of the core genome [17, 19]. In contrast, many toxins, especially superantigenic toxins, are located on mobile genetic elements such as pathogenicity islands or bacteriophages, consequently being part of the accessory genome (for more details see below). Only some hemolysins such as a-toxin and c-hemolysin are present in most isolates, indicating a significant function for survival of S. aureus in its ecological niche. 9.2.3 Accessory Genome

The genome of S. aureus consists of many accessory elements scattered throughout the genome accounting for approx. 25% of the whole genome sequences. Within the genomes a wide range of putative mobile DNA elements have been identified, including pathogenicity and genomic islands, up to 30 insertion sequences, five transposons, and several bacteriophages. Methicillin-resistant isolates typically carry the methicillin resistance encoding chromosomal cassette mec (SCCmec). Most of these elements have been acquired by horizontal gene transfer and some of them are mobile. Accessory genes often carry virulence genes which have substantial implications for disease types. For example, various toxin genes are associated with certain strains or lineages including toxic shock syndrome

9.2 Comparative Genomics of S. aureus

toxin tst, Panton-Valentine leukocidin, serine-protease splB, and staphylococcal superantigens sea, seg, and sei. Interestingly, many toxin genes are located on genomic and pathogenicity islands.

9.2.3.1 Pathogenicity Islands Pathogenicity islands are large chromosomal structures encoding virulence traits which have been acquired by horizontal gene transfer first described in the gramnegative bacterium E. coli [58, 59]. A typical pathogenicity island is characterized by a GC content differing from that of the core genome, the presence of one or more virulence-associated genes, the presence of mobility genes like transposases or recombinases, direct repeats at the flanking borders, and putative mobility [60]. In gram-negative bacteria, pathogenicity islands often integrate into tRNA genes and are lost with varying efficiency. Pathogenicity islands can be regarded as a subgroup of so-called genomic islands (GEIs) which have the same characteristics as pathogenicity islands but do not necessarily carry putative virulence genes. Probably, SaPIs (S. aureus pathogenicity islands) have derived from bacteriophages as they carry some phage-related genes and show interaction with phages. Importantly, most SaPIs carry superantigenic toxins therefore the presence or absence of a particular SaPI has great implications for the pathogenetic potential of a strain. Different schemes of classification of genomic islands in S. aureus have been suggested [16, 17, 29]. SaPIs or alternatively designated islands of S. aureus can be classified on the basis of integrase homology and insertion site into specific groups. The first pathogenicity island of S. aureus was discovered by Lindsay et al. and was the first pathogenicity island identified in gram-positive pathogens [25]. This 15.2-kb pathogenicity island, named SaPI1, encodes the toxic shock syndrome toxin-1 (TSST-1) which is a potent superantigen associated with most cases of menstrual toxic shock syndrome. The island is flanked by 17-bplong direct repeats and in addition to tst it carries other genes whose products are presumably involved in virulence. Like other pathogenicity islands, SaPI1 carries a locus which is homologous to members of the integrase family of recombinases of bacteriophages. Interestingly, this type of staphylococcal island has the capacity to integrate into target sites which are identical to the directly repeated sequences at the ends of SaPIs in different tst-negative strains. The SaPI1 can be excised and circulated by the helper bacteriophages U13 and 80a; following excision, the islands are transduced to other strains with high frequency. The integration of SaPI into the recipient genome by a Campbell mechanism near the tyrB locus is recA-independent, requiring the SaPI1-encoded integrase [25]. Obviously, SaPI1 uses excision, replication, and encapsidation functions of the helper phage. It has been shown that SaPI1 interferes with phage growth after its excision and is encapsidated into phage heads. Upon transduction SaPI1 integrates into the attc site, for which the SaPI1-coded integrase is necessary. SaPI1 was first identified in the clinical isolate RN4282; however, a similar type of island is also present in strain COL but not in the other five sequenced S. aureus genomes [17]. The island in strain COL was originally designated SaPI3, but it is virtually identical to SaPI1

185

186

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

as both islands share many island-specific genes, are integrated into the same target site, and carry identical 17 nt flanking directed repeats [17, 61]. An important difference, however, is that SaPI1 of strain COL carries the staphylococcal enterotoxin B gene (seb) instead of the tst gene on the island. The fact that tst and seb are located on different islands of a similar type may account for the fact that TSST-1 and SEB are presumably never coproduced [62]. SaPI2 also carries the tst gene but is located at another site of the S. aureus genome within the tryptophan locus. This island is found in many menstrual toxic shock syndrome (TSS) strains [25, 63]. In addition to tst, usually two other superantigens are located on SaPI2, staphylococcal enterotoxin L (sel) and enterotoxin C3 (sec3). This island is alternatively designated mSa4 [16], SaPIn1 (strain N315), or SaPIm1 (Mu50) [14]. As has been shown for SaPI1, SaPI2 can be induced to be excised and transduced by different phages [63]. A third type of staphylococcal island has been identified in the genomes of strain MW2, where it is designated mSa3, and Mu50, where it is designated SaGIm [16]. Although this island does not carry tst, it is closely related to SaPI1 and SaPI2 as many ORFs of unknown function are common to all of these islands. However, their integrases show significant differences in homology and mSa3 is integrated at a completely different locus in the S. aureus genome [16]. Characteristic genes on mSa3 include a siderophore transporter fhuD homologue in strain Mu50 and ear (putative b-lactamase type protein), enterotoxin L2 (sel2), and enterotoxin C4 (sec4) in strain MW2. mSa3 spontaneously excises from the chromosome and forms extracellular circles. Transduction by specific helper phages has not been discovered so far. A fourth class of island was identified in the genome of the epidemic MRSA252 strain. This island was designated SaPI4 and contains the same gene order and homologues of pathogenicity island proteins of SaPI1 (including “SaPI3” of strain COL, later classified as SaPI1), and SaPIbov. SaPI4 contains no protein with homology to characterized virulence genes. The island integrates downstream of the ribosomal protein gene rpsR [15] and comprises 25.1 kbp. In addition to these four islands, two genomic islands which also carry pathogenicity factors and two islands in bovine isolates have been described. The two genomic islands mSaa and mSab are present in all sequenced isolates, and it is very likely that they are also present in many other clinical strains, suggesting that these elements are very stable [15–17]. mSaa was originally designated SaPIn2 in strain N315 and SaPIm2 in strain Mu50, and mSab was termed SaPIn3 in strain N315 and SaPIm3 in strain Mu50. mSaa and mSab possess some specific features in comparison to other staphylococcal islands, especially to the tst island family. Mobility has not yet been described for these islands, which is consistent with the fact that an inactive transposase is located on mSaa and mSab. Moreover, both islands encode a restriction modification system hsdS and hsdM (host specificity determinants) which might be important for stabilization of the islands in the S. aureus genome [16]. Most impressively, on mSaa an exotoxin gene cluster (set) and a lipoprotein gene cluster (lip) is carried, and on mSab an enterotoxin (seg, sen, sei, sem, seo) and serine protease (splA-F) gene cluster [15, 16, 29, 64]. mSaa and

9.2 Comparative Genomics of S. aureus

mSab exist in different allelic combinations in the S. aureus genomes. For example, the superantigen gene cluster carried by mSab of N315, Mu50, and MRSA252 is missing in MW2 and MSSA476, which instead have a bacteriocin gene cluster bsa [15]. The presence of this bacteriocin might be of advantage for strains in the community as these strains compete with many other species on the skin and mucosal surfaces. Three families of mSaa and mSab genomic islands have been classified on the basis of allelic variations of the hsdS gene which determines target-specific methylation and occurs in three allelic forms. The HsdS protein shows less than 66% amino acid identities. Amino acid variations are mainly found in nucleotide sequence recognition regions, suggesting an evolutionary advantage of different methylation patterns. The mSaa and mSab genomic islands may serve as an important reservoir of genome flexibility and variations in S. aureus forming new pathotypes. Extensive gene duplications along with recombination and gene loss have resulted in different sets of superantigens and exotoxins and may serve as a hot spot for evolutionary events in the future, leading to new S. aureus lineages. Recently, a new type of genomic island has been identified in strain COL named mSac. This island is found in all S. aureus genomes, but is also present in the S. epidermidis RP62A and ATCC12228 genomes and designated mSec. The S. aureus mSac island contains a cluster of two genes of the phenol-soluble modulin (PSM) family and a small cluster of exotoxin genes similar to those in mSaa [17]. Two pathogenicity islands have been found in bovine isolates. The SaPIbov is similar in size (15.9 kbp) and gene order to SaPI1. It also encodes TSST-1, as SaPI1 and SaPI2; however, in contrast to these islands from human isolates, SaPIbov is flanked by a 74-nt direct repeat, one copy of which has also been shown to be present in a TSS-negative bovine strain [65]. Parts of the bovine island have also been found in human isolates, but the entire sequence has not been identified so far in human strains. SaPIbov2, the second bovine island, was recently identified and characterized in detail [23]. The 28-kb island is integrated at the 3¢ end of the GMP synthetase and is flanked by 18-bp direct repeats. Comparison of the sequences of the island revealed extensive similarities to other SaPIs. Importantly, the toxin genes present in other SaPIs were exchanged for a biofilm-associated adhesin called Bap carried on a transposon-like structure. Bap is regarded as an important virulence determinant of bovine S. aureus strains causing mastitis. For the SaPIbov2 transfer of the island was demonstrated without helper phage using a construct of the island-specific Sip integrase flanked by the left and right attachment sites. This module was excised, circularized, and integrated RecA-independently in the attachment site of S. aureus but also on an E. coli plasmid. SaPIbov2 is present in bovine isolates but not in human strains, indicating the presence of an ancestor S. aureus strain in animals. Interestingly, DNA flanking the bap gene in other animal pathogens including S. xylosus, S. chromogenes, and S. simulans is very similar, which suggests that the bap gene was horizontally transferred between different animal pathogens, probably via a composite transposon-like structure such as was found on SaPIbov2 [23].

187

188

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

The exact role of a particular pathogenicity island on the outcome of an infection remains a task for future research. For example, it is not presently known whether the extensive gene duplications found on mSaa and mSab genomic islands impacts on the pathogenicity of S. aureus isolates. Moreover, most of the ORFs on SaPI1 have homology to proteins of unknown function. Since 22 of the 24 ORFs of SaPI1 of strain COL are expressed in vitro, they might be important for maintenance of the island or regulation of enterotoxin expression [61]. Staphylococcal pathogenicity islands could have evolved by specialized transduction events and recombination of pathogenicity island modules along with deletion and addition of genetic material. Probably, the origin of the pathogenicity islands is a common ancestral genetic element derived from a phage. This hypothesis is supported by the fact that genes with high homology to phage genes are present in all staphylococcal pathogenicity islands. The classification of staphylococcal islands is currently a matter of debate. The term “genomic island” may best denote the characteristic of an island structure which is defined as a putative foreign genetic element that was acquired by horizontal gene transfer. In addition, most authors also include staphylococcal phages in the group of genomic islands, reflecting the fact that many of the staphylococcal genomic islands are very similar in structure and gene content to bacteriophages and some of them can be mobilized by phages. The genomic islands identified in the sequenced S. aureus genomes are shown in Table 9.3.

Table 9.3 Summary of genomic islands in sequenced S. aureus strains.

Strain

Type of island [17]

Characteristic genes on the island

Alternative designation

COL

mSa1

seb, tst, ear

SaPI1 [25] SaPI3 [61]

mSa4 typeII

Four unknown ORFs

SaPI2 [25]

mSaa typeIII

set1-5, lpl2,7,8,11,13

mSab typeI

splA-F, lukDE, ear, epidermin

N315

mSac

set, eta, psmb

SCCmec typeI

mecA, ccrA, ccrB

UCOL

Integrated in geh

mSa4 typeI

sel, sec3, tst

SaPIn1 [14] SaPI2 [25]

mSaa typeI

set6-15, lpl 1-9

SaPIn2 [14]

mSab typeI

splA-F, lukDE, seg, sen, sei, sem, seo

SaPIn3 [14]

mSac

set, eta, psmb

SCCmec typeII

mecA, ccrA, ccrB

USa3

sak, sea, sep, seg2, sek2

9.2 Comparative Genomics of S. aureus Table 9.3 Continued.

Strain

Type of island [17]

Characteristic genes on the island

Alternative designation

Mu50

mSa3 typeI

fhuD

SaGIm [14] SaPI3 [25]

mSa4 typeI

sel, sec3, tst

SaPIm1 [14] SaPI2 [25]

MW2

mSaa typeI

set6-15, lpl 1-9

SaPIm2 [14]

mSab typeI

splA-F, lukDE, seg, sen, sei, sem, seo

SaPIm3 [14]

mSac

set, eta, psmb

SCCmec typeII

mecA, ccrA, ccrB

USa1

72 ORFs

USa3

sak, sea, sep, seg2, sek2

mSa3 type II

sel2, sec4, ear

SaPI3 [25]

mSa4 typeII

Four unknown ORFs

SaPI2 [25]

mSaa typeII

set16-26, lpl10-14

mSab typeII

splA-F, lukDE, bsa

mSac

set, eta, psmb

SCCmec, IV

mecA, ccrA, ccrB

USa2

lukS-PV, lukF-PV

USa3

sak, sea, sep, seg2, sek2

MRSA252 SaPI4 [15]

No virulence genes

mSaa typeI

set6-15, lpl(6)

mSab typeI

splA-F, seg, sen, sei, sem, seo

mSac

set, eta, psmb

SCCmec typeII

mecA, ccrA, ccrB

USa2

No virulence genes

USa3

sak, sea

MSSA476 mSaa

set (11), lpl(5)

mSab

spl(4), lukDE, epidermin

mSac

set, eta, psmb

SCC476

far

USa3

sea, sak, seg2, sek2

USa4

No virulence genes

Gene symbols: bsa, bacteriocin gene cluster; ccrA,B, cassette chromosome recombinase A,B; far, fusidic acid resistance; geh, lipase; lpl, lipoproteins; lukDE, components of toxin leukocidin DE; lukS-PV, lukF-PV, Panton-Valentine leukocidin components S and F; mecA, penicillin binding protein 2a; sak, staphylococcal kinase; set, staphylococcal exotoxins; splA-F, serine protease; sea, seb, sec3, sec4, seg, seg2, sek2, sel, sem, sen, seo, sep, staphylococcal enterotoxins; tst, toxic shock syndrome toxin 1.

189

190

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

9.2.3.2 Staphylococcal Cassette Chromosome A unique type of mobile genomic island in staphylococci is represented by the staphylococcal cassette chromosome (SCC) element which is associated with methicillin resistance in staphylococci. Although this genomic island does not contain any phage-related genes, mobility has been demonstrated due to the activity of two recombinases of the invertase/resolvase family, designated chromosome cassette recombinases A and B (ccrA and ccrB). Integration occurs site-specifically into the chromosome near the origin of replication in a conserved ORF of unknown function named orfX that contains a 15-bp attachment site (attBscc). In the case of integration of SCC into the genome the attBscc is found at both chromosome–SCC junctions. SCC elements were originally described to be associated with methicillin resistance, but in recent studies SCC elements were also described in methicillin-susceptible isolates [15, 66]. Methicillin resistance in staphylococci is related to the presence of the mecA gene encoding an additional penicillin-binding protein, PBP2a, with reduced affinity to b-lactam antibiotics. PBP2a can substitute the essential functions of b-lactam-susceptible PBPs in crosslinking the peptidoglycan of the bacterial cell wall [67]. The mecA determinant is located within SCC elements which therefore has been termed staphylococcal cassette chromosome mec (SCCmec) [68, 69]. SCCmec has been shown to be transferable among staphylococcal species. Five major SCCmec types, ranging in size from 21 kbp to 67 kbp, have been identified [70, 71]. While types I–III are found in hospital-acquired MRSA (HA-MRSA or HMRSA), strains of types IV and V are restricted to community-acquired MRSA isolates (CA-MRSA, C-MRSA or Co-MRSA). The SCCmec type is defined by the combination of the type of cassette chromosome recombinase (ccr), which confers mobility, and the mec gene complex [70]. There is no equivalent region in methicillinsusceptible staphylococcal strains, suggesting that the mec fragment was acquired by horizontal gene transfer. Moreover, the mosaic structure of the different SCCmec types indicates that several recombination events between the ccr and mec gene complexes along with integration and deletion of DNA sequences has driven the evolution of methicillin resistance in staphylococci. The core sequence of SCCmec consists of class A, B, C, or Dmec and one of five allotypes of ccr gene complex. A recently identified new SCCmec type lacks the ccrAB genes and contains instead of ccrAB a single copy of a recombinase designated ccrC [71]. Class A and Bmec have been found in clinical MRSA, class Cmec is mainly distributed in S. haemolyticus, and class Dmec mainly in S. hominis. Class Amec consists of mecA, a copy of the insertion sequence element IS431mec (an IS257-like element), and the regulatory sequences mecR1 and mecI. In class Bmec mecI and the 3¢-region of mecRI are deleted and a truncated copy of IS1272 is inserted (Fig. 9.1). The ccr types are defined by the combination of different ccrA and ccrB homologues. Moreover, transposon Tn554 is detectable in SCCmec of types II and III. Finally, genes encoding resistance genes or whole plasmids (pUB110 or pT181) are found in the 3¢-region of mecA (Fig. 9.1).The different types of SCCmec reflect the evolution of methicillin resistance in staphylococci. First MRSA clones such as the preMRSA isolate N315 carry intact regulatory genes mecRI and mecI which do not

9.2 Comparative Genomics of S. aureus

Fig. 9.1 Structure of type I, II, III, and IV staphylococcal cassette chromosome mec (SCCmec) elements (adapted from [70]). SCCmec is characterized by two common gene complexes, the ccr (cassette chromosome recombinase) gene complex (gray) and the mec gene complex (blue). Integrated IS431, Tn554, and plasmid pT181 are also indicated. Gene symbols: cad, cadmium

resistance; ccr, cassette chromosome recombinase; ermA, erythromycin resistance; hsdR, host specificity determinant; kdp, two-component system regulating potassium transport; mec, penicillin binding protein 2a, methicillin resistance; mer, mercury resistance; tetK, tetracycline resistance. (This figure also appears with the color plates.)

respond well to b-lactam antibiotics including methicillin. Now, epidemic clones are homogeneously resistant to high methicillin concentrations and in addition to many other antibiotics including tobramycin, bleomycin, and tetracycline. Importantly, such resistance determinants inserted into SCCmec by recombination across IS431 insertion elements. Specific types of SCCmec elements have been described in CA-MRSA. In contrast to HA-MRSA, which was first described in 1961, CA-MRSA was detected quite recently among healthy individuals with no recognizable risk factors [72, 73]. The expression of the Panton-Valentine leukotoxin (PVL) by CA-MRSA represents a major virulence factor of these strains as PVL is implicated in life-threatening necrotizing pneumonia [74]. Interestingly, PVL is carried by different bacteriophages [75]. Obviously, CA-MRSA strains are epidemiologically and clonally unrelated to hospital-acquired isolates. Most CA-MRSA isolates worldwide carry the type IV SCCmec, which was presumably acquired by community clones showing a high fitness. The type IV SCCmec element (21–24 kbp) is shorter than SCCmec elements found in HA-MRSA and carries no further resistance determinants [16]. Recently, a fifth SCCmec type was discovered in a CA-MRSA isolate from Australia [71]. The 28-kb type V SCCmec carries a C2 mec gene complex, a new ccr recombi-

191

192

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

nase designated ccrC, and in addition, a complete set of a type I restriction modification system composed of hsdR, hsdS, and hsdM. The restriction modification systems seem to be important for the stability of a given region of the genome. Interestingly, hsdR and hsdM are also present on the two genomic islands mSaa and mSab found in all S. aureus genomes sequenced so far (see above). As several CA-MRSA lineages with substantial genetic diversity are found in different parts of the world, in all probability new SCCmec types will be described in the future. The origin of SCCmec is still obscure and a number of speculations have been proposed. Probably, the evolutionary origin of the precursor mecA is a coagulase-negative staphylococcal species [67]. MLST studies revealed that methicillin resistance in nosocomial strains due to the presence of SCCmec is restricted to approximately five clonal complexes (CC5, CC8, CC22, CC30, CC45) [28]. It has therefore been supposed that despite mobility of SCCmec, a host restriction barrier ensures stable integration and maintenance of mecA resistance [76, 77]. The reason why some clones are able to pick up SCCmec and not others is still unknown. Obviously, mecA expression is connected with loss of fitness which has to be counterbalanced by an adaptive process including regulation by mec/bla regulatory elements [78]. Recent work reveals that SCC elements are also present in methicillin-sensitive strains. For example, strain MSSA476 carries a unique 22.6-kb genetic element with a high similarity to SCCmec. The SCC476 element has the same left and right boundaries (attL and atlR), similar inverted repeats, and is integrated at the same site on the chromosome as SCCmec elements but carries a putative fusidic acid resistance determinant instead of mecA [15]. In addition, SCC-specific sequences were identified in several clinical MSSA isolates by DNA microarray analysis, indicating extensive integration and deletion events within this region [22]. The emergence of MRSA strains has dramatically changed the diversity of clones causing diseases in humans. A small number of clonal types are now responsible for the vast majority of nosocomial S. aureus infections worldwide. Moreover, CA-MRSA strains seem to be widely distributed in the community, and there is substantial fear that PVL-positive strains with increased fitness may enter the hospital environment, where they would become multiresistant. In consequence, new dangerous pathotypes might emerge which are more aggressive than those presently found and would be extremely difficult to treat.

9.2.3.3 Bacteriophages Most of the naturally occurring S. aureus strains are polylysogenic. Based on virus morphology, staphylococcal bacteriophages are members of two groups of tailed phages: Myoviridae and Siphoviridae. Bacteriophages of S. aureus belong to seven serological groups, designated A, B, C, D, F, G, L. Alternatively, staphylococcal prophages can be classified into five families based on integrase gene homology or, depending on genome sizes, into three classes: class I (< 20 kbp), class II (approx. 40 kbp), and class III (> 125 kbp) [29, 79]. More than 40 staphylococcal phage genomes have been determined including phages carrying important viru-

9.2 Comparative Genomics of S. aureus

lence determinants such enterotoxin A, exfoliative toxin A, and Panton-Valentine leukocidin [80–82]. The sequence data revealed some interesting structural features of staphylococcal phages. Although the genomes of the phages show a G+C content which is similar to that of the host, most phages contain a large set of genes of unknown function and no homology to bacterial sequences [79]. The coding regions are tightly packed with more than 90% coding capacity. The majority of genes are transcribed from one strand. Some large phages contain a second DNA replication mode which is associated with lytic functions encoding amidase and holin genes. These functions may be important for a broad host range of this class of phages as they infect both coagulase-positive and coagulase-negative staphylococci. Interestingly, phage genomes possess a mosaic structure, suggesting that recombination events between different phages are common. All these characteristics highlight the importance of phages for structural flexibility of staphylococcal genomes. The best characterized staphylococcal phage is U11 with a 43.6-kb genome [83]. U11 is a member of the serogroup B and possesses int and xis genes, as does UL54a. Phage U11 is a general transducing phage capable of packaging up to 45 kbp of DNA. The phage is able to package plasmids and chromosomal genes of the host at low frequency, which in turn can be injected into other S. aureus cells. Since these plasmids can replicate and chromosomal genes might be integrated into the chromosome of the new host, transducing phages are important vehicles involved in horizontal gene transfer. For example, tst carrying staphylococcal pathogenicity islands (see above) can be transferred by generalized transduction between different strains of S. aureus. It is tempting to speculate that other gene clusters, too, such as biofilm converting genes or other toxin loci, or even resistance determinants such as mecA, may become subject to transduction by bacteriophages. For many phages, including U12, U13, UPVL, UETA, USLT, and UMu50A, no intrinsic xis function has been identified [83]. Instead these phages carry an ORF of unknown function named OrfC which might perform the xis function, but this has yet to be shown. Phages have a great impact on the evolution of new pathotypes of staphylococci, mediating both positive and negative lysogenic conversion. For example, the gene for the superantigenic staphylococcal enterotoxin A (sea) that causes food poisoning and toxic shock syndrome is carried by a polymorphic family of related temperate bacteriophages found, e.g., in the genomes of strains Mu50, MW2, MSSA476, and MRSA252 [14–16, 84]. Interestingly, the phage integrates into the genome using an attachment site located within the determinant encoding the b-toxin (hlb) [85]. By a similar mechanism the structural gene of the lipase (geh) is inactivated by bacteriophage L54a or an L54-like phage (UCOL), which att site is composed of an 18-bp core sequence within the reading frame for lipase [17, 86]. Moreover, double- or triple-converting bacteriophages have been described. U13 negatively affects the expression of b-toxin but simultaneously confers the ability to produce staphylokinase, and a serotype F bacteriophage of S. aureus has been found mediating the simultaneous triple-lysogenic conversion of enterotoxin A, staphylokinase, and b-toxin [87, 88].

193

194

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

Currently, the phage carrying the PVL gene associated with CA-MRSA strains has serious clinical implications. CA-MRSA strains are being increasingly recognized in the community where they colonize persons without any specific risk factors, frequently causing skin and soft tissue infections [73]. Most alarming, however, these strains carry a bacteriophage encoding the PVL toxin, which has been shown to be the major virulence determinant in necrotizing pneumonia [74]. Several recent cases of necrotizing pneumonia in children and young adults were associated with a very high mortality, around 40% [74]. Fortunately, CA-MRSA pneumonia has been rarely reported so far; however, the recent spread of PVL-expressing CA-MRSA strains increases the probability that new, even more aggressive lineages will evolve, and that such strains will reach the hospitals, where they may become resistant to many antibiotics. The short-term evolution of PVL-carrying CA-MRSA strains impressively illustrates the importance of bacteriophages for particular virulent forms of S. aureus and the genome flexibility of this important human pathogen.

9.2.3.4 Plasmids Most naturally occurring staphylococcal strains contain plasmids. These plasmids can be grouped into three major classes. Class I consists of small (1–5 kbp) multicopy plasmids that are either cryptic or carry a single resistance determinant [89]. Plasmids of this class mainly contribute to the development of resistance against oxytetracycline, chloramphenicol, and to macrolides–lincosamides–streptogramines (MLS) [89, 90]. Plasmids of class II are 15–30 kbp in size, have low copy numbers (4–6/cell), and mostly carry several resistance determinants: for b-lactams, often associated with transposon Tn552, for mercury resistance, for resistance to cadmium and lead, and also for MLS resistance of the ermB type due to insertion of Tn551 [89, 91, 92]. Class III plasmids are conjugative multiresistance plasmids and are therefore of special epidemiological interest. They are comparatively large (30–60 kbp), and besides determinants for conjugative transfer, they carry a number of different resistance determinants: for aminoglycoside resistance on Tn4001, for trimethoprim resistance on Tn4003, for resistance to quaternary ammonium compounds, and in some cases also for b-lactams on a transposon related to Tn552 [93–95]. The association of resistance determinants in S. aureus with transposable elements such as Tn4001 has led to a clustering of resistance determinants on plasmids [93, 96, 97]. There is evidence that certain multiresistance plasmids have evolved by sequential acquisition of resistance determinants based on cointegration of target molecules. This process is mediated by IS257, which can already exist on the captured DNA sequence. Most concerningly, a plasmid carrying the full vanA vancomycin resistance transposon Tn1546 has recently emerged in S. aureus [13]. The vanA determinant was received from an Enterococcus faecalis strain coinfecting a patient in a hospital. The conjugative enterococcal plasmid was transferred to S. aureus, in which it cannot replicate, but the vanA-containing

9.3 Staphylococcus epidermidis

transposon Tn1546 jumped into a naturally S. aureus-conjugative plasmid. This event impressively shows how resistance determinants can cross the species border, accelerating the development of new resistance types and jeopardizing antistaphylococcal therapy. Since vancomycin is used as an antibiotic of last resort, the spread of vancomycin-resistant staphylococci, either vertically or by transfer of vanA plasmids horizontally, would have dramatic consequences for the management of infections. Whereas plasmids carry many resistance determinants, they carry only a few virulence genes. For example, the superantigenic enterotoxins SED and SEJ are located on a 27.6-kb penicillinase plasmid designated pIB485 [98, 99]. The exfoliative toxin B (ETB) gene is carried by a large plasmid of 38.2 kbp which shows similarity to certain conjugative resistance plasmids of the pSK family [100]. Interestingly, pETB contains another virulence-related gene encoding the ADP-ribosyltransferase EDIN-C which modifies Rho GTPases. In addition, a cadmium resistance operon and a lantibiotic-producing region are encoded by this plasmid. Three copies of IS257 divide the pETB genome into three functional regions, suggesting the integration of particular parts of the plasmid by homologous recombination between insertion elements [100]. Some staphylococcal plasmids carry bacteriocin-like genes. The approximately 40-kb plasmid pACK1 of S. simulans biovar staphylolyticus ATCC1362 carries the lysostaphin gene (lss), which codes for an extracellular glycylglycine endopeptidase, and the lysostaphin immunity factor gene (lif), which leads to an increase of the serine/glycine ratio of the interpeptide bridges of peptidoglycan [101]. Interestingly, lss and lif are flanked by insertion sequences such as IS257 and IS1293, indicating that S. simulans biovar staphylolyticus received lif and lss by horizontal gene transfer.

9.3 Staphylococcus epidermidis

S. epidermidis is primarily a normal inhabitant of the healthy human skin and mucosal microflora. In recent decades, however, the bacterium has emerged as a common cause of nosocomial infections. These infections usually occur in association with the use of medical devices, and they preferentially affect immunocompromised and critically ill patients, causing acute bacteremia and septicemia. From the molecular point of view it is still unclear why S. epidermidis was so successful in becoming established as a nosocomial pathogen. The most interesting question in this respect is whether pathogenic strains obtained from device-associated infections or the hospital environment differ genetically and/or physiologically from commensal isolates outside of hospitals. Currently, the complete annotated genome sequences of two S. epidermidis strains are publicly available: one from S. epidermidis ATCC 12228, a laboratory reference strain used for antibiotic resistance testings [102], and the other from S. epidermidis RP62A (ATCC 35984), a clinical strain isolated from an intravascular catheter-associated sepsis [17].

195

196

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

Fig. 9.2 Pairwise comparison of the Staphylococcus epidermidis ATCC 12228 and Staphylococcus epidermidis RP62A genomes displayed using the Artemis Comparison Tool (ACT; http:// www.sanger.ac.uk/Software/ACT). The red bars represent homologous matches between the genomes and the blue bars indicate a homologous, but inverted chromosomal region.

Genomic islands, IS elements, phages, SCC cassettes, and genes involved in adherence and biofilm formation are marked as colored boxes. Gene symbols: aae, autolysin/adhesin; aap, accumulation-associated protein; atlE, autolysin; bhp, Bap homologous protein; ica, intercellular adhesin. (This figure also appears with the color plates.)

Whole-genome analysis of both strains revealed a genome size of approximately 2.499 Mbp for ATCC1228 and 2.616 Mbp for RP62A, coding for 2381 and 2553 ORFs, respectively. Direct comparison of the S. epidermidis RP62A and ATCC12228 sequences indicates over a broad extent a very uniform overall genome organization (Fig. 9.2). Variation in genome size and gene content is mainly due to the insertion of a prophage in S. epidermidis RP62A and differences in terms of other mobile elements such as genomic islands, transposons, and insertion sequences. Figure 9.2 shows on the right a large chromosomal region which is inverted (colored blue). Interestingly, this part of the chromosome contains the attachment site (orfX) for the SCCmec cassettes and a range of genes involved in adherence and biofilm formation (see below). It is tempting to speculate that this part of the S. epidermidis genome is subject to frequent recombination events and acquisition of variable resistance and virulence traits.

9.3 Staphylococcus epidermidis

9.3.1 Genomic Islands

Three novel genomic islands have been identified in S. epidermidis [17] (Fig. 9.2). The mSe1 and mSe2 islands originate from integrated plasmids. mSe1 was detected in RP62A and comprises 17 153 nucleotides. A striking feature of mSe1 is an encoded cadmium resistance protein (CadD) and its corresponding transcription regulator (CadC). S. epidermidis ATCC1228 lacks mSe1, but carries instead a single 4455-kb ORF of unknown function at the same integration site (SE2204 in Fig. 9.2). mSe2 was identified by Gill and coworkers in ATCC 12228 [17]. It is a relatively large structure, 38.4 kbp in size, and encodes a range of interesting traits which are unique to this strain and have not been found in other staphylococci so far. The island carries an additional sortase gene (srtC) and genes coding for two LPXTG-anchored cell wall proteins, implying specific interactions of this strain with host structures. The third island, mSec, is present in both S. epidermidis genomes at the same site. The small 2.66-kb islet encodes a putative phosphoesterase, a hydrolase, and four b1 phenol-soluble modulins which have been shown recently to be involved in activation of the human innate immune system [103]. 9.3.2 Phage SPb and other Bacillus Genes

S. epidermidis RP62A harbors a Bacillus subtilis SPb-like phage which is unique to this strain [104]. The phage genome has obviously undergone multiple recombination events in its staphylococcal host, resulting in a mosaic structure carrying IS elements and an LPXTG-anchored cell wall protein gene [17]. Both S. epidermidis RP62A and ATCC 12228 harbor the cap operon, which originates from B. anthracis and encodes a polyglutamate capsule. The capsule is a major virulence factor in B. anthracis and presence of the cap genes and the SPb-like phage indicates that S. epidermidis is capable of exchanging genetic material with bacilli, which is a significant and surprising result of the genome sequencing projects. 9.3.3 Virulence Factors

S. aureus exerts a range of species-specific factors for the invasion of host tissues and evasion of the immune defense. These factors comprise, for instance, staphylocoagulase, protein A, clumping factors A and B, staphylokinase, and hyaluronidase. Moreover, S. aureus strains encode a great variety of exotoxins and enterotoxins which are mostly located on pathogenicity islands. Apart from the d- and b-hemolysins, S. epidermidis ATCC 12228 and RP62A harbor no staphylococcal exotoxin genes, and the genomic islands identified in these strains also do not carry superantigen genes and should therefore not be regarded as pathogenicity islands. Nevertheless, homologous genes for other secreted exoenzymes such as lipases, esterases, serine protease, cysteine proteases, and other proteases as well

197

198

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

as thermonuclease and exonuclease are present in both S. epidermidis genomes. These factors reflect the interference with host structures and the recruitment of nutrient sources by S. epidermidis. The lack of toxins and superantigens in S. epidermidis explains the more subacute and chronic course of S. epidermidis infections, whereas S. aureus has the capacity to cause acute and often life-threatening infections. 9.3.4 Staphylococcal Cassette Chromosome

Nosocomial S. epidermidis isolates are known for their pronounced resistance to many of our commonly used antibiotics. As in S. aureus, methicillin resistance in S. epidermidis is conferred by SCCmec. However, in contrast to MRSA, little attention is paid to methicillin-resistant S. epidermidis (MRSE) in hospital settings, and it is not combated with the same intensity by hygiene measures as is MRSA. As a result, a high prevalence of resistant isolates is recorded worldwide. Approximately 80% of S. epidermidis isolates from device-associated infections are methicillin- and multiresistant, whereas commensal strains obtained from the community are mostly antibiotic-susceptible [105]. A recent study of SCCmec distribution provides evidence that S. epidermidis can harbor all types of SCCmec [106]. Genome sequencing of the methicillin-resistant S. epidermidis RP62A revealed the presence of a type II SCCmec inserted in orfX in the large inverted chromosomal region at the righthand end of the S. epidermidis chromosome (Fig. 9.2). Ninetyeight percent identity of the element on the nucleotide level with S. aureus type II SCCmec is an indication of the capacity for horizontal gene transfer between the two species [17]. This hypothesis is supported by a range of studies showing the existence of primordial SCCmec-like elements in coagulase-negative staphylococci which can be mobilized and transferred to other strains or species [66, 77, 107, 108]. SCCmecs are now regarded as site-specific mobile elements in which extensive recombination and gene shuffling takes place. Obviously, they not only serve as shuttles for the transfer of methicillin resistance, but can also carry other staphylococcal genes. Thus, S. epidermidis ATCC 12228 contains an SCC element (named SCCpbp4) which instead of the mec complex harbors the penicillin-binding protein gene pbp4, the teichoic acid biosynthesis gene tagF, a restriction modification system (hsdS, hsdM), and cadmium and mercury resistance genes along with recombinase genes and IS elements [17, 66]. These findings strongly suggest that S. epidermidis and other coagulase-negative staphylococci represent the gene pool in which an ongoing generation of novel SCC types takes place and from which methicillin resistance in S. aureus might originate. In this respect, the genome data underscore the necessity of taking multiresistant S. epidermidis and coagulase-negative staphylococci seriously as reservoirs for the spread of resistance genes within microbial communities. As a consequence they should be controlled by appropriate hygiene measures in a similar manner as MRSA.

9.3 Staphylococcus epidermidis

9.3.5 Adherence and Biofilm Formation

Genetic differences between commensal and disease-causing S. epidermidis are an issue which has already been addressed and intensively investigated in the pregenomic era. In addition to antibiotic resistance, major differences were found in the capacity to colonize inert surfaces. Thus, the most intriguing feature of clinical S. epidermidis isolates is their ability to form thick multilayered biofilms on polymer and metal surfaces [56]. These biofilms consist of staphylococcal cells embedded in a slimy polysaccharide intercellular adhesin (PIA) matrix connecting the bacteria both to the surface and to each other. However, cell-to-cell and cell-tosurface contacts can also be mediated by polysaccharide-independent protein interactions [109–111]. S. epidermidis RP62A is a biofilm-positive strain originally isolated from a patient suffering from intravenous-catheter-associated septicemia. Because of the extraordinary large amount of biofilm produced by it, RP62A serves as a prototype and reference strain in Staphylococcus biofilm studies. In contrast, S. epidermidis ATCC 12228 is known as a biofilm-negative laboratory strain. Biofilm formation is regarded as a two-step mechanism which involves initial adherence of the bacterium to a surface followed by an accumulative stage. The initial attachment of S. epidermidis is mainly mediated by cell-wall-anchored adhesins. S. epidermidis possesses some structures which are involved in the interaction with matrix proteins and in adherence to inert surfaces and eukaryotic cells. Thus, genes encoding cell-wall-associated adhesins like fibronectin-, elastin- and fibrinogen-binding proteins were identified both in S. epidermidis RP62A and in ATCC 12228. Moreover, it was shown that in S. epidermidis autolysins which are actually involved in peptidoglycan synthesis and cell division also exhibit adhesive properties. So far in S. epidermidis AtlE and Aae, two members of this autolysin/ adhesin family of surface-associated proteins, have been described. AtlE is responsible for adhesion to unmodified polymer surfaces and for vitronectin binding, whereas Aae interacts additionally with fibrinogen and fibronectin [112, 113]. The atlE and aae genes are present in both S. epidermidis RP62A and S. epidermidis ATCC 12228. The atlE genes are located at the same chromosomal site in the core part of the chromosome, whereas the aae genes are detectable in the inverted high-recombination region of the S. epidermidis genome (Fig. 9.2). Bhp, a homologous protein to the biofilm-associated protein Bap from S. aureus, is another protein that mediates initial adherence, but is also involved in biofilm accumulation of S. epidermidis [111]. The Bhp-encoding gene is only present in S. epidermidis RP62A and resides also in the inverted chromosomal region (Fig. 9.2). In S. aureus the bap gene is part of a novel transposon-like element integrated into a pathogenicity island which is specific for bovine isolates, suggesting that this gene is mobile and transferable [23]. However, for S. epidermidis it is unknown whether the bhp gene is also part of such a transposon-like structure. Also, larger studies on the distribution of the bhp gene in the S. epidermidis population and its significance as a virulence factor are still pending.

199

200

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

In the majority of biofilm-positive S. epidermidis strains, the second, accumulative stage of biofilm production is mainly characterized by synthesis of the PIA. The enzymes involved in PIA production are encoded by the icaADBC operon [114]. Numerous studies have shown that the icaADBC genes are more prevalent in S. epidermidis strains from device-associated infections than in commensal isolates [57, 115–117]. This genetic information has therefore been regarded as a discriminating factor between pathogenic and nonpathogenic S. epidermidis strains. In agreement with these findings, the ica operon is indeed only detectable in the clinical isolate RP62A, but absent in the biofilm-negative commensal strain ATCC 12228 [102]. Interestingly, the icaADBC genes are also part of the recombination region of the S. epidermidis chromosome and form the border of the inverted genome segment (Fig. 9.2). The origin and evolution of ica-positive S. epidermidis strains is still uncertain. However, detection of ica genes in different coagulasenegative staphylococcal species [118] and in E. coli [119] is a clue that this genetic information might spread by horizontal gene transfer. A recent study employing multilocus sequence typing (MLST) indicates that the ica operon occurs in different S. epidermidis backgrounds (Ziebuhr et al., unpublished data). Among these clones one was identified which represents the majority of ica-positive isolates collected from Northern and Middle Europe as well as from the US during the last 10 years. The data suggest that integration of the ica genes into S. epidermidis resulted in the emergence of highly successful clones which were eventually able to spread worldwide. However, it is also conceivable that the ica genes belonged originally to the Staphylococcus core genome and got lost in the course of genome rearrangements in the high-recombination region. In this respect it is important to note that all S. aureus strains analyzed so far carry the ica operon, and genome comparison revealed insertion of the ica operon at the same site in the S. aureus chromosome as in S. epidermidis (data not shown). However, this might also reflect an insertional hot spot for the integration of the ica genes at this locus. Thus, on the basis of these few facts it is currently not yet possible to decide whether the ica genes are mobile or an integral part of staphylococcal genomes. Another factor which mediates the accumulative stage of S. epidermidis biofilm formation is the accumulation-associated protein Aap [109, 110]. It is a large polypeptide of 1505 amino acids with an N-terminal signal peptide, an LPXTG cell wall anchor, and an extended repeat region. The full-length Aap protein requires proteolytic processing by staphylococcal or host proteases to exhibit its adhesive properties [109]. The aap gene, which is also located in the inverted high-recombination chromosomal region, is detectable in both sequenced S. epidermidis genomes (Fig. 9.2). However, the recent study by Rohde and coworkers also describes an aap-negative S. epidermidis strain, indicating that this genetic information is not necessarily common to all S. epidermidis isolates [109].

9.3 Staphylococcus epidermidis

9.3.6 Insertion Sequences

Insertion sequences are autonomous genetic elements encoding only functions for their own mobility within genomes. They are detectable in all living organisms and play a significant role in genome organization. Comparison of S. aureus and S. epidermidis genomes revealed remarkable differences in the number and kind of IS elements. Thus, both S. epidermidis genomes contain 57 IS (some of them truncated or functionally inactive) distributed randomly throughout the chromosome, whereas S. aureus genomes contain on average only 10–17 copies of different IS. Since IS elements are supposed to play a role in genome flexibility and adaptation, this finding suggests that the two species might differ in this respect. It is tempting to speculate that S. epidermidis as a low-pathogenic, commensal bacterium depends much more on genome rearrangements to generate genetic diversity and to adapt to novel requirements than does S. aureus. Due to the large number of different virulence factors, S. aureus might have a priori much broader and better opportunities to cope with its host or a changing environment. The importance of virulence factors and genome flexibility for staphylococcal pathogenesis is highlighted by an ongoing genome sequencing project of the food-grade bacterium S. carnosus. This species contains neither virulence factors nor any insertion sequence or other repetitive DNA element, rendering the bacterium completely nonpathogenic (R. Rosenstein and F. Gtz, personal communication). Many of the known staphylococcal IS elements (e.g., IS1272, IS431, IS256, IS200 etc.) occur in S. aureus as well as in S. epidermidis, which is another clue about the possibility of lateral gene transfer and exchange between the two species. In the two S. epidermidis genomes a novel insertion sequence named ISSep1 was found which is not detectable in any of the sequenced S. aureus genomes [17]. ISSep1 is scattered throughout in multiple copies and at different insertion sites in the RP62A and ATCC 12228 genomes (Fig. 9.2). With 13 copies in RP62A (two of them degenerated) and 15 copies in the ATCC 12228 genome, ISsep1 represents the most abundant IS in S. epidermidis. However, currently no data are available on a possible occurrence of the element in other coagulase-negative staphylococcal species or the function of ISSep1 in the S. epidermidis chromosome. Although S. epidermidis RP62A and ATCC 12228 do not vary in the number of IS, a difference with respect to the presence of IS256 can be observed. IS256 forms the ends of the composite transposon Tn4001 conferring aminoglycoside resistance on staphylococci and enterococci, but also occurs independently of the transposon in multiple copies on chromosomes or plasmids [120–122]. It was shown that, in contrast to commensal strains, the genomes of clinical S. epidermidis carry copies of IS256, whereas other typical staphylococcal insertion sequences such as IS257 and IS1272 are distributed equally among saprophytic and clinical isolates [105]. In accordance with this study, the RP62A genome was found to contain five IS256 elements: two transposon-associated and three independent copies. By contrast, no IS256 element was found in ATCC 12228. The IS256 element is involved in phase variation of biofilm expression by active transposition into the

201

202

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

ica genes and participates in chromosomal rearrangements of the staphylococcal chromosome by homologous recombination [123, 124]. Obviously, the element is highly active and also interferes with antibiotic resistance expression. Thus, IS256 was found to influence teicoplanin resistance in S. aureus by insertional inactivation of the tcaR regulatory gene [125]. Interestingly, IS256 also has a capacity to govern neighboring gene expression by internal promoter structures. Activation of resistance genes in the vicinity of IS256 has been shown for the aminoglycoside resistance genes in Tn4001 and methicillin resistance in S. aureus as well as in S. sciuri [121, 122, 126, 127]. The combined data imply that insertion sequences might play a significant role in the organization of the genome of pathogenic bacteria. On the one hand, multiple copies of IS in the genome force homologous recombination events and therefore contribute in turn to the flexibility and rearrangement of the genetic material. On the other hand, active transposition of IS contributes to heterogenous gene expression by mutation or activation of genes. The two processes together give rise to novel genotypic and phenotypic variants which are an advantage to the bacterium in adapting to changing external conditions or to moving into new ecological niches. The large number of IS in S. epidermidis may be a reflection of the extraordinary capacity of this species to deal with different environments and to assert itself as a commensal as well as a pathogenic bacterium in hospital settings.

9.4 Concluding Remarks

Recent genome analysis of bacterial pathogens revealed intriguing insights into the composition and organization of bacterial chromosomes. Currently, six S. aureus and two S. epidermidis genomes have been sequenced and comparative genome analysis has been performed. It has become evident that both organisms have a well-conserved core genome which is shared with all isolates and a flexible gene pool specific for each strain. The strain-specific genetic elements include a range of genomic islands and mobile regions, most of which encode virulenceand resistance-associated determinants. These genetic differences give rise to different expression pattern of virulence-associated factors including several toxins and in consequence are responsible for the clinical signs of staphylococcal infections. The enormous genome flexibility of staphylococci is a prerequisite for survival in specific environments and is the main reason why staphylococci have become the most successful pathogens in hospitals. Through horizontal gene transfer, new genetic traits have been captured which are important for the selection of new pathotypes. For example, the most powerful selective force in hospitals is the pressure exerted by antibiotics. Only isolates which can resist the action of these substances are able to survive. Methicillin-resistant S. aureus and S. epidermidis are nowadays the most abundant pathogenic bacteria in hospitals in Europe and the United States. However, comparative genome analysis has also revealed

References

that the genome content is only the working plan. Critical to an understanding of the nature of pathogenic bacteria will be the knowledge of how specific genes are regulated in response to the host environment and how specific gene products interact with host factors. The full understanding of these processes is a great challenge for the future and will lead to the development of new vaccines and antibiotics to combat reemerging infectious disease.

Acknowledgement

The work of K.O. and W.Z. was supported by grants from the Deutsche Forschungsgemeinschaft SFB 479 and SFB 630, and the Bundesministerium fr Bildung und Forschung (BMBF) Pathogenomics Network of Competence and Proteomics Network.

References 1 Rosenbach, F. J. 1885. Mikroorganismen

bei den Wundinfektionskrankheiten des Menschen. Wiesbaden, Germany. 2 Gtz, F., T. Bannerman, and K. H. Schleifer. 2004. The genera Staphylococcus and Macrococcus. In: The procaryotes. http://141.150.157.117:8080/ prokWIP/chaphtm/356/COMPLETE.htm 3 Huebner, J., and D. A. Goldmann. 1999. Coagulase-negative staphylococci: role as pathogens. Annu. Rev. Med. 50:223– 236. 4 Lowy, F. D. 1998. Staphylococcus aureus infections. N. Engl. J. Med. 339:520– 532. 5 Le Loir, Y., F. Baron, and M. Gautier. 2003. Staphylococcus aureus and food poisoning. Genet. Mol. Res. 2:63–76. 6 Gtz, F. 2004. Staphylococci in colonization and disease: prospective targets for drugs and vaccines. Curr. Opin. Microbiol. 7:477–487. 7 Diekema, D. J., M. A. Pfaller, F. J. Schmitz, J. Smayevsky, J. Bell, R. N. Jones, and M. Beach. 2001. Survey of infections due to Staphylococcus species: frequency of occurrence and antimicrobial susceptibility of isolates collected in the United States, Canada, Latin America, Europe, and the Western Pacific re-

8

9

10

11

12

13

gion for the SENTRY Antimicrobial Surveillance Program, 1997–1999. Clin. Infect. Dis. 32 Suppl 2:S114–132. Stefani, S., and P. E. Varaldo. 2003. Epidemiology of methicillin-resistant staphylococci in Europe. Clin. Microbiol. Infect. 9:1179–1186. Fridkin, S. K. 2001. Vancomycin-intermediate and -resistant Staphylococcus aureus: what the infectious disease specialist needs to know. Clin. Infect. Dis. 32:108–115. 2002. From the Centers for Disease Control. Staphylococcus aureus resistant to vancomycin – United States, 2002. Jama. 288:824–825. 2002. From the Centers for Disease Control and Prevention. Vancomycin resistant Staphylococcus aureus – Pennsylvania, 2002. JAMA 288:2116. Hiramatsu, K. 1998. The emergence of Staphylococcus aureus with reduced susceptibility to vancomycin in Japan. Am. J. Med. 104:7S–10S. Weigel, L. M., D. B. Clewell, S. R. Gill, N. C. Clark, L. K. McDougal, S. E. Flannagan, J. F. Kolonay, J. Shetty, G. E. Killgore, and F. C. Tenover. 2003. Genetic analysis of a high-level vancomycinresistant isolate of Staphylococcus aureus. Science 302:1569–1571.

203

204

9 Pathogenic Staphylococci: Lessons from Comparative Genomics 14 Kuroda, M., T. Ohta, I. Uchiyama,

T. Baba, H. Yuzawa, I. Kobayashi, L. Cui, A. Oguchi, K. Aoki, Y. Nagai, J. Lian, T. Ito, M. Kanamori, H. Matsumaru, A. Maruyama, H. Murakami, A. Hosoyama, Y. Mizutani–Ui, N. K. Takahashi, T. Sawano, R. Inoue, C. Kaito, K. Sekimizu, H. Hirakawa, S. Kuhara, S. Goto, J. Yabuzaki, M. Kanehisa, A. Yamashita, K. Oshima, K. Furuya, C. Yoshino, T. Shiba, M. Hattori, N. Ogasawara, H. Hayashi, and K. Hiramatsu. 2001. Whole genome sequencing of methicillin-resistant Staphylococcus aureus. Lancet 357:1225–1240. 15 Holden, M. T., E. J. Feil, J. A. Lindsay, S. J. Peacock, N. P. Day, M. C. Enright, T. J. Foster, C. E. Moore, L. Hurst, R. Atkin, A. Barron, N. Bason, S. D. Bentley, C. Chillingworth, T. Chillingworth, C. Churcher, L. Clark, C. Corton, A. Cronin, J. Doggett, L. Dowd, T. Feltwell, Z. Hance, B. Harris, H. Hauser, S. Holroyd, K. Jagels, K. D. James, N. Lennard, A. Line, R. Mayes, S. Moule, K. Mungall, D. Ormond, M. A. Quail, E. Rabbinowitsch, K. Rutherford, M. Sanders, S. Sharp, M. Simmonds, K. Stevens, S. Whitehead, B. G. Barrell, B. G. Spratt, and J. Parkhill. 2004. Complete genomes of two clinical Staphylococcus aureus strains: evidence for the rapid evolution of virulence and drug resistance. Proc. Natl. Acad. Sci. U. S. A. 101:9786–9791. 16 Baba, T., F. Takeuchi, M. Kuroda, H. Yuzawa, K. Aoki, A. Oguchi, Y. Nagai, N. Iwama, K. Asano, T. Naimi, H. Kuroda, L. Cui, K. Yamamoto, and K. Hiramatsu. 2002. Genome and virulence determinants of high virulence community-acquired MRSA. Lancet 359:1819– 1827. 17 Gill, S. R., D. E. Fouts, G. L. Archer, E. F. Mongodin, R. T. Deboy, J. Ravel, I. T. Paulsen, J. F. Kolonay, L. Brinkac, M. Beanan, R. J. Dodson, S. C. Daugherty, R. Madupu, S. V. Angiuoli, A. S. Durkin, D. H. Haft, J. Vamathevan, H. Khouri, T. Utterback, C. Lee, G. Dimitrov, L. Jiang, H. Qin, J. Weidman, K. Tran, K. Kang, I. R. Hance, K. E. Nelson, and C. M. Fraser. 2005. Insights on evolution of virulence and resistance

from the complete genome analysis of an early methicillin-resistant Staphylococcus aureus strain and a biofilm-producing methicillin-resistant Staphylococcus epidermidis strain. J. Bacteriol. 187:2426–2438. 18 Fitzgerald, J. R., D. E. Sturdevant, S. M. Mackie, S. R. Gill, and J. M. Musser. 2001. Evolutionary genomics of Staphylococcus aureus: insights into the origin of methicillin-resistant strains and the toxic shock syndrome epidemic. Proc. Natl. Acad. Sci. U. S. A. 98:8821–8826. 19 Peacock, S. J., C. E. Moore, A. Justice, M. Kantzanou, L. Story, K. Mackie, G. O’Neill, and N. P. Day. 2002. Virulent combinations of adhesin and toxin genes in natural populations of Staphylococcus aureus. Infect. Immun. 70:4987–4996. 20 Cassat, J. E., P. M. Dunman, F. McAleese, E. Murphy, S. J. Projan, and M. S. Smeltzer. 2005. Comparative genomics of Staphylococcus aureus musculoskeletal isolates. J. Bacteriol. 187:576–592. 21 Dunman, P. M., W. Mounts, F. McAleese, F. Immermann, D. Macapagal, E. Marsilio, L. McDougal, F. C. Tenover, P. A. Bradford, P. J. Petersen, S. J. Projan, and E. Murphy. 2004. Uses of Staphylococcus aureus GeneChips in genotyping and genetic composition analysis. J. Clin. Microbiol. 42:4275– 4283. 22 Rademacher, A. 2004. Molekulargenetische Untersuchungen von Staphylococcus aureus Isolaten von Mukoviszidose-Patienten mittels DNA-Microarrays. Diploma thesis. University of Wrzburg. 23 Ubeda, C., M. A. Tormo, C. Cucarella, P. Trotonda, T. J. Foster, I. Lasa, and J. R. Penades. 2003. Sip, an integrase protein with excision, circularization and integration activities, defines a new family of mobile Staphylococcus aureus pathogenicity islands. Mol. Microbiol. 49:193–210. 24 Ruzin, A., J. Lindsay, and R. P. Novick. 2001. Molecular genetics of SaPI1 – a mobile pathogenicity island in Staphylococcus aureus. Mol. Microbiol. 41:365– 377.

References 25 Lindsay, J. A., A. Ruzin, H. F. Ross,

N. Kurepina, and R. P. Novick. 1998. The gene for toxic shock toxin is carried by a family of mobile pathogenicity islands in Staphylococcus aureus. Mol. Microbiol. 29:527–543. 26 Nagy, Z., and M. Chandler. 2004. Regulation of transposition in bacteria. Res. Microbiol. 155:387–398. 27 Feil, E. J., J. E. Cooper, H. Grundmann, D. A. Robinson, M. C. Enright, T. Berendt, S. J. Peacock, J. M. Smith, M. Murphy, B. G. Spratt, C. E. Moore, and N. P. Day. 2003. How clonal is Staphylococcus aureus? J. Bacteriol. 185:3307–3316. 28 Robinson, D. A., and M. C. Enright. 2004. Multilocus sequence typing and the evolution of methicillin-resistant Staphylococcus aureus. Clin. Microbiol. Infect. 10:92–97. 29 Lindsay, J. A., and M. T. Holden. 2004. Staphylococcus aureus: superbug, super genome? Trends Microbiol. 12:378–385. 30 McDougal, L. K., C. D. Steward, G. E. Killgore, J. M. Chaitram, S. K. McAllister, and F. C. Tenover. 2003. Pulsed-field gel electrophoresis typing of oxacillinresistant Staphylococcus aureus isolates from the United States: establishing a national database. J. Clin. Microbiol. 41:5113–5120. 31 Johnson, A. P., H. M. Aucken, S. Cavendish, M. Ganner, M. C. Wale, M. Warner, D. M. Livermore, and B. D. Cookson. 2001. Dominance of EMRSA15 and -16 among MRSA causing nosocomial bacteraemia in the UK: analysis of isolates from the European Antimicrobial Resistance Surveillance System (EARSS). J. Antimicrob. Chemother. 48:143–144. 32 Zalacain, M., S. Biswas, K. A. Ingraham, J. Ambrad, A. Bryant, A. F. Chalker, S. Iordanescu, J. Fan, F. Fan, R. D. Lunsford, K. O’Dwyer, L. M. Palmer, C. So, D. Sylvester, C. Volker, P. Warren, D. McDevitt, J. R. Brown, D. J. Holmes, and M. K. Burnham. 2003. A global approach to identify novel broad-spectrum antibacterial targets among proteins of unknown function. J. Mol. Microbiol. Biotechnol. 6:109–126.

33 Bischoff, M., P. Dunman, J. Kormanec,

34

35

36

37

38

39

40

D. Macapagal, E. Murphy, W. Mounts, B. Berger–Bachi, and S. Projan. 2004. Microarray-based analysis of the Staphylococcus aureus sigmaB regulon. J. Bacteriol. 186:4085–4099. Gertz, S., S. Engelmann, R. Schmid, A. K. Ziebandt, K. Tischer, C. Scharf, J. Hacker, and M. Hecker. 2000. Characterization of the sigma(B) regulon in Staphylococcus aureus. J. Bacteriol. 182:6983–6991. Homerova, D., M. Bischoff, A. Dumolin, and J. Kormanec. 2004. Optimization of a two-plasmid system for the identification of promoters recognized by RNA polymerase containing Staphylococcus aureus alternative sigma factor sigmaB. FEMS Microbiol. Lett. 232:173–179. Ziebandt, A. K., H. Weber, J. Rudolph, R. Schmid, D. Hoper, S. Engelmann, and M. Hecker. 2001. Extracellular proteins of Staphylococcus aureus and the role of SarA and sigma B. Proteomics 1:480–493. Chan, P. F., S. J. Foster, E. Ingham, and M. O. Clements. 1998. The Staphylococcus aureus alternative sigma factor sigmaB controls the environmental stress response but not starvation survival or pathogenicity in a mouse abscess model. J. Bacteriol. 180:6082–6089. Horsburgh, M. J., J. L. Aish, I. J. White, L. Shaw, J. K. Lithgow, and S. J. Foster. 2002. sigmaB modulates virulence determinant expression and stress resistance: characterization of a functional rsbU strain derived from Staphylococcus aureus 8325-4. J. Bacteriol. 184:5457– 5467. Kupferwasser, L. I., M. R. Yeaman, C. C. Nast, D. Kupferwasser, Y. Q. Xiong, M. Palma, A. L. Cheung, and A. S. Bayer. 2003. Salicylic acid attenuates virulence in endovascular infections by targeting global regulatory pathways in Staphylococcus aureus. J. Clin. Invest. 112:222–233. Entenza, J. M., P. Moreillon, M. M. Senn, J. Kormanec, P. M. Dunman, B. Berger-Bachi, S. Projan, and M. Bischoff. 2005. Role of sigmaB in the expression of Staphylococcus aureus cell

205

206

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

41

42

43

44

45

46

47

48

49

wall adhesins ClfA and FnbA and contribution to infectivity in a rat model of experimental endocarditis. Infect. Immun. 73:990–998. Jonsson, I. M., S. Arvidson, S. Foster, and A. Tarkowski. 2004. Sigma factor B and RsbU are required for virulence in Staphylococcus aureus-induced arthritis and sepsis. Infect. Immun. 72:6106–6111. Dubrac, S., and T. Msadek. 2004. Identification of genes controlled by the essential YycG/YycF two-component system of Staphylococcus aureus. J. Bacteriol. 186:1175–1181. Howell, A., S. Dubrac, K. K. Andersen, D. Noone, J. Fert, T. Msadek, and K. Devine. 2003. Genes controlled by the essential YycG/YycF two-component system of Bacillus subtilis revealed through a novel hybrid regulator approach. Mol. Microbiol. 49:1639–1655. Dubrac, S., and T. Msadek. 2004. Identification of genes controlled by the essential YycG/YycF two–component system of Staphylococcus aureus. J. Bacteriol. 186:1175–1181. Novick, R. P. 2003. Autoinduction and signal transduction in the regulation of staphylococcal virulence. Mol. Microbiol. 48:1429–1449. Cheung, A. L., A. S. Bayer, G. Zhang, H. Gresham, and Y. Q. Xiong. 2004. Regulation of virulence determinants in vitro and in vivo in Staphylococcus aureus. FEMS Immunol. Med. Microbiol. 40:1–9. Fournier, B., A. Klier, and G. Rapoport. 2001. The two-component system ArlS– ArlR is a regulator of virulence gene expression in Staphylococcus aureus. Mol. Microbiol. 41:247–261. Rachid, S., K. Ohlsen, U. Wallner, J. Hacker, M. Hecker, and W. Ziebuhr. 2000. Alternative transcription factor sigma(B) is involved in regulation of biofilm expression in a Staphylococcus aureus mucosal isolate. J. Bacteriol. 182:6824–6826. Ziebandt, A. K., D. Becher, K. Ohlsen, J. Hacker, M. Hecker, and S. Engelmann. 2004. The influence of agr and sigmaB in growth phase dependent regulation of virulence factors in Staphylococcus aureus. Proteomics. 4:3034–3047.

50 Hecker, M., S. Engelmann, and S. J.

Cordwell. 2003. Proteomics of Staphylococcus aureus – current state and future challenges. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 787:179–195. 51 Papakyriacou, H., D. Vaz, A. Simor, M. Louie, and M. J. McGavin. 2000. Molecular analysis of the accessory gene regulator (agr) locus and balance of virulence factor expression in epidemic methicillin-resistant Staphylococcus aureus. J. Infect. Dis. 181:990–1000. 52 Booth, M. C., L. M. Pence, P. Mahasreshti, M. C. Callegan, and M. S. Gilmore. 2001. Clonal associations among Staphylococcus aureus isolates from various sites of infection. Infect. Immun. 69:345–352. 53 Joh, D., P. Speziale, S. Gurusiddappa, J. Manor, and M. Hook. 1998. Multiple specificities of the staphylococcal and streptococcal fibronectin-binding microbial surface components recognizing adhesive matrix molecules. Eur. J. Biochem. 258:897–905. 54 Ton–That, H., L. A. Marraffini, and O. Schneewind. 2004. Protein sorting to the cell wall envelope of Gram-positive bacteria. Biochim. Biophys. Acta. 1694:269–278. 55 Mazmanian, S. K., H. Ton–That, and O. Schneewind. 2001. Sortase-catalysed anchoring of surface proteins to the cell wall of Staphylococcus aureus. Mol. Microbiol. 40:1049–1057. 56 Gtz, F. 2002. Staphylococcus and biofilms. Mol. Microbiol. 43:1367–1378. 57 Cho, S. H., K. Naber, J. Hacker, and W. Ziebuhr. 2002. Detection of the icaADBC gene cluster and biofilm formation in Staphylococcus epidermidis isolates from catheter-related urinary tract infections. Int. J. Antimicrob. Agents. 19:570–575. 58 Dobrindt, U., B. Hochhut, U. Hentschel, and J. Hacker. 2004. Genomic islands in pathogenic and environmental microorganisms. Nat. Rev. Microbiol. 2:414– 424. 59 Blum, G., M. Ott, A. Lischewski, A. Ritter, H. Imrich, H. Tschape, and J. Hacker. 1994. Excision of large DNA regions termed pathogenicity islands from tRNA-specific loci in the chromo-

References some of an Escherichia coli wild-type pathogen. Infect. Immun. 62:606–614. 60 Hacker, J., G. Blum-Oehler, I. Muhldorfer, and H. Tschape. 1997. Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol. Microbiol. 23:1089–1097. 61 Yarwood, J. M., J. K. McCormick, M. L. Paustian, P. M. Orwin, V. Kapur, and P. M. Schlievert. 2002. Characterization and expression analysis of Staphylococcus aureus pathogenicity island 3. Implications for the evolution of staphylococcal pathogenicity islands. J. Biol. Chem. 277:13138–13147. 62 De Boer, M. L., and A. W. Chow. 1994. Toxic shock syndrome toxin 1-producing Staphylococcus aureus isolates contain the staphylococcal enterotoxin B genetic element but do not express staphylococcal enterotoxin B. J. Infect. Dis. 170:818–827. 63 Novick, R. P. 2003. Mobile genetic elements and bacterial toxinoses: the superantigen-encoding pathogenicity islands of Staphylococcus aureus. Plasmid 49:93–105. 64 Fitzgerald, J. R., S. D. Reid, E. Ruotsalainen, T. J. Tripp, M. Liu, R. Cole, P. Kuusela, P. M. Schlievert, A. Jarvinen, and J. M. Musser. 2003. Genome diversification in Staphylococcus aureus: molecular evolution of a highly variable chromosomal region encoding the staphylococcal exotoxin-like family of proteins. Infect. Immun. 71:2827–2838. 65 Fitzgerald, J. R., S. R. Monday, T. J. Foster, G. A. Bohach, P. J. Hartigan, W. J. Meaney, and C. J. Smyth. 2001. Characterization of a putative pathogenicity island from bovine Staphylococcus aureus encoding multiple superantigens. J. Bacteriol. 183:63–70. 66 Mongkolrattanothai, K., S. Boyle, T. V. Murphy, and R. S. Daum. 2004. Novel non-mecA-containing staphylococcal chromosomal cassette composite island containing pbp4 and tagFgenes in a commensal staphylococcal species: a possible reservoir for antibiotic resistance islands in Staphylococcus aureus. Antimicrob. Agents Chemother. 48:1823–1836. 67 Hiramatsu, K., L. Cui, M. Kuroda, and T. Ito. 2001. The emergence and evolu-

68

69

70

71

72

73

74

75

tion of methicillin-resistant Staphylococcus aureus. Trends Microbiol. 9:486–493. Ito, T., Y. Katayama, and K. Hiramatsu. 1999. Cloning and nucleotide sequence determination of the entire mec DNA of pre-methicillin-resistant Staphylococcus aureus N315. Antimicrob. Agents Chemother. 43:1449–1458. Ito, T., Y. Katayama, K. Asada, N. Mori, K. Tsutsumimoto, C. Tiensasitorn, and K. Hiramatsu. 2001. Structural comparison of three types of staphylococcal cassette chromosome mec integrated in the chromosome in methicillin-resistant Staphylococcus aureus. Antimicrob. Agents Chemother. 45:1323–1336. Ito, T., K. Okuma, X. X. Ma, H. Yuzawa, and K. Hiramatsu. 2003. Insights on antibiotic resistance of Staphylococcus aureus from its whole genome: genomic island SCC. Drug Resist. Updat. 6:41– 52. Ito, T., X. X. Ma, F. Takeuchi, K. Okuma, H. Yuzawa, and K. Hiramatsu. 2004. Novel type V staphylococcal cassette chromosome mec driven by a novel cassette chromosome recombinase, ccrC. Antimicrob. Agents Chemother. 48:2637–2651. Herold, B. C., L. C. Immergluck, M. C. Maranan, D. S. Lauderdale, R. E. Gaskin, S. Boyle–Vavra, C. D. Leitch, and R. S. Daum. 1998. Communityacquired methicillin-resistant Staphylococcus aureus in children with no identified predisposing risk. JAMA 279:593– 598. Chambers, H. F. 2005. Communityassociated MRSA – resistance and virulence converge. N. Engl. J. Med. 352:1485–1487. Gillet, Y., B. Issartel, P. Vanhems, J. C. Fournet, G. Lina, M. Bes, F. Vandenesch, Y. Piemont, N. Brousse, D. Floret, and J. Etienne. 2002. Association between Staphylococcus aureus strains carrying gene for Panton–Valentine leukocidin and highly lethal necrotising pneumonia in young immunocompetent patients. Lancet 359:753–759. Narita, S., J. Kaneko, J. Chiba, Y. Piemont, S. Jarraud, J. Etienne, and Y. Kamio. 2001. Phage conversion of Panton–Valentine leukocidin in Staphy-

207

208

9 Pathogenic Staphylococci: Lessons from Comparative Genomics lococcus aureus: molecular analysis of a PVL-converting phage, phiSLT. Gene 268:195–206. 76 Enright, M. C., D. A. Robinson, G. Randle, E. J. Feil, H. Grundmann, and B. G. Spratt. 2002. The evolutionary history of methicillin-resistant Staphylococcus aureus (MRSA). Proc. Natl. Acad. Sci. U. S. A. 99:7687–7692. 77 Katayama, Y., D. A. Robinson, M. C. Enright, and H. F. Chambers. 2005. Genetic background affects stability of mecA in Staphylococcus aureus. J. Clin. Microbiol. 43:2380–2383. 78 Ender, M., N. McCallum, R. Adhikari, and B. Berger-Bachi. 2004. Fitness cost of SCCmec and methicillin resistance levels in Staphylococcus aureus. Antimicrob. Agents Chemother. 48:2295–2297. 79 Kwan, T., J. Liu, M. DuBow, P. Gros, and J. Pelletier. 2005. The complete genomes and proteomes of 27 Staphylococcus aureus bacteriophages. Proc. Natl. Acad. Sci. U. S. A. 102:5174–5179. 80 Betley, M. J., and J. J. Mekalanos. 1988. Nucleotide sequence of the type A staphylococcal enterotoxin gene. J. Bacteriol. 170:34–41. 81 Kaneko, J., T. Kimura, S. Narita, T. Tomita, and Y. Kamio. 1998. Complete nucleotide sequence and molecular characterization of the temperate staphylococcal bacteriophage phiPVL carrying Panton–Valentine leukocidin genes. Gene 215:57–67. 82 Yamaguchi, T., T. Hayashi, H. Takami, K. Nakasone, M. Ohnishi, K. Nakayama, S. Yamada, H. Komatsuzawa, and M. Sugai. 2000. Phage conversion of exfoliative toxin A production in Staphylococcus aureus. Mol. Microbiol. 38:694– 705. 83 Iandolo, J. J., V. Worrell, K. H. Groicher, Y. Qian, R. Tian, S. Kenton, A. Dorman, H. Ji, S. Lin, P. Loh, S. Qi, H. Zhu, and B. A. Roe. 2002. Comparative analysis of the genomes of the temperate bacteriophages phi 11, phi 12 and phi 13 of Staphylococcus aureus 8325. Gene 289:109–118. 84 Betley, M. J., and J. J. Mekalanos. 1985. Staphylococcal enterotoxin A is encoded by phage. Science 229:185–187.

85 Coleman, D. C., J. P. Arbuthnott, H. M.

Pomeroy, and T. H. Birkbeck. 1986. Cloning and expression in Escherichia coli and Staphylococcus aureus of the beta-lysin determinant from Staphylococcus aureus: evidence that bacteriophage conversion of beta-lysin activity is caused by insertional inactivation of the beta-lysin determinant. Microb. Pathog. 1:549–564. 86 Lee, C. Y., and J. J. Iandolo. 1986. Lysogenic conversion of staphylococcal lipase is caused by insertion of the bacteriophage L54a genome into the lipase structural gene. J. Bacteriol. 166:385– 391. 87 Smeltzer, M. S., M. E. Hart, and J. J. Iandolo. 1994. The effect of lysogeny on the genomic organization of Staphylococcus aureus. Gene 138:51–57. 88 Coleman, D. C., D. J. Sullivan, R. J. Russell, J. P. Arbuthnott, B. F. Carey, and H. M. Pomeroy. 1989. Staphylococcus aureus bacteriophages mediating the simultaneous lysogenic conversion of beta-lysin, staphylokinase and enterotoxin A: molecular mechanism of triple conversion. J. Gen. Microbiol. 135:1679–1697. 89 Novick, R. P. 1989. Staphylococcal plasmids and their replication. Annu. Rev. Microbiol. 43:537–565. 90 Westh, H., D. M. Hougaard, J. Vuust, and V. T. Rosdahl. 1995. Prevalence of erm gene classes in erythromycin-resistant Staphylococcus aureus strains isolated between 1959 and 1988. Antimicrob. Agents Chemother. 39:369–373. 91 Shalita, Z., E. Murphy, and R. P. Novick. 1980. Penicillinase plasmids of Staphylococcus aureus: structural and evolutionary relationships. Plasmid 3:291–311. 92 Lyon, B. R., and R. Skurray. 1987. Antimicrobial resistance of Staphylococcus aureus: genetic basis. Microbiol. Rev. 51:88–134. 93 Firth, N., S. Apisiridej, T. Berg, B. A. O’Rourke, S. Curnock, K. G. Dyke, and R. A. Skurray. 2000. Replication of staphylococcal multiresistance plasmids. J. Bacteriol. 182:2170–2178. 94 Townsend, D. E., S. Bolton, N. Ashdown, and W. B. Grubb. 1985. Transfer of plasmid-borne aminoglycoside-resis-

References tance determinants in staphylococci. J. Med. Microbiol. 20:169–185. 95 Archer, G. L., J. P. Coughter, and J. L. Johnston. 1986. Plasmid-encoded trimethoprim resistance in staphylococci. Antimicrob. Agents Chemother. 29:733–740. 96 Berg, T., N. Firth, S. Apisiridej, A. Hettiaratchi, A. Leelaporn, and R. A. Skurray. 1998. Complete nucleotide sequence of pSK41: evolution of staphylococcal conjugative multiresistance plasmids. J. Bacteriol. 180:4350–4359. 97 Lyon, B. R., J. W. May, and R. A. Skurray. 1984. Tn4001: a gentamicin and kanamycin resistance transposon in Staphylococcus aureus. Mol. Gen. Genet. 193:554–556. 98 Zhang, S., J. J. Iandolo, and G. C. Stewart. 1998. The enterotoxin D plasmid of Staphylococcus aureus encodes a second enterotoxin determinant (sej). FEMS Microbiol. Lett. 168:227–233. 99 Bayles, K. W., and J. J. Iandolo. 1989. Genetic and molecular analyses of the gene encoding staphylococcal enterotoxin D. J. Bacteriol. 171:4799–4806. 100 Yamaguchi, T., T. Hayashi, H. Takami, M. Ohnishi, T. Murata, K. Nakayama, K. Asakawa, M. Ohara, H. Komatsuzawa, and M. Sugai. 2001. Complete nucleotide sequence of a Staphylococcus aureus exfoliative toxin B plasmid and identification of a novel ADP-ribosyltransferase, EDIN-C. Infect. Immun. 69:7760–7771. 101 Thumm, G., and F. Gotz. 1997. Studies on prolysostaphin processing and characterization of the lysostaphin immunity factor (Lif) of Staphylococcus simulans biovar staphylolyticus. Mol. Microbiol. 23:1251–1265. 102 Zhang, Y. Q., S. X. Ren, H. L. Li, Y. X. Wang, G. Fu, J. Yang, Z. Q. Qin, Y. G. Miao, W. Y. Wang, R. S. Chen, Y. Shen, Z. Chen, Z. H. Yuan, G. P. Zhao, D. Qu, A. Danchin, and Y. M. Wen. 2003. Genome-based analysis of virulence genes in a non-biofilm-forming Staphylococcus epidermidis strain (ATCC 12228). Mol. Microbiol. 49:1577– 1593. 103 Vuong, C., M. Durr, A. B. Carmody, A. Peschel, S. J. Klebanoff, and M. Otto.

104

105

106

107

108

109

110

2004. Regulated expression of pathogen-associated molecular pattern molecules in Staphylococcus epidermidis: quorum-sensing determines pro-inflammatory capacity and production of phenol-soluble modulins. Cell Microbiol. 6:753–759. Lazarevic, V., B. Soldo, A. Dusterhoft, H. Hilbert, C. Mauel, and D. Karamata. 1998. Introns and intein coding sequence in the ribonucleotide reductase genes of Bacillus subtilis temperate bacteriophage SPbeta. Proc. Natl. Acad. Sci. U. S. A. 95:1692–1697. Kozitskaya, S., S. H. Cho, K. Dietrich, R. Marre, K. Naber, and W. Ziebuhr. 2004. The bacterial insertion sequence element IS256 occurs preferentially in nosocomial Staphylococcus epidermidis isolates: association with biofilm formation and resistance to aminoglycosides. Infect. Immun. 72:1210–1215. Wisplinghoff, H., A. E. Rosato, M. C. Enright, M. Noto, W. Craig, and G. L. Archer. 2003. Related clones containing SCCmec type IV predominate among clinically significant Staphylococcus epidermidis isolates. Antimicrob. Agents Chemother. 47:3574–3579. Wielders, C. L., M. R. Vriens, S. Brisse, L. A. de Graaf-Miltenburg, A. Troelstra, A. Fleer, F. J. Schmitz, J. Verhoef, and A. C. Fluit. 2001. In-vivo transfer of mecA DNA to Staphylococcus aureus [corrected]. Lancet. 357:1674–1675. Hanssen, A. M., G. Kjeldsen, and J. U. Sollid. 2004. Local variants of staphylococcal cassette chromosome mec in sporadic methicillin-resistant Staphylococcus aureus and methicillin-resistant coagulase-negative staphylococci: evidence of horizontal gene transfer? Antimicrob. Agents Chemother. 48:285–296. Rohde, H., C. Burdelski, K. Bartscht, M. Hussain, F. Buck, M. A. Horstkotte, J. K. Knobloch, C. Heilmann, M. Herrmann, and D. Mack. 2005. Induction of Staphylococcus epidermidis biofilm formation via proteolytic processing of the accumulation-associated protein by staphylococcal and host proteases. Mol. Microbiol. 55:1883–1895. Hussain, M., M. Herrmann, C. von Eiff, F. Perdreau-Remington, and G. Peters.

209

210

9 Pathogenic Staphylococci: Lessons from Comparative Genomics

111

112

113

114

115

116

117

118

1997. A 140-kilodalton extracellular protein is essential for the accumulation of Staphylococcus epidermidis strains on surfaces. Infect. Immun. 65:519–524. Cucarella, C., C. Solano, J. Valle, B. Amorena, I. Lasa, and J. R. Penades. 2001. Bap, a Staphylococcus aureus surface protein involved in biofilm formation. J. Bacteriol. 183:2888–2896. Heilmann, C., G. Thumm, G. S. Chhatwal, J. Hartleib, A. Uekotter, and G. Peters. 2003. Identification and characterization of a novel autolysin (Aae) with adhesive properties from Staphylococcus epidermidis. Microbiology 149:2769–2778. Heilmann, C., M. Hussain, G. Peters, and F. Gotz. 1997. Evidence for autolysin-mediated primary attachment of Staphylococcus epidermidis to a polystyrene surface. Mol. Microbiol. 24:1013– 1024. Heilmann, C., O. Schweitzer, C. Gerke, N. Vanittanakom, D. Mack, and F. Gotz. 1996. Molecular basis of intercellular adhesion in the biofilm-forming Staphylococcus epidermidis. Mol. Microbiol. 20:1083–1091. Galdbart, J. O., J. Allignet, H. S. Tung, C. Ryden, and N. El Solh. 2000. Screening for Staphylococcus epidermidis markers discriminating between skin–flora strains and those responsible for infections of joint prostheses. J. Infect. Dis. 182:351–355. Ziebuhr, W., C. Heilmann, F. Gotz, P. Meyer, K. Wilms, E. Straube, and J. Hacker. 1997. Detection of the intercellular adhesion gene cluster (ica) and phase variation in Staphylococcus epidermidis blood culture strains and mucosal isolates. Infect. Immun. 65:890–896. Frebourg, N. B., S. Lefebvre, S. Baert, and J. F. Lemeland. 2000. PCR-based assay for discrimination between invasive and contaminating Staphylococcus epidermidis strains. J. Clin. Microbiol. 38:877–880. Moretro, T., L. Hermansen, A. L. Holck, M. S. Sidhu, K. Rudi, and S. Langsrud. 2003. Biofilm formation and the presence of the intercellular adhesion locus ica among staphylococci from food and food processing environments. Appl. Environ. Microbiol. 69:5648–5655.

119 Wang, X., J. F. Preston, 3rd, and

T. Romeo. 2004. The pgaABCD locus of Escherichia coli promotes the synthesis of a polysaccharide adhesin required for biofilm formation. J. Bacteriol. 186:2724–2734. 120 Dyke, K. G., S. Aubert, and N. el Solh. 1992. Multiple copies of IS256 in staphylococci. Plasmid 28:235–246. 121 Byrne, M. E., D. A. Rouch, and R. A. Skurray. 1989. Nucleotide sequence analysis of IS256 from the Staphylococcus aureus gentamicin–tobramycin– kanamycin-resistance transposon Tn4001. Gene 81:361–367. 122 Rouch, D. A., M. E. Byrne, Y. C. Kong, and R. A. Skurray. 1987. The aacA– aphD gentamicin and kanamycin resistance determinant of Tn4001 from Staphylococcus aureus: expression and nucleotide sequence analysis. J. Gen. Microbiol. 133:3039–3052. 123 Ziebuhr, W., V. Krimmer, S. Rachid, I. Lossner, F. Gtz, and J. Hacker. 1999. A novel mechanism of phase variation of virulence in Staphylococcus epidermidis: evidence for control of the polysaccharide intercellular adhesin synthesis by alternating insertion and excision of the insertion sequence element IS256. Mol. Microbiol. 32:345–356. 124 Ziebuhr, W., K. Dietrich, M. Trautmann, and M. Wilhelm. 2000. Chromosomal rearrangements affecting biofilm production and antibiotic resistance in a Staphylococcus epidermidis strain causing shunt-associated ventriculitis. Int. J. Med. Microbiol. 290:115–120. 125 Maki, H., N. McCallum, M. Bischoff, A. Wada, and B. Berger-Bachi. 2004. tcaA inactivation increases glycopeptide resistance in Staphylococcus aureus. Antimicrob. Agents Chemother. 48:1953– 1959. 126 Couto, I., I. S. Sanches, R. Sa–Leao, and H. de Lencastre. 2000. Molecular characterization of Staphylococcus sciuri strains isolated from humans. J. Clin. Microbiol. 38:1136–1143. 127 Maki, H., and K. Murakami. 1997. Formation of potent hybrid promoters of the mutant llm gene by IS256 transposition in methicillin-resistant Staphylococcus aureus. J. Bacteriol. 179:6944–6948.

211

10 Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases Alexander S. Pym, Stephen V. Gordon, and Roland Brosch

10.1 Introduction

Tuberculosis remains a serious threat to human health in spite of the existence of effective drug regimens and vaccine strategies. Mycobacterium tuberculosis, the etiological agent of tuberculosis, infects approximately one-third of the world’s population and kills one person every 15 s (Dye et al. 1999). The situation is deteriorating in many countries for various reasons including the emergence of M. tuberculosis strains that have become multidrug-resistant. There is therefore an urgent need for new vaccines and new therapeutic agents which are effective against multidrug-resistant strains. It was against this background that the genome sequencing project of the paradigm strain for tuberculosis research, M. tuberculosis H37Rv, was initiated (Cole et al. 1998). Sequence analysis of this strain revealed the existence of approx. 4000 genes in a 4.4-Mbp genome. Recent advances in mycobacterial genetics (Pelicic et al. 1997; Smith et al. 2001; Braunstein et al. 2002) mean that these genes may now be systematically tested for their contribution to virulence, drug resistance, immune modulation, and other fundamental aspects of pathogenicity. In addition, the genome sequences of other strains of the M. tuberculosis complex have recently been completed (Fleischmann et al. 2002; Garnier et al. 2003) or are in the finishing phase. Using these sequences as the basis for comparative genomics, a number of variable genomic regions have been identified in the subspecies of the M. tuberculosis complex, some of which are implicated in virulence. The insights from these studies are important for understanding the genetic basis of phenotypic differences that exist between various M. tuberculosis strains and/or across other members of the M. tuberculosis complex. However, tubercle bacilli are not the only major mycobacterial pathogens that have been explored by such analyses. The genome sequences of Mycobacterium leprae and Mycobacterium ulcerans, the etiological agents of leprosy and Buruli ulcer, respectively, have been obtained (Cole et al. 2001) or are nearing completion (http://genopole.pasteur.fr/Mulc/BuruList.html). Genome sequencing of environmental saprophytes and opportunistic human pathogens, such as Mycobacterium

212

10 Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases

smegmatis, Mycobacterium marinum, and members of the Mycobacterium avium complex, have also been undertaken (http://www.genomesonline.org). Extensive exploration of all mycobacterial genome sequences is beyond the scope of this chapter, so instead we will focus on the salient insights we have gained from genomic analysis on the agents of tuberculosis, leprosy, and Buruli ulcer.

10.2 Molecular Basis of Pathogenicity

M. tuberculosis uses the respiratory tract as the principal portal of entry, where it travels deep into the alveoli and is engulfed by alveolar macrophages. While these immune cells have the capability to destroy most potential invaders, M. tuberculosis can persist and even replicate in this extremely hostile intracellular environment (Russell, 2001). The outcome of infection ultimately depends on the induced T-cell response (Kaufmann, 2001). This immune response is usually only capable of containing the infection and rarely sterilizes the lungs, allowing the bacteria to establish what is known as a “latent” infection. The physiological and anatomical nature of latent or dormant infection is currently unknown. Most latent infections are thought to be perpetual, though in at least 5% of cases (much greater in the presence of HIV coinfection) the apparently dormant bacteria eventually exploit lapses in host immunity to multiply and establish overt disease. The resulting proliferation of bacteria in the lungs produces aerosol transmission and infection of new cases. Thus, although M. tuberculosis is highly pathogenic with the capacity to cause the severe pulmonary disease required for transmission, it can also exist in a quasicommensal state with its host, permitting the establishment of a large reservoir of dormantly infected individuals. These varied clinical states suggest that the molecular basis of pathogenicity of M. tuberculosis involves a complex interplay between bacterial and human factors. A key interface between M. tuberculosis and its host is the macrophage. One of the major features of M. tuberculosis in this respect is its capacity to block the acidification of the phagosome and consequently to disable or retard phagosome–lysosome fusion in phagocytic cells (Goren et al. 1976; Anes et al. 2003). Many aspects of this phenomenon remain unclear, but studies have started to shed some light on the molecular mechanisms involved. Complex lipids from the mycobacterial cell wall are thought to be important for blocking the normal biogenesis of the phagolysosome. Recently the SapM lipid phosphatase of M. tuberculosis, via hydrolysis of phosphatidylinositol 3-phosphate, has been proposed as a key protein in this process (Vergne et al. 2005). A eukaryotic-like serine–threonine protein kinase (PknG) also appears essential for the manipulation of membrane trafficking and intracellular survival (Walburger et al. 2004). Several approaches for gene inactivation have been developed to enable the identification of the genetic basis of other aspects of pathogenicity. In one of them, a vector with a thermosensitive origin of replication derived from plasmid pAL5000 of Mycobacterium fortuitum has been used together with the gene sacB to

10.2 Molecular Basis of Pathogenicity

select the rare genetic event of double cross-over recombination in M. tuberculosis (Pelicic et al. 1997). For example, this approach was used to inactivate gene phoP, part of a two-component system of M. tuberculosis that did not affect intracellular survival but impaired growth in mouse lungs (Perez et al. 2001). The same vector (ts/sacB) together with marked transposons also allowed signature-tagged mutagenesis (STM) to be applied to M. tuberculosis, thereby identifying transposon mutants of M. tuberculosis that could not survive an in vivo mouse passage. In one study four transposon insertions were identified in a 50-kbp region, which groups 13 genes required for the synthesis or transport of phthiocerol dimycoserosates (Camacho et al. 1999). The absence of these complex lipids affects the permeability and structure of the mycobacterial cell wall as well as the virulence of the bacteria (Camacho et al. 1999). A similar approach based on mycobacteriophages (Braunstein et al. 2002) confirmed the importance of phthiocerol dimycoserosate synthesis and transport for the virulence of M. tuberculosis (Cox et al. 1999). The use of these genetic techniques in conjunction with animal models of tuberculosis (usually the mouse) has produced a lengthening list of diverse genes implicated in pathogenesis. However, in many cases the precise mechanism by which these genes act has yet to be defined. Hingley-Wilson and colleagues have tried to categorize various M. tuberculosis mutants generated by different research groups by their in vivo growth characteristics, which are shown in Table 10.1 (Hingley-Wilson et al. 2003). The strains listed there include (a) severe growth in vivo (sgiv) mutants, which show a very severe reduction in colony-forming units with time; (b) growth in vivo (giv) mutants, which do not grow as robustly as wildtype M. tuberculosis in the lungs of immunocompetent mice, yet still grow better than sgiv mutants; (c) persistence (per) mutants, which fail to grow or persist after the onset of acquired immunity; and (d) mutants that have the same growth characteristics as per mutants, but show altered pathology (pat) compared with that of wild-type M. tuberculosis. Whilst this may be a useful framework for understanding the mouse model of tuberculosis, it is unclear how these phenotypes relate to human tuberculosis. A different method for identifying genes implicated in the infection process was developed by Sassetti et al., who adapted the mariner transposon for use in M. tuberculosis and named the method “transposon site hybridization” or TraSH (Sassetti et al. 2001). The use of high-density mutagenesis allowed them to mutate virtually every gene of M. tuberculosis and determine the effect of gene disruption on the growth rate during infection. Microarray hybridization served as the readout for the presence or absence of individual transposon mutants in the pool of bacteria after mouse passage. Comparing the results of mutant pools passaged in mice with pools grown in artificial media, they identified 194 genes that seemed to be specifically required for mycobacterial growth in vivo (Sassetti and Rubin, 2003). According to their data, approximately 5% of the genes present in the genome of M. tuberculosis code for proteins that are required for virulence in the mouse. Interestingly, a high proportion of these proteins are only found in closely related species, suggesting mycobacteria have acquired a specific repertoire of virulence determinants distinct from those identified in other bacterial pathogens.

213

214

10 Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases Table 10.1 Classification of the described M. tuberculosis mutants

(after Hingley-Wilson et al. 2003). Class

M. tuberculosis mutant

sgiv

leuD

Hondalus et al. 2000

Leucine synthesis

lysA

Pavelka et al. 2003

Lysine synthesis

mgt

Buchmeier et al. 2000

Magnesium transport

proC

Smith et al. 2001

Proline synthesis

trpD

Smith et al. 2001

Tryptophan synthesis

purC

Jackson et al.1999

Purine synthesis

erp

Berthet et al. 1998a

Exported repetitive protein

phoP

Perez et al. 2001

Two-component regulatory protein

secA2

Braunstein et al. 2003

Accessory secretion factor

fadD28, mmpL7

Camacho et al. 1999

PDIM synthesis and transport

fadD28, mmpL7

Cox et al. 1999

PDIM synthesis and transport

glnA1

Tullius et al. 2003

Glutamine synthetase

panCD

Sambandamurthy et al. 2002

Pantothenate synthesis

pcaA

Glickman et al. 2000

Cyclopropanation of a-mycolates

icl

McKinney et al. 2000

Isocitrate lyase

plcABCD

Raynaud et al. 2002

Phospholipase C

relMTB

Dahl et al. 2003

(p)ppGpp synthesis and hydrolysis

dnaE2

Boshoff et al. 2003

DNA polymerase

hsp70

Stewart et al. 2001

Hsp70

sigH

Kaushal et al. 2002

Sigma factor H

rpoV(sigA)/whiB3

Steyn et al. 2002

Sigma factor A and transcription factor

hbhA

Pethe et al. 2001

Heparin-binding hemagglutinin

giv

per

pat

dis

Gene function

PDIM, phthiocerol dimycocerosate.

10.2 Molecular Basis of Pathogenicity

Together with results from previous studies identifying virulence genes in M. tuberculosis, this genome-wide approach allows an overview of the genetic “virulence potential” of the tubercle bacilli to be established. These mutants can now be verified by gene knockout strategies followed by in vivo survival tests, and experiments to determine their physiological function. With this approach the same authors identified approximately 700 genes that are essential for growth of M. tuberculosis in vitro (Sassetti et al. 2003). Interestingly, the great majority of these genes were conserved in the degenerate genome of the leprosy bacillus, M. leprae, indicating that nonessential functions have been selectively lost since M. leprae diverged from other mycobacteria (see page 222). Another strategy to identify genes that are involved in virulence is comparative genomics of closely related organisms that show different phenotypes. Such analyses revealed the existence of several variable genomic regions among the members of the M. tuberculosis complex. This complex, which is defined as a single species by DNA–DNA hybridization studies (Imaeda 1985), with exceptionally little sequence variation (Sreevatsan et al. 1997), comprises M. tuberculosis together with Mycobacterium canettii, Mycobacterium africanum, Mycobacterium bovis, and Mycobacterium microti. The subspecies can only be distinguished by a limited number of phenotypic or, more recently, genotypic characteristics, but differ remarkably with respect to their host range and pathogenicity. M. microti, for example, is almost exclusively a rodent pathogen, whereas M. bovis infects a wide variety of mammalian species, including humans. BCG (bacille Calmette– Gurin), a laboratory-attenuated variant of M. bovis, has been used extensively since the 1920s as a vaccine against human tuberculosis. Hybridization of M. bovis BCG genomic DNA with the genome of M. tuberculosis H37Rv, represented on either a spotted microarray (Behr et al. 1999; Salamon et al. 2000) or on bacterial artificial chromosome (BAC) arrays (Gordon et al. 1999), was able to identify up to 18 deletions in the M. bovis BCG genome relative to M. tuberculosis, ranging in size from 0.3 to 12.7 kbp, extending previous subtractive hybridization studies (Mahairas et al. 1996). However, specific polymerase chain reaction (PCR) and sequence analyses of these regions of difference (RD) among members of the M. tuberculosis complex showed that most of the RDs absent from BCG were also missing from other strains of M. bovis, indicating that some of these variable regions reflect the evolutionary divergence of M. tuberculosis and M. bovis rather than genomic modifications that were introduced during the attenuation process of BCG (Brosch et al. 2002). This observation gave rise to the construction of an evolutionary scheme which is discussed further below. One region (RD1) of 10.7 kbp in size was originally identified by subtractive hybridization as being absent from BCG, but present in wild-type M. bovis strains (Mahairas et al. 1996). As preliminary complementation experiments of BCG with the RD1 region from M. bovis did not result in increased virulence of the recombinant BCG strain in mice (Mahairas et al. 1996), the hypothesis that RD1 was involved in the attenuation of BCG remained unconfirmed for several years. The first experimental evidence that the loss of the RD1 locus did contribute to the attenuation of BCG was obtained only recently. A recombinant BCG::RD1, that

215

216

10 Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases

contained the RD1 region and also large portions of its flanking regions, was more virulent in severe combined immune deficient (SCID) mice than the BCGvector control strain. The same recombinant BCG strain was found to persist to a greater degree in the organs of immunocompetent mice, but induced considerably less pathology than M. tuberculosis H37Rv (Pym et al. 2002). Since the complementation of RD1 only partially restored virulence, these data suggested that the RD1 deletion in BCG may represent only one step of a multifactorial attenuation cascade in BCG, either during the isolation of BCG (1908–1919) or during subsequent passage of BCG Pasteur (1921–1961). Notable for a live attenuated vaccine is the fact that the RD1 region contains the esxAB genes encoding ESAT-6 and CFP10, two strongly immunogenic proteins which are found in early culture filtrates of M. tuberculosis but not in BCG cultures (Sorensen et al. 1995; Berthet et al. 1998b). Similarly, the vole bacillus M. microti, which was used as an anti-tuberculosis vaccine in the 1960s in Great Britain and Czechoslovakia (Sula and Radkovsky 1976; Hart and Sutherland 1977), also fails to make the ESAT-6 and CFP-10 proteins, owing to a 14-kbp deletion in the RD1 region (Brodin et al. 2002). This means that all vaccine strains that have been employed on a large scale for the prevention of human tuberculosis were missing the important T-cell antigens ESAT-6 and CFP-10. This raised the question as to whether reintegration of these antigens into modified BCG vaccines could enhance their protective efficacy. The first experimental evidence to address this question has been recently obtained (Pym et al. 2003). Recombinant BCG vaccines expressing ESAT-6 and CFP-10 were found to outperform wild-type BCG in mouse and the guinea pig models of tuberculosis, although the protective advantage was restricted to the animals spleens. Similar results were obtained for RD1complemented M. microti strains (Brodin et al. 2004). Importantly, the improved protection was dependent on appropriate immune recognition of ESAT-6 and CFP-10, which was only seen when the two antigens were exported via a dedicated secretion apparatus, encoded for by the genes adjacent to esxAB in the RD1 locus (Pym et al. 2003). The RD1 region therefore represents one of the most interesting genomic regions of the tubercle bacilli, as it seems to be simultaneously involved in enhanced virulence in the immunocompromised host, but also in improved protection in the immunocompetent host. Complementary approaches by others have confirmed the involvement of the RD1 region in the virulence of M. tuberculosis (Hsu et al. 2003; Lewis et al. 2003; Stanley et al. 2003; Sassetti and Rubin, 2003; Guinn et al. 2004).

10.3 Evolution of the M. tuberculosis Complex

As mentioned above, the M. tuberculosis complex represents a genetically homogeneous group of bacteria that cause tuberculosis in various mammalian species. The members of the complex share greater than 99.9% identity at the DNA level. However, some particular phenotypic characteristics, including different host pre-

10.3 Evolution of the M. tuberculosis Complex

ferences, have led researchers to retain the traditional species names of these bacteria. In a recent study, based on the presence or absence of RD regions, the evolutionary pathway of the M. tuberculosis complex has been redefined (Brosch et al. 2002). For example, region RD9 was found to be present in all M. tuberculosis and M. canettii strains, whereas this 2-kbp genomic segment was absent from M. africanum, M. microti, M. bovis, and M. bovis BCG. The segment is predicted to encode two complete genes, Rv2073c and Rv2074, as well as the 5¢ end of cobL. The interruption of cobL indicates that the RD9 polymorphism is due to the deletion of a 2-kbp fragment from the common ancestor of M. africanum, M. microti, M. bovis, and M. bovis BCG rather than the insertion of these genes into M. tuberculosis. This was confirmed by sequence analysis of the interruption sites of cobL in these species, which showed that the junction sequences of the RD9 region were identical. These findings were of particular importance for the interpretation of the evolution for the members of the M. tuberculosis complex, as they allowed the definition of a direction to the evolutionary processes that have shaped the various lineages (Brosch et al. 2001). As the deletions have interrupted genes that are still intact in M. tuberculosis and M. canettii, it was possible to propose a new evolutionary scheme for the members of the M. tuberculosis complex. This scheme, which was based on the presence or absence of 20 variable regions in a representative set of 100 strains from the M. tuberculosis complex (Brosch et al. 2002), was confirmed in a separate study that also employed RD markers on a different set of strains (Mostowy et al. 2002), and has been complemented by data on selected single nucleotide polymorphism and microdeletions (Marmiesse et al. 2004). Moreover, a study employing multilocus sequence typing found similar phylogenetic relationships among strains of the M. tuberculosis complex to that proposed by deletion-based analysis (Gutacker et al. 2002). Altogether these studies suggest that, contrary to previous ideas (Stead et al. 1995), the agent of bovine tuberculosis, M. bovis, is not the ancestor of the human tuberculosis agent M. tuberculosis. From these genomic studies it appears that the common ancestor of the M. tuberculosis complex resembled M. tuberculosis or M. canettii more than M. bovis. Final confirmation that the genome of M. bovis has indeed undergone several deletion processes relative to M. tuberculosis came from the complete genome sequence of M. bovis (Garnier et al. 2003), showing that the genome of M. bovis AF2122/97 is 66 kbp smaller than the genome of M. tuberculosis H37Rv and with one exception (M. tuberculosis specific deleted region 1 – TbD1) does not contain any “extra” DNA to that present in M. tuberculosis strains. This genetic approach for differentiation can now be used to replace the often confusing traditional division of the members of the M. tuber-

Fig. 10.1 Use of the evolutionary scheme (after Brosch et al. 2002) for the rapid differentiation of the members of the M. tuberculosis complex. (A) Identification of “modern” M. tuberculosis strains, which represent the large majority of clinical isolates;

(B) identification of members of the " M. africanum–M. bovis lineage; (C) identification of classical M. bovis strains, which represent the large majority of cattle isolates; (D) identification of attenuated strains.

217

218

10 Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases

10.3 Evolution of the M. tuberculosis Complex

219

220

10 Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases

culosis complex into rigidly defined subspecies and for rapid identification of clinical specimens (Fig. 10.1). A molecular study on a collection of tuberculosis clinical isolates from East Africa that showed a smooth, M. canettii-like colony morphology (van Soolingen et al. 1997) revealed that this geographically restricted population had much greater genetic variability than did other extant M. tuberculosis complex strains, suggesting that they represent ancestral lineages of the complex. These findings are consistent with an early emergence of tubercle bacilli in East Africa, perhaps contemporaneously with early hominids three to six million years ago. The present global M. tuberculosis population may therefore represent the clonal expansion of a small subset of smooth tubercle bacilli along the waves of human migration out of Africa (Gutierrez et al. 2005).

10.4 Some Metabolic Insight from the Genome Sequences

The mycobacterial cell wall has been the target of in-depth studies for many decades, revealing a complex repertoire of unusual lipids that give the structure its unique architecture. It was therefore to be expected that the mycobacterial genome would encode a sophisticated machinery for lipid metabolism; however, it was still unexpected that over 9% of the genome coding capacity of H37Rv would be dedicated to lipid metabolism. A striking level of redundancy in lipid metabolic genes was apparent in the M. tuberculosis H37Rv genome, with 36 fadD alleles encoding acyl-CoA synthase, 36 fadE genes encoding acyl-CoA dehydrogenase, and 21 echA genes for enoyl-CoA hydratase/isomerase (Cole et al. 1998). The apparent redundancy in lipid enzymes may, however, need to be readdressed in the light of recent work on the fadD genes. Gokhale and colleagues have shown that some of the fadD alleles do not encode fatty acyl-CoA ligases, but instead code for a new class of fatty acyl-AMP ligases that are linked to a proximal pks gene encoding a unique polyketide synthase (Trivedi et al. 2004). It is therefore possible that the apparent redundancy in lipid enzymes hides novel enzyme activities. The role of lipids in the pathogenesis of M. tuberculosis is illustrated by some recent studies by Guilhot and others. They showed that several well-characterized M. tuberculosis strains (H37Rv, Erdman, CDC1551, and MT103) lack a particular phenolic glycolipid (PGL) that is produced by M. tuberculosis 210 (Beijing type), and they linked this observation to a deletion of 7 bp that introduces a frameshift in the pks15/1 gene (Constant et al. 2002). When a large number of strains were investigated (Marmiesse et al. 2004), it was noted that the 7-bp deletion was characteristic of the majority of M. tuberculosis clinical isolates, i.e., strains belonging to principal genetic group 2 or 3, as defined by Sreevatsan and colleagues (Sreevatsan et al. 1997). The importance of this particular PGL in the host–pathogen interaction was proposed by Barry and colleagues, who found a correlation between synthesis of this PGL molecule and a decrease in the production of proinflamma-

10.4 Some Metabolic Insight from the Genome Sequences

tory cytokines by host immune cells (Reed et al. 2004). This finding could explain in part the hypervirulent phenotype observed for strains from the Beijing family (Manca et al. 2001; Lopez et al. 2003). Interestingly, in M. bovis AF2122/97 and M. bovis BCG, a 6-bp deletion was observed at the same locus of the pks15/1 gene. As M. bovis and M. bovis BCG both produce PGL, it seems likely that this 6-bp deletion, which does not cause a frameshift in the pks15/1 gene, does not influence the enzymatic activity of the resulting gene product for the synthesis of the particular PGL (Constant et al. 2002). However, due to mutations in a neighboring gene, encoding a glycosyl transferase, phenolic glycolipids produced by M. bovis and BCG are characterized by a shorter sugar moiety (Perez et al. 2004). Research on mycobacterial lipids has been greatly assisted by the availability of the M. tuberculosis genome sequence, and has also opened up new avenues to identify the underlying genetic basis of their metabolism (Cole et al. 1998; Trivedi et al. 2005). Some of the very specialized enzymes involved in this process, encoded by genes that are unique and essential for M. tuberculosis, also represent important potential drug targets, and great efforts are being undertaken to determine the crystal structures of such proteins. This work is part of structural genomics initiatives, which have become very important research domains with the aim of identifying new antituberculous drugs (Smith and Sacchettini, 2003) (http://www.pasteur.fr/ recherche/X-TB/). Proof that such approaches are feasible is the identification of a new diarylquinoline (R207910) that potently inhibits both drug-sensitive and drug-resistant M. tuberculosis in vitro and in mice. Results from resistant mutants suggest that the diarylquinoline molecule targets the proton pump of adenosine triphosphate (ATP) synthase in M. tuberculosis (Andries et al. 2005). Although the tubercle bacilli are classified as aerobic organisms, the genomic data revealed the potential for microaerophilic and anaerobic respiration. An operon, narGHJI, encodes a nitrate reductase that allows utilization of nitrate as a terminal electron acceptor. Investigating the role of nitrate reductase, Bange and colleagues generated a narG mutant of M. bovis BCG (Weber et al. 2000). Immunodeficient mice infected with the narG mutant developed smaller granulomas than those infected with the wild type. Furthermore, mice infected with the mutant presented no clinical signs of disease after more than 200 days. This suggests that the ability to respire anaerobically contributes to virulence. Understanding how M. tuberculosis is able to cope with low oxygen pressure is one of the key research subjects in mycobacteriology, as this capacity is thought to enable the “dormant” metabolic state proposed for latent human infection (Wayne and Hayes, 1996). Research is also ongoing to identify the animal model of tuberculosis that reproduces the low oxygen pressures associated with granulomas seen in human lungs. As granuloma formation upon infection with M. tuberculosis in mice is not very pronounced, the rabbit model may be more appropriate. It is also noteworthy that some of the classical microbiological methods used to differentiate M. bovis from M. tuberculosis are due to point mutations in the M. bovis genome. For example, M. tuberculosis produces luxuriant, “eugonic” growth on media containing glycerol, while M. bovis produces small “dysgonic” colonies on the same media since it cannot use glycerol as a sole carbon source. Keating and col-

221

222

10 Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases

leagues (2005) have shown that this colony morphology difference is down to a single point mutation in the pyruvate kinase gene of M. bovis, a defect which also explains the pyruvate requirement of M. bovis. Similarly, M. tuberculosis reduces nitrate to nitrite while M. bovis performs this reduction very poorly. Bange and colleagues have also shown that this defect in nitrate reductase activity is due to a point mutation in the promoter of the M. bovis narGHIJ cluster (Stermann et al. 2004). Further studies will be necessary to experimentally confirm the many predictions made on the basis of the genome sequence (Cole et al. 1998).

10.5 Other Major Mycobacterial Human Pathogens 10.5.1 Mycobacterium leprae

M. leprae, the infectious agent of leprosy, represents another major human pathogen among the slow-growing mycobacteria. Although leprosy cases have been decreasing due to multidrug treatment campaigns led by the World Health Organization, the estimated incidence remains at 700 000 leprosy cases worldwide. There has been a dwindling in basic research into leprosy over the years, which has fortunately now been reversed with the completion of a genome sequencing project for M. leprae. At 3 268 182 bp, the M. leprae genome is almost 1.2 Mbp smaller than that of M. tuberculosis, and its G+C content of approximately 58% also differs significantly. At one time, this exceptionally low G+C value led to doubts as to whether the leprosy bacillus was a true member of the genus Mycobacterium, but these doubts have now been completely dispelled by the genome sequence (Cole et al. 2001). The difference in genome size results from loss of genetic information, with a coding capacity of less than 50% in M. leprae compared to 90.8% in M. tuberculosis. This reduction in coding capacity is due to both the deletion of chromosomal regions and the accumulation of mutations in genes, resulting in multiple pseudogenes. These decayed gene remnants are abundant and may represent the removal of functions no longer required for the highly adapted in vivo growth, where many substrates may be provided by the host cell. One of the most striking difference in gene families between the leprosy and tubercle bacilli is the case of the PE and PPE proteins. Whereas these two families account for 167 genes in M. tuberculosis (Cole et al. 1998; Brennan and Delogu 2002) only 9 intact pe or ppe genes could be identified in M. leprae, with complete loss of the PE-PGRS subfamily (Cole et al. 2001). This most likely reflects both downsizing in M. leprae and expansion in M. tuberculosis. Interestingly some of the intact pe and ppe genes in M. leprae, such as ML1828, ML1182, or ML0411, are in gene poor regions or surrounded by pseudogenes. Hence while neighboring loci were deleted or accumulated mutations these pe and ppe genes were main-

10.5 Other Major Mycobacterial Human Pathogens

tained, underlining the importance of retaining a minimal set of functional PE and PPE proteins. In both M. leprae and M. tuberculosis, the single rrn operon is situated approximately 1.3 Mbp from oriC, the chromosomal origin of replication. A copy of rrn is located adjacent to oriC in most bacterial genomes, and this proximity boosts rRNA production through increased gene dosage. It has been suggested that the atypical arrangement seen in mycobacterial pathogens may be related to their slow growth. However, the extremely reduced doubling time of 14 days of M. leprae, and the failure to cultivate this bacterium in axenic culture, is most probably caused by the loss of many functions related to its energy metabolism. M. leprae also lacks a functional catalase–peroxidase, as katG is a pseudogene. In M. tuberculosis KatG is known to play two roles, namely the detoxification of oxygen radicals and the activation of the antimycobacterial drug isoniazid. Catalase–peroxidase has been clearly identified as a virulence factor in M. tuberculosis complex organisms, most likely functioning to protect the bacillus from the reactive oxygen species generated by the macrophage respiratory burst. Absence of this defense mechanism against oxidative stress indicates that the leprosy bacillus either has some other means of withstanding reactive oxygen, or that it fails to trigger a respiratory burst. One of the most important consequences of the absence of a functional KatG is that M. leprae is resistant to isoniazid, and this drug is therefore inappropriate for the treatment of leprosy. More recently, comparative genomic analyses across a large number of biopsies from leprosy patients from all over the world has shown that extant M. leprae strains show very little genetic variability and represent a single clone which has spread around the world. The identification of a few informative single nucleotide polymorphism (SNP) permitted the isolates to be grouped according to their geographical origin, allowing an evolutionary scheme for the leprosy bacillus to be proposed (Monot et al. 2005). 10.5.2 Mycobacterium ulcerans

M. ulcerans is an emerging pathogen that causes Buruli ulcer, a debilitating cutaneous infection that is reaching epidemic proportions in parts of West Africa (Johnson et al. 2005). Buruli ulcer is considered to be the third most common mycobacterial disease of nonimmunocompromised persons, after tuberculosis and leprosy. M. ulcerans, unlike other mycobacterial pathogens, produces a macrolide toxin, mycolactone (George et al. 1999). Treatment of the disease is usually by surgical excision of infected and surrounding tissue, as the organism in situ is unresponsive to drug therapy. For a long time the mode of transmission of M. ulcerans has been unknown. However, recent investigations proposed a transmission chain involving biofilms on water plants, snails, and aquatic insects (Marsollier et al. 2002, 2004). It has been shown that M. ulcerans can multiply to very high numbers in the salivary glands of water insects and can then be transmitted by bites to the mammalian host. Recent detection of M. ulcerans-specific DNA

223

224

10 Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases

sequences in water from swamps in south-east Australia and aquatic insects in Benin have confirmed that it is primarily an environmental organism with a temperature optimum of 32 C. In contrast to M. leprae, the genome of M. ulcerans is nearly 1.5 Mbp larger than that of M. tuberculosis, with a predicated size of approximately 6 Mbp (http:// genopole.pasteur.fr/Mulc/BuruList.html). This is close to that of the related fish pathogen M. marinum, which also has an estimated genome size of approximately 6.7 Mbp (http://www.sanger.ac.uk/Projects/M_marinum/). While the annotation of the complete M. ulcerans genome has not been finished at the time of writing, during the genome sequencing project a 170-kbp plasmid was identified that carries the pks genes encoding the large polyketide synthases needed for the synthesis of the mycolactone toxin. This is in contrast to the M. tuberculosis complex, in that with M. ulcerans the main virulence factor is a plasmid-encoded toxin, most probably acquired by horizontal transfer. Otherwise the organism resembles M. marinum, which – for the human host – is an opportunistic pathogen and a much lesser concern for public health than M. ulcerans. In a recent study, investigation of several isolates of M. ulcerans showed that parts of the toxin-encoding plasmid may be lost, probably due to recombination between the highly repetitive sequences of the pks genes (Stinear et al. 2004). However, as this loss of function is related to the loss of virulence, it probably occurred during in vitro culture. However, such partial loss of the plasmid by certain M. ulcerans strains may contribute to the finding that the incidence and location of Buruli ulcer cases in a given geographical location may vary extensively, i.e., certain strains may disappear as a source for human infection due to the loss of parts of the plasmid (Stinear et al. 2005). In addition, future comparative genome analyses of the now available complete sequences will reveal whether M. ulcerans also has chromosomally encoded factors that are necessary for the production of mycolactone, providing greater insight into the evolutionary mechanisms which gave rise to this fascinating mycobacterium.

10.6 Concluding Remarks

The success of M. tuberculosis and other mycobacterial pathogens is undoubtedly a consequence of a finely tuned dialogue that has evolved over time between the pathogen and host. The availability of genome sequences from several mycobacterial species has considerably enlarged our knowledge of these pathogens, and now makes it possible to explore their pathogenic lifecycle and evolution. This information will play a key role in the development of new therapeutic and preventive strategies to address the immense global burden of mycobacterial disease.

References

Acknowlegments

We are grateful to P. Brodin, C. Demangel, T. Garnier, T. Stinear, G. Hewinson, and S. T. Cole for advice and encouragement. This work was supported by the Association Franaise Raoul Follereau, the European Union (QLK2-CT-200102018), the Ministre de la Recherche et Nouvelles Technologies (ACI Microbiologie), and the Institut Pasteur (PTR 35, PTR 110, and GPH 5).

References Andries K., P. Verhasselt, J. Guillemont, H. W. Gohlmann, J. M. Neefs, H. Winkler, J. Van Gestel, P. Timmerman, M. Zhu, E. Lee, P. Williams, D. de Chaffoy, E. Huitric, S. Hoffner, E. Cambau, C. Truffot–Pernot, N. Lounis and V. Jarlier. 2005. A diarylquinoline drug active on the ATP synthase of Mycobacterium tuberculosis. Science 307:223–227. Anes, E., M. P. Kuhnel, E. Bos, J. Moniz– Pereira, A. Habermann, and G. Griffiths. 2003. Selected lipids activate phagosome actin assembly and maturation resulting in killing of pathogenic mycobacteria. Nat Cell Biol 5:793–802. Behr, M. A., M. A. Wilson, W. P. Gill, H. Salamon, G. K. Schoolnik, S. Rane, and P. M. Small. 1999. Comparative genomics of BCG vaccines by whole–genome DNA microarray. Science 284:1520– 1523. Berthet, F. X., M. Lagranderie, P. Gounon, C. Laurent-Winter, D. Ensergueix, P. Chavarot, F. Thouron, E. Maranghi, V. Pelicic, D. Portnoi, G. Marchal, and B. Gicquel. 1998a. Attenuation of virulence by disruption of the Mycobacterium tuberculosis erp gene. Science 282:759–762. Berthet, F. X., P. B. Rasmussen, I. Rosenkrands , P. Andersen and B. Gicquel. 1998b. A Mycobacterium tuberculosis operon encoding ESAT-6 and a novel low-molecular-mass culture filtrate protein (CFP-10). Microbiology 144:3195– 3203. Boshoff, H. I., M. B. Reed, C. E. Barry, 3rd, and V. Mizrahi. 2003. DnaE2 polymerase contributes to in vivo survival and the emergence of drug resistance in

Mycobacterium tuberculosis. Cell 113:183–193. Braunstein, M., S. S. Bardarov, and W. R. Jacobs, Jr. 2002. Genetic methods for deciphering virulence determinants of Mycobacterium tuberculosis. Methods Enzymol 358:67–99. Braunstein, M., B. J. Espinosa, J. Chan, J. T. Belisle, and W. R. Jacobs, Jr. 2003. SecA2 functions in the secretion of superoxide dismutase A and in the virulence of Mycobacterium tuberculosis. Mol Microbiol 48:453–464. Brennan, M. J., and G. Delogu. 2002. The PE multigene family: a molecular mantra’ for mycobacteria. Trends Microbiol 10:246–249. Brodin, P., K. Eiglmeier, M. Marmiesse, A. Billault, T. Garnier, S. Niemann, S. T. Cole, and R. Brosch. 2002. Bacterial artificial chromosome-based comparative genomic analysis identifies Mycobacterium microti as a natural ESAT-6 deletion mutant. Infect Immun 70:5568– 5578. Brodin P., L. Majlessi , R. Brosch, D. Smith, G. Bancroft, S. Clark, A. Williams, C. Leclerc and S. T. Cole. 2004. Enhanced protection against tuberculosis by vaccination with recombinant Mycobacterium microti vaccine that induces T cell immunity against region of difference 1 antigens. J Infect Dis. 190:115–122. Brosch, R., S. V. Gordon, M. Marmiesse, P. Brodin, C. Buchrieser, K. Eiglmeier, T. Garnier, C. Gutierrez, G. Hewinson, K. Kremer, L. M. Parsons, A. S. Pym, S. Samper, D. van Soolingen, and S. T. Cole. 2002. A new evolutionary scenario

225

226

10 Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases for the Mycobacterium tuberculosis complex. Proc Natl Acad Sci U S A 99:3684–3689. Brosch, R., A. S. Pym, S. V. Gordon, and S. T. Cole. 2001. The evolution of mycobacterial pathogenicity: clues from comparative genomics. Trends Microbiol 9:452–458. Buchmeier, N., A. Blanc–Potard, S. Ehrt, D. Piddington, L. Riley, and E. A. Groisman. 2000. A parallel intraphagosomal survival strategy shared by mycobacterium tuberculosis and Salmonella enterica. Mol Microbiol 35:1375–1382. Camacho, L. R., D. Ensergueix, E. Perez, B. Gicquel, and C. Guilhot. 1999. Identification of a virulence gene cluster of Mycobacterium tuberculosis by signature-tagged transposon mutagenesis. Mol Microbiol 34:257–267. Cole, S. T., R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. Gordon, K. Eiglmeier, S. Gas, C. E. Barry, 3rd, F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, B. G. Barrell, and et al. 1998. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393:537–544. Cole, S. T., K. Eiglmeier, J. Parkhill, K. D. James, N. R. Thomson, P. R. Wheeler, N. Honore, T. Garnier, C. Churcher, D. Harris, K. Mungall, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. M. Davies, K. Devlin, S. Duthoy, T. Feltwell, A. Fraser, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, C. Lacroix, J. Maclean, S. Moule, L. Murphy, K. Oliver, M. A. Quail, M. A. Rajandream, K. M. Rutherford, S. Rutter, K. Seeger, S. Simon, M. Simmonds, J. Skelton, R. Squares, S. Squares, K. Stevens, K. Taylor, S. Whitehead, J. R. Woodward, and B. G. Barrell. 2001. Massive gene decay in the leprosy bacillus. Nature 409:1007–1011. Constant P., E. Perez, W. Malaga, M. A. Laneelle, O. Saurel, M. Daffe and Guilhot C. Role of the pks15/1 gene in the biosynthesis of phenolglycolipids in the Mycobacterium tuberculosis complex. Evidence that all strains synthesize

glycosylated p–hydroxybenzoic methly esters and that strains devoid of phenolglycolipids harbor a frameshift mutation in the pks15/1 gene. 2002. J Biol Chem 277:38148–38158. Cox, J. S., B. Chen, M. McNeil, and W. R. Jacobs, Jr. 1999. Complex lipid determines tissue-specific replication of Mycobacterium tuberculosis in mice. Nature 402:79–83. Dahl, J. L., C. N. Kraus, H. I. Boshoff, B. Doan, K. Foley, D. Avarbock, G. Kaplan, V. Mizrahi, H. Rubin, and C. E. Barry, 3rd. 2003. The role of RelMtb-mediated adaptation to stationary phase in long-term persistence of Mycobacterium tuberculosis in mice. Proc Natl Acad Sci U S A 100:10026– 10031. Dye, C., S. Scheele, P. Dolin, V. Pathania and M. C. Raviglione. 1999. Consensus statement. Global burden of tuberculosis: estimated incidence, prevalence, and mortality by country. WHO Global Surveillance and Monitoring Project. JAMA.282:677–686. Fleischmann, R. D., D. Alland, J. A. Eisen, L. Carpenter, O. White, J. Peterson, R. DeBoy, R. Dodson, M. Gwinn, D. Haft, E. Hickey, J. F. Kolonay, W. C. Nelson, L. A. Umayam, M. Ermolaeva, S. L. Salzberg, A. Delcher, T. Utterback, J. Weidman, H. Khouri, J. Gill, A. Mikula, W. Bishai, W. R. Jacobs Jr., J. C. Venter, and C. M. Fraser. 2002. Whole-genome comparison of Mycobacterium tuberculosis clinical and laboratory strains. J Bacteriol 184:5479–5490. Garnier, T., K. Eiglmeier, J. C. Camus, N. Medina, H. Mansoor, M. Pryor, S. Duthoy, S. Grondin, C. Lacroix, C. Monsempe, S. Simon, B. Harris, R. Atkin, J. Doggett, R. Mayes, L. Keating, P. R. Wheeler, J. Parkhill, B. G. Barrell, S. T. Cole, S. V. Gordon, and R. G. Hewinson. 2003. The complete genome sequence of Mycobacterium bovis. Proc Natl Acad Sci U S A 100:7877–7882. George K. M., D. Chatterjee, G. Gunawardana, D. Welty, T. Lee , and P.L. Small. 1999. Mycolactone: a polyketide toxin from Mycobacterium ulcerans required for virulence. Science 283: 854–857.

References Glickman, M. S., J. S. Cox, and W. R. Jacobs, Jr. 2000. A novel mycolic acid cyclopropane synthetase is required for cording, persistence, and virulence of Mycobacterium tuberculosis. Mol Cell 5:717– 727. Gordon, S. V., R. Brosch, A. Billault, T. Garnier, K. Eiglmeier, and S. T. Cole. 1999. Identification of variable regions in the genomes of tubercle bacilli using bacterial artificial chromosome arrays. Mol Microbiol 32:643–655. Goren, M. B., P. D’Arcy Hart, M. R. Young, and J. A. Armstrong. 1976. Prevention of phagosome–lysosome fusion in cultured macrophages by sulfatides of Mycobacterium tuberculosis. Proc Natl Acad Sci U S A 73:2510–2514. Gutacker M. M. , J. C. Smoot, C. A. Migliaccio, S. M. Ricklefs, S. Hua , D. V. Cousins, E. A. Graviss, E. Shashkina, B. N. Kreiswirth and J. M. Musser. 2002. Genome-wide analysis of synonymous single nucleotide polymorphisms in Mycobacterium tuberculosis complex organisms: resolution of genetic relationships among closely related microbial strains. Genetics. 162:1533–1543. Gutierrez, M. C., Brisse, S., Brosch, R., Fabre, M., Omais, B., Marmiesse, M., Supply, P., and Vincent V. (2005) Ancient Origin and Gene Mosaicism of the Progenitor of Mycobacterium tuberculosis, PLoS Pathog 1: e5. Guinn, K. I., M. J. Hickey, S. K. Mathur, K. L. Zakel, J. E. Grotzke, D. M. Lewinsohn, S. Smith, and D. R. Sherman. 2004. Individual RD1-region genes are required for export of ESAT-6/CFP-10 and for virulence of Mycobacterium tuberculosis. Mol Microbiol 51:359–370. Harboe M, T. Oettinger, H. G. Wiker, I. Rosenkrands and P. Andersen. 1996. Evidence for occurrence of the ESAT-6 protein in Mycobacterium tuberculosis and virulent Mycobacterium bovis and for its absence in Mycobacterium bovis BCG. Infect Immun 64:16–22 Hart, P. D., and I. Sutherland. 1977. BCG and vole bacillus vaccines in the prevention of tuberculosis in adolescence and early adult life. Br Med J 2:293–295. Hingley–Wilson, S. M., V. K. Sambandamurthy, and W. R. Jacobs, Jr. 2003. Sur-

vival perspectives from the world’s most successful pathogen, Mycobacterium tuberculosis. Nat Immunol 4:949–955. Hondalus, M. K., S. Bardarov, R. Russell, J. Chan, W. R. Jacobs, Jr., and B. R. Bloom. 2000. Attenuation of and protection induced by a leucine auxotroph of Mycobacterium tuberculosis. Infect Immun 68:2888–2898. Hsu, T., S. M. Hingley–Wilson, B. Chen, M. Chen, A. Z. Dai, P. M. Morin, C. B. Marks, J. Padiyar, C. Goulding, M. Gingery, D. Eisenberg, R. G. Russell, S. C. Derrick, F. M. Collins, S. L. Morris, C. H. King, and W. R. Jacobs, Jr. 2003. The primary mechanism of attenuation of bacillus Calmette-Guerin is a loss of secreted lytic function required for invasion of lung interstitial tissue. Proc Natl Acad Sci U S A 100:12420–12425. Imaeda, T. (1985) Deoxyribonucleic acid relatedness among selected strains of Mycobacterium tuberculosis, Mycobacterium bovis, Mycobacterium bovis BCG, Mycobacterium microti and Mycobacterium africanum. Int J Syst Bacteriol 35:147–150. Jackson, M., S. W. Phalen, M. Lagranderie, D. Ensergueix, P. Chavarot, G. Marchal, D. N. McMurray, B. Gicquel, and C. Guilhot. 1999. Persistence and protective efficacy of a Mycobacterium tuberculosis auxotroph vaccine. Infect Immun 67:2867–2873. Johnson PD, Stinear T, Small PL, Pluschke G, Merritt RW, Portaels F, Huygen K, Hayman JA, Asiedu K. 2005. Buruli Ulcer (M. ulcerans infection): new insights, new hope for disease control. PLoS Med. 2:e108. Kaufmann, S. H. 2001. How can immunology contribute to the control of tuberculosis? Nat Rev Immunol 1:20–30. Kaushal, D., B. G. Schroeder, S. Tyagi, T. Yoshimatsu, C. Scott, C. Ko, L. Carpenter, J. Mehrotra, Y. C. Manabe, R. D. Fleischmann, and W. R. Bishai. 2002. Reduced immunopathology and mortality despite tissue persistence in a Mycobacterium tuberculosis mutant lacking alternative sigma factor, SigH. Proc Natl Acad Sci U S A 99:8330–8335. Keating L. A. , P. R. Wheeler, H. Mansoor, J. K. Inwald, J. Dale, R. G. Hewinson

227

228

10 Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases and S. V. Gordon. 2005. The pyruvate requirement of some members of the Mycobacterium tuberculosis complex is due to an inactive pyruvate kinase: implications for in vivo growth. Mol Microbiol 2005 56:163–174. Lewis, K. N., R. Liao, K. M. Guinn, M. J. Hickey, S. Smith, M. A. Behr, and D. R. Sherman. 2003. Deletion of RD1 from Mycobacterium tuberculosis mimics bacille Calmette-Guerin attenuation. J Infect Dis 187:117–123. Lopez, B., D. Aguilar, H. Orozco, M. Burger, C. Espitia, V. Ritacco, L. Barrera, K. Kremer, R. Hernandez–Pando, K. Huygen, and D. van Soolingen. 2003. A marked difference in pathogenesis and immune response induced by different Mycobacterium tuberculosis genotypes. Clin Exp Immunol 133:30–37. Mahairas, G. G., P. J. Sabo, M. J. Hickey, D. C. Singh, and C. K. Stover. 1996. Molecular analysis of genetic differences between Mycobacterium bovis BCG and virulent M. bovis. J Bacteriol 178:1274–1282. Manca, C., L. Tsenova, A. Bergtold, S. Freeman, M. Tovey, J. M. Musser, C. E. Barry, 3rd, V. H. Freedman, and G. Kaplan. 2001. Virulence of a Mycobacterium tuberculosis clinical isolate in mice is determined by failure to induce Th1 type immunity and is associated with induction of IFN-alpha/beta. Proc Natl Acad Sci U S A 98:5752–5757. Marmiesse, M., P. Brodin, C. Buchrieser, C. Gutierrez, N. Simoes, V. Vincent, P. Glaser, S. T. Cole and R. Brosch 2004. Macro-array and bioinformatic analyses reveal mycobacterial core’ genes, variation in the ESAT-6 gene family and new phylogenetic markers for the Mycobacterium tuberculosis complex. Microbiology 150:483–496. Marsollier L., R. Robert, J. Aubry , J. S. Andre, H. Kouakou, P. Legras, A. L. Manceau, C. Mahaza and B. Carbonnelle. 2002. Aquatic insects as a vector for Mycobacterium ulcerans. Appl Environ Microbiol 68: 4623–4628. Marsollier L., T. Severin, J. Aubry, R. W. Merritt, J. P. Saint-Andre, P. Legras, A. L. Manceau, A. Chauty, B. Carbonnelle and S. T. Cole. 2004. Aquatic snails, pas-

sive hosts of Mycobacterium ulcerans. Appl Environ Microbiol 70:6296–6298. McKinney, J. D., K. Honer zu Bentrup, E. J. Munoz–Elias, A. Miczak, B. Chen, W. T. Chan, D. Swenson, J. C. Sacchettini, W. R. Jacobs, Jr., and D. G. Russell. 2000. Persistence of Mycobacterium tuberculosis in macrophages and mice requires the glyoxylate shunt enzyme isocitrate lyase. Nature 406:735–738. Monot, M., Honore, N., Garnier, T., Araoz, R., Coppee, J. Y., Lacroix, C., Sow, S., Spencer, J. S., Truman, R. W., Williams, D. L., Gelber, R., Virmond, M., Flageul, B., Cho, S. N., Ji, B., Paniz– Mondolfi, A., Conivt, J., Young, S., Fine, P. E., Rasolofo, V., Brennan, P. J., and Cole, S. T. 2005. On the origin of leprosy. Science. 308:1040–1042. Mostowy, S., D. Cousins, J. Brinkman, A. Aranaz, and M. A. Behr. 2002. Genomic deletions suggest a phylogeny for the Mycobacterium tuberculosis complex. J Infect Dis 186:74–80. Pavelka, M. S., Jr., B. Chen, C. L. Kelley, F. M. Collins, and W. R. Jacobs, Jr. 2003. Vaccine efficacy of a lysine auxotroph of Mycobacterium tuberculosis. Infect Immun 71:4190–4192. Pelicic, V., M. Jackson, J. M. Reyrat, W. R. Jacobs, Jr., B. Gicquel, and C. Guilhot. 1997. Efficient allelic exchange and transposon mutagenesis in Mycobacterium tuberculosis. Proc Natl Acad Sci U S A 94:10955–10960. Perez E., P. Constant, A. Lemassu, F. Laval, M. Daffe and C. Guilhot. 2004. Characterization of three glycosyltransferases involved in the biosynthesis of the phenolic glycolipid antigens from the Mycobacterium tuberculosis complex. J Biol Chem. 279:42574–42583. Perez, E., S. Samper, Y. Bordas, C. Guilhot, B. Gicquel, and C. Martin. 2001. An essential role for phoP in Mycobacterium tuberculosis virulence. Mol Microbiol 41:179–187. Pethe, K., S. Alonso, F. Biet, G. Delogu, M. J. Brennan, C. Locht, and F. D. Menozzi. 2001. The heparin-binding haemagglutinin of M. tuberculosis is required for extrapulmonary dissemination. Nature 412:190–194.

References Pym, A. S., P. Brodin, R. Brosch, M. Huerre, and S. T. Cole. 2002. Loss of RD1 contributed to the attenuation of the live tuberculosis vaccines Mycobacterium bovis BCG and Mycobacterium microti. Mol Microbiol 46:709–717. Pym, A. S., P. Brodin, L. Majlessi, R. Brosch, C. Demangel, A. Williams, K. E. Griffiths, G. Marchal, C. Leclerc, and S. T. Cole. 2003. Recombinant BCG exporting ESAT–6 confers enhanced protection against tuberculosis. Nat Med 9:533–9. Raynaud, C., C. Guilhot, J. Rauzier, Y. Bordat, V. Pelicic, R. Manganelli, I. Smith, B. Gicquel, and M. Jackson. 2002. Phospholipases C are involved in the virulence of Mycobacterium tuberculosis. Mol Microbiol 45:203–217. Reed M. B., P. Domenech, C. Manca, H. Su, A. K. Barczak, B. N. Kreiswirth, G. Kaplan and C. E. Barry 3rd. A glycolipid of hypervirulent tuberculosis strains that inhibits the innate immune response. 2004. Nature 431:84–87. Russell, D. G. 2001. Mycobacterium tuberculosis: here today, and here tomorrow. Nat Rev Mol Cell Biol 2:569–77. Salamon, H., M. Kato-Maeda, P. M. Small, J. Drenkow, T. R. Gingeras. 2000. Detection of deleted genomic DNA using a semiautomated computational analysis of GeneChip data. Genome Res 10:2044–2054. Sambandamurthy, V. K., X. Wang, B. Chen, R. G. Russell, S. Derrick, F. M. Collins, S. L. Morris, and W. R. Jacobs, Jr. 2002. A pantothenate auxotroph of Mycobacterium tuberculosis is highly attenuated and protects mice against tuberculosis. Nat Med 8:1171–1174. Sassetti C. M. , D. H. Boyd and E. J. Rubin. 2001. Comprehensive identification of conditionally essential genes in mycobacteria. Proc Natl Acad Sci U S A 98:12712–12717. Sassetti C. M., D. H. Boyd and E. J. Rubin. 2003. Genes required for mycobacterial growth defined by high density mutagenesis. Mol Microbiol. 48:77–84. Sassetti C. M and E. J. Rubin. 2003. Genetic requirements for mycobacterial survival during infection. Proc Natl Acad Sci U S A. 100:12989–12994.

Smith, C. V., and J. C. Sacchettini. 2003. Mycobacterium tuberculosis: a model system for structural genomics. Curr Opin Struct Biol 13:658–664. Smith, D. A., T. Parish, N. G. Stoker, and G. J. Bancroft. 2001. Characterization of auxotrophic mutants of Mycobacterium tuberculosis and their potential as vaccine candidates. Infect Immun 69:1142–1150. Sorensen, A. L., S. Nagai, G. Houen, P. Andersen, and A. B. Andersen. 1995. Purification and characterization of a low-molecular-mass T-cell antigen secreted by Mycobacterium tuberculosis. Infect Immun 63:1710–1717. Sreevatsan S, X. Pan, K. E. Stockbauer, N. D. Connell, B. N. Kreiswirth, T. S. Whittam and J. M. Musser. 1997. Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination. Proc Natl Acad Sci U S A 94:9869–9874. Stanley, S. A., S. Raghavan, W. W. Hwang, and J. S. Cox. 2003. Acute infection and macrophage subversion by Mycobacterium tuberculosis require a specialized secretion system. Proc Natl Acad Sci U S A 100:13001–13006. Stead, W. W., K. D. Eisenach, M. D. Cave, M. L. Beggs, G. L. Templeton, C. O. Thoen, and J. H. Bates. 1995. When did Mycobacterium tuberculosis infection first occur in the New World? An important question with public health implications. Am J Respir Crit Care Med 151:1267–1268. Stermann M., L. Sedlacek , S. Maass and F. C. Bange. 2004. A promoter mutation causes differential nitrate reductase activity of Mycobacterium tuberculosis and Mycobacterium bovis. J Bacteriol 186:2856–2861. Stewart, G. R., V. A. Snewin, G. Walzl, T. Hussell, P. Tormay, P. O’Gaora, M. Goyal, J. Betts, I. N. Brown, and D. B. Young. 2001. Overexpression of heatshock proteins reduces survival of Mycobacterium tuberculosis in the chronic phase of infection. Nat Med 7:732–737. Steyn, A. J., D. M. Collins, M. K. Hondalus, W. R. Jacobs, Jr., R. P. Kawakami, and

229

230

10 Pathogenomics: Insights into Tuberculosis and Related Mycobacterial Diseases B. R. Bloom. 2002. Mycobacterium tuberculosis WhiB3 interacts with RpoV to affect host survival but is dispensable for in vivo growth. Proc Natl Acad Sci U S A 99:3147–3152. Stinear T. P., A. Mve–Obiang, P. L. C. Small, W. Frigui, M. J. Pryor, R. Brosch, G. A. Jenkin, P. D. Johnson, J. K. Davies, R. E. Lee, S. Adusumilli, T. Garnier, S. F. Haydock, P. F. Leadlay and S. T. Cole. 2004. Giant plasmid-encoded polyketide synthases produce the macrolide toxin of Mycobacterium ulcerans. Proc Natl Acad Sci U S A 101:1345–1349. Stinear T. P., H. Hong, W. Frigui , M. J. Pryor, R. Brosch, T. Garnier, P. F. Leadlay and S. T. Cole. 2005. Common evolutionary origin for the unstable virulence plasmid pMUM found in geographically diverse strains of Mycobacterium ulcerans. J Bacteriol 187:1668–1676. Sula, L., and I. Radkovsky. 1976. Protective effects of M. microti vaccine against tuberculosis. J Hyg Epidemiol Microbiol Immunol 20:1–6. Trivedi O. A., P. Arora, V. Sridharan, R. Tickoo, D. Mohanty and R.S. Gokhale. 2004. Enzymic activation and transfer of fatty acids as acyl-adenylates in mycobacteria. Nature 428:441–445. Trivedi O. A. , P. Arora, A. Vats, M. Z. Ansari, R. Tickoo , V. Sridharan, D. Mohanty and R.S. Gokhale. 2005. Dissecting the mechanism and assembly of a complex virulence mycobacterial lipid. Mol Cell 17:631–643. Tullius, M. V., G. Harth, and M. A. Horwitz. 2003. Glutamine synthetase GlnA1 is essential for growth of Mycobacterium

tuberculosis in human THP-1 macrophages and guinea pigs. Infect Immun 71:3927–3936. van Soolingen, D., T. Hoogenboezem, P. E. de Haas, P. W. Hermans, M. A. Koedam, K. S. Teppema, P. J. Brennan, G. S. Besra, F. Portaels, J. Top, L. M. Schouls and J. D. van Embden. 1997. A novel pathogenic taxon of the Mycobacterium tuberculosis complex, Canetti: characterization of an exceptional isolate from Africa. Int J Syst Bacteriol 47:1236– 1245. Vergne I, J. Chua, H. H. Lee, M. Lucas, J. Belisle and V. Deretic. 2005. Mechanism of phagolysosome biogenesis block by viable Mycobacterium tuberculosis. Proc Natl Acad Sci U S A 102:4033–4038. Walburger A, A. Koul, G. Ferrari, L. Nguyen, C. Prescianotto-Baschong, K. Huygen, B. Klebl, C. Thompson, G. Bacher and J. Pieters. 2004. Protein kinase G from pathogenic mycobacteria promotes survival within macrophages. Science 304:1800–1804. Wayne L. G. and L. G. Hayes. 1996. An in vitro model for sequential study of shiftdown of Mycobacterium tuberculosis through two stages of nonreplicating persistence. Infect Immun 64:2062– 2069. Weber I., C. Fritz, S. Ruttkowski, A. Kreft and F. C. Bange. 2000. Anaerobic nitrate reductase (narGHJI) activity of Mycobacterium bovis BCG in vitro and its contribution to virulence in immunodeficient mice. Mol Microbiol. 35:1017–1025.

231

11 Genomes of Pathogenic Neisseria Species Christoph Schoen, Heike Claus, Ulrich Vogel, and Matthias Frosch

11.1 Introduction

The Neisseriae are gram-negative cocci that usually occur in pairs. Neisseria gonorrhoeae (the gonococcus) and N. meningitidis (the meningococcus) are pathogenic to humans, whereas other Neisseria species such as N. lactamica are commensal inhabitants of the human respiratory tract and rarely if ever cause invasive disease. Their only habitat is humans, with no other known reservoirs. Previous DNA hybridization studies suggested that N. meningitidis and N. gonorrhoeae share up to 90% DNA homology [1], and it has been estimated that meningococci and gonococci may have separated into different species probably quite recently after large-scale urbanization of their human hosts within the last 10 000 years [2]. The gonococcus is an obligate human pathogen and transmission generally occurs through direct sexual contact. Infections typically occur on the mucosal epithelia of the male urethra or the female uterine cervix but can also affect the rectum, the throat, and the conjunctiva of the eye [3]. Very rarely, gonococci disseminate and cause a dermatitis–arthritis syndrome, endocarditis, or meningitis. Gonococci may also ascend from the endocervical canal and cause pelvic inflammatory disease, resulting in an increased risk of infertility and ectopic pregnancy [3]. Newborns exposed to infected secretions in the birth canal may develop ocular infections which can have serious consequences, such as corneal scarring or perforation. Gonorrhea occurs worldwide, with its highest incidence in developing countries [4]. The development of an effective vaccine has been hampered by the lack of a suitable animal model [5] and the fact that an effective immune response has never yet been demonstrated [6]. Meningococci are facultative commensals and colonize the nasopharynx of about 10% of healthy individuals [7]. In contrast to gonococci, direct person-to-person spread of meningococci occurs by airborne transmission. Meningococci can traverse the mucosal barrier and enter the bloodstream, causing septicemia, and also the blood–brain barrier, resulting in fulminant meningitis [8]. In developing countries, epidemics of meningococcal infection are major causes of morbidity and mortality. In many European and American countries infection with

232

11 Genomes of Pathogenic Neisseria Species

N. meningitidis leads to more death and disability among infants than any other microbial infection [9]. Meningococci are divided into 12 serogroups based on the chemical composition and the immunological characteristics of their capsular polysaccharide. The most important serogroups associated with disease in humans are A, B, C, W-135, and Y. While serogroups B and C account for sporadic cases and numerous localized outbreaks worldwide and cause the majority of cases in industrialized countries, serogroup A meningococci are the main pathogens involved in major epidemics in China, the Middle East, South America, and, especially, sub-Saharan Africa [10]. Vaccines against N. meningitidis serogroups A, C, W-135 and Y have been developed based on capsular polysaccharides, but no effective vaccine is yet available against the group B meningococci which are responsible for most meningococcal disease in USA and Europe (see Chapter 24). In this chapter we will focus on the results of both in silico and experimental analyses of the neisserial genomes that have been fully sequenced so far and are publicly available. Attention will also be paid to newly identified virulence gene candidates of N. meningitidis and to genomic comparison of the pathogenic species.

11.2 Genomes of Pathogenic Neisseria Species

So far, the genome sequences of one N. gonorrhoeae [11] and three N. meningitidis strains [12–14] have been fully determined by the random shotgun sequencing strategy and made publicly available. However, only the annotation of the genome sequences of N. meningitidis serogroup A strain Z2491 [12] and N. meningitidis serogroup B strain MC58 [13] have been published. That of N. gonorrhoeae strain FA1090 is currently accessible via the homepage of the Los Alamos National Laboratory Bioscience Division (http://www.stdgen.lanl.gov/stdgen/bacteria/ngon/) [11]. Accordingly, only a preliminary annotation of N. meningitidis serogroup C strain FAM18 is currently available from the Wellcome Trust Sanger Institute homepage (accessible via http://www.sanger.ac.uk/Projects/N_meningitidis/ seroC/seroC.shtml) [14]. Among the nonpathogenic Neisseria species, shotgun sequencing of a N. lactamica (an ST-640 strain) genome at the Wellcome Trust Sanger Institute is complete, and finishing and gap closure of 38 contigs with a total size of 2.219 Mbp are in progress at the time of writing. A summary of the overall features of the four completely sequenced and publicly available neisserial genomes is given in Table 11.1 and a multiple whole-genome alignment in Fig. 11.1. The three meningococcal genomes are all of quite similar size, around 2.22 Mbp, which is slightly larger than that of N. gonorrhoeae strain FA1090 with its 2.15 Mbp, and half the size of, e. g., the E. coli K12 genome with its 4.64 Mbp [15]. The average meningococcal open reading frame (ORF) length is 879 bp, compared to 827 bp in N. gonorrhoeae, and the coding density is between 83% and 84%. On the basis of the primary annotations [12, 13, 16], biological roles could be assigned to about 60% of all ORFs. Regarding those cate-

11.2 Genomes of Pathogenic Neisseria Species

gories that are potentially important for the pathogenesis of neisserial diseases, 5– 6% of all ORFs code for cell envelope proteins, 5–6 % for proteins involved in transport and binding functions, and another 3% belong to the category of mobile and extrachromosomal element functions. Finally, the genomes contain each four copies of a 16S-23S-5S ribosomal RNA operon and between 55 and 58 tRNAs. 11.2.1 The Flexible Genome Pool

In contrast to meningococci, most clinical isolates of N. gonorrhoeae harbor a small (4.2 kbp) plasmid of unknown function [17] (Table 11.1). In some gonococcal strains regions of this cryptic plasmid have also been found integrated into the gonococcal genome [18]. Another two plasmids frequently encountered in gonococci contain genes that code for b-lactamase [19] and can be cotransferred to N. meningitidis via conjugation in the presence of a large, 37-kbp conjugative plasmid [20, 21] but this has not happened yet under in vivo conditions [22]. Although less frequently, N. meningitidis was also found to harbor mostly cryptic plasmids encoding resistance against b-lactam antibiotics [23], sulfonamides [24], and tetracycline [25], which has raised some concern regarding effective antibiotic treatment of meningococcal meningitis [22]. Plasmids differentially distributed among clonal lineages of meningococci have also been demonstrated [26, 27]. However, in contrast to many gram-negative pathogens, no virulence plasmids have been detected in Neisseria species so far. In addition to plasmids, a computational analysis based on the published genome sequences [12, 13] revealed three partially defective mosaic relatives of the Mu-like group of prophages in the genome of N. meningitidis Z2491 called Pnm1, Pnm2, and Pnm3, respectively (Table 11.1 and Fig. 11.1). The MC58 genome contains similar prophages at the Pnm2 and Pnm3 integration sites called NeisMu1 and NeisMu2, respectively, but has no prophage at the Pnm1 integration site [28, 29] (Fig. 11.1). In addition, a number of fCTX-like phage genes were found in both genomes, summing up to at least three (MC58) and five (Z2491) probably defective prophages in both genomes [30]. The finding that at least prophage NeisMu1 codes for membrane-associated antigenic proteins suggests that these proteins contribute to the variability in envelope structure and may thus influence virulence and pathogenicity [29]. In addition to prophages, larger entities of laterally transferred DNA are frequently found in the genomes of pathogenic bacteria. Amongst other criteria, these regions are generally characterized by atypical DNA composition [31]. Accordingly, base composition analysis of the Z2491 genome resulted in the identification of at least 60 coding regions (nearly 5% in total) with a significantly lower G+C content when compared to the mean G+C content, ranging in size from 224 bp to 11.3 kbp and averaging 1.8 kbp [12] (Table 11.1 and Fig. 11.1). Onethird of these regions encode proteins that are likely to be located on the surface of the cell or are responsible for the production of surface structures like the serogroup A capsule cassette [12]. In the MC58 genome three such regions were iden-

233

234

11 Genomes of Pathogenic Neisseria Species

Table 11.1 Basic features of Neisseria spp. genomes as given in the primary

annotations [11–14] unless stated otherwise.

N. gonorrhoeae

N. meningitidis Serogroup A

Serogroup B

Serogroup C

Strain

FA1090

Z2491

MC58

FAM18

GenBank

AE004969

AL157959

AE002098

NC_003221

Genome size (bp)

2 153 944

2 184 406

2 272 351

2 194 961

G+C content (%)

54.0

51.8

51.5

51.6

Predicted number

2 185

2 121

2 158

2 058

Average length (bp)

827

877

874

885

Coding area (%)

83.9

82.9

83.0

83.0

Number of tRNAs

55

58

59

59

Number of rRNA operons

4

4

4

4

Pseudogenes

n. a.

‡ 56

n. a.

n. a.

ORFs with assigned function (%)

61.2

62.7

56.4

n. a.

Putative phase-variable genes[a]

83

68

82

n. a.

Genes containing coding tandem repeats[b]

27

25

26

n. a.

Open reading frames

Flexible genome pool Plasmids (as identified in other strains) Resistance plasmids

Frequent, mostly cryptic Less frequent, mostly cryptic Frequent b-lactamase Only few strains harbor plasmids encoding encoding small plasmids; b-lactamase or sulfonamide resistance tetracycline resistance on large conjugative plasmid

Phages and phage-like elements[c] n. a.

5 (Pnm1, Pnm2, and Pnm3)

3 (NeisMu1, NeisMu2)

‡1

Putative genomic islands

ID1–ID6

> 60 regions

IHTA, B, and C

n. a.

DNA uptake sequences

1 965

1 892

1 910

1 888

IS elements

IS1016, IS1016-related, IS150

IS1016 , IS1106 , IS1655

Total number

n. a.

43

IS1016, n. a. IS1106, IS1655, IS4351-related 51

Repeat elements

11.2 Genomes of Pathogenic Neisseria Species

235

Table 11.1 Continued.

N. gonorrhoeae

N. meningitidis Serogroup A

Serogroup B

Serogroup C

Correia and Correia-related repeats[d]

102

270

261

n. a.

Others

n. a.

ATR, dRS3, REP2-REP5, RS

n. a.

n. a.

a b c d

n. a. not available. According to [101]. According to [60]. According to [30]. According to [63].

Fig. 11.1 Multiple alignment of the four neisserial genomes so far fully sequenced using the program Mauve [100]. Locally collinear blocks of DNA are depicted in the same colors and connected via correspondingly colored lines. Equally colored blocks on different sides of black lines corresponding to

the respective genome sequence indicate chromosomal inversions. The chromosomal locations of potential prophages [30], putative genetic islands [13], and some putative virulence genes are given for strain MC58. (This figure also appears with the color plates.)

236

11 Genomes of Pathogenic Neisseria Species

tified and designated as islands of horizontally transferred DNA (IHTs) (Table 11.1 and Fig. 11.1) [13] coding, with the exception of IHT-A1, mostly for hypothetical proteins [13]. IHT-A1 contains the genes of the capsule gene cluster, which is an important virulence determinant in both serogroup A and serogroup B meningococci. Similarly, six putative genomic islands of about 2 kbp in size were identified in the N. gonorrhoeae genome [16] (Table 11.1 and Fig. 11.1). Four of these islands were found to encode putative adhesins, but most genes of these islands again code for proteins of unknown function. However, although these regions differ in their G+C content and codon usage, almost all of them are neither associated with tRNA loci, nor are they flanked by direct repeats, nor do they contain genes or pseudogenes coding for genetic mobility. Therefore, in striking contrast to what has been observed in Enterobacteriaceae, none of these putative islands seems to have the typical hallmarks of canonical pathogenicity islands [32]. 11.2.2 Repetitive DNA Sequence Elements Govern Neisserial Biology

One of the most striking and unique characteristics of the annotated neisserial genomes [12, 13] is the abundance and diversity of repetitive DNA. The genomes are replete with an unprecedented number of genetic elements that contribute to both genome fluidity and physical variability, attesting to the adaptability of the Neisseriae and their potential to evade the human immune system.

11.2.2.1 DNA Uptake Sequences, Horizontal Gene Transfer, and Antigenic Diversity An example of the abundance of repetitive DNA sequence elements is the neisserial DNA uptake sequence which is involved in the recognition and uptake of DNA from the environment [33]. There are nearly 2000 copies of the 10-bp uptake sequence in each genome, which either occurs alone or in inverted repeats as part of a transcriptional terminator [34]. During colonization of the nasopharynx the bacteria are exposed to DNA bearing the appropriate uptake sequence from lysed strains of the same [35, 36] or related species [37], and natural competence of the Neisseriae facilitates uptake of this cognate DNA. Thus, the high competence for natural transformation of Neisseria species together with DNA recombination results in mosaicism of many chromosomal genes [38, 39] as well as in an epidemic population structure [40]. In fact, genetic changes in N. meningitidis occur even more frequently through DNA recombination than by mutation [41]. Recombination is also an important genetic mechanism in the generation of new meningococcal clones and alleles [42] and particularly affects several bacterial structures involved in virulence and/or transmission. For example, it has been established that capsule switching from a serogroup B to a serogroup C capsule phenotype was the result of the substitution of the serogroup B polysialyltransferase with the serogroup C polysialyltransferase after transformation and horizontal transfer of a serogroup C capsule biosynthetic operon [43, 44]. However, the natural frequency of this process is still a matter of debate.

11.2 Genomes of Pathogenic Neisseria Species

Of note, there was a significantly higher density of DNA uptake sequences within genes involved in DNA repair, recombination, restriction modification, and replication than in any other annotated gene group in these organisms [45]. This finding might reflect facilitated recovery from DNA damage caused by an adverse environment such as oxygen or modification of the pH as occurs inside phagocytic vacuoles. Additionally, it was shown that induction of a gene implicated in DNA repair (xseB) during the initial step of adhesion increased the ability of the bacteria to repair their chromosome, suggesting that some of the DNA repair mechanisms could be controlled by the interaction of meningococci with host cells [46].

11.2.2.2 Simple Sequence Repeats and Phase Variation The sequenced genomes of the pathogenic Neisseriae are also littered with a vast number of polymorphic simple sequence repeats, ranging from homopolymeric tracts to pentanucleotide repeats [47]. The length of the homopolymeric or heteropolymeric tracts, i.e., the number of tandemly repeated motifs, can be modified during replication due to slipped-strand mispairing, and can consequently influence translation or transcription [48–50]. In Neisseria species, phase variation is consistently associated with reversible changes within such simple DNA repeats. The repertoire of putative phase variable genes was originally determined by the identification of simple DNA repeats in the MC58 genome sequence using criteria based upon the length of the repeat, its position within the gene, and knowledge of tracts in established phase-variable genes in Neisseria. This analysis identified 65 potentially phase-variable genes [51]. Currently, based on the MC58 genome sequence there is experimental evidence of phase variation for 14 genes encoding opacity proteins [52], lipopolysaccharide biosynthesis proteins [53], proteins involved in the biosynthesis and modification of pili [54, 55], hemoglobin receptors [56, 57], PorA outer membrane protein [58], Opc outer membrane protein [48], capsular polysaccharide biosynthesis proteins [49], and a putative adhesin (nadA) [59]. Further computational analysis of the remaining 51 putative phasevariable genes identified in the MC58 genome sequence predicted that 33 genes could be considered phase-variable and 18 nonvariable [59]. In a complementary approach, the results of independent analyses of the complete genome sequences of N. meningitidis strain Z2491 and N. gonorrhoeae strain FA1090 were combined with the previous analysis of N. meningitidis strain MC58 [51] and a comprehensive genomic comparison was carried out to determine the repertoires of potentially phase-variable genes [47]. This comparative whole-genome approach identified 68 phase-variable gene candidates in N. meningitidis strain Z2491, 83 candidates in N. gonorrhoeae strain FA1090, and 82 candidates in N. meningitidis strain MC58 [47, 51]. The number of potentially phase-variable genes in N. meningitidis is substantially greater than that for any other species studied to date [51]. In addition to the sheer number of genes identified, another noteworthy feature is the diversity of functions observed in these genes. While a significant number of genes (31 of 61 with known functions or homologies) fall

237

238

11 Genomes of Pathogenic Neisseria Species

within the functional categories traditionally associated with phase variation and virulence – e.g., surface proteins, lipopolysaccharide and sugar metabolism genes, toxins and restriction modification genes – a large number of the remainder appear to have other, as yet unknown functions. An additional mechanism by which Neisseria species adapt to their niche is the variation in the number of the coding tandem repeats within protein coding genes [60]. Coding tandem repeats are those which do not alter the reading frame with copy number, and the changes in copy number of these repeats may then potentially alter the function or antigenicity of the protein encoded. A total of 28 genes were identified in the three complete neisserial genomes containing coding tandem repeats that had structural consequences for the encoded protein [60]. Fourteen genes encode predicted and known surface proteins with coding repeat copy number variation that might result in antigenic variation, including pilQ, several putative lipoproteins, the Lip/H.8 antigen, AniA, and a putative adhesin (NMB0586), amongst others. Remarkably, some of the genes identified encode proteins with cytoplasmic functions, including sugar metabolism, DNA repair, and protein production, in which repeat length variation may have other functions. This large repertoire of genes potentially subject to phase variation results in a stochastically dynamic population of bacteria that maximizes the likelihood of successful establishment in new hosts. Individual neisserial colonies are therefore genetically and phenotypically heterogeneous as a result of the presence of these phase-variable genes.

11.2.2.3 Insertion Sequences and the Regulation of Gene Expression The genomes are also peppered by several insertion sequences (IS). Most of the insertion sequences vary in size from 0.5 kbp to 1.3 kbp and are members of at least four specific DNA families: IS1016, IS1106, IS1655, and IS4351 [12, 13]. For example, the genome of strain MC58 contains 22 intact and 29 remnant IS [13]. Other repetitive sequence elements in the N. meningitidis genome are concentrated within intergenic repeat arrays of 0.2–2.7 kbp. These repeat arrays are composed of several different repeat types including larger units such as so-called “Correia elements” (CEs). They have been described in both N. meningitidis and N. gonorrhoeae [61] and represent about 2% of the N. meningitidis genomes [62]. While the number of CEs is comparable in the N. meningitidis genomes, it is considerably reduced in the N. lactamica and N. gonorrhoeae genomes [63, 64], although estimates of the absolute numbers of CEs vary depending on the methodologies applied [12, 13, 63, 64]. However, the abundance of CEs and the percentage of nucleotides contained in these repetitive elements in the neisserial genomes are higher than those described for comparable intergenic repeats of other prokaryotes [63]. CEs are sequence indels comparable to small insertion sequences 100–155 bp in length, but in contrast to conventional IS elements do not encode a transposase [63–65]. They carry transcription initiation signals [66, 67] as well as functional integration host factor (IHF) binding sites [65, 68], and

11.2 Genomes of Pathogenic Neisseria Species

hence may play a role in modulating gene expression. For example, in gonococci IHF has been shown to bind proximal of the three pilE promoters and function as a transcriptional coactivator [69], whereas it had a negative impact on mtrC transcription in meningococci [68]. In addition, there is growing evidence that CEs may influence gene expression at the posttranscriptional level [64, 70]. The abundance of CEs in the different Neisseria genomes suggests that they may have played a major role in genome organization, function, and evolution. Their differential distribution in different Neisseria strains may also contribute to the distinct behaviors of each Neisseria species. Another class of repetitive extragenic palindromic (REP) sequence called REP2 was present in 26 copies in the Z2491 genome [12]. Like CE, it was found to contain promoter as well as ribosome binding sites and to influence the expression of a set of genes such as pilC1 and crgA which are necessary for the efficient interaction of N. meningitidis with host cells [71]. Taken together, three mechanisms of repeat-mediated antigenic variation operate within the N. meningitidis genome: (1) on/off switching and transcriptional modulation of gene expression by slipped-strand mispairing of short tandem repeats or by reversible insertion of IS elements; (2) intragenomic recombination of localized repeats leading to the use of different carboxy termini for surfaceexposed proteins; and (3) intergenomic gene conversion of specific surface-associated genes associated with large arrays of global repeats, mediated by the internalization of related DNA through the highly repetitive DNA uptake sequence. Phase and antigenic variation not only facilitate microbial evasion of immune responses, but the resulting polymorphisms influence all aspects of gonococcal and meningococcal biology. 11.2.3 Genome-Wide Mutational Analyses

On the basis of signature-tagged mutagenesis (STM) of a clinical isolate of + N. meningitidis serogroup B strain C311 (ET5), 73 genes were identified that were essential for bacteremia in an infant rat model [72]. In addition to eight genes encoding already known virulence factors, 65 genes were identified that were not previously known to be involved in meningococcal pathogenesis. Amongst others, two regulatory genes (ntrY and hfq) affecting the transcription at many loci were identified, as were two putative surface-associated ATP-binding cassette transporters. Seven mutants had insertions in genes necessary for the integrity of the cell envelope. In addition to five genes encoding enzymes of the shikimate pathway, an additional six genes involved in amino acid biosynthesis were identified. Although the positions of attenuating mutations were widely distributed throughout the serogroup B meningococcus genome, two clusters could be identified. The first contained insertions in three genes responsible for iron transport proteins (NMB1728–1730) and the second cluster included insertions in four genes of hitherto unknown function (NMB1954–2006).

239

240

11 Genomes of Pathogenic Neisseria Species

With a similar STM approach, four previously uncharacterized genes (NMB0065, NMB0352, NMB0638, and NMB2076) involved in polysialic acid capsule production were found which when mutated rendered a N. meningitidis serogroup C (8013) ST177 IST-18 complex strain serum-sensitive [73]. Due to the screening method used, 14 of the 18 genes identified were important for the synthesis of the polysialic acid capsule or the lipooligosaccharide, suggesting that these genes are likely to be the only meningococcal attributes necessary for serum resistance. However, since the authors only present the sequencing results for the first half of the library, corresponding to 2281 mutants, it may be still too early to draw definitive conclusions. Unfortunately, the fact that the two isolates used in the STM studies have not been used in other studies of meningococcal biology and are not well-characterized, representative variants of hyperinvasive lineages restricts the value of these results to some degree [74]. 11.2.4 Comparative Genomics

Comparative genome analyses of close relatives are a promising way to yield insights into the sources of microbial genome variability with respect to gene content, gene order, and the existence and distribution of potential virulence genes. These differences have also been identified in Neisseria species on the basis of computational and experimental techniques such as subtractive hybridization and DNA array technology. Based on reciprocal best-hit BLASTP analyses [75] and the primary annotations as given in Refs. [12, 13, 16], 1406 ORFs are similar in all Neisseria species and may thus be an approximation to the genus-specific core genome (Fig. 11.2A). In particular, 313 of the gonococcal proteins belonging to the core proteome are more similar to the homologues N. meningitidis B MC58 proteins than to their homologues in N. meningitidis A Z2491, and 299 of these are in turn more similar to their counterparts in N. meningitidis A Z2491, according to the expect (E) values in pairwise BLASTP comparisons (Fig. 11.2B). The remaining 794 proteins have identical E values. Synteny plots (Fig. 11.2C, D) show that the gene order is well conserved over extended regions of the neisserial genomes, although there are also numerous large-scale rearrangements between the chromosomes such as transition and inversions (Fig. 11.1). For example, relative to the MC58 chromosome there is a large inversion of almost 1 Mbp around the replication origin in the Z2491 genome (Fig. 11.1, Fig. 11.2D) [13], which is in agreement with results from experimental work on physical chromosomal maps of N. meningitidis [76, 77] and N. gonorrhoeae [78, 79]. Although some large regions of the Z2491 chromosomal sequence (5–40 kbp) were found to be absent from the gonococcus FA1090 and could therefore represent meningococcus-specific islands, a significant proportion of this DNA is also absent from N. meningitidis MC58. In addition to in silico comparisons, experimental techniques such as DNA arrays [80] or representational difference analysis [27, 81–84] have also been used

11.2 Genomes of Pathogenic Neisseria Species

Fig. 11.2 Proteome comparison of three neisserial genomes based on the primary annotation and bidirectional best hit BLASTP analysis [75]. (A) Venn diagram of the encoded proteomes. (B) Comparison of the core proteome. Each dot represents a single N. gonorrhoeae FA1090 protein, plotted by its

BLASTP E value to the most similar protein from each of the two N. meningitidis strains depicted on the x and y axes. Symmetrical hits are positioned along a diagonal line. Panels (C) and (D) display synteny plots to compare the order of homologous genes between two annotated genomes as given.

to analyze the genetic differences between various strains of the pathogenic Neisseria species and apathogenic N. lactamica in more detail. Most of the genes that were shared by all virulent strains of N. meningitidis (78% of the chromosome; 1.7 Mbp) were also present in all isolates of the three species N. meningitidis, N. gonorrhoeae, and N. lactamica and thus may correspond to the neisserial core genome [80]. Of the rest, 46 kbp were found to be strictly meningococcus-specific, that is, present in all strains of invasive meningococci and absent from all the N. gonorrhoeae and N. lactamica strains. Seventy-three kilobasepairs of the Z2491 genome are pathogen-specific, i.e., shared with the gonococcus and absent from N. lactamica. Twelve kilobasepairs are shared with N. lactamica but absent from the gonococcus. The analysis of the genes corresponding to these sequences reveals that they correspond in general to single genes or small groups of genes scattered around the genome, since only one genetic island more than 10 kbp in length could thus be detected (NMA0687–NMA0698) [80].

241

242

11 Genomes of Pathogenic Neisseria Species

The majority of the experimentally identified genetic differences between N. meningitidis A Z2491 and N. gonorrhoeae FA1090 might therefore encode functions specific to bacteremia and invasion of the meninges. They are clustered in nine distinct regions on the Z2491 genome ranging in size from 1.8 kbp to 40 kbp [82, 83]. These regions together comprise roughly 5% of the chromosome of N. meningitidis and contain 84 ORFs, of which 43 are homologous to ORFs already described in other species [83]. Based on the distribution of DNA uptake sequences, gene content, and sequence homology, it has been speculated that N. meningitidis-specific regions may have arisen by import from other species via recombination in the homologous flanking DNA [83], although there were no repeat structures similar to those flanking pathogenicity islands in Enterobacteriaceae [32]. In addition, not all regions were present in a set of 13 different N. meningitidis strains additionally tested, and some were also present in nonpathogenic N. lactamica. In particular, region 1 contains the capsular polysaccharide biosynthesis genes; region 2, which is highly conserved among the strains compared, contains two genes with homology to genes in Bordetella pertussis encoding the filamentous hemagglutinin precursor FhaB and the accessory protein FhaC involved in the secretion of FhaA. Region 3 contains the gpxA gene encoding glutathione peroxidase and the 39.3-kbp prophage Pnm1. Region 4, which is absent from a large proportion of other N. meningitidis strains, contains two genes of unknown function, and region 5, which is also present in N. lactamica, encodes a restriction/modification system. Region 6 encodes a pseudogene and is deleted in most of the strains. With the exception of strain Z2491, it was found that region 7 contained functional copies of genes with homologies to genes of the type I secretion apparatus in other bacteria (hlyD and tolC), suggesting that this region might be involved in virulence in some of the N. meningitidis strains [83]. Region 8 contains the two genes fhuA and dsbA that were present in all strains, although fhuA is a pseudogene in some strains. fhuA is homologous to an Escherichia coli ferrichrome-iron receptor gene that is involved in the uptake of siderophore-bound iron, and dsbA is similar to genes in E. coli that encode a disulfide oxidoreductase involved in the correct folding of secreted proteins. Finally, region 9 contains five ORFs of unknown function. However, neither resistance to complement killing nor adherence to or invasion of human umbilical vein epithelial cells (HUVEC) cells were affected when region 2, 3, 7, 8, or 9 was deleted from the test strains [83], and only deletion mutants lacking region 8 showed reduced survival in the bloodstream in an infant rat model. Thus, this region, together with region 1 encoding well-known virulence factors, may be necessary for dissemination in the bloodstream and thus for causing meningitis [83]. On a population scale, both restriction modification systems [27, 84] and plasmids [26, 85] were found to be differentially distributed among clonal groupings of meningococci possibly contributing to the genetic isolation of hypervirulent lineages. Remarkably, none of the pathogen-specific regions experimentally identified have the characteristics typical of pathogenicity islands, bacteriophages, or compound transposons [80, 83], structures which are associated with the introduction into bacterial chromosomes of foreign DNA coding for virulence factors. There-

11.2 Genomes of Pathogenic Neisseria Species

fore, dramatic differences in pathogenic potential may result from only small genetic changes. A detailed comparative analysis of the genomic locus comprising the genes responsible for capsule synthesis, modification, and transport named cps was done with four meningococcal strains, one gonococcal strain, and two commensal Neisseria strains, i.e., N. lactamica and N. sicca (Fig. 11.3). All analyzed Neisseria species harbored the tex gene, whereas the region D containing the galE gene and the rfb genes was only found in meningococci, gonococci, and the closest relatives of meningococci among the apathogenic Neisseria species, i.e. N. lactamica. Thus, the tex gene may be the ancient core of the cps locus. The import of the lipA and lipB genes may have coincided with a duplication and rearrangement of region D, resulting in a truncated duplicate, termed D¢, and truncation of the gonococcal methyltransferase genes in meningococci. The evolution of the cps locus seems to

Fig. 11.3 Comparison of the cps locus of pathogenic and apathogenic Neisseria species. Regions with identical functions are depicted in the same colors. The following sequence data were used to compare the cps loci: NmB B1940: assembled from the GenBank sequences L09188, L09189, M57677, M95053, and Z13995; NmB MC58: AE002098; NmA Z2491: AL157959; NmC FAM18: NC_003221, homologous genes according to BLASTX [75] searches against the annotated neisserial genomes; Ng FA1090: AE004969; Nl DSM4691 and

Ns LMG5290: H. Claus and U. Vogel, unpublished data. The gene assignment of the meningococcal sequences was done following the annotation of the MC58 genome. Ng, N. gonorrhoeae; Nl, N. lactamica; Nm, N. meningitidis; Ns, N. sicca; DSM, German type culture collection (Deutsche Sammlung von Mikroorganismen), Braunschweig, Germany; LMG, Belgian collection of microorganisms (Laboratorium voor Microbiologie, University of Gent), Gent, Belgium. (This figure also appears with the color plates.)

243

244

11 Genomes of Pathogenic Neisseria Species

be a result of successive imports, deletions, and rearrangements [86]. Interestingly, in strain B1940 the DNA fragment comprising regions D¢ through D exhibits an inverse orientation compared to the other meningococcal cps sequences. This inversion was also observed in strains belonging to the same sequence type, indicating a high frequency of inversion in vivo (H. Claus et al. 2001, unpublished). 11.2.5 Novel Virulence Factors of Meningococci Identified by Genomic Approaches

Besides the capsular polysaccharide of meningococci, many virulence factors were identified in gonococci and meningococci in the pregenomic era. Table 11.2 shows a selection of well-studied virulence factors in the pathogenic Neisseria species.

Table 11.2 Presence of virulence genes in the four sequenced genomes of

pathogenic Neisseria spp. as identified by TBLASTN [75] searches[a].

Virulence gene

N. gonorrhoeae

N. meningitidis Serogroup A

Serogroup B

Serogroup C

Strain FA1090

Strain Z2491

Strain MC58

Strain FAM18

galE

+

+

+

+

pgm

+

+

+

+

lgtA

+

+

+

–

lgtB

LOS synthesis

+

+

+

+

[b]

+

–

–

–

[b]

+

–

–

–

+

–

+

–

lgtH

–

+

–

+

rfaF

+

+

+

+

rfbC

+

+

+

+

+

+

+

+

lgtC

lgtD lgtE

[c]

LOS sialylation lst

11.2 Genomes of Pathogenic Neisseria Species Table 11.2 Continued.

Virulence gene

N. gonorrhoeae

N. meningitidis Serogroup A

Serogroup B

Serogroup C

Strain FA1090

Strain Z2491

Strain MC58

Strain FAM18

ctrA

–

+

+

+

ctrB

–

+

+

+

ctrC

–

+

+

+

ctrD

–

+

+

+

lipA

–

+

+

+

lipB

–

+

+

+

mynA[c]

–

+

–

–

[c]

–

+

–

–

[c]

–

+

–

–

[c]

mynD

–

+

–

–

siaA

–

–

+

+

siaB

–

–

+

+

siaC

–

–

+

+

siaD

–

–

+

+

pilC1/pilC2

+

+

+

+

pilE

+

+

+

+

pilM

+

+

+

+

pilN

+

+

+

+

pilO

+

+

+

+

pilP

+

+

+

+

pilQ

+

+

+

+

Capsule expression

mynB

mynC

Type IV pilus proteins

245

246

11 Genomes of Pathogenic Neisseria Species Table 11.2 Continued.

Virulence gene

N. gonorrhoeae

N. meningitidis Serogroup A

Serogroup B

Serogroup C

Strain FA1090

Strain Z2491

Strain MC58

Strain FAM18

opa

+

+

+

+

opcA

–

+

+

porA

–

+

+

+

porB

+

+

+

+

+

+

+

+

exbB

+

+

+

+

exbD

+

+

+

+

fbpA

+

+

+

+

fbpB

+

+

+

+

fbpC

+

+

+

+

fhuA

–

+

+

+

Adhesins

IgA protease iga Iron uptake systems

hmbR

+

+

+

+

[c]

+

+

–

+

[c]

hpuB

+

+

–

+

lbpA

–

+

+

+

lbpB

–

+

+

+

tbpA

+

+

+

+

tbpB

+

+

+

+

tonB

+

+

+

+

hpuA

11.2 Genomes of Pathogenic Neisseria Species Table 11.2 Continued.

Virulence gene

N. gonorrhoeae

N. meningitidis Serogroup A

Serogroup B

Serogroup C

Strain FA1090

Strain Z2491

Strain MC58

Strain FAM18

fetA

+

+

+

+

frpA

–

–

+

+

frpC

–

–

+

+

sodB

+

+

+

+

sodC

–

+

+

+

Toxins

Superoxide dismutase

a Unless stated otherwise, the corresponding N. meningitidis B MC58 encoded proteins were used as query and the genomes sequences of N. gonorrhoeae FA1090 [11], N. meningitidis A Z2491 [12], and N. meningitidis C FAM18 [14] as databases. Sequences were considered to be homologous if they shared a minimum sequence identity of 60% over at least 60% of the query length with expect values E < e–70. b TBLASTN search with the corresponding FA1090 encoded protein. c TBLASTN search with the corresponding Z2491 encoded protein.

Whole-genome sequences in combination with bioinformatic tools provided a way of discovering previously unknown proteins. In a first approach of so-called “reverse vaccinology” the whole genome of meningococcal serogroup B strain MC58 was screened for the identification of new vaccine candidates [87]. Several of the identified proteins shared homologies with known virulence factors of other bacteria. To verify whether the previously unknown proteins had a role in the pathogenesis of meningococci, some of these proteins were further characterized biochemically and functionally. The genome-derived Neisseria antigen (GNA) 33 is a membrane-bound lipoprotein with murein hydrolase activity that is present in all Neisseria species and well conserved in different meningococcal isolates [88]. Functional analyses of the gene revealed that it plays an important role in peptidoglycan metabolism, cell separation, and membrane architecture. The inability of a knockout mutant to cause bacteremia in the infant rat model also indicated a role in virulence. In GNA992, a putative surface-exposed protein homologue to two adhesins (Hsf and Hia) of Haemophilus influenzae was identified. Secondary structure anal-

247

248

11 Genomes of Pathogenic Neisseria Species

ysis of GNA992 predicted a specific folding which has been observed in eukaryotic molecules involved in cell–cell adhesion and also in bacterial invasins, suggesting that this structure could be functionally relevant for the interaction with the host cell [89]. App (adhesion and penetration protein) (NMB1985) is a member of the autotransporter family of proteins and a homologue of the Hap (Haemophilus adhesion and penetration) protein of H. influenzae that plays a role in the interaction with human epithelial cells. This function could also be assigned to App. However, in comparison to the wildtype strain, a knockout mutant showed only a reduction of adhesion to Chang epithelial cells, whereas no difference was observed with Hep2 epithelial cells and HUVEC endothelial cells, respectively [90, 91]. NadA (Neisseria adhesion A) (NMB1994) is a surface-exposed protein which is able to elicit bactericidal antibodies. Fifty percent of the strains isolated from patients harbor the nadA gene, versus only 16.2% of the strains isolated from healthy carriers. NadA is present in strains of the ST-8, ST-11, and ST-32 complex, whereas it is absent in strains of the ST-41/44 complex as well as in gonococci and in the commensal species N. lactamica and N. cinerea [92, 93]. Meningococcal NadA knockout mutants showed a reduced ability to adhere to and invade Chang epithelial cells in vitro. Furthermore, NadA is able to promote adhesion of E. coli to human epithelial cells but not to endothelial cells [94]. NarE (Neisseria ADP-ribosylating enzyme) (NMB1343) was identified as a structural homologue of cholera toxin in meningococci by an in silico approach. In biochemical studies it was verified that NarE possesses ADP-ribosylation and NAD-glycohydrolase activity. The protein is exported into the periplasm of meningococci. The narA gene is always present in ST-32 and St-41/44 strains but absent in ST-8 and ST-11 strains. In gonococci, NarE is not expressed because of a frameshift in the narE gene [95, 96]. By microarray analysis of meningococcal gene expression in response to growth with iron, a Fur-dependent, up-regulated gene was identified which belonged to a putative operon comprising three genes encoding proteins so far of unknown function [97]. Further characterization of this operon showed the necessity of this operon for protection of meningococci against hydrogen peroxide-mediated killing. The deletion mutant exhibited increased sensitivity to reactive oxygen species-producing cells [98]. The screening of the above-mentioned genome-wide collection of defined mutants [73] identified a mutant which presented reduced adherence to both human endothelial cells and human epithelial cells. The adherence of the socalled pilX mutant was as low as that of a nonpiliated mutant. The pilin-like protein PilX copurified with the pilus fiber and was found to be essential for bacterial aggregation, i.e., for formation of bacterial microcolonies. For this reason, the reduced number of adherent bacteria of the mutant resulted from the absence of interbacterial interactions [99].

References

11.3 Future Perspectives

To further elucidate the genetic differences that might be responsible for the differences in the pathogenic potential of different N. meningitidis isolates, the genomes of two nonpathogenic isolates of N. meningitidis (ST-53 and ST-136 strains) [7] have been sequenced and their annotation is still in progress (H. Claus et al., 2005). A comparison of these genomes with the already sequenced genomes of the virulent N. meningitidis isolates will not only reveal these subtle differences, and hence further our understanding of the pathogenesis of neisserial diseases, but may also provide new targets for therapeutic or prophylactic intervention.

Acknowledgments

Sequence data were produced by the Neisseria meningitidis Serogroup C strain FAM18 Sequencing Group at the Sanger Institute and can be obtained from ftp://ftp.sanger.ac.uk/pub/pathogens/nm/. We also acknowledge the Gonococcal Genome Sequencing Project supported by USPHS/NIH grant#AI38399, and B.A. Roe, L. Song, S.P. Lin, X. Yuan, S. Clifton, Tom Ducey, Lisa Lewis, and D.W. Dyer at the University of Oklahoma.

References 1 Guibourdenche, M., M. Y. Popoff and

J. Y. Riou. 1986. Deoxyribonucleic acid relatedness among Neisseria gonorrhoeae, N. meningitidis, N. lactamica, N. cinerea and “Neisseria polysaccharea”. Ann. Inst. Pasteur Microbiol. 137B: 177–185. 2 Saunders, N. J., D. W. Hood and E. R. Moxon. 1999. Bacterial evolution: bacteria play pass the gene. Curr. Biol. 9: R180–R183. 3 Brooks, G. F., J. S. Butel and S. A. Morse. 2004. Jawetz, Melnick & Adelberg’s Medical Microbiology. Lange Medical Books/McGraw-Hill, New York, p. 295– 304. 4 Brunham, R. C., and J. E. Embree. 1992. Sexually transmitted diseases: current and future dimensions of the problem in the third world. In: Reproductive tract infections: global impact and priorities for women’s reproductive health.

A. Germain K., K. Holmes and P. Piot A, editors. Plenum Press, New York, pp 35–58. 5 Cohen, M. S., and J. G. Cannon. 1999. Human experimentation with Neisseria gonorrhoeae: progress and goals. J. Infect. Dis. 179 Suppl. 2: S375–S379. 6 Moxon, R., and R. Rappuoli. 2002. Bacterial pathogen genomics and vaccines. Br. Med. Bull. 62: 45–58. 7 Claus, H., M. C. Maiden, R. Maag, M. Frosch and U. Vogel. 2002. Many carried meningococci lack the genes required for capsule synthesis and transport. Microbiology 148: 1813–1819. 8 Rosenstein, N. E., B. A. Perkins, D. S. Stephens, T. Popovic and J. M. Hughes. 2001. Meningococcal disease. N. Engl. J. Med. 344: 1378–1388. 9 Hart, C. A., and L. E. Cuevas. 1997. Meningococcal disease in Africa. Ann. Trop. Med. Parasitol. 91: 777–785.

249

250

11 Genomes of Pathogenic Neisseria Species 10 Connolly, M., and N. Noah. 1999. Is

group C meningococcal disease increasing in Europe? A report of surveillance of meningococcal infection in Europe 1993–6. European Meningitis Surveillance Group. Epidemiol. Infect. 122: 41–49. 11 http://www.ncbi.nlm.nih.gov/genomes/ framik.cgi?db=genome&gi=635, http://www.genome.ou.edu/gono.html 12 Parkhill, J., M. Achtman, K. D. James, S. D. Bentley, C. Churcher, S. R. Klee, G. Morelli, D. Basham, D. Brown, T. Chillingworth, R. M. Davies, P. Davis, K. Devlin, T. Feltwell, N. Hamlin, S. Holroyd, K. Jagels, S. Leather, S. Moule, K. Mungall, M. A. Quail, M. A. Rajandream, K. M. Rutherford, M. Simmonds, J. Skelton, S. Whitehead, B. G. Spratt and B. G. Barrell. 2000. Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491. Nature 404: 502–506. 13 Tettelin, H., N. J. Saunders, J. Heidelberg, A. C. Jeffries, K. E. Nelson, J. A. Eisen, K. A. Ketchum, D. W. Hood, J. F. Peden, R. J. Dodson, W. C. Nelson, M. L. Gwinn, R. DeBoy, J. D. Peterson, E. K. Hickey, D. H. Haft, S. L. Salzberg, O. White, R. D. Fleischmann, B. A. Dougherty, T. Mason, A. Ciecko, D. S. Parksey, E. Blair, H. Cittone, E. B. Clark, M. D. Cotton, T. R. Utterback, H. Khouri, H. Qin, J. Vamathevan, J. Gill, V. Scarlato, V. Masignani, M. Pizza, G. Grandi, L. Sun, H. O. Smith, C. M. Fraser, E. R. Moxon, R. Rappuoli and J. C. Venter. 2000. Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science 287: 1809–1815. 14 Bentley, S., J. Parkhill and T. P. S. Unit. 2002. Yet another Neisseria meningitidis genome sequence – serogroup C FAM18. In: 13th International Pathogenic Neisseria Conference. D. A. Caugant and E. Wedege, editors. Norwegian Institute of Public Health, Oslo, Norway, p 75. 15 Blattner, F. R., G. Plunkett, 3rd, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Collado–Vides, J. D. Glasner, C. K. Rode, G. F. Mayhew, J. Gregor, N. W. Davis, H. A. Kirkpatrick, M. A. Goeden,

D. J. Rose, B. Mau and Y. Shao. 1997. The complete genome sequence of Escherichia coli K-12. Science 277: 1453–1474. 16 Anonymus (1997). Neisseria gonorrhoeae database: Los Alamos National Laboratory Bioscience Division. http:// www.stdgen.lanl.gov/stdgen/bacteria/ ngon/. 17 Roberts, M., P. Piot and S. Falkow. 1979. The ecology of gonococcal plasmids. J. Gen. Microbiol. 114: 491–494. 18 Hagblom, P., C. Korch, A. B. Jonsson and S. Normark. 1986. Intragenic variation by site-specific recombination in the cryptic plasmid of Neisseria gonorrhoeae. J. Bacteriol. 167: 231–237. 19 Roberts, M., L. P. Elwell and S. Falkow. 1977. Molecular characterization of two beta-lactamase-specifying plasmids isolated from Neisseria gonorrhoeae. J. Bacteriol. 131: 557–563. 20 Ikeda, F., A. Tsuji, Y. Kaneko, M. Nishida and S. Goto. 1986. Conjugal transfer of beta-lactamase-producing plasmids of Neisseria gonorrhoeae to Neisseria meningitidis. Microbiol. Immunol. 30: 737–742. 21 Sox, T. E., W. Mohammed, E. Blackman, G. Biswas and P. F. Sparling. 1978. Conjugative plasmids in Neisseria gonorrhoeae. J. Bacteriol. 134: 278–286. 22 Dillon, J. A., and K. H. Yeung. 1989. Beta-lactamase plasmids and chromosomally mediated antibiotic resistance in pathogenic Neisseria species. Clin. Microbiol. Rev. 2 Suppl: S125–S133. 23 Backman, A., P. Orvelid, J. A. Vazquez, O. Skold and P. Olcen. 2000. Complete Sequence of a beta-lactamase-encoding plasmid in Neisseria meningitidis. Antimicrob. Agents Chemother. 44: 210– 212. 24 Facinelli, B., and P. E. Varaldo. 1987. Plasmid-mediated sulfonamide resistance in Neisseria meningitidis. Antimicrob. Agents Chemother. 31: 1642– 1643. 25 Knapp, J. S., J. M. Zenilman, J. W. Biddle, G. H. Perkins, W. E. DeWitt, M. L. Thomas, S. R. Johnson and S. A. Morse. 1987. Frequency and distribution in the United States of strains of Neisseria gonorrhoeae with plasmid-mediated,

References high-level resistance to tetracycline. J. Infect. Dis. 155: 819–822. 26 Hilse, R., J. Stoevesandt, D. A. Caugant, H. Claus, M. Frosch and U. Vogel. 2000. Distribution of the meningococcal insertion sequence IS1301 in clonal lineages of Neisseria meningitidis. Epidemiol. Infect. 124: 337–340. 27 Claus, H., A. Friedrich, M. Frosch and U. Vogel. 2000. Differential distribution of novel restriction-modification systems in clonal lineages of Neisseria meningitidis. J. Bacteriol. 182: 1296– 1303. 28 Morgan, G. J., G. F. Hatfull, S. Casjens and R. W. Hendrix. 2002. Bacteriophage Mu genome sequence: analysis and comparison with Mu-like prophages in Haemophilus, Neisseria and Deinococcus. J. Mol. Biol. 317: 337–359. 29 Masignani, V., M. M. Giuliani, H. Tettelin, M. Comanducci, R. Rappuoli and V. Scarlato. 2001. Mu-Like prophage in serogroup B Neisseria meningitidis coding for surface-exposed antigens. Infect. Immun. 69: 2580–2588. 30 Casjens, S. 2003. Prophages and bacterial genomics: what have we learned so far? Mol. Microbiol. 49: 277–300. 31 Karlin, S., A. M. Campbell and J. Mrazek. 1998. Comparative DNA analysis across diverse genomes. Annu. Rev. Genet. 32: 185–225. 32 Hacker, J., and J. B. Kaper. 2000. Pathogenicity islands and the evolution of microbes. Annu. Rev. Microbiol. 54: 641–679. 33 Goodman, S. D., and J. J. Scocca. 1988. Identification and arrangement of the DNA sequence recognized in specific transformation of Neisseria gonorrhoeae. Proc. Natl. Acad. Sci. U. S. A. 85: 6982–6986. 34 Smith, H. O., M. L. Gwinn and S. L. Salzberg. 1999. DNA uptake signal sequences in naturally transformable bacteria. Res. Microbiol. 150: 603–616. 35 Frosch, M., and T. F. Meyer. 1992. Transformation-mediated exchange of virulence determinants by co-cultivation of pathogenic Neisseriae. FEMS Microbiol. Lett. 79: 345–349. 36 Linz, B., M. Schenker, P. Zhu and M. Achtman. 2000. Frequent interspeci-

37

38

39

40

41

42

43

44

45

46

fic genetic exchange between commensal Neisseriae and Neisseria meningitidis. Mol. Microbiol. 36: 1049–1058. Kroll, J. S., K. E. Wilks, J. L. Farrant and P. R. Langford. 1998. Natural genetic exchange between Haemophilus and Neisseria: intergeneric transfer of chromosomal genes between major human pathogens. Proc. Natl. Acad. Sci. U. S. A. 95: 12381–12385. Fudyk, T. C., I. W. Maclean, J. N. Simonsen, E. N. Njagi, J. Kimani, R. C. Brunham and F. A. Plummer. 1999. Genetic diversity and mosaicism at the por locus of Neisseria gonorrhoeae. J. Bacteriol. 181: 5591–5599. Maiden, M. C., B. Malorny and M. Achtman. 1996. A global gene pool in the neisseriae. Mol. Microbiol. 21: 1297– 1298. Smith, J. M., N. H. Smith, M. O’Rourke and B. G. Spratt. 1993. How clonal are bacteria? Proc. Natl. Acad. Sci. U. S. A. 90: 4384–4388. Feil, E. J., M. C. Maiden, M. Achtman and B. G. Spratt. 1999. The relative contributions of recombination and mutation to the divergence of clones of Neisseria meningitidis. Mol. Biol. Evol. 16: 1496–1502. Holmes, E. C., R. Urwin and M. C. Maiden. 1999. The influence of recombination on the population structure and evolution of the human pathogen Neisseria meningitidis. Mol. Biol. Evol. 16: 741–749. Vogel, U., H. Claus and M. Frosch. 2000. Rapid serogroup switching in Neisseria meningitidis. N. Engl. J. Med. 342: 219–220. Swartley, J. S., A. A. Marfin, S. Edupuganti, L. J. Liu, P. Cieslak, B. Perkins, J. D. Wenger and D. S. Stephens. 1997. Capsule switching of Neisseria meningitidis. Proc. Natl. Acad. Sci. U. S. A. 94: 271–276. Davidsen, T., E. A. Rodland, K. Lagesen, E. Seeberg, T. Rognes and T. Tonjum. 2004. Biased distribution of DNA uptake sequences towards genome maintenance genes. Nucleic Acids Res. 32: 1050–1058. Morelle, S., E. Carbonnelle, I. Matic and X. Nassif. 2005. Contact with host cells

251

252

11 Genomes of Pathogenic Neisseria Species induces a DNA repair system in pathogenic Neisseriae. Mol. Microbiol. 55: 853–861. 47 Snyder, L. A. S., S. A. Butcher and N. J. Saunders. 2001. Comparative wholegenome analyses reveal over 100 putative phase-variable genes in the pathogenic Neisseria spp. Microbiology 147: 2321–2332. 48 Sarkari, J., N. Pandit, E. R. Moxon and M. Achtman. 1994. Variable expression of the Opc outer membrane protein in Neisseria meningitidis is caused by size variation of a promoter containing polycytidine. Mol. Microbiol. 13: 207–217. 49 Hammerschmidt, S., A. Muller, H. Sillmann, M. Muhlenhoff, R. Borrow, A. Fox, J. van Putten, W. D. Zollinger, R. Gerardy-Schahn and M. Frosch. 1996. Capsule phase variation in Neisseria meningitidis serogroup B by slipped-strand mispairing in the polysialyltransferase gene (siaD): correlation with bacterial invasion and the outbreak of meningococcal disease. Mol. Microbiol. 20: 1211–1220. 50 van der Ende, A., C. T. Hopman, S. Zaat, B. B. Essink, B. Berkhout and J. Dankert. 1995. Variable expression of class 1 outer membrane protein in Neisseria meningitidis is caused by variation in the spacing between the -10 and -35 regions of the promoter. J. Bacteriol. 177: 2475–2480. 51 Saunders, N. J., A. C. Jeffries, J. F. Peden, D. W. Hood, H. Tettelin, R. Rappuoli and E. R. Moxon. 2000. Repeatassociated phase variable genes in the complete genome sequence of Neisseria meningitidis strain MC58. Mol. Microbiol. 37: 207–215. 52 de Vries, F. P., A. van Der Ende, J. P. van Putten and J. Dankert. 1996. Invasion of primary nasopharyngeal epithelial cells by Neisseria meningitidis is controlled by phase variation of multiple surface antigens. Infect. Immun. 64: 2998–3006. 53 Jennings, M. P., Y. N. Srikhanta, E. R. Moxon, M. Kramer, J. T. Poolman, B. Kuipers and P. van der Ley. 1999. The genetic basis of the phase variation repertoire of lipopolysaccharide immu-

notypes in Neisseria meningitidis. Microbiology 145: 3013–3021. 54 Jonsson, A. B., G. Nyberg and S. Normark. 1991. Phase variation of gonococcal pili by frameshift mutation in pilC, a novel gene for pilus assembly. EMBO J. 10: 477–488. 55 Jennings, M. P., M. Virji, D. Evans, V. Foster, Y. N. Srikhanta, L. Steeghs, P. van der Ley and E. R. Moxon. 1998. Identification of a novel gene involved in pilin glycosylation in Neisseria meningitidis. Mol. Microbiol. 29: 975– 984. 56 Richardson, A. R., and I. Stojiljkovic. 1999. HmbR, a hemoglobin–binding outer membrane protein of Neisseria meningitidis, undergoes phase variation. J. Bacteriol. 181: 2067–2074. 57 Lewis, L. A., M. Gipson, K. Hartman, T. Ownbey, J. Vaughn and D. W. Dyer. 1999. Phase variation of HpuAB and HmbR, two distinct haemoglobin receptors of Neisseria meningitidis DNM2. Mol. Microbiol. 32: 977–989. 58 van der Ende, A., C. T. Hopman and J. Dankert. 2000. Multiple mechanisms of phase variation of PorA in Neisseria meningitidis. Infect. Immun. 68: 6685– 6690. 59 Martin, P., T. van de Ven, N. Mouchel, A. C. Jeffries, D. W. Hood and E. R. Moxon. 2003. Experimentally revised repertoire of putative contingency loci in Neisseria meningitidis strain MC58: evidence for a novel mechanism of phase variation. Mol. Microbiol. 50: 245–257. 60 Jordan, P., L. A. Snyder and N. J. Saunders. 2003. Diversity in coding tandem repeats in related Neisseria spp. BMC Microbiol. 3: 23. 61 Correia, F. F., S. Inouye and M. Inouye. 1986. A 26-base-pair repetitive sequence specific for Neisseria gonorrhoeae and Neisseria meningitidis genomic DNA. J. Bacteriol. 167: 1009–1015. 62 De Gregorio, E., C. Abrescia, M. S. Carlomagno and P. P. Di Nocera. 2003. Asymmetrical distribution of Neisseria miniature insertion sequence DNA repeats among pathogenic and nonpathogenic Neisseria strains. Infect. Immun. 71: 4217–4221.

References 63 Liu, S. V., N. J. Saunders, A. Jeffries and

64

65

66

67

68

69

70

71

R. F. Rest. 2002. Genome analysis and strain comparison of correia repeats and correia repeat-enclosed elements in pathogenic Neisseria. J. Bacteriol. 184: 6163–6173. Mazzone, M., E. De Gregorio, A. Lavitola, C. Pagliarulo, P. Alifano and P. P. Di Nocera. 2001. Whole-genome organization and functional properties of miniature DNA insertion sequences conserved in pathogenic Neisseriae. Gene 278: 211–222. Buisine, N., C. M. Tang and R. Chalmers. 2002. Transposon-like Correia elements: structure, distribution and genetic exchange between pathogenic Neisseria sp. FEBS Lett. 522: 52–58. Snyder, L. A., W. M. Shafer and N. J. Saunders. 2003. Divergence and transcriptional analysis of the division cell wall (dcw) gene cluster in Neisseria spp. Mol. Microbiol. 47: 431–442. Black, C., J. Fyfe and J. Davies. 1995. A promoter associated with the neisserial repeat can be used to transcribe the uvrB gene from Neisseria gonorrhoeae. J. Bacteriol. 177: 1952–1958. Rouquette–Loughlin, C. E., J. T. Balthazar, S. A. Hill and W. M. Shafer. 2004. Modulation of the mtrCDE-encoded efflux pump gene complex of Neisseria meningitidis due to a Correia element insertion sequence. Mol. Microbiol. 54: 731–741. Hill, S. A., D. S. Samuels, J. H. Carlson, J. Wilson, D. Hogan, L. Lubke and R. J. Belland. 1997. Integration host factor is a transcriptional cofactor of pilE in Neisseria gonorrhoeae. Mol. Microbiol. 23: 649–656. De Gregorio, E., C. Abrescia, M. S. Carlomagno and P. P. Di Nocera. 2002. The abundant class of nemis repeats provides RNA substrates for ribonuclease III in Neisseriae. Biochim. Biophys. Acta 1576: 39–44. Morelle, S., E. Carbonnelle and X. Nassif. 2003. The REP2 repeats of the genome of Neisseria meningitidis are associated with genes coordinately regulated during bacterial cell interaction. J. Bacteriol. 185: 2618–2627.

72 Sun, Y. H., S. Bakshi, R. Chalmers and

C. M. Tang. 2000. Functional genomics of Neisseria meningitidis pathogenesis. Nat. Med. 6: 1269–1273. 73 Geoffroy, M. C., S. Floquet, A. Metais, X. Nassif and V. Pelicic. 2003. Largescale analysis of the meningococcus genome by gene disruption: resistance to complement-mediated lysis. Genome Res. 13: 391–398. 74 Maiden, M. C., and I. M. Feavers. 2000. Meningococcal genomics: two steps forward, one step back. Nat. Med. 6: 1215– 1216. 75 Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. 76 Dempsey, J. A., A. B. Wallace and J. G. Cannon. 1995. The physical map of the chromosome of a serogroup A strain of Neisseria meningitidis shows complex rearrangements relative to the chromosomes of the two mapped strains of the closely related species N. gonorrhoeae. J. Bacteriol. 177: 6390–6400. 77 Gaher, M., K. Einsiedler, T. Crass and W. Bautsch. 1996. A physical and genetic map of Neisseria meningitidis B1940. Mol. Microbiol. 19: 249–259. 78 Dempsey, J. A., W. Litaker, A. Madhure, T. L. Snodgrass and J. G. Cannon. 1991. Physical map of the chromosome of Neisseria gonorrhoeae FA1090 with locations of genetic markers, including opa and pil genes. J. Bacteriol. 173: 5476–5486. 79 Dempsey, J. A., and J. G. Cannon. 1994. Locations of genetic markers on the physical map of the chromosome of Neisseria gonorrhoeae FA1090. J. Bacteriol. 176: 2055–2060. 80 Perrin, A., S. Bonacorsi, E. Carbonnelle, D. Talibi, P. Dessen, X. Nassif and C. Tinsley. 2002. Comparative genomics identifies the genetic islands that distinguish Neisseria meningitidis, the agent of cerebrospinal meningitis, from other Neisseria species. Infect. Immun. 70: 7063–7072. 81 Perrin, A., X. Nassif and C. Tinsley. 1999. Identification of regions of the

253

254

11 Genomes of Pathogenic Neisseria Species chromosome of Neisseria meningitidis and Neisseria gonorrhoeae which are specific to the pathogenic Neisseria species. Infect. Immun. 67: 6119–6129. 82 Tinsley, C. R., and X. Nassif. 1996. Analysis of the genetic differences between Neisseria meningitidis and Neisseria gonorrhoeae: two closely related bacteria expressing two different pathogenicities. Proc. Natl. Acad. Sci. U. S. A. 93: 11109–11114. 83 Klee, S. R., X. Nassif, B. Kusecek, P. Merker, J. L. Beretti, M. Achtman and C. R. Tinsley. 2000. Molecular and biological analysis of eight genetic islands that distinguish Neisseria meningitidis from the closely related pathogen Neisseria gonorrhoeae. Infect. Immun. 68: 2082–2095. 84 Bart, A., J. Dankert and A. van der Ende. 2000. Representational difference analysis of Neisseria meningitidis identifies sequences that are specific for the hyper-virulent lineage III clone. FEMS Microbiol. Lett. 188: 111–114. 85 Claus, H., J. Stoevesandt, M. Frosch and U. Vogel. 2001. Genetic isolation of meningococci of the electrophoretic type 37 complex. J. Bacteriol. 183: 2570– 2575. 86 Vogel, U., and H. Claus. 2000. The evolution of human pathogens: examples and clinical implications. Int. J. Med. Microbiol. 290: 511–518. 87 Pizza, M., V. Scarlato, V. Masignani, M. M. Giuliani, B. Arico, M. Comanducci, G. T. Jennings, L. Baldi, E. Bartolini, B. Capecchi, C. L. Galeotti, E. Luzzi, R. Manetti, E. Marchetti, M. Mora, S. Nuti, G. Ratti, L. Santini, S. Savino, M. Scarselli, E. Storni, P. Zuo, M. Broeker, E. Hundt, B. Knapp, E. Blair, T. Mason, H. Tettelin, D. W. Hood, A. C. Jeffries, N. J. Saunders, D. M. Granoff, J. C. Venter, E. R. Moxon, G. Grandi and R. Rappuoli. 2000. Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. Science 287: 1816–1820. 88 Adu–Bobie, J., P. Lupetti, B. Brunelli, D. Granoff, N. Norais, G. Ferrari, G. Grandi, R. Rappuoli and M. Pizza. 2004. GNA33 of Neisseria meningitidis is a lipoprotein required for cell separa-

89

90

91

92

93

94

95

96

tion, membrane architecture, and virulence. Infect. Immun. 72: 1914–1919. Scarselli, M., R. Rappuoli and V. Scarlato. 2001. A common conserved amino acid motif module shared by bacterial and intercellular adhesins: bacterial adherence mimicking cell cell recognition? Microbiology 147: 250–252. Hadi, H. A., K. G. Wooldridge, K. Robinson and D. A. Ala’Aldeen. 2001. Identification and characterization of App: an immunogenic autotransporter protein of Neisseria meningitidis. Mol. Microbiol. 41: 611–623. Serruto, D., J. Adu–Bobie, M. Scarselli, D. Veggi, M. Pizza, R. Rappuoli and B. Arico. 2003. Neisseria meningitidis App, a new adhesin with autocatalytic serine protease activity. Mol. Microbiol. 48: 323–334. Comanducci, M., S. Bambini, B. Brunelli, J. Adu–Bobie, B. Arico, B. Capecchi, M. M. Giuliani, V. Masignani, L. Santini, S. Savino, D. M. Granoff, D. A. Caugant, M. Pizza, R. Rappuoli and M. Mora. 2002. NadA, a novel vaccine candidate of Neisseria meningitidis. J. Exp. Med. 195: 1445–1454. Comanducci, M., S. Bambini, D. A. Caugant, M. Mora, B. Brunelli, B. Capecchi, L. Ciucchi, R. Rappuoli and M. Pizza. 2004. NadA diversity and carriage in Neisseria meningitidis. Infect. Immun. 72: 4217–4223. Capecchi, B., J. Adu–Bobie, F. Di Marcello, L. Ciucchi, V. Masignani, A. Taddei, R. Rappuoli, M. Pizza and B. Arico. 2005. Neisseria meningitidis NadA is a new invasin which promotes bacterial adhesion to and penetration into human epithelial cells. Mol. Microbiol. 55: 687–698. Masignani, V., E. Balducci, D. Serruto, D. Veggi, B. Arico, M. Comanducci, M. Pizza and R. Rappuoli. 2004. In silico identification of novel bacterial ADP–ribosyltransferases. Int. J. Med. Microbiol. 293: 471–478. Masignani, V., E. Balducci, F. Di Marcello, S. Savino, D. Serruto, D. Veggi, S. Bambini, M. Scarselli, B. Arico, M. Comanducci, J. Adu-Bobie, M. M. Giuliani, R. Rappuoli and M. Pizza. 2003. NarE: a novel ADP-ribosyltrans-

References ferase from Neisseria meningitidis. Mol. Microbiol. 50: 1055–1067. 97 Grifantini, R., S. Sebastian, E. Frigimelica, M. Draghi, E. Bartolini, A. Muzzi, R. Rappuoli, G. Grandi and C. A. Genco. 2003. Identification of iron-activated and -repressed Fur-dependent genes by transcriptome analysis of Neisseria meningitidis group B. Proc. Natl. Acad. Sci. U. S. A. 100: 9542–9547. 98 Grifantini, R., E. Frigimelica, I. Delany, E. Bartolini, S. Giovinazzi, S. Balloni, S. Agarwal, G. Galli, C. Genco and G. Grandi. 2004. Characterization of a novel Neisseria meningitidis Fur and iron-regulated operon required for protection from oxidative stress: utility of DNA microarray in the assignment of the biological role of hypothetical genes. Mol. Microbiol. 54: 962–979. 99 Helaine, S., E. Carbonnelle, L. Prouvensier, J. L. Beretti, X. Nassif and V. Pelicic.

2005. PilX, a pilus-associated protein essential for bacterial aggregation, is a key to pilus-facilitated attachment of Neisseria meningitidis to human cells. Mol. Microbiol. 55: 65–77. 100 Darling, A. C., B. Mau, F. R. Blattner and N. T. Perna. 2004. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14: 1394–1403. 101 Snyder, L. A., J. K. Davies, N. J. Saunders, R. P. Viscidi, J. C. Demma, J. A. Dempsey, W. Litaker, A. Madhure, T. L. Snodgrass and J. G. Cannon. 2004. Microarray genomotyping of key experimental strains of Neisseria gonorrhoeae reveals gene complement diversity and five new neisserial genes associated with minimal mobile elements. BMC Genomics 5: 23.

255

257

12 Genomics of Pathogenic Clostridia and Bacilli Armin Ehrenreich, Gerhard Gottschalk, and Holger Brggemann

The taxonomy of endospore-forming bacteria has become complex in recent years. In addition to the classical genera Clostridium, Bacillus, and Desulfotomaculum, several new genera have emerged, e.g., Sporomusa, Sporoanaerobacter, Sporolactobacillus, Sporosarcina, Thermoanaerobacter, and Moorella. So far as is known, these genera do not comprise pathogenic species. Such species, however, are found amongst the clostridia and bacilli, the genomics of which will be discussed here.

12.1 Genomics of Pathogenic Clostridia spp. 12.1.1 Introduction

The most prominent group of anaerobic pathogens are spore-forming bacteria of the genus Clostridium. Among the 120 species of this heterogeneous genus, about 35 are able to express a pathogenic phenotype [1]. Several well-known diseases are caused by toxin-producing clostridia, such as gas gangrene and necrotic enteritis caused by C. perfringens, diarrhea and pseudomembranous colitis caused by C. difficile, the tetanus disease due to C. tetani, and the foodborne botulism, caused by C. botulinum. The latter two organisms produce the most powerful neurotoxins known to mankind, the tetanus and the botulinum toxin, respectively. What is known about the genetic and genomic background of pathogenic clostridia? Prior to the genome sequencing era, information, if available, was restricted to the genetic loci encoding the major toxin genes. The genetic vicinity remained unknown. In some cases it could be shown that toxin genes are located on extrachromosomal elements [2]. Since the year 2000, however, the genomes of five clostridial species have been completely sequenced, starting with the genome of the solvent-producing nonpathogenic C. acetobutylicum [3], followed by the genomes of the pathogens C. perfringens [4], C. tetani [5], C. botulinum (Sanger Institute), and C. difficile (Sanger Institute). The first three of these genomes have been published so far; the last two will be published in the near future. The availability

258

12 Genomics of Pathogenic Clostridia and Bacilli

of these sequence data now makes it possible to gain a deeper insight into the life of clostridial pathogens, e.g., with regard to identifying additional virulence factors and regulatory networks controlling virulence, and deciphering underlying traits such as metabolic preferences that enable the organism to replicate inside and harm the human body. The genome data might also give us new clues to the evolution of the clostridial species, and how they acquired and distributed their pathogenic potential. 12.1.2 C. perfringens

C. perfringens accounts for a variety of diseases in humans, ranging from foodborne diarrhea to necrotic enteritis and gas gangrene, as well as for several enterotoxemic diseases of domestic animals. The organism is known to produce various toxins that are responsible for the characteristic lesions and symptoms of necrosis [6]. Strains of C. perfringens are classified into five toxinotypes (A–E) based on the production of four major toxins, a, b, e, and i [7, 8]. The functions of the most important toxins and enzymes involved in pathogenesis have been studied in detail. For instance, the a-toxin is a phosphohydrolase with a preference for phosphatidylcholine and sphingomyelin, two major components of the outer leaflet of eukaryotic cell membranes [9]. h-Toxin (or perfringolysin O) exhibits cholesteroldependent pore-forming activity on eukaryotic membranes [10]; the enteric toxins b, b2, e, and CPE also act by forming pores or channels in membranes [11]. CPE is responsible for the symptoms of a common human food poisoning, whereas b-toxin and e-toxin cause veterinary enterotoxemias when absorbed from the intestines. The i-toxin is a binary protein consisting of a binding component and an ADP-ribosyltransferase which ADP-ribosylates monomeric actin and subsequently prevents formation of cytoskeletal filaments [12]. Other enzymes seem to have supporting functions in tissue destruction, such as j-toxin (a collagenase), l-toxin (a hyaluronidase), and sialidases. The genetic location of the toxin genes depends on the respective strain or isolate. In most strains, the genes of toxins a, h, and j can be found on the chromosome, whereas others such as b, e, and i are found encoded on plasmids [8]. Genome of the Type A Strain C. perfringens 13 A C. perfringens strain, designated strain 13, a natural isolate from the soil, has been completely sequenced [4]. The strain is classified as a type A strain, meaning that a-toxin is produced but not toxins b, e, and i. Strain 13 can establish experimental gas gangrene (myonecrosis) in a murine model. The genome consists of a 3.031-Mbp chromosome and a 54.3-kbp plasmid (pCP13). Some genome features are summarized in Table 12.1. Predictions are for 2660 CDS (coding sequence) on the chromosome and 63 CDS on the plasmid pCP13. Genome annotation and metabolic pathway reconstruction revealed a bacterium which is strictly restricted to anaerobic fermentation, resulting in the pro-

12.1 Genomics of Pathogenic Clostridia spp.

259

duction of gases like CO2 and H2 and acids such as butyrate, acetate, and lactate. As in C. acetobutylicum, but unlike C. tetani, an extended set of enzymes for the degradation of mono- and polysaccharides can be found in C. perfringens. With regard to pathogenesis, additional putative virulence factors have been identified from the complete genome sequence, such as five hyaluronidases, five hemolysin-like proteins, an exo-a-sialidase, and proteases such as a homologue of a-clostripain of C. histolyticum. In addition, three genes similar to an enterotoxin of B. cereus were found. The 54-kbp plasmid encodes the b2-toxin, a proposed poreforming enzyme involved in the formation of necrotic lesions. Interestingly, the virulence genes are not clustered, do not form pathogenicity islands, and no mobile genetic elements are associated with them. A possible exception may be the cluster (CPE0030–36) containing the genes of a-toxin (plc) and a putative hemolysin (hlyA) flanked by a tRNA. Table 12.1 Summary of Clostridium spp. genomes discussed in the text.

Organism

Replicon

Size (bp)

Number of genes

% G+C

Prophages

rRNA operons

tRNA

Chromosome

3 031 430

2660

28.6

1

10

96

pCP13

54 310

63

25.5

2618

28.6

3

6

54

61

24.5

n.d.

9

n.d.

n.d.

11

n.d.

C. perfringens strain 13

C. tetani E88 Massachusetts Chromosome

2 799 251

pE88

74 082

C. botulinum Hall A (ATCC 3502) Chromosome

3 886 916

~3648

28.2

Plasmid

16 344

~17

26.8

Chromosome

4 290 252

~3679

29.1

Plasmid

7 881

~8

27.9

3 256 682

n.d.

28.4

n.d.

n.d.

n.d.

Chromosome

3 940 080

3740

30.9

2

11

73

pSOL1

192 000

178

30.9

C. difficile strain 630

C. perfringens ATCC 13124 Chromosome C. acetobutylicum ATCC 824

n.d., not determined.

260

12 Genomics of Pathogenic Clostridia and Bacilli

Another type A strain, C. perfringens ATCC 13124, has also been completely sequenced, but has not been published so far (The Institute for Genome Research, TIGR). This 3.256-Mbp genome shows strong genome-wide synteny with the genome of strain 13 – with the exception of an additional 250-kb fragment in strain ATCC 13124 which is completely absent from the genome of strain 13. 12.1.3 C. tetani

Spores of C. tetani are found ubiquitously in the soil all over the world. If those spores come in contact with open wounds, they can germinate and a life-threatening spastic paralysis known as tetanus can occur in unvaccinated persons. The cause of this disease is the production of a potent neurotoxin, the tetanus toxin, by vegetative C. tetani cells. The toxin blocks the release of neurotransmitters (c-aminobutyrate, glycine) from presynaptic membranes of inhibitory neurons of the spinal cord and the brainstem of mammals [13]. It catalyzes proteolytic cleavage of the synaptic vesicle protein synaptobrevin, which is part of the SNARE complex involved in the fusion of neurotransmitter-containing vesicles with the presynaptic membrane [14]. This leads to continuous muscle contractions, which are primarily observed in jaw and neck muscles (lockjaw). Since the introduction of a potent vaccine in the beginning of the twentieth century – formaldehyde-treated tetanus toxin – cases of tetanus disease occur only sporadically in the industrialized countries. However, the disease, in particular neonatal tetanus, is still an important cause of death because of insufficient immunization, primarily in developing countries. According to the World Health Organization, neonatal tetanus is the second leading cause of death from vaccine-preventable diseases among children worldwide [15]. Prior to genome sequencing, only one other virulence factor than the tetanus toxin was identified in C. tetani: tetanolysin O, a cholesterol-dependent pore-forming cytolysin, which is similar to perfringolysin O (h-toxin) of C. perfringens [16]. The gene for the tetanus toxin is encoded exclusively on a large plasmid, whereas the tetanolysin O gene is located on the chromosome. Nonneurotoxigenic isolates of C. tetani have been identified which lack the tetanus toxin gene. It has been shown that both these isolates and the closely related species C. tetanomorphum do not possess the large plasmid or have different plasmids [2, 17]. Genome of C. tetani E88 The genome of a derivative of the Massachusetts strain (designated E88) was completely sequenced and published in 2003 [5]. The strain E88 is in use as a vaccine production strain; it is a toxigenic but nonsporulating isolate. The genome comprises a 2.799-Mbp chromosome and a 74-kbp plasmid (pE88). Some features are summarized in Table 12.1. The updated genome annotation revealed 2618 putative genes on the chromosome and 61 putative genes on the plasmid pE88.

12.1 Genomics of Pathogenic Clostridia spp.

The plasmid pE88 is of special interest, because it encodes the tetanus toxin and, what was previously unknown, an additional virulence factor, a collagenase, similar to the j-toxin of C. perfringens. Other genes on pE88 are involved in regulation: the transcriptional activator TetR of the tetanus toxin, a proposed alternative sigma factor that is similar to other clostridial regulators of toxin gene expression such as BotR, which controls the botulinum toxin production in C. botulinum, and TxeR, the regulator of Toxin A and B production in C. difficile [18]. Additional regulatory proteins can be found on pE88: two sigma-factor-like proteins (CTP10, CTP11), two proteins (CTP04, CTP05) with similarities to the regulatory system UviAB, a system involved in regulating bacteriocin production in C. perfringens, and a two-component system with unknown specificity (CTP21, CTP22). Interestingly, one-third of all genes of pE88 code for transport functions: five multisubunit ABC (ATP-binding cassette) transporters can be found, some of which show highest homology to peptide transporters. The origin of the plasmid remains obscure. The replication protein of pE88 is duplicated and shows highest homology to the replication protein encoded on plasmid pIP404 of C. perfringens. Gene fragments characteristic of mobile genetic elements are present on the plasmid: two frameshifted transposase genes of pE88 show strongest similarity to transposases of B. cereus. The chromosome apparently contains only a few putative virulence factors. Besides the gene for tetanolysin O, two hemolysin-like genes are present. However, cell surface determinants are abundant, although their involvement in pathogenesis remains to be elucidated. These 32 proteins contain a typical leader peptide for their export across the cytoplasmic membrane and two or three cell-wallbinding domains (PF04122), preferentially located near the N-termini. The C-termini of those surface-bound proteins of C. tetani are highly different, pointing to diverse functions in the interaction with the cell exterior. Whereas the function of the majority of surface proteins remains obscure, the C-termini of two proteins (CTC1364, CTC2092) contain a b-lactamase and an amidase domain, respectively. Similar surface-bound proteins are present in C. difficile (28 proteins), such as the surface layer protein SlpA and the characterized adhesin Cwp66 [19, 20], and to a much lesser extent in C. botulinum (8 proteins). In contrast, no homologues of these surface-bound proteins could be detected in C. perfringens. An open question is the strategy used for wound colonization. Recently, a heme oxygenase of C. tetani E88 was characterized which might be an oxygen scavenger in order to establish an anoxic microenvironment in the open wound [21]. That also fits with the findings that C. tetani and C. perfringens are rather aerotolerant and possess a set of genes for an oxidative stress response. In contrast, C. botulinum is strictly anaerobic; it lacks a heme oxygenase. Metabolic reconstruction revealed some peculiarities of C. tetani. First, poly- and monosaccharides other than glucose cannot be utilized for fermentation, in sharp contrast to the sugar-degrading potential of C. aetobutylicum and C. perfringens. Instead, C. tetani exhibits an extended capability to degrade (oligo)peptides and most amino acids, which makes it a perfect example of a proteolytic/peptolytic organism. This capability might be linked to another metabolic peculiarity of

261

262

12 Genomics of Pathogenic Clostridia and Bacilli

C. tetani: extensive sodium ion-dependent bioenergetics [5, 19]. Several membrane-bound sodium-dependent amino acid symporters can be found, delivering the substrates for fermentation. Sodium ion pumps like the V-type ATPase and possibly the Rnf/Nqr system are involved in a sodium ion cycle. It is proposed that the latter membrane-bound electron-transport system couples sodium ion + extrusion with the reduction of NAD . Reduced NADH is consumed by dehydrogenases of the major fermentation route, the butyrate pathway. C. acetobutylicum clearly lacks sodium ion-dependent systems like Rnf/Nqr and V-type ATPase. C. perfringens, C. botulinum, and C. difficile possess a V-type as well as an F0F1-type ATPase and a Rnf/Nqr system, indicating that sodium ion bioenergetics may be a trait of pathogenic clostridia. 12.1.4 C. botulinum

C. botulinum is the causative agent of foodborne botulism, a severe neuroparalytic disease affecting humans and animals. The food poisoning is caused by the production of the botulinum toxin (BoNT), which has a similar mode of action as the tetanus toxin but a different site of action. BoNT acts on synapses of motoneurons, blocking the release of the neurotransmitter acetylcholine into the neuromuscular junctions. It proteolytically cleaves a protein of the SNARE complex [22, 23]. Consequently, muscle contraction is impaired, resulting in flaccid muscle paralysis. Seven different antigenic toxinotypes are known (BoNT A–G), which are produced by six genetically distinct groups of clostridia: C. botulinum I–IV and unique strains of C. baratii and C. butyricum; the latter two produce BoNT F and E, respectively [24, 25]. C. botulinum group I consists of proteolytic strains that produce BoNT of types A, B, and F; group II strains are nonproteolytic and produce BoNT B, E, or F; group III strains produce BoNT C or D; and group IV strains, now recognized as Clostridium argentinense, produce BoNT G. Some strains can produce a mixture of two BoNTs (e.g., AF, AB) and many type A strains contain cryptic/ silent BoNT type B genes [25]. The genes of BoNTs are located on the chromosome, on plasmids, or on bacteriophages. Nonneurotoxigenic C. botulinum strains have been isolated which lack the botulinum toxin gene(s). Genome of C. botulinum Hall Strain A (ATCC 3502) The genome of strain ATCC 3502 is completely sequenced, but has not been published so far (Sanger Institute). The strain produces botulinum toxin of type A. The size of the chromosome is 3.887 Mbp, much larger than that of C. tetani, its closest completely sequenced relative. The botulinum toxin gene is chromosomally located within a cluster comprising several toxin-associated proteins known as the hemagglutinins (HAs) and the nontoxic nonhemagglutinin protein (NTNH) as well as the positive regulatory gene botR [25]. The upstream boundary contains a gene with an N-terminal consensus domain of bacterial flagellin. The downstream boundary is flanked by two genes containing internal stop codons, and a putative frameshift mutation. They

12.1 Genomics of Pathogenic Clostridia spp.

show highest sequence similarity to the insertion sequence (IS) element IS1069 of Lactococcus lactis. Although this IS element-like boundary of the BoNT cluster is obviously a rudimentary fragment, it seems possible that the element has been functional in the evolutionary history of C. botulinum. Currently, very little is known about additional virulence-associated factors of C. botulinum and other traits of this organism. Recently, a heme protein sensor for nitric oxide (SONO) was characterized which might be used by C. botulinum to recognize the presence of NO and mediate a phobic response as a protective mechanism [26]. A similar protein is present in C. tetani. 12.1.5 C. difficile

C. difficile is a major nosocomial pathogen. It causes diarrhea, mild colitis, and even life-threatening pseudomembranous colitis. The bacterium is the most frequent cause of outbreaks of diarrhea in hospitalized patients. The disease can occur when the normal intestinal flora is altered, e.g., after antibiotic treatment, allowing C. difficile to flourish in the intestinal tract, and to produce toxins. Major toxins are toxins A and B. Toxin A (TcdA), an enterotoxin, causes tissue damage in the intestinal mucosa, resulting in inflammation, and induces hemorrhagic fluid secretion in the intestine. Toxin B (TcdB) is a potent cytotoxin. Both toxins are mono-glucosyltransferases which incorporate a glucose residue into the effector region of RhoA, a small GTP-binding protein of the Rho family [27, 28]. Members of the Rho family are involved in regulating actin assembly. After glycosylation, Rho loses its ability to induce the polymerization of actin filaments, resulting in the disaggregation of the microfilament cytoskeleton, thus provoking cell retraction and rounding. Some C. difficile strains produce an additional toxin, a binary toxin designated CDT [12, 29]. It is related to C. perfringens i toxin, and is composed of two separate components. The enzymatic component exhibits ADP-ribosyltransferase activity, which can covalently modify cellular actin. ADP-ribosylated actin is incapable of polymerizing, and thus results in complete destruction of the actin cytoskeleton. The genes of toxins A and B are organized into a five-gene cluster (tcdA–E) on the chromosome, the so-called pathogenicity locus (PaLoc) with a size of 19.6 kbp [29]. TcdD (also known as TxeR) is an alternative RNA polymerase sigma factor required for the toxin gene expression [30]. Over 20 different toxinotypes of C. difficile have been described, which exhibit high variations in the PaLoc sequence [31, 32]. Nontoxigenic strains of C. difficile have been isolated which lack the entire PaLoc. Genome of C. difficile 630 Strain 630 is a multi-drug-resistant isolate from a patient with severe pseudomembranous colitis. The 4.290-Mbp chromosome is the largest of all sequenced clostridia so far. Like C. botulinum the sequence has not been published yet (Sanger Institute).

263

264

12 Genomics of Pathogenic Clostridia and Bacilli

The 19.6-kbp PaLoc is completely present, mobile genetic elements in its boundaries are absent. It has been reported that the PaLoc inserts a single site of the C. difficile genome: a stretch of 115 bp found in nontoxinogenic strains is replaced by the 19.6-kbp locus in toxinogenic strains [29]. The CDT toxin gene is disrupted in strain 630. Other virulence-related traits will be uncovered once the genome sequence is available. Recently, surface layer proteins of C. difficile have been identified and those that are attached to the cell wall, such as the adhesin Cwp66 [20]. The genome encodes at least 28 cell-wall-binding proteins [21]. 12.1.6 Conclusions and Perspectives

The genomes of the major clostridial pathogens reveal which toxin genes are used to express a pathogenic phenotype, and how they are used. The existence of nontoxigenic strains of C. tetani, C. botulinum, and C. difficile, which have lost their major toxin genes, leads to the assumption that the pathogenic phenotype is rather labile. In C. tetani loss of the tetanus toxin is often linked to the loss of an entire plasmid. In C. botulinum the neurotoxin is encoded either on plasmids, bacteriophages, or on the chromosome, depending on the toxinotype [25]. Some nontoxigenic strains of C. botulinum have lost their extrachromosomal elements. The existence of silent BoNT gene clusters on the chromosome as well as the IS element-like boundaries of a chromosomally encoded neurotoxin suggest that the BoNT genes have been genetically mobilized at some point in their evolutionary history. In C. difficile the PaLoc cluster seems to be a hot spot for genomic rearrangement, although no mobile genetic elements are associated here. Nontoxigenic isolates lack part or the whole PaLoc cluster [32]. The (soon to be) available genome sequences will reveal many more aspects of the life style of clostridial pathogens. Current research focuses on mechanisms of the interaction with eukaryotic cells and with the environment, but also on questions about the anaerobic energy metabolism and novel reactions in the metabolism of amino acids and other substances. Another area of interest is the regulatory network governing pathogenesis. In C. perfringens the VirR/VirS two-component system could be identified, which regulates the expression of several different toxin genes [33]. In C. tetani, C. botulinum, and C. difficile toxin regulators have been identified as well which act as alternative sigma factors [18]. A still open question relates to the nature of (environmental) stimuli, which trigger the regulatory cascades leading to a pathogenic life style.

12.2 Genomics of Pathogenic Bacilli

265

12.2 Genomics of Pathogenic Bacilli 12.2.1 Introduction

Most aerobic endospore-forming bacteria are currently assembled in the large genus Bacillus. Members of this genus are normally soil organisms with their primary habitat on decaying plant material. They can be isolated nearly ubiquitously because their highly resistant endospores are easily disseminated with the dust. Table 12.2 summarizes the features of the Bacillus genomes mentioned in this review. Bacilli have little involvement in human or mammalian pathogenic proTable 12.2 Summary of Bacillus spp. genomes discussed in the text.

Organism

Replicon

Size (bp)

Number of genes

% G+C

Prophages

rRNA operons

tRNA

Chromosome

5 227 293

5508

35.4

4

11

95

pXO1

181 677

217

32.5

0

0

0

pXO2

94 829

113

33.0

0

0

0

5366

35.3

6

13

108

21

38.0

Yes

0

0

5642

35.6

3

12

98

242

33.5

0

0

0

B. anthracis Ames

B. cereus ATCC 14579 Chromosome

5 426 909

pBClin

15 100 (linear)

B. cereus ATCC 10987 Chromosome

5 224 283

pBC10987

208 369

B. cereus G9241 Chromosome

5 286 464

n.d.

n.d.

n.d.

n.d.

n.d.

pBCX01

191 110

n.d.

n.d.

n.d.

n.d.

n.d.

pBC218

218 094

n.d.

n.d.

n.d.

n.d.

n.d.

Chromosome

4 222 748

4286

46.2

4

7

72

Chromosome

4 214 810

4112

43.2

10

10

86

B. licheniformis

B. subtilis 168

n.d., not determined.

266

12 Genomics of Pathogenic Clostridia and Bacilli

cesses. The only important exceptions to this are strains placed in the B. cereus group, which exhibit highly divergent pathogenic properties. B. anthracis, the causative agent of anthrax, is the most disastrous representative. Anthrax is a disease primarily affecting ungulate herbivores, occasionally carnivores, and less frequently humans. 12.2.2 Pathogenic Properties of Bacilli not Belonging to the B. cereus Group

Bacilli other than members of the B. cereus group are only rarely associated with food spoilage, such as ropy bread and incidents of foodborne gastroenteritis. Toxin-producing strains of B. licheniformis, a normally harmless contaminant in dairy products, have been related to a few food poisoning incidents. These strains have been isolated from raw milk and industrially produced baby food [34]. The toxigenic B. licheniformis strains are biochemically indistinguishable from type T strain DSM13 , but produce a toxin that is similar in many physicochemical properties to cereulide, the emetic toxin of B. cereus. Cereulide is a dodecadepsipeptide structurally resembling valinomycin. This peptide toxin is nonribosomally synthesized, and B. licheniformis as well as many other Bacillus spp. are known to produce several peptides such as the antibiotics bacitracin and amoebicin. B. lichenifomis is used to produce industrial enzymes on a large scale, and has the recognized-as-safe status with the United States Food and Drug Administration. The T recently finished genome project for B. lichenifomis DSM13 confirmed the absence of described virulence factors [35, 36] in addition to the known ability to form a poly-c-d-glutamate capsule that has been described as an important virulence factor in B. anthracis [37]. Most notably, the cereulide biosynthetic genes are absent in this strain. 12.2.3 Pathogenicity of B. cereus

The B. cereus sensu lato group comprises five valid species: B. cereus, B. mycoides, B. thuringiensis, B. anthracis, and the newly described psychrotolerant species B. weihenstephanensis that is composed of B. cereus isolates capable of growing at below 7 C [38–41]. There are several taxonomic problems with the species classification as B. cereus has a very high genomic variability [42, 43]. An overview of phylogenetic coherences among strains mentioned in this review is given in Fig. 12.1. The chromosome size ranges from 2.4 Mbp up to 5.3 Mbp. It has been shown that despite this considerable range in genome size, the smallest, 2.4-Mbp genome corresponds to a conserved region of the larger genomes [44]. In addition, the genomes of several B. cereus isolates contain large, presumably linear plasmids with sizes ranging from 290 kbp to 730 kbp [45]. On the basis of these observations it has been proposed that the genome of B. cereus consists of a constant part and a less stable one, which is more easily exchanged by other genetic elements.

12.2 Genomics of Pathogenic Bacilli

Fig. 12.1 Phylogenetic tree of members of the B. cereus sensu lato group. The unrooted tree derived from multiple-locus sequence typing. It was calculated using the neighbor-joining method. The scale bar represents the average number of nucleotide differences per site.

B. cereus strains are the rare cause of food poisoning which is particularly associated with the consumption of rice-based dishes. The organism produces two forms of human food poisoning, characterized by either diarrhea and abdominal distress or an emetic syndrome with nausea and vomiting, induced by an enterotoxin or the emetic toxin, respectively. Other toxins are produced during growth, including phospholipases, proteases, and hemolysins, one of which, cereolysin, is a thiol-activated hemolysin [46–49]. These toxins may also contribute to opportunistic pathogenicity of B. cereus in nongastrointestinal diseases that may range from serious infections in immunosuppressed patients, neonates, and postsurgical patients, to septicemia, meningitis, endocarditis, and osteomyelitis. Surgical and traumatic wound infections are other reported severe diseases [50–52]. In contrast to B. anthracis, virulent strains of B. cereus are often b-hemolytic, and contain b-lactamases rendering them resistant to b-lactam antibiotics. As there is no clear genetic distinction between the members of the B. cereus group, it has long been debated whether the members of this group are varieties of the same species [53, 54] or separate species [38]. Insect-pathogenic members of the B. cereus group are currently classified as B. thuringiensis [39]. These bacteria produce toxins by virtue of plasmid-born genes in the form of parasporal crystal proteins that have been widely exploited in agriculture as an insecticide. Occasion-

267

268

12 Genomics of Pathogenic Clostridia and Bacilli

ally, B. thuringiensis strains are responsible for human infections similar to those caused by strains of B. cereus [55, 56]. 12.2.4 Pathogenicity of B. anthracis 12.2.4.1 Course of Anthrax

Anthrax is mainly an epizootic disease of wild and domesticated herbivores, but all mammals including humans are susceptible [37, 57]. The disease affects primarily livestock, but it can occasionally be transmitted to humans who come in contact with infected animals or animal products. For this reason it was once called “woolsorter’s disease,” as workers commonly contracted it by handling wool or hides from infected animals. Anthrax has been known since ancient times, and is speculated to have been the cause of two of the Egyptians plagues described in the Old Testament, the death of cattle and the appearance of boils. It also played an important role in the history of bacteriology: in 1876 Koch identified B. anthracis as the cause of anthrax in animals, and later formulated his famous postulates. Anthrax was also the subject of important work by Pasteur in 1881, when he used heat treatment to isolate an attenuated strain that could be used for vaccinating sheep and cattle against the disease [58]. Anthrax appears when endospores of B. anthracis penetrate the tissue of a host. With animals this usually happens by ingestion, whereas in humans anthrax comes in three major forms: cutaneous, gastrointestinal, and inhalational or pulmonary. Cutaneous or skin anthrax results from spores entering gaps in the skin barrier. Within a few days the characteristic skin lesions develop with a black eschar of coal-like appearance that gave the name to the disease (anthrax is the Greek word for coal). With antibiotic treatment the lesions usually heal within weeks, but there is the threat of systemic spread as a fatal complication. Gastrointestinal anthrax, the second form, is analogous to the main route of infection in animals, but occurs rarely in man in developed countries. It results from eating tainted meat, and has the clinical picture of severe gastroenteritis with abdominal pain, fever, vomiting, bloody diarrhea, and shock. Mortality is high, and autopsy reveals hemorrhagic inflammation of the small intestine and bowel perforation. The most feared variety is inhalational or pulmonary anthrax, which results in immediate septicemia when spores gain access to their hosts by inhalation. Early diagnosis is difficult as the mild symptoms resemble those of flu, with fever, cough, and malaise. After a few days inhalational anthrax takes an abrupt turn with increasing respiratory distress, shock, and coma. Patients develop a characteristic chest X-ray with massive lymph nodes as well as pleural effusions resulting in bloody fluid around the lungs. Inhalational anthrax is virtually always fatal. Intense supportive care is only able to reduce mortality to slightly below 50% [59]. During the course of infection a point is reached where antibiotics can clear the

12.2 Genomics of Pathogenic Bacilli

bacteremia, but are unable to prevent death, most likely due to the circulating toxin. In animals the infection cycle closes once the host is dead, and its carcass decays when bacteria come into contact with oxygen and sporulate. Anthrax spores are highly resistant and survive many years in the soil, where they can infect their next host.

12.2.4.2 Virulence Factors of B. anthracis Regardless of the route of entry, spores are phagocytosed by macrophages where they germinate [60]. Spore germination may be triggered within the macrophage by host-specific signals. The vegetative bacteria circumvent or escape the antimicrobial environment of the phagolysosome by an unknown mechanism, and are released to the host tissue, where they appear as large, encapsulated, nonmotile rods often forming long chains of characteristic “bamboo-like” appearance. The B. anthracis capsule is currently considered one of its major virulence factors, as it inhibits phagocytosis, and its monotonous linear polymer of c-d-glutamic acid is only weakly immunogenic [37]. In the further course of infection the bacteria rep9 licate as an extracellular pathogen to high titers of up to 10 cells per milliliter of blood. The toxins play a key role in pathogenesis of anthrax, and are composed of three polypeptides: the protective antigen (PA), named for its ability to elicit a protective immune response, the lethal factor (LF), and the edema factor (EF). They were named because intravenous injection of PA and LF result in the death of the animal [61], whereas intradermal injection of PA and EF produces edema in the skin [62]. Anthrax toxin belongs to the family of bacterial “AB” toxins. Generally, these toxins work by the combined action of the cell-binding B subunit which is responsible for binding and translocation of the enzymatically active A subunit into the target cell. Anthrax toxin is unique in that it has one B subunit, the PA, which combines with two different enzymatically active subunits to form the lethal toxin (LeTx) from the combination of PA and LF and the edema toxin (EdTx) from a combination of PA and EF. The structure and function of LeTx and EdTx have been reviewed in detail recently [12, 58]. In short, PA binds to its cellular receptors, which are located on the surface of many cell types. Two possible receptors were identified in recent work as the tumor endothelial marker-8 (TEM-8) [63] and the capillary morphogenesis protein 2 (CMG2) [64], which are both widely expressed. Once bound to a receptor, PA can be cleaved by furin or a furin-like membrane endoprotease [65], and the remaining 63-kDa fragment (PA63) oligomerizes instantly into a heptamer [66] and associates with EF or LF to yield the assembled toxic complex. The oligomerization triggers the receptor-mediated endocytosis of PA63. Via the drop of pH caused by the endosomal acidification, the prepore converts to a pore, and EF and LF are translocated with at least a partial unfolding to reach their cytosolic targets [67]. The enzymatic activity of EF has been shown to be an adenylate cyclase that is highly homologous to Bordetella pertussis adenylate cyclase. The specific activity of EF is 1000-fold higher than that of the mammalian calmodulin-activated adenylate cyclase. In contrast, the enzy-

269

270

12 Genomics of Pathogenic Clostridia and Bacilli

matic activity of LF was a longstanding mystery until it was realized that LF cleaves all known mitogen-activated protein kinase kinases (Mek1 to Mek7) with the exception of Mek5. As a consequence of the cleavage, the Meks are unable to dock with their substrate, mitogen-activated protein kinase (MAPK) [68]. How these two toxins bring about the observed symptoms is not understood at all. The edema and death caused by these toxins have been known for decades, but the mechanisms how these physiological effects arise are still subject to debate. 12.2.5 Genome of B. anthracis

The laboratory strain B. anthracis Ames was derived from an isolate taken from a dead cow in Texas in 1981. Its genome was sequenced, it is 5.22 Mbp in size, and it contains two circular plasmids named pXO1 and pXO2 which are 181.6 kbp and 94.8 kbp in size respectively [69]. All fully virulent B. anthracis strains carry these plasmids, because they code for the primary virulence factors: the toxin production and the capsule formation. A toxinogenic but noncapsulated strain cured of plasmid pXO2 by heat treatment isolated in 1937 and designated Sterne strain was shown to be avirulent and a good live vaccine that is still used for veterinary purposes [70].

12.2.5.1 Chromosomal Genes The chromosome of B. anthracis contains several homologues of genes that are known to contribute to the virulence of pathogenic B. cereus or B. thuringiensis isolates. These include a complex of three nonhemolytic enterotoxins as well as two channel-forming hemolysins. Furthermore, several genes were identified by homology that were known to be involved in the virulence of the intracellular gram-positive pathogen Listeria monocytogenes [71]. These include two phospholipases C, one of them phosphatidylinositol-specific, the other phosphatidylcholinepreferring, two internalin-like genes, listeriolysin O, and the extracellular protease p60. The mentioned homologies are noteworthy because in the initial phase of the infection the endospores of B. anthracis germinate and survive inside a macrophage by an unknown mechanism. Another interesting gene that has been detected on the B. anthracis chromosome is a homologue of the metalloprotease enhancin. This is a protease first studied in baculoviruses that infect gypsy moths. It enhances viral infectivity by degrading the mucin component of the peritoneal membrane that lines the midgut of insects [72]. Other genes that are believed to have an implication in insect rather than mammalian pathogenesis are two paralogues of the immune inhibitor A metalloprotease. This protease was characterized in B. thuringiensis, because it cleaves bacteriocidal lectins and thereby boosts virulence in insects [73].

12.2 Genomics of Pathogenic Bacilli

Compared to B. subtilis, B. anthracis possesses an expanded array of iron-acquisition genes that might be of importance for scavenging iron during growth in an animal host. These include 15 ABC transporters for iron siderophore chelates together with two gene clusters for siderophore biosynthesis. Especially, two genes involved in the synthesis of a siderophore similar to aerobactin have not been detected in other bacilli so far. It can be learned from the genome that B. anthracis appears to be more adapted towards amino acid and peptide utilization than B. subtilis. B. anthracis contains for example an expanded number of secreted proteases and peptidases. Furthermore, there are several genes involved in amino acid utilization, such as homogentisate dioxygenase, which is part of tyrosine degradation. Another clue to the importance of peptides and amino acids for the diet of B. anthracis is the fact that the organism contains six Lys/Rht amino acid efflux systems, whereas B. subtilis has only two. These systems prevent accumulation of single amino acids to bacteriostatic concentrations during growth on peptides [74]. It therefore seems that B. anthracis is well adapted to life in a protein-rich environment such as a decaying animal carcass.

12.2.5.2 Genes Located on Plasmids pXO1 and pXO2 The large plasmid pXO1 harbors the structural genes for the anthrax toxins that have been designated cya (EF), lef (LF), and pagA (PA). Other genes of known function located on pXO1 are the two trans-acting regulatory genes atxA and pagR, a gene encoding a type I topoisomerase (topA), and an operon containing genes (gerXC, gerXA, gerXB) involved in spore germination. Clustering of germination genes with other virulence factors points to their involvement in pathogenesis. In fact, deletion of the gerX operon affects germination of B. anthracis endospores inside macrophages in vivo [60]. The most notable feature of pXO1 is that the described toxin genes, the germination genes, and the regulators are located on a large 44.8-kbp region whose ends are flanked by two inverted and nearly identical copies of the insertion element IS1627. For this reason, this region has been described as pathogenicity island [75]. Also located in this pathogenicity island is a noncoding sequence that has similarity to the hemolysin II gene from B. cereus. Outside the island, immediately adjacent to its right boundary, is a gene cluster predicted to encode proteins with sequence similarity to enzymes that participate in the biosynthesis of serotype-specific capsular polysaccharides in streptococci. The significance of these genes in B. anthracis is not evident, as there is no evidence that the organism produces a polysaccharide capsule. The smaller plasmid pXO2 carries the three genes (capB, capC, and capA) required for the synthesis of the antiphagocytic poly-c-d-glutamic acid capsule [76]. A fourth gene is associated with depolymerization of the capsular polymer (dep), and might control the size of the capsule [77]. However, most genes located on pXO1 and pXO2 encode hypothetical proteins without any database matches and assigned functions.

271

272

12 Genomics of Pathogenic Clostridia and Bacilli

12.2.5.3 Regulation of Virulence Genes The orthologue to the regulator PlcR that controls a regulon including most of the genes involved in virulence in B. cereus strains is surprisingly truncated in B. anthracis. Despite this, 56 putative PlcR-binding motifs have been predicted in silico on the chromosome of B. anthracis and 2 on pXO2. These binding sites are positioned to control genes including phospholipases, enterotoxins, and hemolysins. It has been shown that the mutation of plcR accounts for dramatic reduction in the expression of lecitinase, protease, and hemolysins [78]. As low-level expression of some genes from the PlcR regulon has been reported in B. anthracis, it is nevertheless possible that some cross-regulation by a second paralogue of the PlcR regulator occurs that has been identified on the B. anthracis chromosome. Therefore, some genes of the PlcR regulon may still contribute to virulence. The AtxA regulator that is encoded on the pathogenicity island located on pXO1 controls toxin expression and also, via the second regulator AcpA encoded on pXO2, the capsule biosynthesis. These regulators together arrange expression of B. anthracis virulence factors at elevated CO2 partial pressure and 37 C. AtxA was shown to be incompatible with the PlcR regulator from B. cereus. This fact might have selected for the observed mutation in the plcR gene [78]. It seems that the insertion of the pathogenicity island containing the toxin loci and their regulator in pXO1 as well as expression of the capsule genes and modification of expression of genes in the PlcR regulon confers an advantage on B. anthracis that led to nearclonal spread of the organism [79]. This is only one of the intriguing stories that can be learned from the genome sequences, with more to come.

12.2.5.4 Molecular Diversity in B. anthracis Genomes In contrast to the mentioned genomic diversity within the B. cereus group, B. anthracis appears to be genetically clonal [43, 80]. Genes from different isolates have typically greater than 99% nucleotide sequence identity [81]. Worldwide, anthrax has only seven currently known lineages. Differentiation of these lineages is only possible by application of fastidious markers, such as the analysis of variable-number tandem repeats (VNTRs). VNTRs enumerate small tandem repeats of a nucleotide sequence at several locations in the B. anthracis genome. Using this technique it has been observed that the highest genetic diversity and therefore the presumed origin of B. anthracis can be detected in Southern Africa, where the wide expanses of savannah and large populations of wild ungulate herbivores are ideally suited for anthrax endemicity [82]. These somewhat academic reflections found an application when in 2001 the mail anthrax attacks occurred in the United States. In order to identify the origin of these letter-based attacks and create a database for strain identification in future cases and cases of biologic warfare, an approach characterized as “microbial forensic” was started. The target was set to develop the technology and statistical methods for detection of new genomic markers like single nucleotide polymorphisms in whole genome sequences [83]. With the accuracy achieved during this effort it was possible to conclude that the strain used in the 2001 Florida attacks differs in only 11 single nucleotide polymorph-

12.2 Genomics of Pathogenic Bacilli

isms from the Ames strain [83]. Due to its possible abuses in terrorism or biological warfare, exact specification of B. anthracis isolates has a high relevance in national security. For this reason, at least eight more B. anthracis strains selected to represent each known genetic lineage of this species will be sequenced to the same degree of accuracy.

12.2.5.5 Genome of a Highly Virulent B. cereus Strain Resembling B. anthracis in Pathogenesis As stated above, the majority of B. cereus strains are not particularly virulent. However, there are some reports of life-threatening pneumonia whose history, clinical features, and laboratory findings resemble those of inhalational anthrax. The complete genome sequence of one such strain, designated G9241, isolated from the sputum and blood of such a patient who was neither immunocompromised nor had any other evident predisposition, has been determined [84]. Although the phenotypic and 16S rRNA analysis clearly identified this isolate as B. cereus, the genome sequence revealed the presence of a 191 110-bp circular plasmid that has been designated pBCXO1. It has 99.6% amino acid identity and high synteny to the B. anthracis toxin-encoding plasmid pXO1, but lacks the pathogenicity island containing the anthrax toxin genes. Homologues of the pXO2-encoded capsule genes were not detected in G9241, but a cluster for synthesis of a polysaccharide capsule is encoded on a previously unidentified circular plasmid termed pBC218 encoding 188 putative open reading frames. Among them are genes with similarities to toxin regulator AtxA (78% amino acid identity) from B. anthracis pXO1, protective antigen (60% amino acid identity), and the lethal factor (36% amino acid identity). The edema factor or the B. anthracis pXO2-encoded poly-c-dglutamic acid capsule biosynthetic genes are absent in G9241, but due to the genes located on pBC218 the isolate did produce a capsule that was expressed during growth on sheep blood agar plates or horse serum. The formation of this capsule did not require CO2 induction as in B. anthracis; however, G9241 forms a thicker capsule during growth with elevated CO2. The genome sequence of G9241 demonstrates strikingly that modern wholegenome sequencing provides a powerful approach to the public health challenges associated with emerging diseases. 12.2.6 Comparison of B. cereus Group Genomes: How Did Pathogenicity Evolve?

Genome sequencing efforts have additionally made available the genome sequences from two B. cereus strains that have not been described as being particularly pathogenic. Together these sequences offer a unique glimpse into the genomic organization of the B. cereus sensu lato group, their ecology, evolution, and on the determinants that provoke different levels of virulence especially when compared to the genome of B. anthracis. One of the studied genomes is from the type strain ATCC 14579, the other from a dairy isolate of B. cereus ATCC 10987 that is

273

274

12 Genomics of Pathogenic Clostridia and Bacilli

genetically close to B. anthracis and has several unique metabolic capabilities such as urease and xylose utilization. The type strain of B. cereus ATCC 14579 has a genome size of 5.42 Mbp. Part of the genome is a linear plasmid of 15.1 kbp length, designated pBClin15. The genome of the dairy isolate B. cereus ATCC 10987 that was isolated from a cheese spoilage in Canada in 1930 is very similar to this, with a genome size of 5.2 Mbp [85] and a 208-kbp plasmid pBc10987 that codes for 242 genes. Comparison of pBc10987 to pXO1 from B. anthracis revealed that around 65% of the proteins where homologous, and approximately 50% were in a syntenic location, showing a clear relationship between the two plasmids. An important difference from pXO1 from B. anthracis is that the pathogenicity island containing the genes for the regulator AtxA, the protective antigen, and the edema factor is absent from pBC10987. This region has been replaced on pBc10987 by genes for a copperrequiring tyrosinase, amino acid transport systems, an arsenate resistance cluster, and regulatory proteins. In addition, pBc10987 includes an MIP family channel protein and a possible metalloprotease, which are two new potential virulence factors. There is a large core set that includes 75–80% of the genes sharing 80–100% amino acid identity with orthologous genes in B. anthracis. What is remarkable for reputedly non pathogenic strains is that this core set includes numerous factors for invasion, establishment, and propagation of bacteria within the host. B. cereus ATCC 14579 includes all but two toxins ever identified in clinical B. cereus isolates. B. cereus ATCC 10987 was already known to contain phosphatidylinositol-specific and phosphatidylcholine-preferring phospholipases C, sphingomyelinase, nonhemolytic enterotoxin, and proteases [86, 87]. It is interesting that this common set also comprises genes that can be attributed to insect pathogenesis. Examples are the three homologues of the immune inhibitor A protein (InhA), which selectively cleaves insect antibacterial peptides [73], or the presence of a homologue to the metalloprotease enhancin that confers the ability to cleave the intestinal mucin [72]. Common to all genomes seems to be the presence of the PlcR regulon that includes a number of virulence factors in B. cereus and B. anthracis despite the fact that PlcR is truncated in B. anthracis. There are 52 putative PlcR-binding sites that have been predicted in silico in the B. anthracis genome, 56 in B. cereus ATCC 14579, and 57 in B. cereus ATCC 10987. Approximately 15% of the open reading frames found in B. cereus have no similarity to genes present in B. anthracis. For instance, chromosomal clusters of up to 20 kbp in length that code for capsular polysaccharide biosynthesis, including genes for glycosyltransferases, translocases, and a polysaccharide polymerization machinery, are specific for each B. cereus sequence and are absent from B. anthracis. The presence of genes coding for potential pathogenicity factors in the core genome of B. cereus, B. anthracis, and B. thuringiensis is consistent with the view that the ancestor of the B. cereus group was an opportunistic insect pathogen rather than a benign soil bacterium [88].

References

Acknowledgement

The authors wish to thank their colleagues in the lab who contributed to the experimental work connected with their review. The work was supported by grants of the Niederschsisches Ministerium fr Wissenschaft und Kultur to the Gttingen Genomics Laboratory and of the Bundesministerium fr Bildung und Forschung to the BiotechGenoMik network Gttingen.

References 1 Stackebrandt, E. and F. A. Rainey. 1997.

2

3

4

5

6

Phylogenetic relationsships. In: The clostridia: molecular biology and pathogenesis. J. I. Rood, B. A. McClane, J. G. Songer, and R. W. Titball, editors. Academic Press, San Diego. Finn, C. W., Jr., R. P. Silver, W. H. Habig, M. C. Hardegree, G. Zon and C. F. Garon. 1984. The structural gene for tetanus neurotoxin is on a plasmid. Science 224:881–884. Nolling, J., G. Breton, M. V. Omelchenko, K. S. Makarova, Q. Zeng, R. Gibson, H. M. Lee, J. Dubois, D. Qiu, J. Hitti, Y. I. Wolf, R. L. Tatusov, F. Sabathe, L. Doucette-Stamm, P. Soucaille, M. J. Daly, G. N. Bennett, E. V. Koonin and D. R. Smith. 2001. Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum. J. Bacteriol. 183:4823–4838. Shimizu, T., K. Ohtani, H. Hirakawa, K. Ohshima, A. Yamashita, T. Shiba, N. Ogasawara, M. Hattori, S. Kuhara and H. Hayashi. 2002. Complete genome sequence of Clostridium perfringens, an anaerobic flesh-eater. Proc. Natl. Acad. Sci. U. S. A. 99:996–1001. Bruggemann, H., S. Baumer, W. F. Fricke, A. Wiezer, H. Liesegang, I. Decker, C. Herzberg, R. MartinezArias, R. Merkl, A. Henne and G. Gottschalk. 2003. The genome sequence of Clostridium tetani, the causative agent of tetanus disease. Proc. Natl. Acad. Sci. U. S. A. 100:1316–1321. Rood, J. I. 1998. Virulence genes of Clostridium perfringens. Annu. Rev. Microbiol. 52:333–360.

7 Cole, S. T. and B. Canard. 1997. Struc-

8

9

10

11

12

13

ture organization and evolution of the genome of C. perfringens. In: The clostridia: molecular biology and pathogenesis. J. I. Rood, B. A. McClane, J. G. Songer, and R. W. Titball, editors. Academic Press, San Diego, 49–64. Petit, L., M. Gilbert and M. R. Popoff. 1999. Clostridium perfringens: toxinotype and genotype. Trends Microbiol. 7:104– 110. Eaton, J. T., C. E. Naylor, A. M. Howells, D. S. Moss, R. W. Titball and A. K. Basak. 2002. Crystal structure of the C. perfringens alpha-toxin with the active site closed by a flexible loop region. J. Mol. Biol. 319:275–281. Rossjohn, J., S. C. Feil, W. J. McKinstry, R. K. Tweten and M. W. Parker. 1997. Structure of a cholesterol-binding, thiolactivated cytolysin and a model of its membrane form. Cell 89:685–692. Smedley, J. G., 3rd, D. J. Fisher, S. Sayeed, G. Chakrabarti and B. A. McClane. 2004. The enteric toxins of Clostridium perfringens. Rev. Physiol. Biochem. Pharmacol. 152:183–204. Barth, H., K. Aktories, M. R. Popoff and B. G. Stiles. 2004. Binary bacterial toxins: biochemistry, biology, and applications of common Clostridium and Bacillus proteins. Microbiol. Mol. Biol. Rev. 68:373–402. Arnon, S. S. 1997. Human tetanus and human botulism. In: The clostridia: molecular biology and pathogenesis. J. I. Rood, B. A. McClane, J. G. Songer, and R. W. Titball, editors. Academic Press, San Diego, 95–115.

275

276

12 Genomics of Pathogenic Clostridia and Bacilli 14 Schiavo, G., F. Benfenati, B. Poulain,

O. Rossetto, P. Polverino de Laureto, B. R. DasGupta and C. Montecucco. 1992. Tetanus and botulinum-B neurotoxins block neurotransmitter release by proteolytic cleavage of synaptobrevin. Nature 359:832–835. 15 World Health Organization. 2000. Maternal and neonatal tetanus elimination by 2005. Strategies for achieving and maintaining elimination. Annual Report (http://www.who.int/vaccines/ en/neotetanus.shtml). 16 Mitsui, N., K. Mitsui and J. Hase. 1980. Purification and some properties of tetanolysin. Microbiol. Immunol. 24:575– 584. 17 Hara, T., M. Matsuda and M. Yoneda. 1977. Isolation and some properties of nontoxigenic derivatives of a strain of Clostridium tetani. Biken. J. 20:105–115. 18 Raffestin, S., B. Dupuy, J. C. Marvaud and M. R. Popoff. 2005. BotR/A and TetR are alternative RNA polymerase sigma factors controlling the expression of the neurotoxin and associated protein genes in Clostridium botulinum type A and Clostridium tetani. Mol. Microbiol. 55:235–249. 19 Waligora, A. J., C. Hennequin, P. Mullany, P. Bourlioux, A. Collignon and T. Karjalainen. 2001. Characterization of a cell surface protein of Clostridium difficile with adhesive properties. Infect. Immun. 69:2144–2153. 20 Calabi, E. and N. Fairweather. 2002. Patterns of sequence conservation in the S-Layer proteins and related sequences in Clostridium difficile. J. Bacteriol. 184:3886–3897. 21 Bruggemann, H., R. Bauer, S. Raffestin and G. Gottschalk. 2004. Characterization of a heme oxygenase of Clostridium tetani and its possible role in oxygen tolerance. Arch. Microbiol. 182:259–263. 22 Schiavo, G. and C. Montecucco. 1997. The structure and mode of action of botulinum and tetanus toxin. In: The clostridia: molecular biology and pathogenesis. J. I. Rood, B. A. McClane, J. G. Songer, and R. W. Titball, editors. Academic Press, San Diego, 295–322. 23 Henderson, I., T. Davis, M. Elmor and N. P. Minton. 1997. The genetic basis of

toxin production in Clostridium botulinum and Clostridium tetani. In: The clostridia: molecular biology and pathogenesis. J. I. Rood, B. A. McClane, J. G. Songer, and R. W. Titball, editors. Academic Press, San Diego, 261–294. 24 Collins, M. D. and A. K. East. 1998. Phylogeny and taxonomy of the food-borne pathogen Clostridium botulinum and its neurotoxins. J. Appl. Microbiol. 84:5–17. 25 Dineen, S. S., M. Bradshaw and E. A. Johnson. 2003. Neurotoxin gene clusters in Clostridium botulinum type A strains: sequence comparison and evolutionary implications. Curr. Microbiol. 46:345–352. 26 Nioche, P., V. Berka, J. Vipond, N. Minton, A. L. Tsai and C. S. Raman. 2004. Femtomolar sensitivity of a NO sensor from Clostridium botulinum. Science 306:1550–1553. 27 Just, I. and R. Gerhard. 2004. Large clostridial cytotoxins. Rev. Physiol. Biochem. Pharmacol. 152:23–47. 28 Schirmer, J. and K. Aktories. 2004. Large clostridial cytotoxins: cellular biology of Rho/Ras-glucosylating toxins. Biochim. Biophys. Acta 1673:66–74. 29 Braun, V., T. Hundsberger, P. Leukel, M. Sauerborn and C. von Eichel-Streiber. 1996. Definition of the single integration site of the pathogenicity locus in Clostridium difficile. Gene 181:29–38. 30 Mani, N. and B. Dupuy. 2001. Regulation of toxin synthesis in Clostridium difficile by an alternative RNA polymerase sigma factor. Proc. Natl. Acad. Sci. U. S. A. 98:5844–5849. 31 Rupnik, M., J. S. Brazier, B. I. Duerden, M. Grabnar and S. L. Stubbs. 2001. Comparison of toxinotyping and PCR ribotyping of Clostridium difficile strains and description of novel toxinotypes. Microbiology 147:439–447. 32 Geric, B., M. Rupnik, D. N. Gerding, M. Grabnar and S. Johnson. 2004. Distribution of Clostridium difficile variant toxinotypes and strains with binary toxin genes among clinical isolates in an American hospital. J. Med. Microbiol. 53:887–894. 33 Ba–Thein, W., M. Lyristis, K. Ohtani, I. T. Nisbet, H. Hayashi, J. I. Rood and T. Shimizu. 1996. The virR/virS locus

References

34

35

36

37 38

39

40

41

regulates the transcription of genes encoding extracellular toxin production in Clostridium perfringens. J. Bacteriol. 178:2514–2520. Salkinoja-Salonen, M. S., R. Vuorio, M. A. Andersson, P. Kampfer, M. C. Andersson, T. Honkanen-Buzalski and A. C. Scoging. 1999. Toxigenic strains of Bacillus licheniformis related to food poisoning. Appl. Environ. Microbiol. 65:4637–4645. Veith, B., C. Herzberg, S. Steckel, J. Feesche, K. H. Maurer, P. Ehrenreich, S. Baumer, A. Henne, H. Liesegang, R. Merkl, A. Ehrenreich and G. Gottschalk. 2004. The complete genome sequence of Bacillus licheniformis DSM13, an organism with great industrial potential. J. Mol. Microbiol. Biotechnol. 7:204–211. Rey, M. W., P. Ramaiya, B. A. Nelson, S. D. Brody–Karpin, E. J. Zaretsky, M. Tang, A. Lopez de Leon, H. Xiang, V. Gusti, I. G. Clausen, P. B. Olsen, M. D. Rasmussen, J. T. Andersen, P. L. Jorgensen, T. S. Larsen, A. Sorokin, A. Bolotin, A. Lapidus, N. Galleron, S. D. Ehrlich and R. M. Berka. 2004. Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species. Genome Biol. 5:R77. Mock, M. and A. Fouet. 2001. Anthrax. Annu. Rev. Microbiol. 55:647–671. Lechner, S., R. Mayr, K. P. Francis, B. M. Pruss, T. Kaplan, E. Wiessner-Gunkel, G. S. Stewart and S. Scherer. 1998. Bacillus weihenstephanensis sp. nov. is a new psychrotolerant species of the Bacillus cereus group. Int. J. Syst. Bacteriol. 48(Pt 4):1373–1382. Priest, F. G., M. Barker, L. W. Baillie, E. C. Holmes and M. C. Maiden. 2004. Population structure and evolution of the Bacillus cereus group. J. Bacteriol. 186:7959–7970. Stenfors, L. P. and P. E. Granum. 2001. Psychrotolerant species from the Bacillus cereus group are not necessarily Bacillus weihenstephanensis. FEMS Microbiol. Lett. 197:223–228. Stenfors, L. P., R. Mayr, S. Scherer and P. E. Granum. 2002. Pathogenic poten-

tial of fifty Bacillus weihenstephanensis strains. FEMS Microbiol. Lett. 215:47– 51. 42 Daffonchio, D., S. Borin, G. Frova, P. L. Manachini and C. Sorlini. 1998. PCR fingerprinting of whole genomes: the spacers between the 16S and 23S rRNA genes and of intergenic tRNA gene regions reveal a different intraspecific genomic variability of Bacillus cereus and Bacillus licheniformis. Int. J. Syst. Bacteriol. 48(Pt 1):107–116. 43 Radnedge, L., P. G. Agron, K. K. Hill, P. J. Jackson, L. O. Ticknor, P. Keim and G. L. Andersen. 2003. Genome differences that distinguish Bacillus anthracis from Bacillus cereus and Bacillus thuringiensis. Appl. Environ. Microbiol. 69:2755–2764. 44 Carlson, C. R. and A. B. Kolsto. 1994. A small (2.4 Mb) Bacillus cereus chromosome corresponds to a conserved region of a larger (5.3 Mb) Bacillus cereus chromosome. Mol. Microbiol. 13:161–169. 45 Beverley, S. M. 1988. Characterization of the unusual’ mobility of large circular DNAs in pulsed field-gradient electrophoresis. Nucleic Acids Res. 16:925– 939. 46 Drobniewski, F. A. 1993. Bacillus cereus and related species. Clin. Microbiol. Rev. 6:324–338. 47 Lund, T. and P. E. Granum. 1996. Characterisation of a non-haemolytic enterotoxin complex from Bacillus cereus isolated after a foodborne outbreak. FEMS Microbiol. Lett. 141:151–156. 48 Lund, T. and P. E. Granum. 1999. The 105–kDa protein component of Bacillus cereus non-haemolytic enterotoxin (Nhe) is a metalloprotease with gelatinolytic and collagenolytic activity. FEMS Microbiol. Lett. 178:355–361. 49 Lund, T., M. L. De Buyser and P. E. Granum. 2000. A new cytotoxin from Bacillus cereus that may cause necrotic enteritis. Mol. Microbiol. 38:254–261. 50 Turnbull, P. C., T. A. French and E. G. Dowsett. 1977. Severe systemic and pyogenic infections with Bacillus cereus. Br. Med. J. 1:1628–1629. 51 Turnbull, P. C., K. Jorgensen, J. M. Kramer, R. J. Gilbert and J. M. Parry. 1979. Severe clinical conditions associated

277

278

12 Genomics of Pathogenic Clostridia and Bacilli with Bacillus cereus and the apparent involvement of exotoxins. J. Clin. Pathol. 32:289–293. 52 Turnbull, P. C. 1981. Bacillus cereus toxins. Pharmacol. Ther. 13:453–505. 53 Helgason, E., O. A. Okstad, D. A. Caugant, H. A. Johansen, A. Fouet, M. Mock, I. Hegna and Kolsto. 2000. Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis – one species on the basis of genetic evidence. Appl. Environ. Microbiol. 66:2627–2630. 54 Helgason, E., D. A. Caugant, I. Olsen and A. B. Kolsto. 2000. Genetic structure of population of Bacillus cereus and B. thuringiensis isolates associated with periodontitis and other human infections. J. Clin. Microbiol. 38:1615–1622. 55 Damgaard, P. H., P. E. Granum, J. Bresciani, M. V. Torregrossa, J. Eilenberg and L. Valentino. 1997. Characterization of Bacillus thuringiensis isolated from infections in burn wounds. FEMS Immunol. Med. Microbiol. 18:47–53. 56 Jackson, S. G., R. B. Goodbrand, R. Ahmed and S. Kasatiya. 1995. Bacillus cereus and Bacillus thuringiensis isolated in a gastroenteritis outbreak investigation. Lett. Appl. Microbiol. 21:103– 105. 57 Turnbull, P. C. 2002. Introduction: anthrax history, disease and ecology. Curr. Top. Microbiol. Immunol. 271:1– 19. 58 Mourez, M. 2004. Anthrax toxins. Rev. Physiol. Biochem. Pharmacol. 152:135– 164. 59 Jernigan, J. A., D. S. Stephens, D. A. Ashford, C. Omenaca, M. S. Topiel, M. Galbraith, M. Tapper, T. L. Fisk, S. Zaki, T. Popovic, R. F. Meyer, C. P. Quinn, S. A. Harper, S. K. Fridkin, J. J. Sejvar, C. W. Shepard, M. McConnell, J. Guarner, W. J. Shieh, J. M. Malecki, J. L. Gerberding, J. M. Hughes and B. A. Perkins. 2001. Bioterrorism-related inhalational anthrax: the first 10 cases reported in the United States. Emerg. Infect. Dis. 7:933–944. 60 Guidi–Rontani, C., M. Levy, H. Ohayon and M. Mock. 2001. Fate of germinated Bacillus anthracis spores in primary murine macrophages. Mol. Microbiol. 42:931–938.

61 Beall, F. A., M. J. Taylor and C. B.

Thorne. 1962. Rapid lethal effect in rats of a third component found upon fractionating the toxin of Bacillus anthracis. J. Bacteriol. 83:1274–1280. 62 Smith, H., J. Keppie and J. L. Stanley. 1955. The chemical basis of the virulence of Bacillus anthracis. V. The specific toxin produced by B. anthracis in vivo. Br. J. Exp. Pathol. 36:460–472. 63 Bradley, K. A., J. Mogridge, M. Mourez, R. J. Collier and J. A. Young. 2001. Identification of the cellular receptor for anthrax toxin. Nature. 414:225–229. 64 Scobie, H. M., G. J. Rainey, K. A. Bradley and J. A. Young. 2003. Human capillary morphogenesis protein 2 functions as an anthrax toxin receptor. Proc. Natl. Acad. Sci. U. S. A. 100:5170–5174. 65 Gordon, V. M., K. R. Klimpel, N. Arora, M. A. Henderson and S. H. Leppla. 1995. Proteolytic activation of bacterial toxins by eukaryotic cells is performed by furin and by additional cellular proteases. Infect. Immun. 63:82–87. 66 Milne, J. C., D. Furlong, P. C. Hanna, J. S. Wall and R. J. Collier. 1994. Anthrax protective antigen forms oligomers during intoxication of mammalian cells. J. Biol. Chem. 269:20607–20612. 67 Wesche, J., J. L. Elliott, P. O. Falnes, S. Olsnes and R. J. Collier. 1998. Characterization of membrane translocation by anthrax protective antigen. Biochemistry. 37:15737–15746. 68 Chopra, A. P., S. A. Boone, X. Liang and N. S. Duesbery. 2003. Anthrax lethal factor proteolysis and inactivation of MAPK kinase. J. Biol. Chem. 278:9402– 9406. 69 Read, T. D., S. N. Peterson, N. Tourasse, L. W. Baillie, I. T. Paulsen, K. E. Nelson, H. Tettelin, D. E. Fouts, J. A. Eisen, S. R. Gill, E. K. Holtzapple, O. A. Okstad, E. Helgason, J. Rilstone, M. Wu, J. F. Kolonay, M. J. Beanan, R. J. Dodson, L. M. Brinkac, M. Gwinn, R. T. DeBoy, R. Madpu, S. C. Daugherty, A. S. Durkin, D. H. Haft, W. C. Nelson, J. D. Peterson, M. Pop, H. M. Khouri, D. Radune, J. L. Benton, Y. Mahamoud, L. Jiang, I. R. Hance, J. F. Weidman, K. J. Berry, R. D. Plaut, A. M. Wolf, K. L. Watkins, W. C. Nierman, A. Hazen,

References R. Cline, C. Redmond, J. E. Thwaite, O. White, S. L. Salzberg, B. Thomason, A. M. Friedlander, T. M. Koehler, P. C. Hanna, A. B. Kolsto and C. M. Fraser. 2003. The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature 423:81– 86. 70 Turnbull, P. C. 2000. Current status of immunization against anthrax: old vaccines may be here to stay for a while. Curr. Opin. Infect. Dis. 13:113–120. 71 Cossart, P. 2002. Molecular and cellular basis of the infection by Listeria monocytogenes: an overview. Int. J. Med. Microbiol. 291:401–409. 72 Wang, P. and R. R. Granados. 1997. An intestinal mucin is the target substrate for a baculovirus enhancin. Proc. Natl. Acad. Sci. U. S. A. 94:6977–6982. 73 Fedhila, S., P. Nel and D. Lereclus. 2002. The InhA2 metalloprotease of Bacillus thuringiensis strain 407 is required for pathogenicity in insects infected via the oral route. J. Bacteriol. 184:3296–3304. 74 Bellmann, A., M. Vrljic, M. Patek, H. Sahm, R. Kramer and L. Eggeling. 2001. Expression control and specificity of the basic amino acid exporter LysE of Corynebacterium glutamicum. Microbiology 147:1765–1774. 75 Okinaka, R. T., K. Cloud, O. Hampton, A. R. Hoffmaster, K. K. Hill, P. Keim, T. M. Koehler, G. Lamke, S. Kumano, J. Mahillon, D. Manter, Y. Martinez, D. Ricke, R. Svensson and P. J. Jackson. 1999. Sequence and organization of pXO1, the large Bacillus anthracis plasmid harboring the anthrax toxin genes. J. Bacteriol. 181:6509–6515. 76 Okinaka, R., K. Cloud, O. Hampton, A. Hoffmaster, K. Hill, P. Keim, T. Koehler, G. Lamke, S. Kumano, D. Manter, Y. Martinez, D. Ricke, R. Svensson and P. Jackson. 1999. Sequence, assembly and analysis of pX01 and pX02. J. Appl. Microbiol. 87:261–262. 77 Uchida, I., S. Makino, C. Sasakawa, M. Yoshikawa, C. Sugimoto and N. Terakado. 1993. Identification of a novel gene, dep, associated with depolymeri-

78

79

80

81

82

83

84

85

zation of the capsular polymer in Bacillus anthracis. Mol. Microbiol. 9:487–496. Mignot, T., M. Mock, D. Robichon, A. Landier, D. Lereclus and A. Fouet. 2001. The incompatibility between the PlcR- and AtxA-controlled regulons may have selected a nonsense mutation in Bacillus anthracis. Mol. Microbiol. 42:1189–1198. Keim, P., L. B. Price, A. M. Klevytska, K. L. Smith, J. M. Schupp, R. Okinaka, P. J. Jackson and M. E. Hugh-Jones. 2000. Multiple-locus variable-number tandem repeat analysis reveals genetic relationships within Bacillus anthracis. J. Bacteriol. 182:2928–2936. Keim, P., A. M. Klevytska, L. B. Price, J. M. Schupp, G. Zinser, K. L. Smith, M. E. Hugh–Jones, R. Okinaka, K. K. Hill and P. J. Jackson. 1999. Molecular diversity in Bacillus anthracis. J. Appl. Microbiol. 87:215–217. Price, L. B., M. Hugh-Jones, P. J. Jackson and P. Keim. 1999. Genetic diversity in the protective antigen gene of Bacillus anthracis. J. Bacteriol. 181:2358–2362. Smith, K. L., V. DeVos, H. Bryden, L. B. Price, M. E. Hugh-Jones and P. Keim. 2000. Bacillus anthracis diversity in Kruger National Park. J. Clin. Microbiol. 38:3780–3784. Read, T. D., S. L. Salzberg, M. Pop, M. Shumway, L. Umayam, L. Jiang, E. Holtzapple, J. D. Busch, K. L. Smith, J. M. Schupp, D. Solomon, P. Keim and C. M. Fraser. 2002. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis. Science 296:2028–2033. Hoffmaster, A. R., J. Ravel, D. A. Rasko, G. D. Chapman, M. D. Chute, C. K. Marston, B. K. De, C. T. Sacchi, C. Fitzgerald, L. W. Mayer, M. C. Maiden, F. G. Priest, M. Barker, L. Jiang, R. Z. Cer, J. Rilstone, S. N. Peterson, R. S. Weyant, D. R. Galloway, T. D. Read, T. Popovic and C. M. Fraser. 2004. Identification of anthrax toxin genes in a Bacillus cereus associated with an illness resembling inhalation anthrax. Proc. Natl. Acad. Sci. U. S. A. 101:8449–8454. Rasko, D. A., J. Ravel, O. A. Okstad, E. Helgason, R. Z. Cer, L. Jiang, K. A. Shores, D. E. Fouts, N. J. Tourasse, S. V.

279

280

12 Genomics of Pathogenic Clostridia and Bacilli Angiuoli, J. Kolonay, W. C. Nelson, A. B. Kolsto, C. M. Fraser and T. D. Read. 2004. The genome sequence of Bacillus cereus ATCC 10987 reveals metabolic adaptations and a large plasmid related to Bacillus anthracis pXO1. Nucleic Acids Res. 32:977–988. 86 Okstad, O. A., I. Hegna, T. Lindback, A. L. Rishovd and A. B. Kolsto. 1999. Genome organization is not conserved between Bacillus cereus and Bacillus subtilis. Microbiology 145(Pt 3):621–631. 87 Lindback, T., O. A. Okstad, A. L. Rishovd and A. B. Kolsto. 1999. Insertional inactivation of hblC encoding the

L2 component of Bacillus cereus ATCC 14579 haemolysin BL strongly reduces enterotoxigenic activity, but not the haemolytic activity against human erythrocytes. Microbiology 145(Pt 11):3139– 3146. 88 Salamitou, S., F. Ramisse, M. Brehelin, D. Bourguet, N. Gilois, M. Gominet, E. Hernandez and D. Lereclus. 2000. The plcR regulon is involved in the opportunistic properties of Bacillus thuringiensis and Bacillus cereus in mice and insects. Microbiology 146(Pt 11):2825– 2832.

281

13 The Genomes of Pathogenic Bartonella Species Carolin Frank, Eva Berglund, and Siv G. E. Andersson

13.1 Introduction

The genus Bartonella contains several important human pathogens responsible for a wide range of disease manifestations including trench fever, cat-scratch disease, and Carrin’s disease. Pathogenic Bartonella are unique among bacteria in that they can cause tumor-like lesions of the skin in HIV-positive individuals. The recent sequencing of the 1.6-Mb genome of Bartonella quintana, the agent of trench fever, and the 1.9-Mb genome of Bartonella henselae, the agent of cat-scratch disease, has provided insights into the origin of B. quintana as a human-specific pathogen, as well as into the putative role of auxiliary replicons, phages, and genomic islands to the evolution of these bacteria. Bartonella are intracellular bacteria that naturally circulate between mammals and arthropod vectors. There are currently about 20 described Bartonella species that infect a wide variety of domestic and wild animals, using a broad diversity of blood-sucking arthropods for transmission. Humans serve as the natural host for two Bartonella species, B. quintana and B. bacilliformis, but several others can incidentally infect humans. 13.1.1 Bartonella in a Phylogenetic Context

The genus Bartonella belongs to the Rhizobiales in the a-subdivision of the Proteobacteria, which consist of bacteria that live in close association with eukaryotes and are widely distributed in many different environments. The Rhizobiales encompass facultative intracellular pathogens like Bartonella and Brucella as well as plant-associated soil bacteria such as Agrobacterium tumefaciens and Sinorhizobium meliloti. Species of the genus Brucella may cause abortions in pregnant animals, but differ from Bartonella in that they are not vector-borne. The Rickettsiales consist of obligate intracellular bacteria, such as Rickettsia and Wolbachia. Like Bartonella, Rickettsia species are vector-borne bacteria that circulate among mammalian hosts and arthropod vectors. Interestingly, both genera contain louseborne human pathogens, Rickettsia prowazekii, the typhus pathogen, and B. quin-

282

13 The Genomes of Pathogenic Bartonella Species

tana, the trench fever agent. These are, however, not phylogenetically related but have developed similar lifestyles independently of each other. Phylogenetic reconstructions have predicted that the common ancestor of the a-proteobacteria contained 3000–5000 genes [1]. There are two different evolutionary routes from this ancestor to the modern species: intracellular bacteria like Bartonella, Rickettsia, and Wolbachia have evolved by gene loss, whereas the soilgrowing bacteria associated with plants have evolved by genome expansion [1]. 13.1.2 Hosts and Vectors for Bartonella Species

The tree in Fig. 13.1 shows the phylogenetic relationship of the currently described Bartonella species, excluding Bartonella washoensis, for which there is very little sequence information available. Until recently, the genus was represented by only one species, B. bacilliformis, which has been recognized for almost 100 years. B. bacilliformis, which is probably human-specific, is the earliest diverging species in the tree. Other species are, with many exceptions, related in a way that mirrors the taxonomy of the host species, suggesting that parasites and hosts have evolved together for some time. The degree of species specificity varies from strict specialists with a single host and a single vector to generalists that have been isolated from several host species. B. henselae and Bartonella koehlerae infect felines. B. koehlerae, which appears to be rare, has been isolated from domestic and stray cats [3, 4]. B. henselae has the cat as its natural reservoir, not only domestic cats but also other feline species such as lions and cheetahs [5]. B. quintana is closely related to B. henselae and B. koehlerae, but has humans as its natural reservoir. This relationship is probably the result of cross-transmission from cats to humans. Another species, Bartonella clarridgeiae, has been isolated from domestic cats [6, 7] but is not related to species in the cat group, neither does it reliably cluster with any other Bartonella species. Bartonella weissi has been isolated both from cats and ruminants (see below). One group of species is associated with rodents. Bartonella taylorii and Bartonella tribocorum were isolated from rats [8, 9]; Bartonella elizabethae was first isolated from a patient [10], then from a rat [11]. Bartonella grahamii has been isolated from several species of small rodents [12]. Members of the vinsonii group are found in both dogs and rodents. Bartonella vinsonii subsp. berkhoffii was isolated from dogs [13] and coyotes [14], and B. vinsonii subsp. vinsonii was isolated from voles [15]. B. vinsonii subsp. arupensis was isolated from the blood of a cattle rancher and proved similar to a strain found in mice exclusively in conjunction with Borrelia burgdorferi and Babesia microti [16]. Bartonella alsatica, which is related to the vinsonii group, was isolated from the blood of wild rabbits [17]. Bartonella doshiae, which is not closely related to any other Bartonella species, was isolated from voles [8]. Another group of related species is predominately associated with ruminants; Bartonella chomelii and Bartonella bovis were isolated from domestic cattle [18, 19]. Bartonella schoenbuchii and Bartonella capreoli were isolated from wild roe dear

13.1 Introduction

Fig. 13.1 Bartonella phylogeny and host specificity. Maximum likelihood tree of Bartonella species, inferred from a concatenated alignment of six nucleotide genes (ftsZ, groEL, 16S rRNA, ITS1, rpoB, and rnpB), using the program PHYML [2]. A line is drawn between each Bartonella species and the larger taxonomic group of reservoir species from which the Bartonella species have been isolated.

[18, 20]. In this group there are also species associated with other hosts: Bartonella birtlesii, which was isolated from mice [18], and B. weissi, which has been isolated both from domestic cats [21] and from a number of wild and domestic ruminants [22]. Some of the assumptions about reservoir hosts are based on isolation alone, and we cannot be sure that all relationships represent real reservoir–pathogen

283

284

13 The Genomes of Pathogenic Bartonella Species

associations. Bartonella has a large potential to exploit alternative hosts, so the host–parasite relationships in Fig. 13.1 may be the result of cross-transmissions, both incidental and permanent. For example, seven species with animal reservoir hosts have been implicated in human disease [23]. Likewise, B. henselae and B. clarridgeiae have been isolated from dogs [24, 25], and B. henselae has been isolated from mice [26]. Finally, species specificity may, at least to some extent, be associated with the arthropod vector rather than with the mammalian host.

13.2 Bartonella Species and Pathogenicity 13.2.1 Infection of Reservoir and Incidental Host

Bartonella species are usually inoculated through feeding of the arthropod vector on the mammal host. In the reservoir host, Bartonella species occupy a unique niche in the blood [27, 28] where they invade and persistently colonize mature erythrocytes. The intraerythrocyte habitat is thought to protect the Bartonella organisms from the host immune response, and also guarantees transmission by the vector. Experiments with animal models have shown that before the invasion of erythrocytes, the bacteria colonize a yet unknown primary niche [29], where they replicate during the first days of infection. There is some evidence suggesting that endothelial cells represent the primary niche [30]. Approximately 5 days after infection the bacteria reappear in the bloodstream and adhere to and invade erythrocytes, where they remain during the lifespan of the erythrocyte [30]. Bacterial infections in the blood are normally associated with severe disease, but Bartonella species infection in the reservoir host is in most cases asymptomatic. B. henselae infections in humans may cause cat-scratch disease, typically manifested as regional swollen lymph nodes. The disease is spread from cats to humans indirectly by the cat flea or directly by a cat bite or scratch. In the United States, B. henselae causes 22 000 diagnosed cases of cat-scratch disease each year. Bartonella species that incidentally infect humans, like B. henselae, do not colonize erythrocytes but vascular endothelial cells. Studies of Bartonella infection of cultured human umbilical vein endothelial cells (HUVEC) have revealed two distinct routes of entry, both induced by the bacterium. The classical route of uptake is reminiscent of bacterium-directed phagocytosis. In B. bacilliformis, uptake is triggered by key regulators of actin reorganization [31] . B. henselae uses an additional uptake route, where aggregates are formed on the cell surface and are subsequently engulfed [32]. Both processes are dependent on actin rearrangements.

13.2 Bartonella Species and Pathogenicity

13.2.2 Bartonella Species as Human Pathogens

The two human pathogens B. quintana and B. bacilliformis and the feline species B. henselae are responsible for most human Bartonella infections. In addition to these, at least five other species have been associated with human disease [23]. Disease manifestations depend on the immune status of the host; in an immunocompetent individual the infection has only local symptoms and can usually be controlled, whereas it can have a severe outcome in an immunocompromised person. B. quintana is the causative agent of trench fever, a disease that affected more than one million soldiers during World War I and is now reemerging among homeless and alcoholic individuals. However, many infected persons develop a chronic asymptomatic carrier state. The bacteria are transmitted among humans via the human body louse, and may persist in the blood long after all clinical signs have disappeared. In immunocompromised individuals, especially HIV-infected patients, B. quintana can cause bacillary angiomatosis. Symptoms include tumorlike lesions of the skin, and the infection is usually fatal unless treated with antibiotic therapy. Like B. quintana, B. henselae can also cause bacillary angiomatosis in immunocompromised persons and induce vascular proliferation in the liver and the spleen (bacillary peliosis), resulting in the formation of blood-filled cysts. B. bacilliformis is the agent of Carrin’s disease, the spread of which is limited to the areas where the sand fly is found, primarily the Peruvian Andes. Carrin’s disease is a biphasic disease consisting of an acute, highly fatal febrile anemia resulting from the invasion of erythrocytes, and a chronic phase where bacteria colonize the vascular endothelium and cause vasoproliferative eruptions of the skin. B. bacilliformis is probably the only Bartonella species that triggers hemolysis, i.e., the rupture of erythrocytes [29]. B. quintana, B. bacilliformis, and B. henselae are unique among all bacterial pathogens in that they induce proliferation of their host cells. Several strategies are involved: in addition to stimulating proliferation of endothelial cells, these bacteria are also able to inhibit endothelial cell apoptosis [33] and induce proinflammatory activity through activation of the transcription factor NF-jB [34]. The resulting tumor-like lesions normally comprise proliferating endothelial cells, bacteria, and infiltrates of macrophages/monocytes and neutrophils. A nonfimbral adhesin, designated Bartonella adhesin (BadA), or variably expressed outer membrane proteins (Vomp) mediate binding to endothelial cells and is immunodominant in infected patients, with a potential role in the induction of vasoproliferative disorders [35]. Variable expression of B. quintana Vomp family members in a macaque animal model appears to be mediated by gene deletion [36].

285

286

13 The Genomes of Pathogenic Bartonella Species

13.3 The Bartonella Genomes

Complete genome sequence data are available for B. henselae (Houston-1 strain) and B. quintana (Toulouse strain) [37]. B. henselae and B. quintana are closely related to each other (Fig. 13.1), but differ in host and vector preferences. The B. henselae genome is 1.93 Mbp long with 1491 genes, as compared to 1.58 Mbp and 1143 genes in B. quintana. There is a striking similarity in the identity of genes in the different functional categories for B. henselae and B. quintana, with one exception: genes belonging in phage and pathogenicity categories are far more common in B. henselae. Most of these are found on one of three genomic islands or on a prophage, all unique to B. henselae. The close relatives Brucella melitensis and Brucella suis have two replicons, of 2.1 Mbp and 1.2 Mbp respectively [38, 39]. Both sequenced Bartonella genomes are reduced versions of chromosome I from Brucella. Interestingly, they also contain a segment of approximately 280 kbp in B. henselae and 200 kbp in B. quintana containing Bartonella-specific genes as well as genes showing similarity to genes located on chromosome II of B. melitensis. It is speculated that this segment was acquired by the integration of an auxiliary replicon in a common ancestor of the Bartonella species [37]. The segment has a higher noncoding content than the rest of the genome, in B. quintana mounting to 50%, suggesting that it is being degraded. Because the major difference between the two species is in the chromosome II-like segment and the genomic islands, these will be discussed in greater detail below.

13.4 Genomic Islands and Phages

Microbes often carry a substantial set of genes that are phylogenetically related to genes from distantly related species [40]. It is now widely recognized that many of these genes have been laterally transferred through integration of mobile DNA into a bacterial host chromosome. Clusters of laterally transferred genes were first discovered in pathogenic strains of bacterial species, whilst they were absent from nonpathogens [41, 42], and were therefore termed pathogenicity islands. However, it now seems clear that the concept of islands must be extended to nonpathogens, and could in principle be any distinct and unstable piece of DNA encoding genes used to exploit new environments. One example is the 500-kbp island capable of converting nonsymbiotic strains of the a-proteobacterium Mesorhizobium into legume symbionts [43]. Therefore, “genomic island” is a more appropriate name [44]. In addition to carrying host-adaptive genes and being present in some strains but not in others, genomic islands often have a mosaic structure, carrying cryptic or functional genes encoding integrases or other mobility factors, and have a base composition that differs from the rest of the genome [45]. Genomic islands may

13.5 Genomic Islands and Phages in Bartonella Species

be introduced into the genome via bacteriophages, plasmids, or other accessory DNA [45]. The most commonly observed features associated with genomic islands are phage integrases, tRNAs, and flanking direct repeats [46, 47]. These characteristics suggest a phage origin, since phages are known to integrate site-specifically into tRNAs. Phages are prevalent in nature and their role in horizontal transfer of genes between bacterial hosts in natural ecosystems is probably very important [48–51]. Phage genomes are mosaic with potential access to a large common gene pool, and recombination between phages appears to be frequent [52–55]. It is therefore likely that they provide a source of innovation, in which new variants and combinations of host-adaptive genes can appear. Several authors have put forward the idea that phages play a role in the evolution of their host [39, 56–58]. The pattern of presence and absence of phages in closely related strains and species suggests that they are generally short-lived in a bacterial genome. This could be the result of a deletion bias in the host, selected as a mechanism for removal of deleterious parasite-encoded DNA [59]. Of course, prophage DNA may be both deleterious and adaptive; the part of the phage that confers no advantage to the host is at most neutral and perhaps a burden. Most of the prophage genes (like the phage structural genes) are expected to be selectively neutral to the bacterium, and once a prophage is immobilized, nonessential parts will be eliminated by purifying selection. This is probably the reason for the observed relationship between genomic islands and phages; genomic islands might be remnants of phages that have kept the host-adaptive genes and lost the phage structural proteins.

13.5 Genomic Islands and Phages in Bartonella Species

The 350-kbp difference in size between B. henselae and B. quintana is almost exclusively due to the presence of large genomic islands in B. henselae along with a higher extent of genetic redundancy in this genome. The four largest islands are a 55-kbp prophage and three genomic islands of size 72 kbp, 34 kbp, and 9 kbp respectively. Additionally, there are 68 smaller unique regions in B. henselae (not found in B. quintana), referred to here as islets (A.C. Frank and S.G.E Andersson, unpublished results). These are 3.1 kb long on average. As many as 17 of 44 tRNAs in the B. henselae genome are associated with an island or islet, highlighting their role as anchors for foreign DNA. There appears to be a close relationship between the prophage and the islands. All islands carry genes or remnants of genes that are present on the prophage, and there is also a high degree of repetition within and between the islands. In total, 46% of the genes from these regions belong to a repeat family, and 73% of the repeat families have a member from the phage or one of the islands. In addition, many of the repeated genes in the prophage and the islands have paralogs in pseudogenes or truncated copies within the islands themselves or in the islets.

287

288

13 The Genomes of Pathogenic Bartonella Species

13.5.1 The B. henselae Prophage

The prophage is an evolutionary mosaic, flanked on one side by a gene coding for Leu tRNA and on the other side by an integrase gene. Only a few of the phage genes have homologues in other a-proteobacteria; these include the integrase gene which has homologues in all sequenced close relatives, two blocks of genes with similarity to a putative prophage in Wolbachia pipientis [60, 61], and a gene coding for single-stranded DNA-binding protein. Although there are a number of unknown genes, most genes encoded by the prophage have an assigned phagerelated function. For instance, there are 12 copies of a phage antirepressor and some phage-related proteins. Taken together, this suggests that the inserted prophage may be able to direct the synthesis of a functional phage particle. Interestingly, phage-like particles have been detected in several Bartonella species including B. quintana, B. henselae, and B. bacilliformis [61–63]. The particles contained 14-kbp linear segments of double-stranded heterologous DNA, and it seems likely that the structural proteins of the phage particles should be encoded by the prophage. However, the reported presence of a 14-kbp fragment also in B. quintana [63] is surprising since B. quintana does not carry the prophage. It remains to be determined whether the phage particles and the 14-kbp fragment are related to the B. henselae prophage. The presence of the integrase in other a-proteobacteria and the similarities to Wolbachia prophage genes suggest that the common ancestor of the modern a-proteobacteria was infected by a phage, or possibly that the phages infecting Bartonella and Wolbachia species have access to a common gene pool. Interestingly, Bartonella, Wolbachia, and Rickettsia have all been identified in French cat fleas by polymerase chain reaction (PCR) using species-specific primers [64]. Thus, the cat flea may be involved in the horizontal transmission of sequences from Rickettsia, Wolbachia, and Bartonella via phage infections. 13.5.2 B. henselae Genomic Islands and Islets

The three B. henselae genomic islands are all flanked by tRNAs, encode mobility factors, and have a base composition slightly different from the surrounding regions. The two largest islands have a defunct copy of the prophage integrase, suggesting a phage origin. The intact prophage is most likely the result of a siteLeu specific integration into a tRNA gene [37]. The islands, however, appear to have used different integration sites since they are found next to other tRNA genes. The two largest islands encode several copies of a gene cluster consisting of genes coding for filamentous hemagglutinin (fhaB) and an outer-membrane hemolysin activator/transporter (fhaC/hecB). These form a two-partner secretion system, where fhaC/hecB product mediates transport of the fhaB product. The 9-kbp island also contains hecB, adjacent to a presumably inactivated copy of an fhaB homologue.

13.5 Genomic Islands and Phages in Bartonella Species

There are many phage genes present on the islands, both with and without homologues on the prophage. Examples of genes that are not present on the prophage include multiple copies of a gene homologous to a virulence-associated gene from B. melitensis and two apparently truncated genes related to a virulence locus in Photorhabdus luminescens, a symbiont of nematodes and pathogen of insects [65]. A few other genes related to the parasitism of insects are found outside the genomic islands, e.g., six copies of the phage lysozyme, which is related to a protein from a plasmid-encoded virulence determinant from the insect pathogen Serratia entomophila. This plasmid also encodes proteins homologous to insecticidal toxins from the virulence locus of P. luminescens [66], suggesting horizontal transfer between the genomes of insect-associated species. Also encoded on the islands are a number of genes that have paralogues in the backbone of the genome, i.e., the large segments found in both B. henselae and B. quintana. This backbone copy is often located next to an integrase remnant, as for instance two inactivated copies of a DNA helicase on the largest genomic island and a region of ten phage genes on the intermediate island, suggesting that they might have been duplicated and relocated in the genome by the action of a phage or island. The islands also harbor a few plasmid maintenance genes, suggesting that at least some of the island genes were originally plasmid-borne. The largest islands both contain blocks of plasmid genes, and like the prophage they encode a homologue of vapA, a killer suppressor protein involved in plasmid maintenance, from a plasmid-borne virulence-associated island of the animal pathogen Dichelobacter nodosus. Of the 68 B. henselae islets, 41 are remnants of genes present on the phage or large genomic islands, 18 carry genes or pseudogenes found elsewhere in the B. henselae genome, and 9 seem to be strictly noncoding, possibly representing former genes that are degraded beyond recognition. Twenty-five of the phage-related islets contain only an integrase remnant and are not located next to a tRNA, suggesting that they were integrated with a different mechanism than the one used by the prophage. In total, there are 43 integrase copies in various stages of decay in the B. henselae genome and they all appear to be closely related to the prophage integrase. However, since some of them are heavily degraded, it cannot be ruled out that they are remnants of a different, albeit related phage. Many of the integrase remnants represent only the 5¢ part of the gene [37]. A possible explanation is that this region carries the leftward phage promoter that is altering the expression of downstream genes. B. quintana has only four integrase remnants; however, there is evidence that B. quintana once had a larger number of integrase genes. For example, B. quintana spacers orthologous to B. henselae integrase remnants are significantly longer than other spacers in the genome, perhaps indicating that they represent degraded genes [37].

289

290

13 The Genomes of Pathogenic Bartonella Species

13.5.3 B. quintana Harbors Remnants of the B. henselae Islands

B. quintana contains 26 islets with a total size of 79 kbp [37]. Very few genes in B. quintana lack a gene or pseudogene homologue in B. henselae, and only two of them show similarity to other organisms. One of them is a homologue to yopP from Yersinia enterocolitica, which is located between a prophage lysozyme homologue and tRNA gene. YopP is encoded on a virulence plasmid [67] that codes for a secreted effector molecule causing apoptosis in macrophages [68, 69]. The other is a putative toxin/hemolysin secretion transporter, present in Sinorhizobium loti, Gln that is located adjacent to tRNA . The location of these two genes next to tRNAs and the association with phage remnants strongly suggest that they were transferred horizontally to B. quintana. However, it is not known whether this happened before or after the divergence of B. henselae and B. quintana. The breakpoints for rearrangements between B. henselae and B. quintana coincide with long repeated sequences that consist of both genes and noncoding sequences located at the genomic islands in B. henseale. About half of the B. quintana islets contain remnants of the prophage or the B. henselae islands, and many are located at the breakpoint positions, indicating that the islands were present in a common ancestor of the two species. Comparison with B. melitensis suggests that recombination at repeated sequences on the islands mediated excision of sequence fragments and rearrangements in B. quintana [37]. 13.5.4 Role of Phages and Islands in the Evolution of Bartonella

Comparisons of B. henselae and B. quintana have revealed that most of the differences between them are related to the B. henselae prophage. The largest genomic island unique to B. henselae shares extensive sequence homology and possibly integration mechanisms with the prophage, and many of the B. henselae islets carry remnants of what appears to be the same phage. Even the two B. quintanaspecific genes with assigned function are located adjacent to a phage and a tRNA gene respectively. All these phage remnants along with evidence of horizontal transfer possibly performed by the phage suggest that the phage has played a crucial role in the evolution of Bartonella. One of the most striking features of the prophage and the genomic islands is their high degree of repetition. This redundancy could be the result of multiple phage integration events, or intragenomic duplication or recombination events, or a combination of the above. Duplication of phage genes may be beneficial for the phage or its host; slightly different copies of the fhaC/hecB–fhaB cluster could for instance be used in the colonization of different bacterial species, vectors, or host cells, or in different stages of infection. It is possible that the prophage proteins regulate the transcription of the virulence genes located on the islands, which would imply a selective advantage associated with their overexpression.

13.6 The Chromosome II-Like Segment in Bartonella

The presence of both a complete prophage and heavily degraded integrase copies suggests on the one hand that the phage could still be active, and on the other hand that the phage association is old and could have contributed to diversification of the whole Bartonella genus. Since the variation at the sequence level between Bartonella species is low, it is likely that the genetic basis for differences in host preference and pathogenicity between them lies in the presence or absence of clusters of virulence genes, like the islands in B. henselae. It is possible that phages are involved not only in the diversification of Bartonella species, but also in their evolution from the common ancestor of Brucella and Bartonella and, before that, from plant pathogens and symbionts. Considering the whole group of a-proteobacteria, it is clear that much of the diversity in gene content is due to the presence of highly dynamic auxiliary replicons. In the a-proteobacteria, auxiliary replicons are much less conserved than main chromosomes, suggesting that they are more amenable to influx of genetic material. It is possible that the acquisition of the prophage and its contributed gene sets, which potentially was an important step in the formation of the Bartonella genus, was mediated by an auxiliary replicon which was later integrated in the main chromosome. In plant-associated bacteria, genes involved in host interactions have been found located at mobile and unstable genetic elements [70]. Likewise, in Bartonella, many species-specific genes potentially involved in interaction with the animal host are located in a region that may have originated as an integrated auxiliary replicon.

13.6 The Chromosome II-Like Segment in Bartonella

The region in B. henselae and B. quintana with the majority of Bartonella-specific genes is located in the fourth quarter of the genome (Fig. 13.2). This segment of the genome has a slightly elevated G+C content, a low coding potential (50–60%), and a higher than normal density of short tandem repeats and tandem duplication remnants. Most of the B. henselae islets are also located here, indicative of continuous sequence degradation in this region of the B. quintana genome (from which the corresponding islets appear to have been lost). Flanked by one of the two rRNA operons, and containing the other, is a segment of 280 kbp in B. henselae and 200 kbp in B. quintana with as many as 20 genes with top hits to chromosome II (Chr II) in B. suis, compared to only two genes that phylogenetically cluster with the main chromosome (Chr I) [37]. This stands in contrast to the rest of the genome, where most genes are orthologues of B. suis Chr I genes. A suggested explanation for the observed pattern is integration of an auxiliary replicon (or parts thereof) analogous to Chr II in Brucella, by homologous recombination at the rRNA operons in an ancestor of Bartonella, followed by sequence loss and rearrangements [37].

291

292

13 The Genomes of Pathogenic Bartonella Species

Fig. 13.2 Genome circle. Schematic view of the genomes of B. henselae and B. quintana, showing the location of the chromosome II region, the prophage, and the genomic islands in B. henselae and remnants thereof in B. quintana, and the known virulence factors such as BadA/Vomp, Trw, and VirB-Bep.

The Chr II-like region in Bartonella harbors many genes that are not present in other sequenced a-proteobacteria. These parts may represent islands that were acquired before the divergence of B. henselae and B. quintana and may yield clues to how Bartonella evolved to become an intraerythrocytic parasite. Located in a segment with the characteristics of a genomic island are a number of surface protein homologues and many short sequences repeated in tandem. Because of their hypermutability, tandem repeats can alter the expression of associated genes, a mechanism used by pathogens to vary expression of surface-exposed antigens. 13.6.1 Type IV Secretion Systems in Bartonella Species

Located in the Chr II-like segment in Bartonella species is an operon coding for type IV secretion system (T4SS) called the virB-D4 operon, which is positioned between the rRNA operons (Fig. 13.2). Another T4SS is encoded by the trw operon

13.6 The Chromosome II-Like Segment in Bartonella

which is located downstream of the Chr II-like region (Fig. 13.2). Many bacteria that live in close relationship with eukaryotic cells use T4SSs to transfer DNA and macromolecules to the host [71]. Both TFSSs in Bartonella are required for invasion of host cells, and both are recent imports to the Bartonella genome, with plasmid conjugation systems as their closest relatives [72]. Knowledge about the origin and function of these systems is highly relevant to understanding how Bartonella adapted to its specific ecological niche. 13.6.2 The virB-D4 Operon

The VirB-D4 T4SS in Bartonella consists of an operon with 10 genes (virB2–10 and the downstream-located virD gene), and is most closely related to a conjugation system encoded by the A. tumefaciens plasmid AT [72]. The VirB-D4 T4SS has been shown to mediate most of the virulence attributes associated with the interaction of B. henselae and human endothelial cells [73]. These include rearrangement of the actin cytoskeleton, resulting in the formation of bacterial aggregates and their internalization by the invasome structure; proinflammatory activation, leading to cell adhesion molecule expression and chemokine secretion; and inhibition of apoptotic cell death, resulting in enhanced endothelial cell survival. The virB effector molecules are encoded by a set of duplicated genes located immediately downstream of the virB operon [74]. The C-terminal of these Bartonella-translocated effector proteins (Beps) harbors an intracellular delivery domain and a short, positively charged tail sequence that together constitutes a transfer signal. 13.6.3 The trw Operon

The other T4SS (encoded by the trw operon) is very closely related to the conjugation system of E. coli plasmid R388 (up to 80% amino acid identity) and must have been acquired rather recently by horizontal gene transfer [72, 75]. This system is essential for establishing intraerythrocytic bacteremia, and the expression has been shown to be upregulated during endothelial cell infection [75]. A unique feature of this operon is that genes coding for pilus structures are tandemly duplicated, possibly reflecting a need for variably expressed pilus genes as a mechanism to evade the host immune system. Since this system lacks the VirD4 homologue required for substrate export by most other T4SS, it is likely that its primary role is to establish contact with erythrocytes by attaching to them with the pilus structure [30].

293

294

13 The Genomes of Pathogenic Bartonella Species

13.7 B. quintana’s Evolution into a Human Pathogen

The phylogenetic grouping of B. quintana suggests that it is the result of a host switch from cats to humans (Fig. 13.1). Bartonella infections are in general asymptomatic in the reservoir host, probably as a result of adaptation. That B. quintana causes disease in the human host supports the idea of recently switched reservoirs. The genomic comparisons suggest that B. quintana, which is a genomic subset of B. henselae, has evolved through loss, mainly of the prophage and associated islands. However, the order of events is still unclear. B. quintana might have evolved from a cat strain that incidentally infected a human, lost the genes needed for colonization of the cat, and became trapped in humans. Alternatively, the process could have started with a cat strain acquiring a factor that enabled it to colonize a human-specific vector. Initiation of a different transmission route would eventually result in the loss of regions dispensable for cat colonization. The timing of the event is also unclear. Synonymous substitution frequencies between B. quintana and B. henselae are fairly high (Ks = 0.62 on average). Thus, if cross-species transmission predates the divergence of B. henselae and B. quintana, we would expect to find Bartonella in other primates as well. However, to date, there are few studies in nonhuman primates. Alternatively, a highly diverged feline strain may have switched hosts later on, for example as a result of cat domestication, which may have altered patterns of flea manifestation as well as the Bartonella species composition of felines. B. henselae is currently the most successful cat-infecting species, more frequently isolated than B. clarridgeiae, B. weissi and B. koehlerae. Thus, of the presumably many different Bartonella species infecting felines at the time of the cross-transmission event, B. quintana-like clones may have been outcompeted by the more successful B. henselae-like species, explaining why there seem currently to be no close relatives of B. quintana circulating among domestic cats. More extensive studies of the genetic diversity of Bartonella species in wild felines are necessary to make predictions about the timing and order of events that induced permanent host switching in the case of B. quintana. Also, the question of why B. quintana has lost the phage and island genes remains. To the extent that these sequences are under selective constraints, different retention patterns may result from lifestyle differences. If so, the presence of genomic islands in B. henselae should perhaps be sought in selective demands posed by persistent growth in the feline host. If, on the other hand, these sequences are not under selective constraints, their absence in B. quintana may be explained by a more efficient removal of junk DNA as a result of its particular host and vector preferences. Interestingly, lice have been shown to have higher rates of evolution than mammals and other insects, possibly as a result of their short generation time and/or transmission dynamics [76]. Presumably, factors that lead to high rates of evolution in the insect vector, such as the shorter generation time, could also cause high rates of evolution in the microbial parasite. Possibly, B. quintana has gone through more cycles of infections since the divergence of the two species.

References

13.8 Conclusions and Future Perspectives

To conclude, comparison of the B. henselae and B. quintana genomes suggests that phages, plasmids, and auxiliary replicons have played an essential role in the evolution of Bartonella species. However, it remains to be determined whether phages and their associated sequences were the major contributors to the diversification of all Bartonella species. It should also be of interest to find out whether the genomic islands and the prophage can be transmitted to other species and if they are related to the reported phage particles. Also of relevance is to determine whether they interact genetically in expression and/or transduction of their genes, and whether they serve as mediators of lateral transfer of genes between Bartonella strains and species in nature. Identification and analysis of prophages in other Bartonella species will also tell us if there has been interspecies transfer of phages, as was reported for the phage of the insect endosymbiont Wolbachia [60]. Since B. henselae and B. quintana are very closely related, they only represent a small fraction of the Bartonella genetic space. Comparative genomics with more species will likely reveal important clues about the evolution and pathogenic mechanisms of Bartonella. Furthermore, since the laboratory strains have been propagated for some time, it is not excluded that sequences necessary for host colonization and maintenance, but not for survival in the laboratory, have been deleted. To fully appreciate the genetic diversity of the genus Bartonella, the natural environment of the species must also be sampled. There is an increasing recognition of Bartonella species as human pathogens, with an expanding number of species causing disease in humans and novel symptoms described. The type IV secretion system VirB-D4 has been shown to play an important role in the pathogenicity of several Bartonella species, and is therefore a promising target for vaccine development. Additional targets are likely to be discovered in the future following the many functional genomics initiatives that have been started for this exciting group of bacteria.

References 1 Boussau, B., E.O. Karlberg, A.C. Frank,

B.A. Legault, and S.G. Andersson. 2004. Computational inference of scenarios for alpha-proteobacterial genome evolution. Proc Natl Acad Sci U S A 101:9722–9727. 2 Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704. 3 Avidor, B., M. Graidy, G. Efrat, C. Leibowitz, G. Shapira, A. Schattner,

O. Zimhony, and M. Giladi. 2004. Bartonella koehlerae, a new cat-associated agent of culture-negative human endocarditis. J Clin Microbiol 42:3462–3468. 4 Droz, S., B. Chi, E. Horn, A.G. Steigerwalt, A.M. Whitney, and D.J. Brenner. 1999. Bartonella koehlerae sp. nov., isolated from cats. J Clin Microbiol 37:1117–1122. 5 Molia, S., B.B. Chomel, R.W. Kasten, C.M. Leutenegger, B.R. Steele, L. Marker, J.S. Martenson, D.F. Keet, R.G.

295

296

13 The Genomes of Pathogenic Bartonella Species

6

7

8

9

10

11

12

Bengis, R.P. Peterson, L. Munson, and S.J. O’Brien. 2004. Prevalence of Bartonella infection in wild African lions (Panthera leo) and cheetahs (Acinonyx jubatus). Vet Microbiol 100:31–41. Lawson, P.A., and M.D. Collins. 1996. Description of Bartonella clarridgeiae sp. nov. isolated from the cat of a patient with Bartonella henselae septicemia. Med Microbiol Lett 5:64–73. Rolain, J.M., C. Locatelli, L. Chabanne, B. Davoust, and D. Raoult. 2004. Prevalence of Bartonella clarridgeiae and Bartonella henselae in domestic cats from France and detection of the organisms in erythrocytes by immunofluorescence. Clin Diagn Lab Immunol 11:423–425. Birtles, R.J., T.G. Harrison, N.A. Saunders, and D.H. Molyneux. 1995. Proposals to unify the genera Grahamella and Bartonella, with descriptions of Bartonella talpae comb. nov., Bartonella peromysci comb. nov., and three new species, Bartonella grahamii sp. nov., Bartonella taylorii sp. nov., and Bartonella doshiae sp. nov. Int J Syst Bacteriol 45:1–8. Heller, R., P. Riegel, Y. Hansmann, G. Delacour, D. Bermond, C. Dehio, F. Lamarque, H. Monteil, B. Chomel, and Y. Piemont. 1998. Bartonella tribocorum sp. nov., a new Bartonella species isolated from the blood of wild rats. Int J Syst Bacteriol 48 Pt 4:1333–1339. Daly, J.S., M.G. Worthington, D.J. Brenner, C.W. Moss, D.G. Hollis, R.S. Weyant, A.G. Steigerwalt, R.E. Weaver, M.I. Daneshvar, and S.P. O’Connor. 1993. Rochalimaea elizabethae sp. nov. isolated from a patient with endocarditis. J Clin Microbiol 31:872–881. Brenner, D.J., S.P. O’Connor, H.H. Winkler, and A.G. Steigerwalt. 1993. Proposals to unify the genera Bartonella and Rochalimaea, with descriptions of Bartonella quintana comb. nov., Bartonella vinsonii comb. nov., Bartonella henselae comb. nov., and Bartonella elizabethae comb. nov., and to remove the family Bartonellaceae from the order Rickettsiales. Int J Syst Bacteriol 43:777– 786. Birtles, R.J., T.G. Harrison, and D.H. Molyneux. 1994. Grahamella in small woodland mammals in the U.K.: isola-

13

14

15

16

17

18

tion, prevalence and host specificity. Ann Trop Med Parasitol 88:317–327. Kordick, D.L., B. Swaminathan, C.E. Greene, K.H. Wilson, A.M. Whitney, S. O’Connor, D.G. Hollis, G.M. Matar, A.G. Steigerwalt, G.B. Malcolm, P.S. Hayes, T.L. Hadfield, E.B. Breitschwerdt, and D.J. Brenner. 1996. Bartonella vinsonii subsp. berkhoffii subsp. nov., isolated from dogs; Bartonella vinsonii subsp. vinsonii; and emended description of Bartonella vinsonii. Int J Syst Bacteriol 46:704–709. Chang, C.C., R.W. Kasten, B.B. Chomel, D.C. Simpson, C.M. Hew, D.L. Kordick, R. Heller, Y. Piemont, and E.B. Breitschwerdt. 2000. Coyotes (Canis latrans) as the reservoir for a human pathogenic Bartonella sp.: molecular epidemiology of Bartonella vinsonii subsp. berkhoffii infection in coyotes from central coastal California. J Clin Microbiol 38:4193– 4200. Weiss, E., and G.A. Dasch. 1982. Differential characteristics of strains of Rochalimaea: Rochalimaea vinsonii sp. nov., the Canadian vole agent. Int J Syst Bacteriol 32:302–315. Welch, D.F., K.C. Carroll, E.K. Hofmeister, D.H. Persing, D.A. Robison, A.G. Steigerwalt, and D.J. Brenner. 1999. Isolation of a new subspecies, Bartonella vinsonii subsp. arupensis, from a cattle rancher: identity with isolates found in conjunction with Borrelia burgdorferi and Babesia microti among naturally infected mice. J Clin Microbiol 37:2598– 2601. Heller, R., M. Kubina, P. Mariet, P. Riegel, G. Delacour, C. Dehio, F. Lamarque, R. Kasten, H.J. Boulouis, H. Monteil, B. Chomel, and Y. Piemont. 1999. Bartonella alsatica sp. nov., a new Bartonella species isolated from the blood of wild rabbits. Int J Syst Bacteriol 49(Pt 1):283–288. Bermond, D., H.J. Boulouis, R. Heller, G. Van Laere, H. Monteil, B.B. Chomel, A. Sander, C. Dehio, and Y. Piemont. 2002. Bartonella bovis Bermond et al. sp. nov. and Bartonella capreoli sp. nov., isolated from European ruminants. Int J Syst Evol Microbiol 52:383–390.

References 19 Maillard, R., P. Riegel, F. Barrat,

20

21

22

23

24

25

26

27

28

C. Bouillin, D. Thibault, C. Gandoin, L. Halos, C. Demanche, A. Alliot, J. Guillot, Y. Piemont, H.J. Boulouis, and M. Vayssier-Taussat. 2004. Bartonella chomelii sp. nov., isolated from French domestic cattle (Bos taurus). Int J Syst Evol Microbiol 54:215–220. Dehio, C., C. Lanz, R. Pohl, P. Behrens, D. Bermond, Y. Piemont, K. Pelz, and A. Sander. 2001. Bartonella schoenbuchii sp. nov., isolated from the blood of wild roe deer. Int J Syst Evol Microbiol 51:1557–1565. Marano, N., P. Jameson, E.L. Marston, D.C. Jones, C. Green, and R. Regnery. Unpublished. Chang, C.C., B.B. Chomel, R.W. Kasten, R.M. Heller, H. Ueno, K. Yamamoto, V.C. Bleich, B.M. Pierce, B.J. Gonzales, P.K. Swift, W.M. Boyce, S.S. Jang, H.J. Boulouis, Y. Piemont, G.M. Rossolini, M.L. Riccio, G. Cornaglia, L. Pagani, C. Lagatolla, L. Selan, and R. Fontana. 2000. Bartonella spp. isolated from wild and domestic ruminants in North America. Emerg Infect Dis 6:306–311. Boulouis, H.J., C.C. Chang, J.B. Henn, R.W. Kasten, and B.B. Chomel. 2005. Factors associated with the rapid emergence of zoonotic Bartonella infections. Vet Res 36:383–410. Mexas, A.M., S.I. Hancock, and E.B. Breitschwerdt. 2002. Bartonella henselae and Bartonella elizabethae as potential canine pathogens. J Clin Microbiol 40:4670–4674. Gundi, V.A., O. Bourry, B. Davous, D. Raoult, and B. La Scola. 2004. Bartonella clarridgeiae and B. henselae in dogs, Gabon. Emerg Infect Dis 10:2261–2262. Engbaek, K., and P.A. Lawson. 2004. Identification of Bartonella species in rodents, shrews and cats in Denmark: detection of two B. henselae variants, one in cats and the other in the long-tailed field mouse. Apmis 112:336–341. Rolain, J.M., C. Foucault, R. Guieu, B. La Scola, P. Brouqui, and D. Raoult. 2002. Bartonella quintana in human erythrocytes. Lancet 360:226–228. Rolain, J.M., B. La Scola, Z. Liang, B. Davoust, and D. Raoult. 2001. Immunofluorescent detection of intraerythro-

29

30

31

32

33

34

35

36

cytic Bartonella henselae in naturally infected cats. J Clin Microbiol 39:2978– 2980. Schulein, R., A. Seubert, C. Gille, C. Lanz, Y. Hansmann, Y. Piemont, and C. Dehio. 2001. Invasion and persistent intracellular colonization of erythrocytes. A unique parasitic strategy of the emerging pathogen Bartonella. J Exp Med 193:1077–1086. Dehio, C. 2004. Molecular and cellular basis of bartonella pathogenesis. Annu Rev Microbiol 58:365–390. Verma, A., and G.M. Ihler. 2002. Activation of Rac, Cdc42 and other downstream signalling molecules by Bartonella bacilliformis during entry into human endothelial cells. Cell Microbiol 4:557–569. Dehio, C., M. Meyer, J. Berger, H. Schwarz, and C. Lanz. 1997. Interaction of Bartonella henselae with endothelial cells results in bacterial aggregation on the cell surface and the subsequent engulfment and internalisation of the bacterial aggregate by a unique structure, the invasome. J Cell Sci 110(Pt 18):2141–2154. Kirby, J.E., and D.M. Nekorchuk. 2002. Bartonella-associated endothelial proliferation depends on inhibition of apoptosis. Proc Natl Acad Sci U S A 99:4656– 4661. Fuhrmann, O., M. Arvand, A. Gohler, M. Schmid, M. Krull, S. Hippenstiel, J. Seybold, C. Dehio, and N. Suttorp. 2001. Bartonella henselae induces NFkappaB-dependent upregulation of adhesion molecules in cultured human endothelial cells: possible role of outer membrane proteins as pathogenic factors. Infect Immun 69:5088–5097. Riess, T., S.G. Andersson, A. Lupas, M. Schaller, A. Schafer, P. Kyme, J. Martin, J.H. Walzlein, U. Ehehalt, H. Lindroos, M. Schirle, A. Nordheim, I.B. Autenrieth, and V.A. Kempf. 2004. Bartonella adhesin a mediates a proangiogenic host cell response. J Exp Med 200:1267–1278. Zhang, P., B.B. Chomel, M.K. Schau, J.S. Goo, S. Droz, K.L. Kelminson, S.S. George, N.W. Lerche, and J.E. Koehler. 2004. A family of variably expressed

297

298

13 The Genomes of Pathogenic Bartonella Species

37

38

39

40

41

42

outer-membrane proteins (Vomp) mediates adhesion and autoaggregation in Bartonella quintana. Proc Natl Acad Sci U S A 101:13630–13635. Alsmark, C.M., A.C. Frank, E.O. Karlberg, B.A. Legault, D.H. Ardell, B. Canback, A.S. Eriksson, A.K. Naslund, S.A. Handley, M. Huvet, B. La Scola, M. Holmberg, and S.G. Andersson. 2004. The louse-borne human pathogen Bartonella quintana is a genomic derivative of the zoonotic agent Bartonella henselae. Proc Natl Acad Sci U S A 101:9716–9721. DelVecchio, V.G., V. Kapatral, R.J. Redkar, G. Patra, C. Mujer, T. Los, N. Ivanova, I. Anderson, A. Bhattacharyya, A. Lykidis, G. Reznik, L. Jablonski, N. Larsen, M. D’Souza, A. Bernal, M. Mazur, E. Goltsman, E. Selkov, P.H. Elzer, S. Hagius, D. O’Callaghan, J.J. Letesson, R. Haselkorn, N. Kyrpides, and R. Overbeek. 2002. The genome sequence of the facultative intracellular pathogen Brucella melitensis. Proc Natl Acad Sci U S A 99:443–448. Paulsen, I.T., R. Seshadri, K.E. Nelson, J.A. Eisen, J.F. Heidelberg, T.D. Read, R.J. Dodson, L. Umayam, L.M. Brinkac, M.J. Beanan, S.C. Daugherty, R.T. Deboy, A.S. Durkin, J.F. Kolonay, R. Madupu, W.C. Nelson, B. Ayodeji, M. Kraul, J. Shetty, J. Malek, S.E. Van Aken, S. Riedmuller, H. Tettelin, S.R. Gill, O. White, S.L. Salzberg, D.L. Hoover, L.E. Lindler, S.M. Halling, S.M. Boyle, and C.M. Fraser. 2002. The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts. Proc Natl Acad Sci U S A 99:13148–13153. Ochman, H., J.G. Lawrence, and E.A. Groisman. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405:299–304. Hacker, J., M. Ott, G. Blum, R. Marre, J. Heesemann, H. Tschape, and W. Goebel. 1992. Genetics of Escherichia coli uropathogenicity: analysis of the O6:K15:H31 isolate 536. Zentralbl Bakteriol 276:165–175. Blum, G., M. Ott, A. Lischewski, A. Ritter, H. Imrich, H. Tschape, and J. Hacker. 1994. Excision of large DNA

43

44

45

46

47

48

49

50

51

52

53

regions termed pathogenicity islands from tRNA-specific loci in the chromosome of an Escherichia coli wild-type pathogen. Infect Immun 62:606–614. Sullivan, J.T., H.N. Patrick, W.L. Lowther, D.B. Scott, and C.W. Ronson. 1995. Nodulating strains of Rhizobium loti arise through chromosomal symbiotic gene transfer in the environment. Proc Natl Acad Sci U S A 92:8985–8989. Hacker, J., and E. Carniel. 2001. Ecological fitness, genomic islands and bacterial pathogenicity. A Darwinian view of the evolution of microbes. EMBO Rep 2:376–381. Hacker, J., and J.B. Kaper. 2000. Pathogenicity islands and the evolution of microbes. Annu Rev Microbiol 54:641– 679. Kaper, J.B., and J. Hacker. 1993. Pathogenicity islands and other mobile virulence elements. Oxford University Press, Oxford. Hou, Y.M. 1999. Transfer RNAs and pathogenicity islands. Trends Biochem Sci 24:295–298. Boyd, E.F., B.M. Davis, and B. Hochhut. 2001. Bacteriophage–bacteriophage interactions in the evolution of pathogenic bacteria. Trends Microbiol 9:137–144. Yin, X., and G. Stotzky. 1997. Gene transfer among bacteria in natural environments. Adv Appl Microbiol 45:153– 212. Jiang, S.C., and J.H. Paul. 1998. Gene transfer by transduction in the marine environment. Appl Environ Microbiol 64:2780–2787. Miller, R.V. 2001. Environmental bacteriophage–host interactions: factors contributing to natural transduction. Antonie Van Leeuwenhoek 79:141–147. Hendrix, R.W., J.G. Lawrence, G.F. Hatfull, and S. Casjens. 2000. The origins and ongoing evolution of viruses. Trends Microbiol 8:504–508. Hendrix, R.W., M.C. Smith, R.N. Burns, M.E. Ford, and G.F. Hatfull. 1999. Evolutionary relationships among diverse bacteriophages and prophages: all the world’s a phage. Proc Natl Acad Sci U S A 96:2192–2197.

References 54 Toussaint, A., and C. Merlin. 2002.

Mobile elements as a combination of functional modules. Plasmid 47:26–35. 55 Nilsson, A.S., and E. Haggard-Ljungquist. 2001. Detection of homologous recombination among bacteriophage P2 relatives. Mol Phylogenet Evol 21:259– 269. 56 Ohnishi, M., K. Kurokawa, and T. Hayashi. 2001. Diversification of Escherichia coli genomes: are bacteriophages the major contributors? Trends Microbiol 9:481–485. 57 Filee, J., P. Forterre, and J. Laurent. 2003. The role played by viruses in the evolution of their hosts: a view based on informational protein phylogenies. Res Microbiol 154:237–243. 58 Banks, D.J., S.B. Beres, and J.M. Musser. 2002. The fundamental contribution of phages to GAS evolution, genome diversification and strain emergence. Trends Microbiol 10:515–521. 59 Lawrence, J.G., R.W. Hendrix, and S. Casjens. 2001. Where are the pseudogenes in bacterial genomes? Trends Microbiol 9:535–540. 60 Masui, S., S. Kamoda, T. Sasaki, and H. Ishikawa. 2000. Distribution and evolution of bacteriophage WO in Wolbachia, the endosymbiont causing sexual alterations in arthropods. J Mol Evol 51:491–497. 61 Umemori, E., Y. Sasaki, K. Amano, and Y. Amano. 1992. A phage in Bartonella bacilliformis. Microbiol Immunol 36:731– 736. 62 Anderson, B., C. Goldsmith, A. Johnson, I. Padmalayam, and B. Baumstark. 1994. Bacteriophage-like particle of Rochalimaea henselae. Mol Microbiol 13:67–73. 63 Barbian, K.D., and M.F. Minnick. 2000. A bacteriophage-like particle from Bartonella bacilliformis. Microbiology 146(Pt 3):599–609. 64 Rolain, J.M., M. Franc, B. Davoust, and D. Raoult. 2003. Molecular detection of Bartonella quintana, B. koehlerae, B. henselae, B. clarridgeiae, Rickettsia felis, and Wolbachia pipientis in cat fleas, France. Emerg Infect Dis 9:338–342. 65 ffrench–Constant, R., N. Waterfield, P. Daborn, S. Joyce, H. Bennett, C. Au,

66

67

68

69

70

71

72

73

A. Dowling, S. Boundy, S. Reynolds, and D. Clarke. 2003. Photorhabdus: towards a functional genomic analysis of a symbiont and pathogen. FEMS Microbiol Rev 26:433–456. Hurst, M.R., T.R. Glare, T.A. Jackson, and C.W. Ronson. 2000. Plasmid-located pathogenicity determinants of Serratia entomophila, the causal agent of amber disease of grass grub, show similarity to the insecticidal toxins of Photorhabdus luminescens. J Bacteriol 182:5127–5138. Cornelis, G.R., A. Boland, A.P. Boyd, C. Geuijen, M. Iriarte, C. Neyt, M.P. Sory, and I. Stainier. 1998. The virulence plasmid of Yersinia, an antihost genome. Microbiol Mol Biol Rev 62:1315– 1352. Monack, D.M., J. Mecsas, N. Ghori, and S. Falkow. 1997. Yersinia signals macrophages to undergo apoptosis and YopJ is necessary for this cell death. Proc Natl Acad Sci U S A 94:10385–10390. Mills, S.D., A. Boland, M.P. Sory, P. van der Smissen, C. Kerbourch, B.B. Finlay, and G.R. Cornelis. 1997. Yersinia enterocolitica induces apoptosis in macrophages by a process requiring functional type III secretion and translocation mechanisms and involving YopP, presumably acting as an effector protein. Proc Natl Acad Sci U S A 94:12638– 12643. Sullivan, J.T., J.R. Trzebiatowski, R.W. Cruickshank, J. Gouzy, S.D. Brown, R.M. Elliot, D.J. Fleetwood, N.G. McCallum, U. Rossbach, G.S. Stuart, J.E. Weaver, R.J. Webby, F.J. De Bruijn, and C.W. Ronson. 2002. Comparative sequence analysis of the symbiosis island of Mesorhizobium loti strain R7A. J Bacteriol 184:3086–3095. Christie, P.J. 2001. Type IV secretion: intercellular transfer of macromolecules by systems ancestrally related to conjugation machines. Mol Microbiol 40:294– 305. Frank, A.C., C.M. Alsmark, M. Thollesson, and S.G. Andersson. 2005. Functional divergence and horizontal transfer of type IV secretion systems. Mol Biol Evol 22:1325–1336. Schmid, M.C., R. Schulein, M. Dehio, G. Denecker, I. Carena, and C. Dehio.

299

300

13 The Genomes of Pathogenic Bartonella Species 2004. The VirB type IV secretion system of Bartonella henselae mediates invasion, proinflammatory activation and antiapoptotic protection of endothelial cells. Mol Microbiol 52:81–92. 74 Schulein, R., P. Guye, T.A. Rhomberg, M.C. Schmid, G. Schroder, A.C. Vergunst, I. Carena, and C. Dehio. 2005. A bipartite signal mediates the transfer of type IV secretion substrates of Bartonella henselae into human cells. Proc Natl Acad Sci U S A 102:856–861.

75 Seubert, A., R. Hiestand, F. de la Cruz,

and C. Dehio. 2003. A bacterial conjugation machinery recruited for pathogenesis. Mol Microbiol 49:1253–1266. 76 Page, R.D., P.L. Lee, S.A. Becher, R. Griffiths, and D.H. Clayton. 1998. A different tempo of mitochondrial DNA evolution in birds and their parasitic lice. Mol Phylogenet Evol 9:276– 293.

301

14 Pathogenomics of Gastric and Enterohepatic Helicobacter Species Sebastian Suerbaum, Sandra Schwarz, and Christine Josenhans

14.1 Introduction

The genus Helicobacter currently comprises 23 validly named species. These can be subdivided into the gastric Helicobacter species, which colonize the mucosa of the stomach of their host organisms, and the enterohepatic Helicobacter species, which colonize the gut and/or the liver. In addition to the fact that the type species of the genus Helicobacter, H. pylori, is one of the most prevalent human pathogens and the only recognized bacterial carcinogen [1], there are multiple other reasons why this genus is of interest in the context of pathogenomics. These bacteria are well adapted to their hosts, causing chronic or life-long infection or colonization. Most species have a very narrow host spectrum, and lack an environmental reservoir. The genus Helicobacter thus offers the possibility to use comparative genomics to address questions relating to host adaptation, tissue tropism, or coevolution with diverse hosts, to name but a few. H. pylori was the sixth bacterial species to have its complete genome sequence published [2], and in 1999 became the first bacterial species for which two complete sequences from unrelated strains were publicly available [3]. However, to date there is only one other member of the genus Helicobacter, H. hepaticus, whose genome sequence is known [4], so that still only limited conclusions from these comparisons are possible. Several genome projects are in progress. These include additional strains of H. pylori, either associated with specific diseases (e.g., MALT lymphoma) or belonging to specific populations of H. pylori, such as the hpAfrica2 population [5]. They also include other Helicobacter species, such as the ferret stomach pathogen, H. mustelae. While the basis for genomic comparisons is currently limited to three genomic sequences from two Helicobacter species, the comparisons with the closely related species Campylobacter jejuni [6] and Wolinella succinogenes [7] are likewise of interest. We will also include in this review the results of studies that have addressed the variability of Helicobacter genomes using microarray hybridizations, and studies concerned with the global variability of H. pylori and its relationship to disease.

302

14 Pathogenomics of Gastric and Enterohepatic Helicobacter Species

14.2 Helicobacter pylori

H. pylori is a widespread human pathogen that colonizes the gastric mucosa, where it persists unless eradicated by specific therapy. Infection is usually acquired in childhood and always leads to an infiltration of the mucosa with neutrophils and lymphocytes (chronic active gastritis). While this infection remains asymptomatic in approx. 85% of cases, it can lead to peptic ulceration of the duodenum or the stomach, to gastric cancer, or to gastric lymphoma of the mucosaassociated lymphoid tissue (for a general review on H. pylori, see Ref. [1]). Since its discovery in 1982 [8], this species has been studied extensively, but many features of the pathogenic lifestyle remain obscure, and the clinical outcome of infection still cannot be predicted from either bacterial or host genetic markers. The complete genomic sequences of two unrelated H. pylori isolates, 26695 and J99, were published by The Institute of Genomic Research (TIGR) and the pharmaceutical company AstraZeneca in 1997 and 1999, respectively [2, 3]. The genomes of the H. pylori strains 26695 and J99 are single circular chromosomes respectively 1.67 Mbp and 1.64 Mbp in size and with a G+C content of 39%. The two genome sequences have been analyzed in great detail with respect to the physiology and metabolic capabilities of H. pylori [9, 10]. The early availability of the complete genome sequence has had a tremendous impact on H. pylori research and is largely responsible for the rapid progress made in the analysis of H. pylori pathogenesis over the last 8 years. 14.2.1 Key Features of the H. pylori Genome Related to Pathogenesis 14.2.1.1 Colonization Factors: Urease and Motility

Urease is essential for the ability of H. pylori to colonize the gastric mucosa. More than 10 genes are required for the synthesis of active urease enzyme, the acquisition of sufficient amounts of nickel, the pH-dependent uptake of the urease substrate, urea, and the regulation of the transcription of all these genes [11]. An even larger number of genes (more than 50) are required for the assembly and operation of the polar bundle of flagella, which confer on H. pylori its exceptional ability to move in viscous environments [12]. Motility and chemotaxis are essential for colonization of the mucosa by H. pylori [13–16], and H. pylori uses the transmucus pH gradient to find its ecological niche in the deep layer of the mucus, close to the gastric epithelial surface [17, 18].

14.2.1.2 Phase Variation One of the most unusual features of the H. pylori genome is the large number of homopolymeric tracts and dinucleotide repeats which can change their length due to slipped strand mispairing. More than 45 H. pylori genes are predicted to be

14.2 Helicobacter pylori

phase-variable due to such hypermutatable repeats, and length variation has actually been observed for 30 of these [19]. Genes switched on or off by this mechanism encode proteins involved in flagellar motility [20], DNA restriction and modification, and lipopolysaccharide biosynthesis [21, 22], as well as many proteins of unknown function. Many pathogens that are phylogenetically related to H. pylori, such as Campylobacter jejuni and H. hepaticus, have since been shown to share this mechanism of gene switching [4, 6].

14.2.1.3 The H. pylori Outer Membrane Protein Family The H. pylori genomes contain a large family of 33 (26695) or 32 (J99) paralogous genes encoding putative outer membrane proteins [2, 3]. This group of proteins was initially termed the Hop family; a phylogenetic analysis based on the C-terminal sequences has led to subcategorization of the family into the subfamilies of Hop and Hor proteins [23]. While the functions of most of these proteins are still unknown, the family comprises several proteins that have been shown to be involved in the adherence of H. pylori to gastric epithelial cells, including the Lewis b blood group antigen binding adhesin BabA [24], the sialyl-Lewis x binding adhesin SabA [25], and the putative adhesins AlpA, AlpB [26], and HopZ [27], whose cellular ligands have not yet been identified and whose role in adhesion is thus less clear.

14.2.1.4 Intraspecies Variation of H. pylori Genomes The comparison of two complete H. pylori genome sequences showed that approx. 7% of genes were unique to each of the two strains [3]. Additional evidence for extensive genomic heterogeneity between H. pylori strains was provided by genomic comparisons with comprehensive DNA microarrays [28]. The comparison of 15 isolates (all of which were isolated in the USA) showed that 1281 genes were present in all strains (core genes), while 362 (22%) were absent from at least one isolate. It is likely that studies that include larger sets of isolates that represent the various H. pylori populations will lead to a more precise (and significantly lower) estimate of the number of core genes. Most of the variable genes are located in two regions of the chromosome that have been termed “plasticity regions” [3]. With the exception of the cag pathogenicity island (see next paragraph), the role of the other genes that are outside of the core gene pool is currently largely unknown, but some may well contribute to the adaptation of H. pylori to the individual human host, or to specific niches within the stomach. It is therefore of interest to study the variability of the gene content in sequential and multiple isolates from individual patients. Israel et al. [29] have analyzed a collection of 36 strains isolated 6 years later from the patient from whom J99 had originally been cultured in 1994. Because the genome of J99 has been sequenced [3], and the H. pylori infection in this patient was never eradicated by medical treatment, this provided a superb opportunity to detect changes with time. These recent isolates were compared with 12 isolates cultured in 1994, using microarray hybridizations

303

304

14 Pathogenomics of Gastric and Enterohepatic Helicobacter Species

and RAPD-PCR (random amplification of polymorphic DNA polymerase chain reaction). The data showed that each of the 36 recent isolates was unique, although all were closely related to J99. Many recent isolates lack genes that are present in J99. In some cases, the recent isolates contained additional genes, which have either been acquired recently, or had been deleted during the microevolution of J99. The functional relevance of these genomic changes is still entirely unclear. First indications that genomic changes may indeed be involved in host adaptation comes from a recent study, where Solnick et al. [30] studied serial H. pylori isolates from an experimentally infected rhesus macaque. In all the passaged strains, babA, the gene encoding the Lewis b blood group antigen binding adhesin [24], was nonfunctional, due either to mutations or to gene conversion from babB, a related gene that is present elsewhere on the genome.

14.2.1.5 The cag Pathogenicity Island The cag pathogenicity island (cag PAI) is an approx. 37-kbp section of the H. pylori chromosome flanked by 31-bp direct repeats [31, 32]. It is now well established that the presence of an intact cag PAI enables an H. pylori strain to interact with the gastric epithelial cells much more closely than a strain lacking the island, resulting in stronger inflammation and a higher risk that the infected person will develop clinical disease [33, 34]. One key activity of the cag PAI is the assembly of a machinery capable of delivering the 128-kDa protein CagA into epithelial cells via a type IV secretion system [35, 36]. Contact of epithelial cells with cag PAI-carrying strains also leads to the delivery of peptidoglycan fragments into the host cell cytosol. These fragments stimulate the intracellular pattern receptor Nod1, leading to activation of the nuclear transcription factor NF-jB and subsequently to interleukin 8 (IL-8) release [37] (for reviews of cag PAI-dependent signaling and of the interaction of H. pylori with the innate immune system, see Refs. [38, 39]). A typical cag PAI encodes 27 genes, although islands with up to 33 genes have been reported [40]. At least six of these code for proteins with homology to type IV secretion system components. A systematic mutagenesis study has shown that 17 of the cag island genes are essential for translocation of CagA, and 14 are essential for the ability of H. pylori to strongly induce IL-8 secretion [41]. Filamentous organelles that may represent the structural correlate of the H. pylori type IV secretion system have been observed on the surface of cag PAI-carrying strains (and were not observed in cag island deletion mutants), but the precise structure and role of these organelles have yet to be determined [42, 43]. Due to the 31-bp flanking repeats, the cag PAI is genetically unstable. It can be lost from the chromosome, either by RecA-dependent precise excision [44], or by replacement of the cag PAIcontaining region of the H. pylori chromosome with an “empty site allele” from a non-cag PAI-carrying H. pylori strain that simultaneously colonizes the stomach of a patient [45]. In addition, many strains harbor incomplete islands lacking a variable number of genes [28, 46]. The dynamics of cag PAI loss and possible acquisition in vivo are still poorly understood.

14.3 Helicobacter hepaticus

14.2.1.6 Nucleotide Sequence Variation in H. pylori Even more striking than the variation of genome content between strains of H. pylori is the degree of nucleotide sequence variation (allelic variation), in which H. pylori appears to surpass all other known bacterial pathogens [47]. Allelic diversity is so high that almost every clinical H. pylori isolate will differ from any other isolate even when only a few hundred base pairs from a random gene are sequenced [47, 48]. A relatively high mutation rate [49], and the frequent occurrence of interstrain recombination events leading to the uptake of small pieces (average fragment size 417 bp) of DNA into the chromosome of the recipient strain, contribute to this diversity and the rapid change of sequences even during chronic colonization of a single individual [50]. The mechanisms generating genetic variation in H. pylori and their relevance to host adaptation have been reviewed recently [51, 52]. Since H. pylori is transmitted vertically in families and its diversity is more than 50-fold higher than that of human DNA, H. pylori sequences could be shown to reflect ancient and more recent human migrations, such as the peopling of the Americas, the colonization of the Polynesian islands, or the slave trade between Africa and the Americas [5], and first evidence has recently been presented that H. pylori genotypes can actually be more informative about human migrations than human genetic markers [53].

14.3 Helicobacter hepaticus

H. hepaticus was discovered in 1992 as the etiological agent responsible for an outbreak of hepatitis and liver cancer in the control mice colonies at the United States National Cancer Institute [54]. Experimental infection with H. hepaticus leads to hepatitis and hepatic carcinoma in several strains of mice (A/J, BALB/c, Scid) [55], and causes inflammatory bowel disease and colon cancer in certain strains of immunodeficient mice, such as IL-2-, IL-10-, or Rag-knockout mice [56–58]. H. hepaticus is the prototype organism for a large group of Helicobacter species that colonize the gut and/or the hepatobiliary tract (for a very comprehensive review of enterohepatic Helicobacter species, their natural host organisms, and associated pathologies, see Ref. [59]). Many of these have been associated with both intestinal and hepatobiliary disease in various animal models, and compelling evidence has recently been presented that suggests a causative role of enterohepatic Helicobacter species in gallstone disease [60]. While H. hepaticus itself has only been isolated from rodents, related enterohepatic Helicobacter species have also been found in humans [61], and their role in disease is currently under investigation (for a review, see Ref. [62]). In 2003, the genome sequence of H. hepaticus was determined in a collaboration between our group, the biotech companies MWG Biotech and GeneData, and scientists at the Massachusetts Institute of Technology and the University of New South Wales, Australia [4]. With approx. 1.8 Mbp coding for 1875 proteins, the H. hepaticus genome is considerably larger than the H. pylori genome. While the

305

306

14 Pathogenomics of Gastric and Enterohepatic Helicobacter Species

genome shares many features with those of both H. pylori and Campylobacter jejuni, it lacks many of the genes that have been shown to play a role in the colonization of the gastric mucosa by H. pylori. Specifically, H. hepaticus has no orthologues of the cag PAI genes or of the vacuolating cytotoxin gene vacA. It also lacks close homologues of the H. pylori adhesins BabA, SabA, AlpA/AlpB and HopZ, although H. hepaticus has 11 genes that encode proteins with homology to the H. pylori Hop/Hor proteins. Like H. pylori, H. hepaticus produces urease and displays flagellar motility. It thus shares most genes required for urease formation, flagellar biosynthesis, and chemotaxis with H. pylori. However, a more detailed analysis of these systems displays differences that are likely to reflect different requirements of the different ecological niches that these two pathogens colonize. A good example of this is the nickel uptake system. H. pylori possesses a unique nickel uptake system, e.g., the nickel transporter NixA [63]. H. hepaticus lacks NixA and several other proteins involved in nickel uptake in H. pylori, but instead possesses a cluster of genes (nikABCDE) very similar to nickel uptake systems found in Enterobacteriaceae. This situation is an interesting example of modular genome evolution where core modules (e.g., synthesis of urease subunits and assembly of the holoenzyme) are combined with a habitat-specific module ideally suited to acquiring nickel from the respective environment. Both H. pylori and H. hepaticus possess a pH-gated inner membrane urea channel that can alternate between open and closed conformations depending on the extracellular proton concentration [64, 65]. 14.3.1 The HHGI1 Genomic Island

H. hepaticus lacks orthologues of the H. pylori cag PAI. However, we identified a large chromosomal region with a conspicuously lower G+C content than the rest of the genome, indicating acquisition by horizontal gene transfer. The 70-kbp region also showed other features suggestive of a PAI, such as a P4-like phage integrase, a very high density of genes lacking homologues in other organisms, and three genes encoding proteins with homology to components of type IV secretion systems, such as the T pilus system of Agrobacterium tumefaciens (VirB10, VirB4, VirD4). Several observations suggest an involvement of the HHGI1 island in the virulence of H. hepaticus. For example, genome comparisons of clinical isolates of H. hepaticus with DNA microarrays have shown that many strains lack the island, or parts of it [4]. These isolates have never been associated with hepatitis or cancer, suggesting a lower pathogenic potential of island-deficient strains. Experiments with isogenic mutants lacking parts of the island are now in progress to confirm the role of the island, and in particular the putative type IV secretion system in H. hepaticus-induced pathology.

14.4 Genome Comparisons of Gastric and Enterohepatic Helicobacter Species with Related Bacteria

307

14.3.2 Other Putative H. hepaticus Virulence Factors

Currently, very little is known about H. hepaticus virulence factors. Urease is thought to be essential for its colonization (S. Ragnum and D. B. Schauer, unpublished data). H. hepaticus produces cytolethal distending toxin (CDT), a multifunctional toxin found in a number of pathogens, such as Campylobacter jejuni, some Escherichia coli strains, Actinobacillus actinomycetemcomitans, and Haemophilus ducreyi. H. hepaticus mutants deficient in CDT synthesis have been generated and induced less inflammation than did the corresponding wildtype strain in a mouse model of inflammatory bowel disease [66]. Further putative virulence factors include a homologue of the Campylobacter jejuni adhesin PEB-1 [4].

14.4 Genome Comparisons of Gastric and Enterohepatic Helicobacter Species with Related Bacteria

The two bacterial species most closely related to the Helicobacter species whose genome sequences are known are the human diarrheal pathogen Campylobacter jejuni [6] and a commensal anaerobic bacterium isolated from the rumen of cattle, Wolinella succinogenes [7]. The key features of all five genomes available for e-proteobacteria are listed in Table 14.1. Some differences between the two Helicobacter Table 14.1 General features of Helicobacter genomes and of related bacteria.

H. hepaticus ATCC 51449

H. pylori 26695

H. pylori J99

C. jejuni NCTC11168

W. succinogenes DSMZ 1740

Total size (bp)

1 799 146

1 667 867

1 643 831

1 641 481

2 110 355

GC content (%)

35.9

38.8

39

30.5

48.5

Coding sequences

1 875

1 590

1 495

1 654

2 046

Average gene length (bp)

1 082

954

998

948

964

Coding density (%)

93.04

91,0

90.8

94.3

94.0

Proteins with assigned functions

1 022

895

874

1 242

1 260

Ribosomal RNA

1 16S – 23S – 5S

2 16S – 23S – 3 5S

2 16S – 23S – 5S

3 16S – 23S – 5S

3 23S – 5S 3 16S

tRNA

37

36

36

44

40

Phase-variable genes

17 observed, 33 predicted

14 observed, 9 observed, 23 observed, 12 predicted 16 predicted 9 predicted

* Homopolymeric tracts rare, abundant dinucleotide repeats [67].

Phase variable genes present*

308

14 Pathogenomics of Gastric and Enterohepatic Helicobacter Species

species, and between Helicobacter and Campylobacter species, have been described above. A detailed comparison is beyond the scope of this chapter, and the reader is referred to two recently published three-part comparisons (H. hepaticus – H. pylori – C. jejuni, Ref. [4]; W. succinogenes – H. pylori – C. jejuni, Ref. [7]) and a four species comparison (Ref. [67]).

14.5 Outlook

The early availability of a complete genome sequence has contributed enormously to the rapid progress made in various areas of H. pylori research, and it is without any doubt due to this fact that H. pylori already – less than 25 years after its discovery – is one of the best studied bacterial pathogens. Because its genome is small, H. pylori has been widely used as a pioneering model system for novel genomic/ postgenomic technologies, such as intraspecies whole-genome comparison [3], high throughput protein interaction mapping [68], comparative proteome analysis [69], and reconstruction of complex bacterial regulatory systems [70, 71] or genome comparisons [28] using whole-genome DNA microarrays. The analysis of the H. hepaticus genome has made a first enterohepatic Helicobacter species amenable to systematic postgenomic analysis of its pathogenic potential in hepatobiliary and enteric disease. Additional genomes that are currently being sequenced include H. mustelae, a gastric pathogen of ferrets (sequenced at the Sanger Centre), and several H. pylori strains associated with different clinical diseases or from specific H. pylori strain populations [5]. As more genome sequences from the genus Helicobacter become available, it should become possible to reconstruct the evolution of these host-adapted bacteria, to generate solid hypotheses regarding the genomic basis of host and tissue tropism, and – perhaps most importantly – to fully elucidate the mechanisms of the carcinogenic effect of H. pylori, H. hepaticus, and possibly other Helicobacter species.

Acknowledgments

Work on H. pylori and H. hepaticus pathogenomics in the authors’ laboratory is funded by grants from the German Research Foundation (SFB479, SFB621, Priority program SPP1047) and by the PathoGenoMik network of the German Ministry of Education and Research (BMBF).

References

References 1 Suerbaum, S., and P. Michetti. 2002.

Helicobacter pylori infection. N. Engl. J. Med. 347:1175–1186. 2 Tomb, J.-F., O. White, A. R. Kerlavage, R. A. Clayton, G. G. Sutton, R. D. Fleischmann, K. A. Ketchum, H. P. Klenk, S. Gill, B. A. Dougherty, K. Nelson, J. Quackenbush, L. Zhou, E. F. Kirkness, S. Peterson, B. Loftus, D. Richardson, R. Dodson, H. G. Khalak, A. Glodek, K. McKenney, L. M. Fitzegerald, N. Lee, M. D. Adams, E. K. Hickey, D. E. Berg, J. D. Gocayne, T. R. Utterback, J. D. Peterson, J. M. Kelley, M. D. Cotton, J. M. Weidman, C. Fujii, C. Bowman, L. Watthey, E. Wallin, W. S. Hayes, M. Borodovsky, P. D. Karp, H. O. Smith, C. M. Fraser, and J. C. Venter. 1997. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388:539–547. 3 Alm, R. A., Ling L.-S.L., Moir D.T., B. L. King, E. D. Brown, P. C. Doig, D. R. Smith, Noonan B., B. C. Guild, B. L. deJonge, G. Carmel, P. J. Tummino, A. Caruso, Uria-Nickelsen M., D. M. Mills, C. Ives, R. Gibson, D. Merberg, S. D. Mills, Q. Jiang, D. E. Taylor, G. F. Vovis, and Trust.T.J. 1999. Genomicsequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature 397:176–180. 4 Suerbaum, S., C. Josenhans, T. Sterzenbach, B. Drescher, P. Brandt, M. Bell, M. Droege, B. Fartmann, H.-P. Fischer, Z. Ge, A. Hrster, R. Holland, K. Klein, J. Knig, L. Macko, G. L. Mendz, G. Nyakatura, D. B. Schauer, Z. Shen, J. Weber, M. Frosch, and J. G. Fox. 2003. The complete genome sequence of the carcinogenic bacterium Helicobacter hepaticus. Proc. Natl. Acad. Sci. U. S. A. 100:7901–7906. 5 Falush, D., T. Wirth, B. Linz, J. K. Pritchard, M. Stephens, M. Kidd, M. J. Blaser, D. Y. Graham, S. Vacher, G. I. Perez-Perez, Y. Yamaoka, F. Megraud, K. Otto, U. Reichard, E. Katzowitsch, X. Wang, M. Achtman, and S. Suerbaum. 2003. Traces of human migra-

tions in Helicobacter pylori populations. Science 299:1582–1585. 6 Parkhill, J., B. W. Wren, K. Mungall, J. M. Ketley, C. Churcher, D. Basham, T. Chillingworth, R. M. Davies, T. Feltwell, S. Holroyd, K. Jagels, A. V. Karlyshev, S. Moule, M. J. Pallen, C. W. Penn, M. A. Quail, M. A. Rajandream, K. M. Rutherford, A. H. van Vliet, S. Whitehead, and B. G. Barrell. 2000. The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences. Nature 403:665– 668. 7 Baar, C., M. Eppinger, G. Raddatz, J. Simon, C. Lanz, O. Klimmek, R. Nandakumar, R. Gross, A. Rosinus, H. Keller, P. Jagtap, B. Linke, F. Meyer, H. Lederer, and S. C. Schuster. 2003. Complete genome sequence and analysis of Wolinella succinogenes. Proc. Natl. Acad. Sci. U. S. A. 100:11690–11695. 8 Warren, J. R., and B. Marshall. 1983. Unidentified curved bacilli on gastric epithelium in active chronic gastritis. Lancet 1:1273–1275. 9 Marais, A., G. L. Mendz, S. L. Hazell, and F. Megraud. 1999. Metabolism and genetics of Helicobacter pylori: the genome era. Microbiol. Mol. Biol. Rev. 63:642–674. 10 Doig, P., B. L. de Jonge, R. A. Alm, E. D. Brown, M. Uria-Nickelsen, B. Noonan, S. D. Mills, P. Tummino, G. Carmel, B. C. Guild, D. T. Moir, G. F. Vovis, and T. J. Trust. 1999. Helicobacter pylori physiology predicted from genomic comparison of two strains. Microbiol. Mol. Biol. Rev. 63:675–707. 11 Stingl, K., and H. de Reuse. 2005. Staying alive overdosed: How does Helicobacter pylori control urease activity? Int. J. Med. Microbiol. 1:1. 12 Josenhans,C. and S.Suerbaum. 2001. Motility and chemotaxis. In: Helicobacter pylori: Molecular and Cellular Biology. M.Achtman and S.Suerbaum, editors. Horizon Scientific Press, Wymondham, 171–184. 13 Eaton, K. A., S. Suerbaum, C. Josenhans, and S. Krakowka. 1996. Coloniza-

309

310

14 Pathogenomics of Gastric and Enterohepatic Helicobacter Species tion of gnotobiotic piglets by Helicobacter pylori deficient in two flagellin genes. Infect. Immun. 64:2445–2448. 14 Josenhans, C., and S. Suerbaum. 2002. The role of motility as a virulence factor in bacteria. Int. J. Med. Microbiol. 291:605–614. 15 McGee, D. J., M. L. Langford, E. L. Watson, J. E. Carter, Y. T. Chen, and K. M. Ottemann. 2005. Colonization and inflammation deficiencies in Mongolian gerbils infected by Helicobacter pylori chemotaxis mutants. Infect. Immun. 73:1820–1827. 16 Terry, K., S. M. Williams, L. Connolly, and K. M. Ottemann. 2005. Chemotaxis plays multiple roles during Helicobacter pylori animal infection. Infect. Immun. 73:803–811. 17 Schreiber, S., M. Stben, C. Josenhans, P. Scheid, and S. Suerbaum. 1999. In vivo distribution of Helicobacter felis in the gastric mucus of the mouse: Experimental method and results. Infect. Immun. 67:5151–5156. 18 Schreiber, S., M. Konradt, C. Groll, P. Scheid, G. Hanauer, H. O. Werling, C. Josenhans, and S. Suerbaum. 2004. The spatial orientation of Helicobacter pylori in the gastric mucus. Proc. Natl. Acad. Sci. U. S. A. 101:5024–5029. 19 Salaun, L., B. Linz, S. Suerbaum, and N. J. Saunders. 2004. The diversity within an expanded and redefined repertoire of phase-variable genes in Helicobacter pylori. Microbiology 150:817–830. 20 Josenhans, C., K. A. Eaton, T. Thevenot, and S. Suerbaum. 2000. Switching of flagellar motility in Helicobacter pylori by reversible length variation of a short homopolymeric sequence repeat in fliP, a gene encoding a basal body protein. Infect. Immun. 68:4598–4603. 21 Appelmelk, B. J., S. L. Martin, M. A. Monteiro, C. A. Clayton, A. A. McColm, P. Zheng, T. Verboom, J. J. Maaskant, D. H. van den Eijnden, C. H. Hokke, M. B. Perry, C. M. VandenbrouckeGrauls, and J. G. Kusters. 1999. Phase variation in Helicobacter pylori lipopolysaccharide due to changes in the lengths of poly(C) tracts in alpha3-fucosyltransferase genes. Infect. Immun. 67:5361–5366.

22 Wang, G., D. A. Rasko, R. Sherburne,

and D. E. Taylor. 1999. Molecular genetic basis for the variable expression of Lewis Y antigen in Helicobacter pylori: analysis of the alpha (1,2) fucosyltransferase gene. Mol. Microbiol. 31:1265– 1274. 23 Alm, R. A., J. Bina, B. M. Andrews, P. Doig, R. E. Hancock, and T. J. Trust. 2000. Comparative genomics of Helicobacter pylori: analysis of the outer membrane protein families. Infect. Immun. 68:4155–4168. 24 Ilver, D., A. Arnqvist, J. Ogren, I. M. Frick, D. Kersulyte, E. T. Incecik, D. E. Berg, A. Covacci, L. Engstrand, and T. Boren. 1998. Helicobacter pylori adhesin binding fucosylated histo-blood group antigens revealed by retagging. Science 279:373–377. 25 Mahdavi, J., B. Sonden, M. Hurtig, F. O. Olfat, L. Forsberg, N. Roche, J. Angstrom, T. Larsson, S. Teneberg, K. A. Karlsson, S. Altraja, T. Wadstrom, D. Kersulyte, D. E. Berg, A. Dubois, C. Petersson, K. E. Magnusson, T. Norberg, F. Lindh, B. B. Lundskog, A. Arnqvist, L. Hammarstrom, and T. Boren. 2002. Helicobacter pylori SabA adhesin in persistent infection and chronic inflammation. Science 297:573–578. 26 Odenbreit, S., M. Till, D. Hofreuter, G. Faller, and R. Haas. 1999. Genetic and functional characterization of the alpAB gene locus essential for adhesion of Helicobacter pylori to human gastric tissue. Mol. Microbiol. 31:1537–1548. 27 Peck, B., M. Ortkamp, K. D. Diehl, E. Hundt, and B. Knapp. 1999. Conservation, localization and expression of HopZ, a protein involved in adhesion of Helicobacter pylori. Nucleic Acids Res. 27:3325–3333. 28 Salama, N., K. Guillemin, T. K. McDaniel, G. Sherlock, L. Tompkins, and S. Falkow. 2000. A whole-genome microarray reveals genetic diversity among Helicobacter pylori strains. Proc. Natl. Acad. Sci. U. S. A. 97:14668– 14673. 29 Israel, D. A., N. Salama, U. Krishna, U. M. Rieger, J. C. Atherton, S. Falkow, and R. M. J. Peek. 2001. Helicobacter pylori genetic diversity within the gastric

References niche of a single human host. Proc. Natl. Acad. Sci. U. S. A. 98:14625– 14630. 30 Solnick, J. V., L. M. Hansen, N. R. Salama, J. K. Boonjakuakul, and M. Syvanen. 2004. Modification of Helicobacter pylori outer membrane protein expression during experimental infection of rhesus macaques. Proc. Natl. Acad. Sci. U. S. A. 101:2106–2111. 31 Censini, S., C. Lange, Z. Xiang, J. E. Crabtree, P. Ghiara, M. Borodovsky, R. Rappuoli, and A. Covacci. 1996. cag, a pathogenicity island of Helicobacter pylori, encodes type I-specific and disease-associated virulence factors. Proc. Natl. Acad. Sci. U. S. A. 93:14648– 14653. 32 Akopyants, N. S., S. W. Clifton, D. Kersulyte, J. E. Crabtree, B. E. Youree, C. A. Reece, N. O. Bukanov, E. S. Drazek, B. A. Roe, and D. E. Berg. 1998. Analyses of the cag pathogenicity island of Helicobacter pylori. Mol. Microbiol. 28:37–53. 33 Crabtree, J. E., J. D. Taylor, J. I. Wyatt, R. V. Heatley, T. M. Shallcross, D. S. Tompkins, and B. J. Rathbone. 1991. Mucosal IgA recognition of Helicobacter pylori 120 kDa protein, peptic ulceration, and gastric pathology. Lancet 338:332–335. 34 Crabtree, J. E., S. M. Farmery, I. J. Lindley, N. Figura, P. Peichl, and D. S. Tompkins. 1994. CagA/cytotoxic strains of Helicobacter pylori and interleukin-8 in gastric epithelial cell lines. J. Clin. Pathol. 47:945–950. 35 Odenbreit, S., J. Pls, B. Sedlmaier, E. Gerland, W. Fischer, and R. Haas. 2000. Translocation of Helicobacter pylori CagA into gastric epithelial cells by type IV secretion. Science 287:1497–1500. 36 Segal, E. D., J. Cha, J. Lo, S. Falkow, and L. S. Tompkins. 1999. Altered states: involvement of phosphorylated CagA in the induction of host cellular growth changes by Helicobacter pylori. Proc. Natl. Acad. Sci. U. S. A 96:14559–14564. 37 Viala, J., C. Chaput, I. G. Boneca, A. Cardona, S. E. Girardin, A. P. Moran, R. Athman, S. Memet, M. R. Huerre, A. J. Coyle, P. S. DiStefano, P. J. Sansonetti, A. Labigne, J. Bertin, D. J. Phil-

pott, and R. L. Ferrero. 2004. Nod1 responds to peptidoglycan delivered by the Helicobacter pylori cag pathogenicity island. Nat. Immunol. 5:1166–1174. 38 Naumann, M. 2005. Pathogenicity island-dependent effects of Helicobacter pylori on intracellular signal transduction in epithelial cells. Int. J. Med. Microbiol. 1:1. 39 Lee, S. K., and C. Josenhans. 2005. Helicobacter pylori and the innate immune system. Int. J. Med. Microbiol. 1:1. 40 Azuma, T., A. Yamakawa, S. Yamazaki, M. Ohtani, Y. Ito, A. Muramatsu, H. Suto, Y. Yamazaki, Y. Keida, H. Higashi, and M. Hatakeyama. 2004. Distinct diversity of the cag pathogenicity island among Helicobacter pylori strains in Japan. J. Clin. Microbiol. 42:2508–2517. 41 Fischer, W., J. Puls, R. Buhrdorf, B. Gebert, S. Odenbreit, and R. Haas. 2001. Systematic mutagenesis of the Helicobacter pylori cag pathogenicity island: essential genes for CagA translocation in host cells and induction of interleukin-8. Mol.Microbiol. 42:1337– 1348. 42 Rohde, M., J. Puls, R. Buhrdorf, W. Fischer, and R. Haas. 2003. A novel sheathed surface organelle of the Helicobacter pylori cag type IV secretion system. Mol. Microbiol. 49:219–234. 43 Tanaka, J., T. Suzuki, H. Mimuro, and C. Sasakawa. 2003. Structural definition on the surface of Helicobacter pylori type IV secretion apparatus. Cell Microbiol. 5:395–404. 44 Bjorkholm, B., A. Lundin, A. Sillen, K. Guillemin, N. Salama, C. Rubio, J. I. Gordon, P. Falk, and L. Engstrand. 2001. Comparison of genetic divergence and fitness between two subclones of Helicobacter pylori. Infect. Immun. 69:7832–7838. 45 Kersulyte, D., H. Chalkauskas, and D. E. Berg. 1999. Emergence of recombinant strains of Helicobacter pylori during human infection. Mol. Microbiol. 31:31–41. 46 Jenks, P. J., F. Megraud, and A. Labigne. 1998. Clinical outcome after infection with Helicobacter pylori does not appear to be reliably predicted by the presence

311

312

14 Pathogenomics of Gastric and Enterohepatic Helicobacter Species

47

48

49

50

51

52

53

54

of any of the genes of the cag pathogenicity island. Gut 43:752–758. Suerbaum, S., J. Maynard Smith, K. Bapumia, G. Morelli, N. H. Smith, E. Kunstmann, I. Dyrek, and M. Achtman. 1998. Free recombination within Helicobacter pylori. Proc. Natl. Acad. Sci. U. S. A. 95:12619–12624. Kansau, I., J. Raymond, E. Bingen, P. Courcoux, N. Kalach, M. Bergeret, N. Braimi, C. Dupont, and A. Labigne. 1996. Genotyping of Helicobacter pylori isolates by sequencing of PCR products and comparison with the RAPD technique. Res. Microbiol. 147:661–669. Bjorkholm, B., M. Sjolund, P. G. Falk, O. G. Berg, L. Engstrand, and D. I. Andersson. 2001. Mutation frequency and biological cost of antibiotic resistance in Helicobacter pylori. Proc. Natl. Acad. Sci. U. S. A. 98:14607–14612. Falush, D., C. Kraft, P. Correa, N. S. Taylor, J. G. Fox, M. Achtman, and S. Suerbaum. 2001. Recombination and mutation during long-term gastric colonization by Helicobacter pylori: estimates of clock rates, recombination size and minimal age. Proc. Natl. Acad. Sci. U. S. A. 98:15056–15061. Suerbaum, S., and M. Achtman. 2004. Helicobacter pylori: recombination, population structure and human migrations. Int. J. Med. Microbiol. 294:133–139. Kraft, C., and S. Suerbaum. 2005. Mutation and recombination in Helicobacter pylori: mechanisms and role in generating strain diversity. Int. J. Med. Microbiol. 1:11. Wirth, T., X. Wang, B. Linz, R. P. Novick, J. K. Lum, M. Blaser, G. Morelli, D. Falush, and M. Achtman. 2004. Distinguishing human ethnic groups by means of sequences from Helicobacter pylori: lessons from Ladakh. Proc. Natl. Acad. Sci. U. S. A. 101:4746–4751. Ward, J. M., J. G. Fox, M. R. Anver, D. C. Haines, C. V. George, M. J. Collins, Jr., P. L. Gorelick, K. Nagashima, M. A. Gonda, R. V. Gilden, J. G. Tully, R. J. Russell, R. E. Benveniste, B. J. Paster, F. E. Dewhirst, J. C. Donovan, L. M. Anderson, and J. M. Rice. 1994. Chronic active hepatitis and associated liver tumors in mice caused by a persistent

55

56

57

58

59

60

61

bacterial infection with a novel Helicobacter species. J. Natl. Cancer Inst. 86:1222–1227. Fox, J. G., X. Li, L. Yan, R. J. Cahill, R. Hurley, R. Lewis, and J. C. Murphy. 1996. Chronic proliferative hepatitis in A/JCr mice associated with persistent Helicobacter hepaticus infection: a model of helicobacter-induced carcinogenesis. Infect. Immun. 64:1548–1558. Kullberg, M. C., J. M. Ward, P. L. Gorelick, P. Caspar, S. Hieny, A. Cheever, D. Jankovic, and A. Sher. 1998. Helicobacter hepaticus triggers colitis in specific-pathogen-free interleukin-10 (IL-10)deficient mice through an IL-12- and gamma interferon-dependent mechanism. Infect. Immun. 66:5157–5166. Erdman, S. E., T. Poutahidis, M. Tomczak, A. B. Rogers, K. Cormier, B. Plank, B. H. Horwitz, and J. G. Fox. 2003. CD4+ CD25+ regulatory T lymphocytes inhibit microbially induced colon cancer in Rag2-deficient mice. Am. J. Pathol. 162:691–702. Cahill, R. J., C. J. Foltz, J. G. Fox, C. A. Dangler, F. Powrie, and D. B. Schauer. 1997. Inflammatory bowel disease: an immunity-mediated condition triggered by bacterial infection with Helicobacter hepaticus. Infect. Immun. 65:3126– 3131. Solnick, J. V., and D. B. Schauer. 2001. Emergence of diverse Helicobacter species in the pathogenesis of gastric and enterohepatic diseases. Clin. Microbiol. Rev. 14:59–97. Maurer, K. J., M. M. Ihrig, A. B. Rogers, V. Ng, G. Bouchard, M. R. Leonard, M. C. Carey, and J. G. Fox. 2005. Identification of cholelithogenic enterohepatic Helicobacter species and their role in murine cholesterol gallstone formation. Gastroenterology 128:1023–1033. Fox, J. G., F. E. Dewhirst, Z. Shen, Y. Feng, N. S. Taylor, B. J. Paster, R. L. Ericson, C. N. Lau, P. Correa, J. C. Araya, and I. Roa. 1998. Hepatic Helicobacter species identified in bile and gallbladder tissue from Chileans with chronic cholecystitis. Gastroenterology 114:755–763.

References 62 Whary, M. T., and J. G. Fox. 2004. Nat-

63

64

65

66

67

ural and experimental Helicobacter infections. Comp. Med. 54:128–158. Mobley, H. L. T., R. M. Garner, and P. Bauerfeind. 1995. Helicobacter pylori nickel-transport gene nixA: synthesis of catalytically active urease in E. coli independent of growth conditions. Mol. Microbiol. 16:97–109. Weeks, D. L., S. Eskandari, D. R. Scott, and G. Sachs. 2000. A H+-gated urea channel: the link between Helicobacter pylori urease and gastric colonization. Science 287:482–485. Weeks, D. L., G. Gushansky, D. R. Scott, and G. Sachs. 2004. Mechanism of proton gating of a urea channel. J. Biol. Chem. 279:9944–9950. Young, V. B., K. A. Knox, J. S. Pratt, J. S. Cortez, L. S. Mansfield, A. B. Rogers, J. G. Fox, and D. B. Schauer. 2004. In vitro and in vivo characterization of Helicobacter hepaticus cytolethal distending toxin mutants. Infect. Immun. 72:2521–2527. Eppinger, M., C. Baar, G. Raddatz, D. H. Huson, and S. C. Schuster. 2004. Comparative analysis of four Campylobacterales. Nat. Rev. Microbiol. 2:872–885.

68 Rain, J. C., L. Selig, H. de Reuse, V. Bat-

taglia, C. Reverdy, S. Simon, G. Lenzen, F. Petel, J. Wojcik, V. Schachter, Y. Chemama, A. Labigne, and P. Legrain. 2001. The protein–protein interaction map of Helicobacter pylori. Nature 409:211–215. 69 Jungblut, P. R., D. Bumann, G. Haas, U. Zimny-Arndt, P. Holland, S. Lamer, F. Siejak, A. Aebischer, and T. F. Meyer. 2000. Comparative proteome analysis of Helicobacter pylori. Mol. Microbiol. 36:710–725. 70 Niehus, E., H. Gressmann, F. Ye, R. Schlapbach, M. Dehio, C. Dehio, A. Stack, T. F. Meyer, S. Suerbaum, and C. Josenhans. 2004. Genome-wide analysis of transcriptional hierarchy and feedback regulation in the flagellar system of Helicobacter pylori. Mol. Microbiol. 52:947–961. 71 Thompson, L. J., D. S. Merrell, B. A. Neilan, H. Mitchell, A. Lee, and S. Falkow. 2003. Gene expression profiling of Helicobacter pylori reveals a growth-phase-dependent switch in virulence gene expression. Infect. Immun. 71:2643–2655.

313

315

15 Genomics of the Opportunistic Pathogen Legionella pneumophila Christel Cazalet and Carmen Buchrieser

15.1 The Genus Legionella: Epidemiology, Life Cycle, and Pathogenesis

The genus Legionella comprises 42 species with 65 serogroups (Benson and Fields 1998), the majority of which are relatively slow-growing, harmless, ubiquitous, aquatic saprophytes. Legionellae are gram-negative, non-spore-forming bacilli belonging to the c-subgroup of proteobacteria. Water is the major reservoir worldwide (Fliermans et al. 1981), and in addition to natural environments, high concentrations of Legionella can be detected in man-made hot-water systems (Fields et al. 2002). Legionella parasitizes within free-living protozoa (Fields 1996; Kwaik et al. 1998), the primary and perhaps sole means of proliferation in the environment (Fields 1993). A minority of Legionella species are human pathogens, and foremost amongst these is Legionella pneumophila, the causative agent of Legionnaires’ disease, an often fatal pneumonia (mortality rate 5–30%) if not promptly and correctly diagnosed. L. pneumophila was first recognized as a pathogen during a large outbreak of this disease at a convention of the American Legion in Philadelphia (Fraser et al. 1977). Since then many outbreaks and sporadic cases of legionellosis have been identified worldwide. Anyone can acquire legionellosis, but those most susceptible to the infection are elderly people, those who smoke, and immunocompromised persons with underlying conditions such as cancer, kidney failure, diabetes, organ transplant, or HIV infection. L. pneumophila enters the human body through the breathing in of contaminated aerosols. It can then reach the alveolar parts of the lungs, where it is engulfed by macrophages. Once in the macrophage, it remains in the phagosome, inhibiting phagolysosomal fusion and acidification of the phagosome. L. pneumophila establishes phagosomes which are completely isolated from the endosomal pathway but are surrounded by endoplasmic reticulum. Within this protected vacuole, L. pneumophila converts to a replicative form that is acid-tolerant but no longer expresses factors that block membrane fusion. As a consequence, the pathogen vacuoles merge with lysosomes, which provide a nutrient-rich replication niche. Once the amino acid supply is depleted, progeny accumulate the second messenger guanosine 3¢,5¢-bispyrophosphate (ppGpp),

316

15 Genomics of the Opportunistic Pathogen Legionella pneumophila

which coordinates entry into the stationary phase with expression of traits that promote transmission to a new host cell (for a review see Swanson and Hammer 2000). L. pneumophila is responsible for more than 98% of cases of legionellosis, and among the 15 L. pneumophila serogroups, serogroup 1 is implicated in about 95% of the cases (Doleans et al. 2004; EWGLI; CDC 2004). In contrast to this, certain species such as L. anisa, L. micdadei, L. dumofii, or L. feeleii are rarely implicated in human disease even though they relatively frequently colonize water distribution systems (Doleans et al. 2004; Muder and Yu 2002; Yu et al. 2002). Exceptions are Australia and New Zealand, where L. pneumophila serogroup 1 accounts for 45.7% of community-acquired legionellosis, and L. longbeachae, which is frequently isolated from potting soil (Steele et al. 1990) accounts for 30.4% of cases (Yu, 2002). Legionellosis has emerged in the second half of the twentieth century partly due to human alterations of the environment. Thermally altered aquatic environments can shift the balance between protozoa and bacteria, resulting in rapid multiplication of Legionellae to infectious concentrations (for a review see Atlas, 1999). The broad protozoal host spectrum in the environment and the exploitation of basic cellular mechanisms of the eukaryotic host allow Legionella species to infect human cells. Intracellular multiplication of Legionella organisms in amoebae has been shown to contribute to disease. As protozoa are essential for the growth of Legionella (Fields et al. 2002), its presence in the aquatic environment also appears to depend on the spectrum of host protozoa present. Whether the adaptation to a specific host or the multiplication capacity in the different protozoal hosts correlates with the virulence and epidemiologic prevalence of L. pneumophila would be an interesting question to answer. Despite recent advances towards understanding the virulence of this pathogen, little has been known on the genomic level about L. pneumophila or other Legionella species. However, genomics has the potential to provide a complete understanding of the genetics, biochemistry, physiology, and pathogenesis of a microorganism. Furthermore, comparative genomics and related technologies are helping to unravel the molecular basis of the pathogenesis, evolution, and phenotypic differences among different Legionella species. The completion, analysis, and publication of three different L. pneumophila genomes (Cazalet et al. 2004; Chien et al. 2004) now provide the basis for the application of powerful new approaches to the understanding of the biology of this organism.

15.2 Genomics of Legionella pneumophila

A major step forward in Legionella research was the determination of the genome sequences of three clinical serogroup 1 isolates of L. pneumophila subsp. pneumophila. The three strains are L. pneumophila strains Paris and strain Lens (Cazalet et al. 2004), and strain Philadelphia 1 (Chien et al. 2004). Strain Philadelphia 1 is

15.2 Genomics of Legionella pneumophila

derived from the original 1976 isolate (Fraser et al. 1977) obtained from a lung specimen from a patient who had died from Legionnaires’ disease at the American Legionnaires’ Convention in 1976. It is widely used in many laboratories for the study of the L. pneumophila life cycle and its virulence mechanisms. Most of our knowledge of the biology of L. pneumophila has been deduced from studies conducted with this strain and its derivatives. In contrast, strain Paris is a very abundant strain in France and Europe. It accounts for 12.7% of cases of legionellosis in France and 33% of those that occur in the Paris area (Aurell et al. 2003). It is associated with hospital- and community-acquired disease that occurs as outbreaks or sporadic cases and is the only recognized endemic L. pneumophila strain indicating particular adaptations to the environment or the host. Strain Lens is an epidemic strain, which from November 2003 to January 2004 caused the largest outbreak that had occurred in France, with 86 cases resulting in 17 deaths (Miquel et al. 2004), suggesting that it is particularly successful in causing disease. Furthermore, strain Lens was isolated in January 2004 and sequenced in February 2004, and thus has the advantage of being freshly isolated: no genomic rearrangements, mutations, or adaptations acquired in the laboratory should have occurred in this strain. The genome sequences of L. pneumophila strains Paris and Lens were obtained by the whole-genome shotgun strategy as described earlier (Frangeul et al. 1999). For both genomes, different libraries were used and a scaffold was obtained by end-sequencing clones from a BAC library. For the genome sequencing of strain Philadelphia 1 a combined strategy of whole-genome shotgun sequencing and sequencing of BAC clones, which covered approximately 90% of the Legionella genome, was employed. The complete genome sequences of L. pneumophila strain Paris, Lens, and Philadelphia 1 comprise 3 503 610 bp, 3 345 687 bp, and 3 397 754 bp, respectively, with an average G+C content of 38% (Table 15.1). The genomes contain approx. 3000 genes distributed fairly evenly between the two strands (approx. 57% on the leading strand) and accounting for about 88% of the potential coding capacity. Genes were classified into 31 (strains Paris and Lens) and 27 broad functional classes (strain Philadelphia 1). No function could be predicted for 42.1% of the L. pneumophila Paris, 44.1% of the L. pneumophila Lens and for 28% of the L. pneumophila Philadelphia 1 genes, a proportion similar to that found in most other sequenced bacterial genomes. A high proportion of the predicted genes (21% for strain Paris, 20.4% for strain Lens, 17% for strain Philadelphia 1) are unique to the genus Legionella.

317

318

15 Genomics of the Opportunistic Pathogen Legionella pneumophila Table 15.1 General features of the three Legionella pneumophila genomes.

Size of the chromosome (bp)

L. pneumophila Paris

L. pneumophila Lens

L. pneumophila Philadelphia 1

3 503 610

3 345 687

3 397 754

[a]

[a]

G+C content

38.3% (37.4%)

38.4% (38.4%)

38%

Total number of protein-coding genes

3076 (141)[a]

2931 (57)[a]

2953

Average length of protein-coding genes

331[b]

333[b]

338[b]

Number of rRNA operons (16S–23S–5S)

3

3

3

Number of tRNA genes

43

Percentage coding

43 [a]

87.9% (92%)

[a]

43 [a]

88% (83.7%)

[a]

89.8

Plasmid

1 (131.9 kbp)

1 (59.8 kbp)

–

Number of strain specific genes

313 (125)[a]

259 (44)[a]

335

Number of orthologous genes

2500

2500

2500

a Plasmid-related. b Codons.

15.3 Specific Features of the Legionella Genomes

The information gleaned from the genome sequence provided new and valuable insight into the biology of L. pneumophila and highlighted the adaptation of this pathogen to intracellular life. One of the most striking features is the exceptionally large number of eukaryotic-like proteins and proteins encoding domains preferentially found in eukaryotic proteins, suggesting that L. pneumophila has the ability to exploit and modulate host cell functions in various ways. In order to act in the eukaryotic cell, these proteins need to be secreted. This could be achieved by various secretion systems, a particularly high number and wide variety of which are present in L. pneumophila – possibly another characteristic of this species. Another important feature identified through genome comparisons of the three sequenced strains is the marked plasticity and high genetic diversity, which may enhance versatility of L. pneumophila. 15.3.1 Eukaryotic-like Proteins in Legionella pneumophila: Modulation of Host Functions?

A major finding during the L. pneumophila genome analysis of strains Paris and Lens was the identification of a large number and a wide variety of proteins identified for the first time in a prokaryote, showing the strongest similarity to eukaryotic proteins or encoding motifs known to be implicated in protein–protein interac-

15.3 Specific Features of the Legionella Genomes

tions, which are present only or primarily in eukaryotes. The presence of these proteins suggests that L. pneumophila has an important capacity to subvert host functions and to modulate them to its advantage. The identified proteins are predicted to be involved in all different stages of the Legionella life cycle, namely invasion, trafficking into the host cell, modulation of host cell functions, and evasion from the host cell. The 31 proteins identified in strain Paris that exhibit the highest similarity to eukaryotic proteins are listed in Table 15.2. These proteins might fulfil important functions in L. pneumophila as they are conserved in all three sequenced genomes except two that are missing in strain Philadelphia 1 and three that are lacking in strain Lens (Table 15.2). More heterogeneity was found among the 30 proteins containing eukaryotic protein–protein interaction domains, in particular among the ankyrin repeat-containing proteins. There are 20 in strain Paris, 18 in strain Lens, and 16 in strain Philadelphia 1. Fourteen ankyrin proteins are shared by all three isolates (Table 15.3). Large families of ankyrin repeat proteins have so far been identified only in Coxiella burnetii (Seshadri et al. 2003) and Wolbachia pipitentis (Wu et al. 2004), two other intracellular bacteria. Ankyrin proteins are multifunctional proteins involved in many cellular pathways, so predicting the function of the Legionella ankyrin proteins is difficult. However, the fact that this protein family is that prominent in all three strains and in other intracellular pathogens suggests its importance for the intracellular life of Legionella. Ankyrin proteins may be involved in the interaction with the host cytoskeleton (Batrukova et al. 2000) or could be involved in targeting proteins to the plasma membrane or to the endoplasmic reticulum (Hryniewicz-Jankowska et al. 2002). They also may be components of transcriptional regulators and influence host gene expression (Caturegli et al. 2000). The presence of proteins carrying F-box or U-box domains in the Legionella genomes is an intriguing finding, as it suggests that L. pneumophila is able to interfere with the ubiquitin machinery of eukaryotic cells. Two F-box proteins, one of which also contains an ankyrin domain, are present in all three sequenced strains. Strains Paris and Philadelphia 1 contain in addition each a third strain-specific Fbox protein (Table 15.3). Strain Philadelphia 1 and Paris encode the only U-box protein identified in a prokaryotic genome to date. Interestingly, this protein is missing in strain Lens. Ubiquitination is a protein modification generally used by cells to tag proteins that are destined for proteasomal degradation. Ubiquitin is a highly conserved 76-amino-acid polypeptide that can be covalently attached to a lysine residue of the target protein by an E3-ubiquitin ligase, which is primarily responsible for providing substrate specificity to ubiquitin conjugation. Several classes of ubiquitin ligases have been described based on the presence of specific domains: HECT (homologous to E6-AP C terminus) domain, RING (really interesting new gene) finger, or the related PHD (pleckstrin homology domain) finger or U-box and the F-box domains (Hershko and Ciechanover 1998; Heyninck and Beyaert 2005). Thus the L. pneumophila U-box and F-box proteins might modulate the eukaryotic ubiquitination machinery. F-box proteins assembled into SCF ubiquitin ligase complexes determine which substrate will be targeted for ubiquitina-

319

Predicted product

purC

exoA exodeoxyribonuclease III

RNA binding protein precursor

Pyruvate decarboxylase

Thiamine biosynthesis protein NMT-1

nuoE NADH dehydrogenase I chain E

Hypersensitive induced response protein

Hypothetical protein

DegP protease

Phytanoyl-coA dioxygenase

Sphingosine-1-phosphate lyase

Glucoamylase

Cytokinin oxidase

Phytanoyl coA dioxygenase

Hypothetical protein

Ectonucleoside triphosphate diphosphohydrolase (apyrase)

L. pneumophila Paris

lpp1647

lpp0702

lpp0321

lpp1157

lpp1522

lpp2832

plpp0050

lpp0634

lpp0965

lpp2748

lpp2128

lpp0489

lpp0955

lpp0578

lpp0379

lpp1033

lpl1000

lpl0354

lpl0554

lpl0925

lpl0465

lpl2102

lpl2621

lpl0935

lpl0618

–

lpl2701

lpl1461

lpl1162

–

lpl0684

lpl1640

L. pneumophila Lens

lpg0971

lpg0301

lpg0515

lpg0894

lpg0422

lpg2176

lpg2694

lpg0903

lpg0584

–

lpg2785

lpg1565

lpg1155

lpg0251

lpg0648

lpg1675

L. pneumophila Philadelphia-1

40%

39%

36%

39%

39%

41%

36%

39%

39%

36%

38%

38%

39%

34%

39%

38%

G-C

Nucleoside phosphatase signature conserved (Q9MYU4 Sus scrofa)

(CAD21525.1 Taenia solium)

(EAA70100.1 Gibberella zeae)

(NP_484368.1 Nostoc sp.)

(P42042 Arxula adeninivorans)

(NP_775139.1 Rattus norvegicus)

(XP_372144.1 Homo sapiens)

(NP_189431.2 Arabidopsis thaliana)

(XP_306643.1 Anopheles gambiae)

(AAN17462.1 Hordeum vulgare subsp. vulgare)

(BAA25988.1 Homo sapiens)

(AAC64375.1 Botryotinia fuckeliana)

(AAB16855.1 Arabidopsis thaliana)

(AAL07519 Solanum tubeosum)

(EAA20230.1 Plasmodium yoelii yoelii)

(AAR06292.1 Nicotiana tabacum)

Best hit BLASTp

Table 15.2 Proteins with the highest similarity score to eukaryotic proteins and their distribution in the three sequenced strains.

320

15 Genomics of the Opportunistic Pathogen Legionella pneumophila

SAM dependent methyltransferase

Ectonucleoside triphosphate diphosphohydro- lpl1869 lase (apyrase)

SAM dependent methyltransferase

Cytochrome P450

Nuclear membrane binding protein

Uracyl DNA glycosylase

Chromosome condensation 1-like

Hypothetical protein

lpp2134

lpp1880

lpp2747

lpp2468

lpp1824

lpp1665

lpp1959

lpp0358

Uridine kinase

Serine threonine protein kinase

Serine threonine protein kinase

lpp1167

lpp2626

lpp1439

lpl1545

lpl2481

lpl1173

lpl1131

lpl334

lpl1953

lpl1659

–

lpl2326

lpl2620

lpl2109

lpl2927

lpp indicates predicted coding sequences (CDSs) of L. pneumophila strain Paris; lpl indicates predicted CDSs of L. pneumophila strain Lens; lpg indicates predicted CDSs of L. pneumophila strain Philadelphia; lines colored in gray indicate proteins which are also mentioned in Table 15.3 because of their conserved eukaryotic domains; protein acession numbers are given in parentheses.

Ca -transporting ATPase

lpp1127

2+

Zinc metalloproteinase

lpp3071

lpl2777

6-pyruvoyl-tetrahydropterin synthase

lpp2923

L. pneumophila Lens

Predicted product

L. pneumophila Paris

lpg1483

lpg2556

lpg1165

lpg1126

lpg0282

lpg1976

lpg1700

–

lpg2403

lpg2693

lpg1905

lpg2182

lpg2999

lpg2865

L. pneumophila Philadelphia-1

36%

32%

33%

37%

38%

41%

36%

34%

39%

35%

39%

35%

38%

34%

G-C

Conserved domain

Conserved domain

(AAM09314.2 Dictyostelium discoideum)

(AAB81284.1 Paramecium tetraurelia)

(EAA20288.1 Plasmodium yoelii yoelii)

Pattern of chromosome condensation regulator conserved

(EAA36774.1 Giardia lamblia)

(NP_082559.1 Mus musculus)

(NP_487786.1 Nostoc sp.)

(EAA20288.1 Plasmodium yoelii yoelii)

(CAE70887.1 Caenorhabditis briggsae)

(BAC98835.1 Bombyx mori)

(AAF56122.1 Drosophila melanogaster)

(NP_703938.1 Plasmodium falciparum)

Best hit BLASTp

15.3 Specific Features of the Legionella Genomes 321

L. pneumophila Lens enhC (lpl2564) lidL (lpl1180) – EnhC paralogue lpl1307 – EnhC paralogue lpl1303 – EnhC paralogue lpl1059 – EnhC paralogue ralF lpl1919 lpl0262 lpl2481 lpl1545 lp2055 lpl0038 – lpl2048 lpl0732 lpl2051 lpl2242 lpl0479 –

L. pneumophila Paris

enhC (lpp2692)

lidL(lpp1174) – EnhC paralogue

lpp131 – EnhC paralogue

lpp2174 – EnhC paralogue

–

ralF (lpp1932)

lpp0267

lpp2626

lpp1439

lpp2065

lpp0037

plpp0098

lpp2058

lpp0750

lpp2061

lpp2270

lpp0503

lpp1905

–

lpg0436

lpg2322

–

lpg0695

–

–

lpg0038

lpg2214

lpg1483

lpg2556

lpg0208

ralF (lpp1950)

lpl1062 – EnhC paralogue

lpg2222 – EnhC paralogue

lpg1356 – EnhC paralogue

lidL (lpg1172) – EnhC paralogue

enhC (lpg2639)

L. pneumophila Philadelphia

35

38

34

39

35

38

37

38

37

36

32

38

34

45

40

41

38

39

G+C content (%)

Tab. 15.3 L. pneumophila proteins identified in the three strains ecoding domains preferentially found within eukaryotic proteins.

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ser/Thr protein kinase domain

Ser/Thr protein kinase domain

Ser/Thr protein kinase domain

sec7 domain

7 sel-1 domains

3 sel-1 domains

4 sel-1 domains

6 sel-1 domains

21 sel-1 domains

Motif identified

322

15 Genomics of the Opportunistic Pathogen Legionella pneumophila

L. pneumophila Lens lpl1682 lpl2219 – lpl0445 lpl2370 – lpl0111 – lpl2375 lpl0523 lpl1681 lpl2344 lpl2058 – – lpl2072 – – lpl0234 –

L. pneumophila Paris

lpp1683

lpp2248

lpp0202

lpp0469

lpp2517

lpp1100

lpp0126

lpp0356

lpp2522

lpp0547

–

–

–

–

–

lpp2082

lpp2486

–

lpp0233

lpp2887

lpg2830

lpg0171

lpg2224

–

lpg2144

lpg2131

lpg0402

lpg2128

–

–

lpg0483

lpg2456

–

lpg0112

–

lpg2452

lpg0403

–

lpg2300

lpg1718

L. pneumophila Philadelphia

35

39

43

34

36

39

38

40

35

34

40

39

38

39

48

36

38

38

39

33

G+C content (%)

2 U-box domains

F-box domain

F-box domain

F-box domain + coiled-coil

F-box domain + ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat

Ankyrin repeat + SET domain

Motif identified

15.3 Specific Features of the Legionella Genomes 323

324

15 Genomics of the Opportunistic Pathogen Legionella pneumophila

tion and subsequent proteolysis by the proteasome. Known targeted proteins degraded by the ubiquitination system of eukaryotic cells are cell cycle regulators, transcription factors, membrane proteins, and signal transduction components (Craig and Tyers 1999; Hershko and Ciechanover 1998). Furthermore, recently Perrin and colleagues (Perrin et al. 2004) reported that the ubiquitination system has a role in the recognition of bacterial pathogens in the cytosol of mammalian cells. In this scenario, cytosolic bacteria are tagged, by an unknown mechanism (i.e., either directly or indirectly), with polyubiquitin chains. In macrophages, ubiquitin-tagged bacteria are recognized and degraded, with the proteasome having an indirect role. The intracellular bacterium S. typhimurium avoids this mechanism by staying in a membrane-surrounded compartment away from the ubiquitination machinery (Cossart and Sansonetti 2004), a mechanism which also holds good for Legionella. However, in addition Legionella seems to be able to modulate the eukaryotic ubiquitination machinery. Further evidence for the adaptation of this pathogen during coevolution with eukaryotes may be the identification of proteins having homologues only in eukaryotes. All three L. pneumophila genomes contain a sphingosine-1-phosphate lyase and two secreted apyrases, proteins never predicted in a prokaryotic proteome before (Table 15.2). These enzymes could be implicated in influencing the autophagy pathway, an immediate macrophage response to L. pneumophila (Amer and Swanson 2005). L. pneumophila initially avoids but subsequently tolerates its delivery to lysosomes by a process that resembles autophagy (Swanson and FernandezMoreira 2002). In eukaryotes sphingosine-1-phosphate lyase is thought to be a dual modulator of sphingosine-1-phosphate and ceramide metabolism. The level and the balance of both signaling molecules are important for inducing autophagy or apoptosis (Reiss et al. 2004). Apyrases have been isolated in autophagy vacuoles (Biederbick et al. 1999) and may influence the L. pneumophila phagosome by decreasing the concentration of NTPs and NDPs. In order to act in the eukaryotic cell these proteins should be secreted. This could be achieved by many different secretion systems present in L. pneumophila. 15.3.2 Secretion Machineries of L. pneumophila: Central to Its Life and to Pathogenesis

The importance of protein secretion for L. pneumophila is highlighted by the presence of a wide variety of secretion systems. The two major ones known to be involved in virulence are the icm/dot type IV secretion system and the lsp type II secretion system. However, L. pneumophila possesses several additional ones, such as a second type IV system named lvh, a type I secretion system called lss, a twin arginine translocation pathway, and several tra-like systems. In addition to these conserved secretory pathways, strain Paris encodes a type V secretion system specific to this strain.

15.3 Specific Features of the Legionella Genomes

15.3.2.1 Type IV Secretion Systems in Legionella Many bacterial pathogens encode type IV secretion systems. However L. pneumophila and Coxiella burnetii are the only ones known to encode a type IVB secretion system similar to the Tra/Trb system of IncI plasmids. In addition L. pneumophila also encodes type IVA systems similar to the Agrobacterium tumefaciens Vir system, like other bacterial pathogens. The type IVB secretion system in Legionella is the icm (intracellular multiplication; Marra et al. 1992) and/or dot (defective organelle trafficking; Berger and Isberg 1993) system. It is required for intracellular growth in human macrophages as well as in amoebae and for intracellular trafficking. The type IVA secretion systems are the lvh (Legionella vir homologue) system and several tra systems, which are homologous to Tra proteins of the Escherichia coli F plasmid.

15.3.2.1.1 The dot/icm Type IVB Secretion System The dot/icm type IV secretion system of L. pneumophila is probably its most important secretion system for infection as it is involved in many different stages of the intracellular life cycle. This macromolecular complex is encoded by 25 genes located on two genomic regions: region I contains seven genes (icmV, W, X, dotA, B, C, D) and region II is composed of 18 genes (icmT, S, R, Q, P, O, N, M, L, K, E, G, C, D, J, B, F, H). This system is conserved and present in the same chromosomal location in strains Paris, Lens, and Philadelphia 1. Recently, comparison of this region in 18 L. pneumophila and 17 other Legionella species by sequence and hybridization analysis showed that they are present in all strains sequenced and that their organization is conserved (Morozova et al. 2004). The nucleotide sequence conservation of the dot/icm genes is high (96–98%) among the different L. pneumophila strains, but among different Legionella species the nucleotide sequence similarity is less pronounced (62–79%) (Morozova et al. 2004). In general, locus II genes are more conserved than locus I genes. The biological implication of these sequence variations is not yet known. In the icm/dot system of L. longbeachae and L. micdadei the global organization is also conserved but, interestingly, in region II in L. micdadei the gene icmR is replaced by two genes, migA and migB, which do not show any homology to icmR. The same gene is replaced in L. longbeachae by a gene called ligB (Feldman and Segal 2004). Given the central role of the dot/icm system in Legionella pathogenesis, many recent studies have aimed at identifying and characterizing its substrates. The first characterized effector was RalF, required for localization of the host protein ARF-1, a key regulator of vesicle trafficking from the endoplasmic reticulum to the phagosomes (Nagai et al. 2002). RalF is conserved in all three sequenced strains, like LidA, another substrate involved in recruitment of vesicles during vacuole biogenesis and in maintaining integrity of the dot/icm complex (Conover et al. 2003), and LepA and LepB, which are involved in egress of Legionella from the vacuole during amoeba infection (Chen et al. 2004). More recently, a number of candidate effector proteins named SidA–G were identified in the Philadelphia 1 strain by a two-hybrid screen with IcmG/DotF as bait followed by a screen of pro-

325

326

15 Genomics of the Opportunistic Pathogen Legionella pneumophila

teins transferred interbacterially with a Cre/loxP-based protein translocation assay (Luo and Isberg 2004). SidA, SidB, SidC, SidE, and SidF are conserved in strains Paris and Lens; in contrast, SidG is missing in strain Lens, SidD is missing in strains Paris and Lens, and SidH is interrupted by two insertion sequence (IS) elements in strain Paris and missing in strain Lens. All Sid proteins except SidD contain a coiled-coil domain, a protein motif known to be involved in protein–protein interactions. Until recently the translocation signal for the type IV secretion effectors was not known. Nagai and colleagues, who investigated the mechanism of translocation of RalF (Nagai et al. 2005), identified a 20-amino-acid C-terminal region of the RalF protein as necessary and sufficient for translocation. In particular, a hydrophobic residue at the C-terminal –3 position is critical for secretion of RalF, as a substitution to hydrophilic residues resulted in a severe defect in translocation. Comparison with other Dot/Icm substrates identified in most of them a hydrophobic residue or a proline residue at the –3 or –4 position, supporting the idea that these residues are critical for secretion by the type IV system.

15.3.2.1.2 The lvh Type IVA Secretion System The second Legionella type IV secretion system, lvh (Legionella VirB homologues), is dispensable for intracellular growth in both macrophages and amoebae (Segal et al. 1999), but is implicated in host cell infection by L. pneumophila at 30 C (Ridenour et al. 2003), and it can also mediate conjugational DNA transfer (Segal et al. 1999). This secretion system is constituted of 11 genes, which are located on a plasmid-like element that can exist either integrated in the chromosome or excised as a multicopy plasmid in strains Paris and Philadelphia 1, but seems to be stably integrated in strain Lens. However, a 28-bp perfect repeat is present in the flanking regions in strain Lens, suggesting that it is also able to excise. This element is probably of plasmid origin as it has a higher G+C content than the rest of the chromosome and it carries some plasmid-related genes. The elucidation of the biological implications of these different forms is another challenging question to answer.

15.3.2.2 A Putative Type I Secretion System in Legionella L. pneumophila contains a putative type I secretion system encoded by the lssXYZABD locus, which might, as in other bacterial pathogens, mediate transport of toxins, degradative enzymes, or other virulence factors across both bacterial membranes. It seems to be specific to L. pneumophila strains (Jacobi and Heuner 2003). The lssXYZABD locus is highly conserved between the three sequenced strains as well as in L. pneumophila strain Corby (94–98% DNA identity; 95–99% protein identity). The downstream lssE gene, encoding a putative signaling protein, is more variable (75–96% DNA identity; 70–94% protein identity). In most type–I secretion systems target protein-encoding genes are located upstream of the secretion machinery itself, but no putative substrate was identified in L. pneumophila to

15.3 Specific Features of the Legionella Genomes

date. However, a candidate substrate could be rtxA, which encodes an Rtx toxin involved in entry and replication in protozoa and human macrophages (Cirillo et al. 2001, 2002). Interestingly the rtxA gene is highly variable in the three sequenced genomes and not adjacent to the lssXYZABD locus.

15.3.2.3 A Type II Secretion System in Legionella To date L. pneumophila is the only intracellular pathogen known to encode a type II secretion system. Type II machineries are usually associated with organisms that colonize surfaces but do not invade cells. In addition to the prepilin peptidase PilD, which processes pseudopilin (LspG, LspH, LspI, LspJ, LspK) assembly into a functional type II secretion apparatus, the outer membrane secretin LspD, the ATPase LspE, and the inner membrane protein LspF are involved in L. pneumophila type II protein secretion. The secretion system is highly conserved in the three sequenced genomes (protein identities of 96–100%) and seems also to be conserved throughout the genus Legionella (Rossier and Cianciotto 2001). The lsp secretion system promotes the ability of L. pneumophila to infect both protozoan and macrophage hosts, to grow in the mammalian lung (Rossier and Cianciotto 2001), and to grow at low temperatures (Soderberg et al. 2004). It is involved in secretion of various enzymes, like the phospholipases A and C, zinc metalloprotease Msp, several lipases, RNase, pNPPC hydrolase, tartrate-resistant and tartrate-sensitive acid phosphatase, and lysophospholipase A (Aragon et al. 2001; Flieger et al. 2002; Hales and Shuman 1999; Rossier and Cianciotto 2001).

15.3.2.4 Secretion Across the Cytoplasmic Membrane All three L. pneumophila strains possess two systems for protein transport across the cytoplasmic membrane. The sec apparatus, which allows protein translocation in an unfolded state, is complete with the exception of secG. In addition, L. pneumophila encodes a twin arginine translocation (TAT) pathway for translocation of folded proteins. In contrast to the usual organization of tatA, tatB, and tatC in an operon, only the tatA and tatB genes are cotranscribed, and tatC is in a different location on the L. pneumophila chromosome. This organization is conserved in all three genomes, and similarity of the Tat proteins is between 97% and 100%. In strains Paris, Lens, and Philadelphia 1, the tatAB operon and the tatC gene are located 24 kbp, 41 kbp, and 27 kbp apart, respectively. However, the flanking regions are, despite the different distances between the tatAB and tatC genes, homologous in the three strains. The Tat secretion system is implicated in growth under low-iron conditions, and in growth within macrophages in an Lsp-independent manner (Rossier and Cianciotto 2005). Tat substrates are characterized by two arginine residues in their signal peptide. Based on this criterion, 20 proteins were identified as putative Tat substrates from analysis of genome sequence, including 12 transport/binding proteins and lipoproteins and three proteins involved in respiration (De Buck et al. 2004). Secretion

327

328

15 Genomics of the Opportunistic Pathogen Legionella pneumophila

of a subunit of cytochrome c oxidase and of phospholipase C via this pathway has been confirmed (Rossier and Cianciotto 2005).

15.3.2.5 A Putative Type V Secretion System (Autotransporter) Specific to Strain Paris L. pneumophila strain Paris encodes a specific protein showing the typical structure of type V secretion systems. It contains three major domains: an N-terminal leader peptide for secretion across the inner membrane, a C-terminal domain that forms a pore in the outer membrane and processes the passenger domain to the cell surface (Desvaux et al. 2004). The passenger domain exposed to the cell surface usually confers on the protein a function in cell–cell aggregation. Consistent with this, the passenger domain of L. pneumophila is composed of hemagglutinin repeats similar to those of the Escherichia coli autotransporters AIDA-I and Ag43 (Benz and Schmidt 1992; Klemm et al. 2004). The L. pneumophila type V secretion system may thus mediate adherence to mammalian cells and/or autoaggregation during biofilm formation. However, the RGD (Arg-Gly-Asp) interaction motif present in AIDA-I and Ag43 lacks the L. pneumophila autotransporter. The presence of many remnants of insertion sequences upstream of this autotransporter, as well as its G+C content of 41% – which is higher than the genome average G+C content of 38% – suggest acquisition through horizontal gene transfer (Fig. 15.1).

Fig. 15.1 Chromosomal region containing the L. pneumophila Paris-specific autotransporter and comparison with strains Lens and Philadelphia 1. The black curve indicates variations in G+C content along this region in strain Paris. Homologous genes in the three strains are indicated by the same color code.

15.3 Specific Features of the Legionella Genomes

15.3.3 Comparative Genomics: Diversity of the Species L. pneumophila

The comparison of the L. pneumophila genomes allows us to detect both conserved and strain-specific characteristics, to identify genomic rearrangements, and to further our knowledge of the evolutionary mechanisms shaping this organism. Whole-sequence alignment of the genomes of strains Paris, Lens, and Philadelphia 1 shows that the three genomes have the same overall organization (Fig. 15.2). The global synteny is disrupted only once by a 260-kbp inversion in strain Lens with respect to the other two genomes. However, closer comparison reveals many smaller regions that differ in their chromosomal location among the three strains, which are specific to one or the other strain or differ in part in gene content. Thus, another important feature identified through genome comparisons of the three sequenced strains is marked plasticity and high genetic diversity, which may enhance the versatility of L. pneumophila.

15.3.3.1 Genomic (Pathogenicity) Islands in the L. pneumophila Genomes Several genomic (pathogenicity) island-like regions have been identified in the three genomes that seem to exhibit a new type of genomic island with a probably Legionella-specific organization. These islands are typically integrated in tRNA or tmRNA genes, are larger than 30 kbp, and contain mobile elements and pathogenicity-related genes as previously defined (Hacker et al. 1997). However, comparison of these islands among the three sequenced strains revealed that in Legio-

Fig. 15.2 Alignment of the complete genome sequences of L. pneumophila strains Paris, Lens, and Philadelphia 1, using the ACT (Artemis comparison tool) software (http:// www.sanger.ac.uk/Software/ACT/). Color code: Red blocks represent homologous sequences; blue blocks, homologous

sequences, but inverted. 1, Disruption of the synteny by a 260-kbp inversion in strain Lens; 2, cluster encoding several efflux pumps missing in strain Lens; 3, lvh type IV secretion system; 4, 65-kbp pathogenicity island of strain Philadelphia 1. (This figure also appears with the color plates.)

329

330

15 Genomics of the Opportunistic Pathogen Legionella pneumophila

nella these islands are composed of a conserved region that is flanked in each isolate by a strain-specific region, suggesting that the same conserved regions might be transferred by different phages or mobilizable plasmids or by yet unknown mechanisms. One example is the lvh region, which is highly conserved in all three sequenced genomes but the flanking DNA regions are specific to each strain and different from each other (Fig. 15.3). The size of these flanking regions is also different in each strain, as the lvh region is flanked by 11 kbp and 22 kbp in strain Paris, by 5.8 kbp and 30 kbp in strain Lens, and by 14 kbp and 30 kbp in strain Philadelphia 1. Furthermore, this region is inserted in the same tmRNA in strains Lens and Paris but in an Arg tRNA in another chromosomal location in strain Philadelphia 1. The lvh region has in addition the particular capacity to exist in an integrated as well as an excised form as a multicopy plasmid, which varies in size in each strain. A similar element contributing to phase variation that can also exist in a plasmid as well as integrated form was previously identified in strain OLDA (Luneberg et al. 2001). The identification of the exact mechanism leading to the excision and integration of these elements will help us to understand the adaptation and versatility mechanisms of Legionella. Another example of the particular structure of these islands in L. pneumophila is a region containing a 40-kbp genomic island encoding several efflux pumps induced in contact with the host cell and apparently dedicated to detoxification and proper metal ion balance within bacteria (Rankin et al. 2002), which is absent in strain Lens. In the two other sequenced strains, this region is located between

Fig. 15.3 The integrative multicopy plasmid carrying the lvh type IV system and comparison of this region in the three strains. The scale on the right indicates the G+C content values in this region. The black curve indicates variations in G+C content along this

region in strain Lens. Red bars indicate homologous regions in the three genomes; light blue indicates genes predicted in this region. Arrows and numbers below indicate the insertion position on the chromosome of each strain.

15.3 Specific Features of the Legionella Genomes

specific regions forming a large island of 130 kbp in strain Paris that is flanked by a Met tRNA gene and codes a putative integrase, and one of 100 kbp in strain Philadelphia 1 that is inserted in a different chromosomal location. One region specific to strain Philadelphia 1 and absent from both the others is the 65 kbp pathogenicity island described previously (Brassinga et al. 2003), although the corresponding chromosomal location in strain Paris contains a 19-kbp duplication carrying a gene cluster homologous to prpA-lvrABC present on this island. Interestingly, despite the presence of many regions which seem to have been acquired by horizontal gene transfer via phages, as suggested by the presence of around 10 integrase genes in each L. pneumophila genome, no complete prophage seems to be present.

15.3.3.2 Plasmids and Genetic Diversity of L. pneumophila Genome comparison of strains Paris, Lens, and Philadelphia 1 identified a conserved backbone of 2500 genes and about 300 strain-specific genes representing 9–11% of the genome (Fig. 15.4). The strain-specific complement of each of these three strains contains several transcriptional regulators, some GGDEF/EAL regulators (Table 15.4), which represent a very prominent family of regulators in L. pneumophila, different ankyrin proteins (Table 15.3), and several restriction modification systems. Many of the strain-specific genes are of as yet unknown function. However, this specific gene complement may allow adaptations to different environments. In addition to the high genetic diversity present in the L. pneumophila chromosomes, strain-specific plasmids were also identified in strains

Fig. 15.4 Core genome and unique gene complement of L. pneumophila strains Paris, Lens, and Philadelphia. Orthologous genes were defined by reciprocal best-match FASTA comparisons. The threshold was set to a

minimum of 80% sequence identities and a ratio of the length of 0.75 to 1.33. * indicates that plasmid genes were not considered in the calculation.

331

332

15 Genomics of the Opportunistic Pathogen Legionella pneumophila Tab. 15.4 GGDEF/EAL regulator family and its distribution in the

three sequenced genomes. TMa helices

Domains

Paris

Gene name Lens Philadelphia

lpp0029

lpl0030

lpg0029

3 PAS, GGDEF, EAL

–

lpp0087

lpl0075

lpg0073

GGDEF, EAL

2

EAL, COG1639

–

lpp0127

—

—

lpp0219

lpl0219

lpg0155

PAS, GGDEF

–

lpp0220

lpl0220

lpg0156

EAL

1

lpp0299

lpl0283

lpg0230

CHASE3, GAF, GGDEF

1

lpp0351

—

—

EAL

–

lpp0352

lpl0329

lpg0277

Response-Regulator, PAS, GGDEF

–

lpp0440

lpl0416

lpg0373

GGDEF

5

lpp0809

lpl0780

lpg0744

GGDEF

–

lpp0891

lpl0860

lpg0829

GGDEF, EAL, HAMP

3

lpp0942

lpl0912

lpp0879

GGDEF

5

lpp0952

lpl0922

lpp0891

2 PAS, GGDEF, EAL

–

lpp1114

lpl1118

lpg1114

PAS, GGDEF, EAL

–

lpp1170

lpl1176

lpg1168

GGDEF, EAL

9

lpp1311

lpl1308

lpg1357

PAS, GGDEF, EAL

7

lpp1425

lpl1559

lpg1469

EAL

2

lpp1475

lpl1508

lpg1518

PAS, GGDEF, EAL

lpp2071

lpl2061

lpg2132

GGDEF, EAL

9

lpp2324

lpl1054

lpg1057

GGDEF, EAL

4

lpp2355

—

lpg1025

GGDEF

5

lpp2477

—

—

GGDEF

1

lpp2695

lpl2567b

lpg2642

2 PAS, GAF, GGDEF, EAL

–

lpp2708

lpl2581

lpg2655

GGDEF

6

a Transmembran helice

13

15.4 Conclusions

Paris and Lens but none in strain Philadelphia 1. Strain Paris encodes a 132-kbp plasmid and strain Lens a 60-kbp plasmid. These plasmids may play a role either in adaptation to the environment or in virulence, as both encode putative virulence factors, mobile genetic elements, and antibiotic resistance genes. In addition, both plasmids encode a paralogue of CsrA – a protein known to act as a repressor of transmission traits of L. pneumophila and as an activator of replication (Molofsky and Swanson 2003), suggesting that these plasmids may code proteins implicated in virulence. Furthermore, some of the genes present on the plasmid identified in strain Paris have previously been identified on a plasmid of the same size in an L. longbeachae isolate that was shown to be implicated in virulence of this species (Doyle and Heuzenroeder 2002). The role of specific plasmids in virulence of Legionella has not been clearly demonstrated, but a correlation between virulence in a mouse model and the presence of plasmids (Bezanson et al. 1994) and a higher persistence in the environment of strains containing plasmids (Brown et al. 1982) has been reported. Transfer of plasmids from a L. pneumophila donor to another L. pneumophila isolate or another Legionella species may be mediated by the dot/icm type IV secretion system (Vogel et al. 1998). The identification, characterization, and analysis of the functional role of Legionella plasmids might add to the understanding of versatility, adaptation, and virulence mechanisms of this pathogen.

15.4 Conclusions

Genomics has provided a wealth of new information about L. pneumophila and enriches our knowledge about this important organism. Comparative genomics has uncovered a large number of genome rearrangements such as inversions, insertions of large DNA fragments, but also deletions and insertions of smaller fragments, some of which show characteristics of pathogenicity (genomic) islands which exhibit a Legionella-specific mosaic structure. The high genomic plasticity and versatility is probably a mechanism of L. pneumophila to adapt to changing environments. Genome analysis has identified a large number of proteins predicted to modulate host cell functions and to allow L. pneumophila to subvert host functions to its advantage. This role might be achieved in part by the many eukaryotic-like proteins or eukaryotic protein domains identified for the first time in a prokaryotic genome. The particularly large number and wide variety of secretion systems present in the genomes should allow effective delivery of the substrates. Given the genetic diversity and genomic plasticity identified through genome comparison, it is clear that additional knowledge about the genetic basis of different L. pneumophila strains and of other Legionella species is needed. New, powerful genomics approaches based on the available sequences like microarray techniques for comparative genomics should now be applied to reveal further differences as well as for the functional analysis of gene expression. The next scientific challenge

333

334

15 Genomics of the Opportunistic Pathogen Legionella pneumophila

will be to use this information to elucidate the virulence and adaptation mechanisms of L. pneumophila in order to combat legionellosis and to allow effective elimination of this pathogen from water systems and aerosol producing systems.

Acknowledgments

We would like to thank the many of our colleagues who have contributed in different ways to this research, in particular P. Glaser for his constant support, his critical view of the results and many fruitful discussions; F. Kunst for his important help and support; and C. Rusniok for his help and never-ending patience with informatics problems or questions. This work received financial support from the Institut Pasteur, the Centre National de la Recherche, and from the Consortium National de Recherche en Gnomique du Rseau National des Genopoles, France. C. Cazalet is holder of a fellowship jointly financed by the Centre National de la Recherche, France and VeoliaWater – Anjou Recherche.

References Amer, A. O. and Swanson, M. S. (2005). Autophagy is an immediate macrophage response to Legionella pneumophila. Cell Microbiol doi:10.1111/ j.1462–5822.2005.00509.x. Aragon, V., Kurtz, S. and Cianciotto, N. P. (2001). Legionella pneumophila major acid phosphatase and its role in intracellular infection. Infect Immun 69, 177– 185. Atlas, R. M. (1999) Legionella: from environmental habitats to disease pathology, detection and control. Environ Microbiol 1:283–293. Aurell, H., Etienne, J., Forey, F., Reyrolle, M., Girardo, P., Farge, P., Decludt, B., Vandenesch, F. and Jarraud, S. (2003). Legionella pneumophila serogroup 1 strain Paris: endemic distribution throughout France. J Clin Microbiol 41, 3320–3322. Batrukova, M. A., Betin, V. L., Rubtsov, A. M. and Lopina, O. D. (2000). Ankyrin: structure, properties, and functions. Biochemistry (Mosc) 65, 395–408. Benson, R. F. and Fields, B. S. (1998). Classification of the genus Legionella. Semin Respir Infect 13, 90–99.

Benz, I. and Schmidt, M. A. (1992). AIDA-I, the adhesin involved in diffuse adherence of the diarrhoeagenic Escherichia coli strain 2787 (O126:H27), is synthesized via a precursor molecule. Mol Microbiol 6, 1539–1546. Berger, K. H. and Isberg, R. R. (1993). Two distinct defects in intracellular growth complemented by a single genetic locus in Legionella pneumophila. Mol Microbiol 7, 7–19. Bezanson, G., Fernandez, R., Haldane, D., Burbridge, S. and Marrie, T. (1994). Virulence of patient and water isolates of Legionella pneumophila in guinea pigs and mouse L929 cells varies with bacterial genotype. Can J Microbiol 40, 426– 431. Biederbick, A., Rose, S. and Elsasser, H. P. (1999). A human intracellular apyraselike protein, LALP70, localizes to lysosomal/autophagic vacuoles. J Cell Sci 112, 2473–2484. Brassinga, A. K., Hiltz, M. F., Sisson, G. R., Morash, M. G., Hill, N., Garduno, E., Edelstein, P. H., Garduno, R. A. and Hoffman, P. S. (2003). A 65-kilobase pathogenicity island is unique to Phila-

References delphia–1 strains of Legionella pneumophila. J Bacteriol 185, 4630–3637. Brown, A., Vickers, R. M., Elder, E. M., Lema, M. and Garrity, G. M. (1982). Plasmid and surface antigen markers of endemic and epidemic Legionella pneumophila strains. J Clin Microbiol 16, 230–235. Caturegli P, A. K., Walls JJ, Bakken JS, Madigan JE, Popov VL, Dumler JS. (2000). ankA: an Ehrlichia phagocytophila group gene encoding a cytoplasmic protein antigen with ankyrin repeats. Infect Immun 68, 5277–5283. Cazalet, C., Rusniok, C., Bruggemann, H., Zidane, N., Magnier, A., Ma, L., Tichit, M., Jarraud, S., Bouchier, C., Vandenesch, F. et al. (2004). Evidence in the Legionella pneumophila genome for exploitation of host cell functions and high genome plasticity. Nat Genet 36, 1165–1173. CDC (2004). Centers for Disease Control and Prevention. http://www.cdc.gov/ncidod/ dbmd/diseaseinfo/legionellosis_g.htm. (May 2005) Chen, J., de Felipe, K. S., Clarke, M., Lu, H., Anderson, O. R., Segal, G. and Shuman, H. A. (2004). Legionella effectors that promote nonlytic release from protozoa. Science 303, 1358–1361. Chien, M., Morozova, I., Shi, S., Sheng, H., Chen, J., Gomez, S. M., Asamani, G., Hill, K., Nuara, J., Feder, M. et al. (2004). The genomic sequence of the accidental pathogen Legionella pneumophila. Science 305, 1966–1968. Cirillo, S. L., Bermudez, L. E., El–Etr, S. H., Duhamel, G. E. and Cirillo, J. D. (2001). Legionella pneumophila entry gene rtxA is involved in virulence. Infect Immun 69, 508–517. Cirillo, S. L., Yan, L., Littman, M., Samrakandi, M. M. and Cirillo, J. D. (2002). Role of the Legionella pneumophila rtxA gene in amoebae. Microbiology 148, 1667–1677. Conover, G. M., Derre, I., Vogel, J. P. and Isberg, R. R. (2003). The Legionella pneumophila LidA protein: a translocated substrate of the Dot/Icm system associated with maintenance of bacterial integrity. Mol Microbiol 48, 305–321.

Cossart, P. and Sansonetti, P. (2004). Bacterial invasion: the paradigms of enteroinvasive pathogens. Science 304, 242–248. Craig, K. L. and Tyers, M. (1999). The F-box: a new motif for ubiquitin dependent proteolysis in cell cycle regulation and signal transduction. Prog Biophys Mol Biol 72, 299–328. De Buck, E., Lebeau, I., Maes, L., Geukens, N., Meyen, E., Van Mellaert, L., Anne, J. and Lammertyn, E. (2004). A putative twin-arginine translocation pathway in Legionella pneumophila. Biochem Biophys Res Commun 317, 654–661. Desvaux, M., Parham, N. J. and Henderson, I. R. (2004). The autotransporter secretion system. Res Microbiol 155, 53–60. Doleans, A., Aurell, H., Reyrolle, M., Lina, G., Freney, J., Vandenesch, F., Etienne, J. and Jarraud, S. (2004). Clinical and environmental distributions of Legionella strains in France are different. J Clin Microbiol 42, 458–460. Doyle, R. M. and Heuzenroeder, M. W. (2002). A mutation in an ompR-like gene on a Legionella longbeachae serogroup 1 plasmid attenuates virulence. Int J Med Microbiol 292, 227–239. EWGLI. The European Working Group for Legionella Infections. http://www.ewgli.org/ (May 2005) Feldman, M. and Segal, G. (2004). A specific genomic location within the icm/dot pathogenesis region of different Legionella species encodes functionally similar but nonhomologous virulence proteins. Infect Immun 72, 4503–4511. Fields, B. S. (1993). Legionella and protozoa: interaction of a pathogen and its natural host. American Society for Microbiology, Washington DC. Fields, B. S. (1996). The molecular ecology of legionellae. Trends Microbiol 4, 286–290. Fields, B. S., Benson, R. F. and Besser, R. E. (2002). Legionella and Legionnaires’ disease: 25 years of investigation. Clin Microbiol Rev 15, 506–526. Flieger, A., Neumeister, B. and Cianciotto, N. P. (2002). Characterization of the gene encoding the major secreted lysophospholipase A of Legionella pneumophila and its role in detoxification of lysophosphatidylcholine. Infect Immun 70, 6094–6106.

335

336

15 Genomics of the Opportunistic Pathogen Legionella pneumophila Fliermans, C. B., Cherry, W. B., Orrison, L. H., Smith, S. J., Tison, D. L. and Pope, D. H. (1981). Ecological distribution of Legionella pneumophila. Appl Environ Microbiol 41, 9–16. Frangeul, L., Nelson, K. E., Buchrieser, C., Danchin, A., Glaser, P. and Kunst, F. (1999). Cloning and assembly strategies in microbial genome projects. Microbiology 145, 2625–2634. Fraser, D. W., Tsai, T. R., Orenstein, W., Parkin, W. E., Beecham, H. J., Sharrar, R. G., Harris, J., Mallison, G. F., Martin, S. M., McDade, J. E. et al. (1977). Legionnaires’ disease: description of an epidemic of pneumonia. N Engl J Med 297, 1189–1197. Hacker, J., Blum-Oehler, G., Mhldorfer, I. and Tschpe, H. (1997). Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol Microbiol 23, 1089–1097. Hales, L. M. and Shuman, H. A. (1999). Legionella pneumophila contains a type II general secretion pathway required for growth in amoebae as well as for secretion of the Msp protease. Infect Immun 67, 3662–3666. Hershko, A. and Ciechanover, A. (1998). The ubiquitin system. Annu Rev Biochem 67, 425–479. Heyninck, K. and Beyaert, R. (2005). A20 inhibits NF-kappaB activation by dual ubiquitin-editing functions. Trends Biochem Sci 30, 1–4. Hryniewicz-Jankowska, A., Czogalla, A., Bok, E. and Sikorsk, A. F. (2002). Ankyrins, multifunctional proteins involved in many cellular pathways. Folia Histochem Cytobiol 40, 239–249. Jacobi, S. and Heuner, K. (2003). Description of a putative type I secretion system in Legionella pneumophila. Int J Med Microbiol 293, 349–358. Klemm, P., Hjerrild, L., Gjermansen, M. and Schembri, M. A. (2004). Structure-function analysis of the self-recognizing Antigen 43 autotransporter protein from Escherichia coli. Mol Microbiol 51, 283–296. Kwaik, A. Y., Gao, L. Y., Stone, B. J., Venkataraman, C. and Harb, O. S. (1998). Invasion of protozoa by Legionella pneumophila and its role in bacterial ecology

and pathogenesis. Appl Environ Microbiol 64, 3127–3133. Luneberg, E., Mayer, B., Daryab, N., Kooistra, O., Zahringer, U., Rohde, M., Swanson, J. and Frosch, M. (2001). Chromosomal insertion and excision of a 30 kb unstable genetic element is responsible for phase variation of lipopolysaccharide and other virulence determinants in Legionella pneumophila. Mol Microbiol 39, 1259–1271. Luo, Z. Q. and Isberg, R. R. (2004). Multiple substrates of the Legionella pneumophila Dot/Icm system identified by interbacterial protein transfer. Proc Natl Acad Sci U S A 101, 841–846. Marra, A., Blander, S. J., Horwitz, M. A. and Shuman, H. A. (1992). Identification of a Legionella pneumophila locus required for intracellular multiplication in human macrophages. Proc Natl Acad Sci U S A 89, 9607–9611. Miquel, P. H., Haeghebaert, S., Che, D., Campese, C., Guitard, C., Brigaud, T., Throuanne, M., Pani, G., Jarraud, S. and Ilef, D. (2004). pidmie communautaire de lgionellose, Pas-de-Calais, France, novembre 2003–janvier 2004. Bull Epidemiol Hebd 36–37, 179–181. Molofsky, A. B. and Swanson, M. S. (2003). Legionella pneumophila CsrA is a pivotal repressor of transmission traits and activator of replication. Mol Microbiol 50, 445–461. Morozova, I., Qu, X., Shi, S., Asamani, G., Greenberg, J. E., Shuman, H. A. and Russo, J. J. (2004). Comparative sequence analysis of the icm/dot genes in Legionella. Plasmid 51, 127–147. Muder, R. R. and Yu, V. L. (2002). Infection due to Legionella species other than L. pneumophila. Clin Infect Dis 35, 990– 998. Nagai, H., Kagan, J. C., Zhu, X., Kahn, R. A. and Roy, C. R. (2002). A bacterial guanine nucleotide exchange factor activates ARF on Legionella phagosomes. Science 295, 679–682. Nagai, H., Cambronne, E. D., Kagan, J. C., Amor, J. C., Kahn, R. A. and Roy, C. R. (2005). A C-terminal translocation signal required for Dot/Icm-dependent delivery of the Legionella RalF protein to

References host cells. Proc Natl Acad Sci U S A 102, 826–831. Perrin, A. J., Jiang, X., Birmingham, C. L., So, N. S. and Brumell, J. H. (2004). Recognition of bacteria in the cytosol of mammalian cells by the ubiquitin system. Curr Biol 14, 806–811. Rankin, S., Li, Z. and Isberg, R. R. (2002). Macrophage-induced genes of Legionella pneumophila: protection from reactive intermediates and solute imbalance during intracellular growth. Infect Immun 70, 3637–3648. Reiss, U., Oskouian, B., Zhou, J., Gupta, V., Sooriyakumaran, P., Kelly, S., Wang, E., Merrill, A. H. J. and Saba, J. D. (2004). Sphingosine-phosphate lyase enhances stress-induced ceramide generation and apoptosis. J Biol Chem 279, 1281–1290. Ridenour, D. A., Cirillo, S. L., Feng, S., Samrakandi, M. M. and Cirillo, J. D. (2003). Identification of a gene that affects the efficiency of host cell infection by Legionella pneumophila in a temperaturedependent fashion. Infect Immun 71, 6256–6263. Rossier, O. and Cianciotto, N. P. (2001). Type II protein secretion is a subset of the PilD-dependent processes that facilitate intracellular infection by Legionella pneumophila. Infect Immun 69, 2092– 2098. Rossier, O. and Cianciotto, N. P. (2005). The Legionella pneumophila tatB gene facilitates secretion of phospholipase C, growth under iron-limiting conditions, and intracellular infection. Infect Immun 73, 2020–2032. Segal, G., Russo, J. J. and Shuman, H. A. (1999). Relationships between a new type IV secretion system and the icm/ dot virulence system of Legionella pneumophila. Mol Microbiol 34, 799–809. Seshadri, R., Paulsen, I. T., Eisen, J. A., Read, T. D., Nelson, K. E., Nelson, W. C., Ward, N. L., Tettelin, H., Davidsen,

T. M., Beanan, M. J. et al. (2003). Complete genome sequence of the Q-fever pathogen Coxiella burnetii. Proc Natl Acad Sci U S A 100, 5455–5460. Soderberg, M. A., Rossier, O. and Cianciotto, N. P. (2004). The type II protein secretion system of Legionella pneumophila promotes growth at low temperatures. J Bacteriol 186, 3712–3720. Steele, T. W., Moore, C. V. and Sangster, N. (1990). Distribution of Legionella longbeachae serogroup 1 and other legionellae in potting soils in Australia. Appl Environ Microbiol 56, 2984–2988. Swanson, M. S. and Fernandez–Moreira, E. (2002). A microbial strategy to multiply in macrophages: the pregnant pause. Traffic 3, 170–177. Swanson, M. S. and Hammer, B. K. (2000). Legionella pneumophila pathogenesis: a fateful journey from amoebae to macrophages. Annu Rev Microbiol 54, 567– 613. Vogel, J. P., Andrews, H. L., Wong, S. K. and Isberg, R. R. (1998). Conjugative transfer by the virulence system of Legionella pneumophila. Science 279, 873–876. Wu, M., Sun, L. V., Vamathevan, J., Riegler, M., Deboy, R., Brownlie, J. C., McGraw, E. A., Martin, W., Esser, C., Ahmadinejad, N. et al. (2004). Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: a streamlined genome overrun by mobile genetic elements. PLoS Biol 2, E69. Yu, V. L., Plouffe, J. F., Pastoris, M. C., Stout, J. E., Schousboe, M., Widmer, A., Summersgill, J., File, T., Heath, C. M., Paterson, and D. L. Chereshsky, A. (2002). Distribution of Legionella species and serogroups isolated by culture in patients with sporadic communityacquired legionellosis: an international collaborative survey. J Infect Dis 186, 127–128.

337

339

16 Genomics of Listeria monocytogenes Michael Kuhn and Werner Goebel

16.1 Introduction: From Pregenomics to Postgenomics

Since the publication of the first complete bacterial genome sequence in 1995 [1], the genomes of 226 (GOLD Genomes OnLine Database, March 8, 2005) prokaryotes have been sequenced and published, including three complete genomes from members of the genus Listeria [2, 3]. This tremendous increase in genomic information – especially about pathogenic bacteria – over the past decade has substantially altered our view on bacterial pathogenesis and also resulted in the establishment of sequence-based high-throughput methods in microbial research laboratories worldwide. The present review will briefly summarize the major molecular genetic and biochemical breakthroughs in the analysis of listerial pathogenicity obtained in the pregenomic era as well as recently gathered insights resulting from and based upon the genomic approaches of the last several years.

16.2 Listeria monocytogenes: A Facultative Intracellular Pathogen

Listeria spp are gram-positive, low G+C-content bacteria closely related to the Bacillus group. L. monocytogenes, the only member of the genus comprising six species that is pathogenic for humans, is a facultative intracellular pathogen causing sporadic outbreaks of foodborne infections with a high rate of mortality especially in immunocompromised individuals. Being relatively harmless to healthy individuals and easy to culture in the laboratory, L. monocytogenes became one of the best-studied model pathogens. Even during in the pregenomic era, a number of virulence genes were identified and the encoded proteins characterized in detail especially with respect to their interaction with the mammalian host cell through the course of the intracellular life cycle (reviewed in detail in Refs. [4, 5]). The early studies to unravel the details of the interaction of L. monocytogenes with mammalian host cells used epithelial-like and macrophage-like cell lines. Macrophages actively ingest L. monocytogenes, but internalization of the bacterium

340

16 Genomics of Listeria monocytogenes

by normally nonphagocytic cells is triggered by L. monocytogenes-specific factors. Aside from the internalization step, the intracellular life cycle of the bacteria in phagocytes and in normally nonphagocytic mammalian cells is, however, similar. The pathogen first appears in a vacuole, which is subsequently lysed by the combined action of the pore-forming protein listeriolysin (LLO) and one or two phospholipases. The bacteria which are released into the cytoplasm begin to replicate while making use of specific transporters to gain carbohydrates from the host cell; those remaining in the phagosome are killed and digested. Concomitant with the onset of intracellular replication, L. monocytogenes induces the expression of the surface protein ActA which, by the activation of the cellular Arp2/3 complex, induces nucleation of host actin filaments. The formation of a polar tail and the permanent polymerization of F-actin at the interface between the bacteria and their tails produce a propulsive force which moves the bacteria through the cytoplasm. Bacteria which reach the surface of the infected host cell induce the formation of pseudopodium-like structures with the bacterium at the tip and the actin tail behind it. These pseudopodia are taken up by neighboring cells. The bacteria thus entering the neighboring cells are within a vacuole that is surrounded by a double membrane which is subsequently lysed by LLO and a broad-specificity phospholipase to release the bacteria into the cytoplasm of the newly infected host cell, where the bacteria again replicate and move intracellularly (reviewed in detail in Refs. [4, 5]).

16.3 Listeria monocytogenes Genetics in the Pregenomic Era: Identification and Characterization of Important Virulence Factors

The two probably most important virulence loci of L. monocytogenes, the virulence gene cluster and the inlAB operon, were already identified via transposon mutagenesis in the pregenomic era. The next sections will briefly summarize the major findings on the important virulence factors; in depth descriptions of them have been presented elsewhere [4, 5]. 16.3.1 Internalins and the Invasion of Nonprofessional Phagocytic Cells

Several genes encoding internalins were identified in L. monocytogenes and L. ivanovii in the pregenomic era [6–8]. All of them belong to the superfamily of LRR (leucine-rich repeat) proteins, but there are two distinct classes of internalins: (a) those which are relatively large in size and cell-associated and (b) those which are considerably smaller and secreted. While L. monocytogenes possesses a large number of internalins belonging to class a and only one (InlC) belonging to class b, the opposite seems to be the case for L. ivanovii (see below). The first internalin identified was InlA, an acidic protein of 800 amino acids which possesses two extended repeat domains, the first of which consists of 15

16.3 Listeria monocytogenes Genetics in the Pregenomic Era …

LRRs [6]. The InlA protein has a typical N-terminal transport signal sequence and a cell wall anchor in the C-terminal part comprising the sorting motif LPXTG followed by a hydrophobic membrane-spanning region. This distal LPXTG motif has been shown to be responsible for the covalent attachment of InlA to the bacterial cell envelope in a process mediated by the enzyme sortase [9]. InlB [6], a protein of 630 amino acids, also carries an N-terminal transport signal sequence, eight LRRs, and three C-terminal modules each beginning with the amino acids glycine (G) and tryptophan (W) (GW modules), but, in contrast to InlA, has no LPXTG motif and no cell-wall-spanning region. InlB is targeted to the bacterial surface via the noncovalent interaction of the GW modules with lipoteichoic acid in the listerial cell wall [10]. The three-dimensional structures of both internalins were solved at the atomic level [11, 12]. It was found that in InlA and InlB, the N-terminal parts in each protein are combined to form a contiguous internalin domain with the LRR region as the central part. The extended b-sheet resulting from the LRRs constitutes an adaptable concave surface proposed to interact with the respective mammalian receptor molecules during infection. Both the internalins, InlA and InlB, were shown to be involved in the internalization of L. monocytogenes by various nonphagocytic mammalian cells [6, 13]. Human E-cadherin was identified as the InlA receptor [14]. The highly specific interaction of human E-cadherin with InlA is based on the concave interaction domain which surrounds and specifically recognizes E-cadherin [12]. The cytoplasmic domain of E-cadherin interacts with a- and b-catenins, which on the other hand directly bind to actin filaments and hence complete the link between the listerial surface protein and the host cell cytoskeleton [15]. The receptor tyrosine kinase Met was identified as an InlB receptor required for InlB-dependent entry of L. monocytogenes into various cells [13], which induces rapid tyrosine phosphorylation of Met followed by a complex series of intracellular signal transduction events. The PI-3 kinase is in the center of this signal transduction pathway leading to actin rearrangement [16], and is most likely activated upon interaction with adaptor proteins recruited to the phosphorylated cytoplasmic domain of Met [13, 17]. The generation of PIP3 by the PI-3 kinase [16] then leads to activation of the downstream kinases. It is currently believed that these events finally lead to the activation of the Arp2/3 complex (see below) which is necessary to induce the actin polymerization involved in the rearrangement of the actin cytoskeleton during the uptake process. The complement receptor for the globular part of the C1q fragment (gC1q-R) is another InlB receptor. Direct interaction of the ubiquitously expressed gC1q-R and InlB has been demonstrated, but the role of the gC1q-R in InlB-mediated entry of L. monocytogenes is still unknown [18]. Finally, glucosaminoglycans (GAGs) were identified as a third type of InlB ligand [19]. GAGs are present on the surface of mammalian cells, where they decorate the proteoglycans. InlB binds to GAGs through its C-terminal GW repeats, which anchor the protein to the bacterial cell surface. GAGs are hence believed to detach InlB from the bacterial surface, allowing its interaction with the Met [20].

341

342

16 Genomics of Listeria monocytogenes

The significance of the listerial surface protein InlA in in vivo host cell invasion was for a long time less clear. To test whether the single nucleotide exchange in mouse E-cadherin compared to human E-cadherin (Pro16Glu), which renders the mouse homologue unable to bind InlA [21], is the reason for the missing role of InlA in virulence in the mouse model, Lecuit et al. [22] produced a transgenic mouse expressing human E-cadherin in its enterocytes. With this transgenic mouse the role of InlA in the penetration of the intestinal barrier was clearly demonstrated. L. monocytogenes mutants lacking InlA were significantly less able to cross the intestinal barrier since they were severely impaired in promoting their own uptake by the enterocytes forming the intestinal barrier [22]. The function of InlC, the prototype of the group of secreted small internalins, is less clear at the moment. Deletion of inlC reduces virulence in the mouse sepsis model, and participation of InlC in InlA-mediated internalization has been suggested [7, 23]. 16.3.2 Listeriolysin O and Two Listerial Phospholipases Allow Escape from the Phagocytic Vacuole

Hemolytic activity of L. monocytogenes is due to the action of a cytolysin called listeriolysin O (LLO), a secreted protein of 58–60 kDa, belonging to a family of thiolactivated, cholesterol-dependent, pore-forming toxins (CDTX) [24]. The gene encoding LLO, hly, was the first listerial virulence gene cloned [25] and codes for 529 amino acids including an N-terminal signal sequence and a highly conserved undecapeptide essential for hemolytic activity. Nonhemolytic mutants are totally avirulent and are eliminated from the infected animal within a few hours [26]. They are unable to open the phagosome and hence unable to escape into the cytoplasm of the host cells [27]. The low pH optimum of LLO is in agreement with its function in the acidified phagosome. Detailed studies showed that this low pH optimum of LLO is critical for its function since it would otherwise damage the host cell upon escape of the bacteria into the pH-neutral cytoplasm (reviewed in Ref. [28]). In addition, a so-called PEST sequence identified in the N-terminal region of LLO [29] targets LLO to the proteasome degradation pathway, hence dramatically reducing its cytoplasmic half-life. Finally, the expression of LLO in the infected host cell is tightly controlled with strongly induced expression when the bacteria reside inside the phagosome [30]. A gene located immediately upstream of hly, called plcA, encodes a secreted phosphatidylinositol-specific phospholipase C (PI-PLC) of 34 kDa which exhibits high homology to several gram-positive phospholipases [31]. The enzyme has been purified [32] and was found to be highly specific for phosphatidylinositol, with no other detectable activity. Besides the highly specific PI-PLC, L. monocytogenes produces a second phospholipase C which hydrolyzes phosphatidylcholine (lecithin), and is thus a phosphatidylcholine-specific phospholipase C (PC-PLC) or lecithinase [33], also called

16.3 Listeria monocytogenes Genetics in the Pregenomic Era …

broad-spectrum phospholipase C. The 29-kDa mature PC-PLC has been purified [33] and is a zinc-dependent phospholipase, the gene of which, plcB, is part of the lecithinase operon [34]. Maturation of the 32-kDa precursor of PC-PLC occurs after secretion and is obviously accomplished by the metalloprotease Mpl of L. monocytogenes [35]. PI-PLC is required for the efficient escape of L. monocytogenes from the phagosome of some cell types [36]. It is assumed that PI-PLC acts in concert with listeriolysin inside the acidified phagosomal vacuole of the host cell to mediate lysis of the vacuolar membrane. The role of PC-PLC in escaping from the primary vacuole is not clear and differs from cell type to cell type. In some human cell lines, where escape of L. monocytogenes occurs at low efficiency independent of LLO [37], PCPLC is required for lysis of the phagocytic vacuole together with the metalloprotease. The metalloprotease Mpl of L. monocytogenes indirectly contributes to pathogenicity and intracellular replication of the bacteria by proteolytic processing and hence activation of the PC-PLC proform [35]. Located downstream of the hly gene, the mpl gene [38] encoding the zinc-dependent Mpl is the first gene of the lecithinase operon [34, 38]. 16.3.3 Intracellular Motility and Cell-to-Cell Spread: The Surface Protein ActA

L. monocytogenes moves rapidly through the cytoplasm by the polymerization of host cell actin which is induced by the listerial surface protein ActA [39, 40]. The actA gene is located downstream of mpl in the lecithinase operon [34] and codes for a proline-rich protein of 639 amino acids, which is sufficient to stimulate F-actin assembly and to promote intracellular movement. ActA consists of three domains: the N-terminal domain with the transport signal sequence, the central proline-rich repeat region, and the C-terminal part, which includes a membrane anchor [39, 40]. The expression of mutated forms of ActA allowed the definition of regions of the ActA protein with specific functions in the recruitment of cellular proteins and hence in actin polymerization and movement. Deletion of the whole N-terminal domain of ActA resulted in a total abolishment of actin polymerization and intracellular movement, showing the absolute necessity of this domain in ActA function [41]. Within the N-terminal part of ActA, two smaller regions were identified which are either required for filament elongation or for the continuity of the actin tail since their deletion led to discontinuous actin tail formation [42]. In contrast, deletion of the C-terminal domain did not inhibit actin assembly [41]. The actin tails produced by L. monocytogenes strains expressing ActA lacking the central proline-rich repeats were significantly shorter and the speed of the movement was drastically reduced [43]. Several actin binding proteins and proteins regulating actin dynamics colocalize with the F-actin tails. From these proteins only VASP, Mena, and the Arp2/3 complex were shown to directly bind to ActA [44, 45, 46]. The Arp2/3 complex initiates

343

344

16 Genomics of Listeria monocytogenes

F-actin polymerization due to its nucleation activity, which is significantly activated by the presence of ActA [46], thus mimicking the activity of proteins of the cellular WASP family which are natural ligands and activators of Arp2/3. The phosphoprotein VASP binds directly to the proline-rich repeats of ActA [44]. On the other hand VASP is a natural ligand of profilin and could stimulate actin assembly by binding to ActA and enhancing the profilin concentration in the vicinity of the bacterium. ActA itself can bind G-actin with a region in its Nterminal part, but deletion of this region does not interfere with actin tail formation in infected cells [47]. It is believed that inside cells, the VASP-mediated profilin/G-actin recruitment can bypass defects in actin binding of ActA. ActA expression is maximally induced when the bacteria have reached the host cell cytoplasm, and the expression level is increased more than 200-fold in cytosolic bacteria in comparison to broth-grown cultures [48] as shown either by reporter gene fusions to actA [30, 48] or by direct quantification of actA-specific transcripts [30]. L. monocytogenes can spread from cell to cell without leaving the cytoplasm by forming microvilli-like protrusions on the host cell surface which are phagocytosed by neighboring cells [49, 50]. The mechanisms of microvilli formation and of induction of phagocytosis by the neighboring cell are largely unknown. Listeria moves randomly through the cytoplasm and the bacteria are finally propelled into the host’s plasma membrane, which is distended and long protrusions are formed. These protrusions enter neighboring cells, are taken up subsequently, and a secondary vacuole with a double membrane is formed [51]. This secondary vacuole is disrupted, requiring the action of LLO and the listerial phospholipase PC-PLC [34, 52]. 16.3.4 PrfA and the Regulation of Virulence Gene Expression

Virulence genes are coordinately regulated in L. monocytogenes by a trans-acting factor called PrfA (positive regulatory factor A) encoded by the prfA gene located upstream of the hly gene. PrfA, a cytoplasmic protein of 27 kDa, acts as a transcriptional activator [53] and is a member of the Crp/Fnr family of transcriptional activators. Like all members of this family it harbors a helix-turn-helix motif in its C-terminal part that binds to a 14-bp palindromic sequence called PrfA box which is present in the promoters of all PrfA-dependent genes (reviewed in Refs. [4, 54]). The essential structural and biochemical features of a functional PrfA-dependent promoter including the importance of neighboring sequences have been recently studied in detail by the use of an in vitro transcription system [55]. Aside from the well-studied promoters of the known virulence genes, a number of additional PrfA-boxes have been identified in the genome sequence of L. monocytogenes as discussed below. The transcriptional organization of the PrfA-regulated genes is complex. Multiple transcripts of all genes of the virulence gene cluster and also of the inlAB operon have been identified [4, 54]. The full expression of the PrfA-dependent

16.3 Listeria monocytogenes Genetics in the Pregenomic Era …

virulence genes requires the synthesis of the monocistronic prfA transcripts, which are synthesized at two PrfA-independent promoters, one of which is dependent on sigma B (SigB), while the other seems to be SigA-dependent [36, 56, 57]. It is believed that transcription of prfA via these prfA promoters results in the synthesis of a limited amount of PrfA sufficient to activate the high-affinity PrfA-dependent promoter located in front of plcA which initiates transcription of a bicistronic plcA-prfA mRNA leading to enhanced PrfA synthesis. The resulting higher level of PrfA is believed to activate the mpl and actA promoters, which seem to have a lower affinity for PrfA due to base mismatches in their palindromic PrfA boxes [54]. PrfA-dependent virulence gene expression in L. monocytogenes is also thermoregulated, and the shift from 30 C to 37 C results in a dramatic increase in expression of the virulence genes [58]. The low expression of virulence genes at temperatures below 30 C coincides with the absence of PrfA protein. However, the prfA gene is still transcribed under these conditions from its own promoter, resulting in a monocistronic prfA transcript. At 37 C prfA is transcribed from the prfA promoter and from the PrfA-dependent plcA promotor, resulting in both a monocistronic and a bicistronic plcA-prfA messenger [58]. Johansson et al. [59] were able to demonstrate that at 30 C the monocistronic prfA messenger is not translated because the upstream untranslated leader (UTR) preceding prfA forms a secondary structure, which masks the ribosome binding region. This secondary structure is thermosensitive and hence unstable at higher temperatures (37 C), which then allows efficient translation resulting in an about five-fold increase in PrfA levels. Besides temperature, an increasing number of environmental signals have been shown to affect virulence gene expression in L. monocytogenes; these can be classified into either physicochemical signals (iron, readily metabolized carbohydrates taken up by phosphoenolpyruvate-dependent phosphotransferase systems such as glucose or cellobiose, higher salt concentration, low pH, activated charcoal) or stress conditions (heat shock, oxidative stress, nutritional stress). Growth inside host cells also activates PrfA, which may be caused by one or more of the above-mentioned signals [54]. The mechanisms of altered gene expression under such conditions are either unknown or only poorly understood. An insertional mutagenesis study [60] and two in vivo expression technology (IVET) studies [61, 62] tried to identify genes preferentially expressed inside mammalian cells. However, in both studies, apart from the well-known virulence genes, only a few genes involved in nucleotide biosynthesis, an arginine transporter, and some unknown genes were identified. The complexity of the regulation of gene expression inside the host cell is thus, despite all progress, far from understood.

345

346

16 Genomics of Listeria monocytogenes

16.4 Genome Sequence of L. monocytogenes and Its Comparison with the Closely Related L. innocua

In 2001 the complete genomic sequences of L. monocytogenes Sv1/2a EGD-e and L. innocua Sv6a (CLIP 11262) (Fig. 16.1), were published [2] (accessible via ListiList: http://genolist.pasteur.fr/ListiList/index.html). Both genomes were sequenced by the whole-genome shotgun technique by a European consortium headed by members of the Pasteur Institute. The availability of the complete genomic sequences of L. monocytogenes and its close nonpathogenic relative L. innocua opened up new possibilities for the investigation of listerial virulence. The sequencing of the 2 944 528 base pairs of the circular chromosome of L. monocytogenes revealed the presence of 2853 protein-coding genes, 30% of which are without a predicted function. The comparison of this sequence to the genomes of L. innocua and the closely related species B. subtilis [63] revealed some special features of L. monocytogenes [2, 64]. The first surprise was the presence of many putative surface proteins belonging to six families (19 internalins, 21 other LPXTG proteins, nine GW module-containing proteins, 11 hydrophobic-tail proteins, four p60-like proteins, and up to 68 putative lipoproteins). The high number of related members in each family seems to be – at least in part – due to extensive gene duplications. Of particular interest are surface proteins characterized by the common LPXTG motif in their C-terminus, which is necessary for the covalent linkage of the proteins to the cell wall peptidoglycan. L. monocytogenes harbors a total of 41 genes coding for LPXTG proteins (more than any other bacterial species analyzed), and these can be further grouped into the family of internalins and internalin-like proteins (characterized by the LRR motif), and the LPXTG proteins lacking this LRR. The function of the vast majority of these genes is unknown, but one may speculate that the internalins and internalin-like proteins confer host species and cell type specificity during infection. They may also mediate the attachment of bacteria to other surfaces during their life outside the mammalian host [2, 65]. Another surprising feature of L. monocytogenes revealed by analysis of the genome was the large number of transport proteins (331 in total), 88 of which are devoted to carbohydrate transport by phosphoenolpyruvate-dependent phosphotransferase systems (PTS) [2]. Hence, L. monocytogenes has nearly three times as many PTS systems as B. subtilis [63], which probably enables L. monocytogenes to take up and therefore grow on a large number of carbohydrates. As mentioned below, one specific hexose phosphate transporter, hpt, is necessary for intracellular multiplication of L. monocytogenes [66], and further studies will most likely reveal the importance of other uptake systems for adaptation to different environmental conditions. As anticipated, due to the wide range of different conditions faced by L. monocytogenes during extracellular and intracellular growth, as many as 209 transcriptional regulators have been identified. This number is second only to that of Pseudomonas aeruginosa, another ubiquitous opportunistic pathogen. However,

16.4 Genome Sequence of L. monocytogenes and Its Comparison with the Closely Related L. innocua

Fig. 16.1 Circular genome maps of L. monocytogenes EGD-e and L. innocua CLIP 11262, showing the position and orientation of genes. From the outside: Circles 1 and 2, L. innocua and L. monocytogenes genes on the + and – strands, respectively. Color coding: green, L. innocua genes; red, L. monocytogenes genes; black, genes specific to L. monocytogenes or L. innocua, respectively; orange, rRNA

operons; purple, prophages. Circle 3, G+C content of L. monocytogenes (< 32.5% G+C in light yellow, 32.5–43.5% G+C in yellow, and > 43.5% G+C in dark yellow). The scale in megabasepairs is indicated on the outside of the genome circles, with the origin of replication at position 0. Reprinted with permission from Ref. [2]. (This figure also appears with the color plates.)

L. monocytogenes encodes only five sigma factors, versus 18 in B. subtilis [63] and 13 in Mycobacterium tuberculosis [67]. The Crp/Fnr family of transcriptional regulators comprises 15 members in L. monocytogenes (including the central virulence gene regulator PrfA) [2]. The importance of this regulatory family in Listeria spp. is highlighted by comparison

347

348

16 Genomics of Listeria monocytogenes

with other genomes: B. subtilis [63] contains one regulator of this type, E. coli two [68], and P. aeruginosa four [69]. The two largest families of regulatory proteins are the GntR-like regulators and the BglG-like antiterminators comprising 24 and 18 members, respectively [2, 64]. Both Listeria genomes encode 15 histidine kinases and 16 response regulators constituting two-component regulatory systems. The complete genomic sequences of a L. monocytogenes Sv4b strain (F2365) together with two nearly complete genomic sequences of an other Sv1/2a (F6854) and another Sv4b strain (H7858) were presented in 2004 [3]. Together with the partial sequence of yet another L. monocytogenes Sv4b strain (CLIP 80489) [70], these data allowed whole-genome comparison within the species L. monocytogenes, resulting in the definition of a core genome of this species. These studies also confirmed that the species L. monocytogenes is highly variable regarding its gene content, with a variation ranging up to 10% between different strains. Finally, a L. monocytogenes Sv4a strain is currently being sequenced (T. Hain, W. Goebel, and T. Chakraborty, unpublished results), which means that at least one representative strain of all major pathogenic lineages of L. monocytogenes will be completely sequenced. Plasmids and transposons are very rarely present in Listeria: a plasmid plI100 of 81.9 kbp was detected in L. innocua coding for heavy metal resistance genes, and another one, pLM80 of 82.3 kbp, in L. monocytogenes Sv4b strain H7858, with high similarity to the L. innocua plasmid [2, 3]. L. monocytogenes EGD-e harbors a copy of a Tn916-like conjugative transposon on the chromosome, and multiple different transposable elements are also present on the above-mentioned plasmids [2, 3].

16.5 Genomic Approaches to Studying the other Members of the Genus Listeria

Complete genomic sequences of at least one strain of each of the other species of the genus Listeria will be available soon: the genomes of L. ivanovii Sv5 (PAM 55), L. welshimeri (SLCC5334), and L. seeligeri (SLCC3954) have just been completely sequenced, are 2 928 879 bp, 2 814 124 bp, and 2 797 634 bp in size, respectively, and will be published soon (T. Hain, W. Goebel, and T. Chakraborty, unpublished results; P. Glaser, C. Buchrieser, and P. Cossart, unpublished results). The sequencing of L. gray is under way at the Institut Pasteur (P. Glaser, C. Buchrieser, and P. Cossart, unpublished results). Of the five completely sequenced species, L. innocua has the largest genome, slightly more than 3 Mbp. The G+C content of all genomes is in the range of 36.8% (L. welshimeri) to 42.2% (L. ivanovii), and the percentage coding ranges from 88.2% (L. welshimeri) to 90.3% (L. monocytogenes and L. innocua).

16.6 Evolutionary Aspects

16.6 Evolutionary Aspects

Based on in vitro data, all Listeria species are regarded as normally noncompetent [71]. It was therefore totally unexpected to find genes in L. monocytogenes and L. innocua coding for putative DNA uptake systems, homologous to B. subtilis competence genes [2, 71]. The uptake apparatus may either not be functional any more, or its regulation or the signals that induce competence may differ from those of B. subtilis and the inducing conditions have not yet been met during laboratory culture. Nevertheless, the possibility of gene transfer by transformation could well explain the genomic differences between the two species L. monocytogenes and L. innocua. These differences are mainly found in blocks dispersed around the chromosome, resulting in a mosaic genome structure. Furthermore, the collinearity identified for the L. monocytogenes and L. innocua chromosomes also extends to numerous regions of the B. subtilis chromosome [2]. The hundreds of insertions found in the three chromosomes are best explained by multiple independent transformation events followed by DNA integration at various sites on the chromosomes. The origin of the known virulence genes in Listeria is still unclear. The large family of internalins seems to have evolved in Listeria after the initial combination of the LPXTG membrane anchor motif with a LRR motif (both of which are of unknown origin) to form a protointernalin. This then (probably) duplicated several times and evolved further by recombinations and point mutations [2]. Even among the completely sequenced L. monocytogenes strains there are significant differences in the numbers and types of internalins, which indicates an unusually high rate of mutation in these genes [3]. The virulence gene cluster was obviously either acquired a long time ago or evolved in Listeria. Looking at the individual genes present in the virulence gene cluster, homologues of the listeriolysin gene [24], the metalloprotease gene [38], and the two phospholipase genes [21, 34] are present in many related species but never found in a cluster as in L. monocytogenes. The origin of the actA gene is totally obscure since no bacterial gene with significant homology (besides the actA genes present in L. ivanovii and L. seeligeri [72, 73]) has yet been isolated. It has been speculated, however, that the actA gene may be of eukaryotic origin since parts of ActA show some homology to eukaryotic cytoskeletal proteins [74]. At present, fully functional virulence clusters are found in L. monocytogenes and in the animal pathogen L. ivanovii, and a similar cluster with additional genes is present in the nonpathogenic species L. seeligeri [73–76] (Fig. 16.2]. Interestingly, one of the ORFs with unknown function at the right border of the cluster shows some weak homology to genes of Listeria phages [73, 74]. This might point to phage transduction events involved in the early evolution of this gene cluster. The analysis of the sequences of the virulence gene cluster and the flanking regions in the different Listeria species (Fig. 16.2) indicates that the virulence gene cluster was present in the common ancestor of L. monocytogenes, L. innocua, L. ivanovii, L. seeligeri, and L. welshimeri. It is present at exactly the same chromosomal

349

350

16 Genomics of Listeria monocytogenes

Fig. 16.2 The virulence gene cluster locus in the six species of the genus Listeria. The cluster is flanked by the housekeeping genes (black boxes) prs, vclB (lmo0209), and ldh in all six species. Genes controlled by PrfA are shown as boxes with black arrowheads. vclA (lmo0208) is present in all species except L. grayi. vclP is present in L. welshimeri, L. seeligeri, and L. ivanovii. vclZ (lmo0207) is present

in L. monocytogenes and L. innocua. vclY and vclX are inverted in L. seeligeri. Species-specific genes (medium gray) not under PrfA control include vclJ, vclF1, vclG1, vclG2, vclF2 of L. grayi, and vclC of L. seeligeri. Homologous genes are represented by boxes of the same color. Reprinted with permission from Refs. [73, 75]. (This figure also appears with the color plates.)

position in all hemolytic isolates. The pathogenic capability associated with the virulence gene cluster has been lost in two separate events in L. innocua and L. welshimeri [73, 74, 77]. The two independent deletion events in the two species are indicated by the presence of short DNA sequences believed to originally belong to the virulence gene cluster [73, 77]. The more distantly related species L. grayi obviously never harbored the virulence gene cluster [73]. The availability of the complete genome sequences of at least one strain of each species of the genus Listeria in the near future and their comparison will certainly further enlarge our understanding of the evolution of the genus Listeria. The recent sequencing projects with additional members of the species L. monocytogenes [3, 70] also shed light on the evolutionary history of this species comprising 13 different serovars, out of which only 4 (Sv 1/2a, 1/2c, 1/2b, and 4b) account for 98% of reported human listeriosis cases [4]. In the pregenomic era, different genetic methods, including multilocus sequence typing, restriction fragment length polymorphism, and ribotyping, suggested that L. monocytogenes can be subdivided into either two or three lineages, with most of the epidemic strains found in only one lineage [4, 70, 78]. The construction of a DNA macroarray representing those genes which are unique to L. innocua and the L. monocytogenes Sv1/2a EGD-e strain and the Sv4b strain CLIP 80489 allowed the comparison of more than 100 Listeria strains by genomic hybridization [70]. This study only partially

16.7 Identification of Listerial Virulence Factors in the Postgenomic Era

confirmed the previous classification of L. monocytogenes strains; the evolutionary scheme derived from the genome comparison groups the L. monocytogenes isolates into two main lineages which can be further subdivided. Most interestingly, L. innocua could be grouped together with the L. monocytogenes Sv4 strains, and it is now evident that this species once derived from an ancestor of the L. monocytogenes Sv4 strains by successive gene loss including loss of the virulence gene cluster.

16.7 Identification of Listerial Virulence Factors in the Postgenomic Era 16.7.1 Internalins and Other Surface Proteins

A large number of internalin-related proteins in addition to the well-studied InlA and InlB proteins were identified in the different L. monocytogenes genome sequences [2, 3, 65, 70]. However, so far only those of them already identified in the pregenomic era (InlC, InlE, InlF, InlG, and InlH) [7, 8, 79] have been studied in any detail, and it was shown that none of them is able to induce phagocytosis in mammalian cells on its own, although at least some of them may play a role in virulence [8]. Still, the precise roles of these and all other internalins and internalin-like proteins in the infection process remain largely unknown. The genome analysis of the L. ivanovii genome revealed a large number of small internalins, exceeding the number of large cell-associated internalins (P. Glaser, C. Buchrieser, and P. Cossart, unpublished results). Interestingly, most of the genes encoding for these small internalins in L. ivanovii are regulated by PrfA (S. Mller-Altrock, N. Mauder, and W. Goebel, unpublished results). Listeria monocytogenes encodes a large number of putative lipoproteins [2, 65], again with largely unknown functions. By deleting the gene lsp (lmo1844), encoding a putative lipoprotein-specific signal peptidase SPase II, Reglier-Poupet et al. [81] recently gained first insights into the role of at least some members of this protein family. The deletion mutant failed to process several lipoproteins and showed a reduced virulence in the mouse model. The expression of the signal peptidase is strongly induced while the bacteria reside in the phagosome, and the mutant bacteria are clearly impaired in phagosomal escape. The mechanisms of how listerial lipoproteins contribute to the lysis of the phagosomal membrane are, however, still unknown. Another lipoprotein, called LpeA (lipoprotein promoting entry; encoded by lmo1847) identified in silico in the L. monocytogenes sequence [82], shows homology to a Streptococcus pneumoniae adherence factor and was implicated in the invasion of hepatocytes, and to a lesser extent of epithelial cells, since the respective mutant showed clearly diminished capacity of cellular invasion, but not of adhesion or intracellular growth. LpeA is the only listerial lipoprotein known to date to be involved in invasion.

351

352

16 Genomics of Listeria monocytogenes

By in silico comparison of the surface protein repertories of L. monocytogenes and L. innocua, a gene encoding a L. monocytogenes surface protein absent in L. innocua was identified [83]. The gene, called aut (lmo1076), encodes a protein (Auto) of 572 amino acids containing a signal sequence, an N-terminal autolysin domain, and a C-terminal cell-wall-anchoring domain made up of four GW modules. The aut gene is expressed independently of PrfA and encodes a surface protein with an autolytic activity. Auto is required for entry of L. monocytogenes into nonphagocytic mammalian cells and necessary for full virulence. The autolytic protein Auto may thus represent a novel type of virulence factor [83]. Signature-tagged mutagenesis in combination with knowledge of the genome sequence allowed the identification of a L. monocytogenes gene called fbpA (lmo1829), required for efficient liver colonization [84]. fbpA encodes a 570-aminoacid protein that has strong homologies to atypical fibronectin-binding proteins. FbpA binds human fibronectin and increases adherence of L. monocytogenes to HEp-2 cells in the presence of fibronectin. FbpA is present on the bacterial surface and interestingly co-immunoprecipitates with LLO and InlB, but not with other known virulence factors. FbpA hence acts like a chaperone for two listerial virulence factors and appears as a novel multifunctional virulence factor of L. monocytogenes [84]. 16.7.2 Growth in the Host Cell Cytoplasm

Phagosomal escape is a prerequisite for L. monocytogenes to replicate intracellularly and is hence a critical virulence trait of this species, as it is also for Shigella flexneri and Rickettsia spp. L. monocytogenes starts intracellular multiplication shortly after escape from the vacuole, with intracellular generation times close to those observed in rich broth culture [85]. The host cell cytoplasm hence allows listerial growth with high efficiency. However, the cytosol is poorly characterized as a substrate supporting bacterial growth, and the relative abundance of nutrients is unknown. Whereas various auxotrophic mutants of L. monocytogenes are able to grow intracellularly [86], the expression of several metabolic genes is intracellularly increased [60], indicating that at least some metabolites may be limiting for growth in the cytosol. Upon microinjection into the cytoplasm of mammalian cells, only bacteria naturally capable of intracytoplasmic growth like L. monocytogenes and S. flexneri replicated efficiently in these cells [87]. Furthermore, a DprfA mutant of L. monocytogenes multiplied poorly upon microinjection, pointing to the need of specific virulence determinants for efficient intracytoplasmic multiplication. In the search for L. monocytogenes-specific bacterial factors allowing intracellular growth, a gene with high homology to uhpT of E. coli encoding a hexose phosphate transporter which also shows similarity to the mammalian glucose-6-phosphate translocase (this listerial transporter is termed Hpt, encoded by lmo0838) [66] and a gene lplA1 (lmo0931) encoding a lipoate protein ligase (LplA1) [88] were identified in the L. monocytogenes genome sequence. Both genes are necessary for efficient

16.7 Identification of Listerial Virulence Factors in the Postgenomic Era

intracellular proliferation of L. monocytogenes. Expression of the Hpt permease is tightly controlled by the central virulence regulator PrfA. Loss of Hpt resulted in impaired listerial intracytosolic proliferation and attenuated virulence in mice but did not affect bacterial growth in a rich medium [66]. However, Hpt alone is not sufficient for efficient cytoplasmic growth since intracytoplasmic replication of L. innocua expressing Hpt together with LLO is not significantly improved upon infection of macrophages [89]. The lack of LplA1 results in bacteria which cease intracellular replication after about five rounds of replication and which are clearly defective in mouse virulence assays. A major target for LplA is the E2 subunit of the pyruvate dehydrogenase enzyme (PDH) complex, and it was shown that in intracellularly grown lplA1 mutants PDH is no longer lipoylated. Studies of lipoic acid metabolism have furthermore shown that little free lipoic acid exists in the mammalian cytosol. Thus LplA1 may not be important in the replication of L. monocytogenes in culture media where free lipoate is available, but could be required in the host cell where lipoate supply may be limited [88]. 16.7.3 Resistance to Bile

The ability to colonize the gall bladder was recently shown to be an important feature of virulent L. monocytogenes [90]. The bacteria have hence to cope with the cytotoxic effects of bile when residing in the gall bladder but also in the small intestine. The comparison of the L. monocytogenes EGD-e and L. innocua genomes [2] has revealed the presence of a L. monocytogenes-specific gene, termed bsh (lmo2067), encoding a bile salt hydrolase (BSH) [91]. Bile salts are the end products of the cholesterol metabolism in the liver and are stored in the gall bladder and released into the duodenum, helping fat digestion. In addition, bile salts are known to have antimicrobial activity since they are amphipathic molecules which can attack and degrade lipid membranes. Some intestinal microorganisms have hence evolved mechanisms to resist the detergent action of bile, including the synthesis of porins, efflux pumps and transport proteins [2]. Others produce bile salt hydrolases which transform and inactivate the bile salts. The deletion of the bsh gene from the L. monocytogenes chromosome results in an increase in bile sensitivity, reduced virulence, and reduced liver colonization after infection of mice, demonstrating that BSH is a novel L. monocytogenes virulence factor involved in the intestinal and hepatic phases of listeriosis. In addition, the bsh gene has been reported to be positively regulated by the central listerial virulence regulator PrfA in vivo [91], but PrfA-dependent transcription of bsh could not be demonstrated in vitro [57]. Analysis of L. monocytogenes deletion mutants in two other bile-associated loci (pva or lmo0446 and btlB or lmo0754) revealed a role at least of the btlB gene product in resisting the acute toxicity of bile and bile salts, particularly glycoconjugated bile salts at low pH, but without actual BSH activity [92]. Finally, in silico analysis of the L. monocytogenes EGD-e genome revealed a putative bile exclusion system encoded by the bilEA and bilEB genes (lmo1421 and lmo1422) [93].

353

354

16 Genomics of Listeria monocytogenes

The bilE system mediates resistance to bile through the active export of bile from the bacterial cell, especially under conditions which mimic the situation of the upper intestinal tract. Furthermore, a L. monocytogenes DbilE mutant shows much higher levels of intracellular bile than the wild-type strain and the mutant is severely impaired in virulence upon oral administration. 16.7.4 Two-component Systems and the Regulation of Virulence Gene Expression

The signal transduction mechanisms allowing bacteria to modulate gene expression in response to diverse stimuli often involve two-component systems which are composed of a sensor kinase and a response regulator. TCS have not been studied intensively in L. monocytogenes: lisRK [94], cheYA [95], agrAC [96], and cesRK [97] were characterized in some detail and shown to contribute to virulence of L. monocytogenes. The availability of the complete genomic sequence of L. monocytogenes EGD-e allowed the in silico identification of 16 putative two-component systems. Surprisingly, only one out these 16 had no counterpart in the closely related, but apathogenic species L. innocua [2]. The role of the listerial two-component systems for in vitro and in vivo survival and growth was studied systematically by the construction of in-frame deletion mutants in the 15 nonessential twocomponent systems [98]. Careful analysis of the mutants revealed that the previously uncharacterized response regulator DegU (encoded by lmo2515) contributes to virulence and is necessary for motility since DegU regulates flagellar gene expression [98, 99]. Interestingly, DegU is the only response regulator identified in the genome sequence which obviously lacks a cognate histidine kinase. 16.7.5 Vitamin B12 Biosynthesis and Anaerobic Use of Ethanolamine

The ability to synthesize vitamin B12 is unevenly distributed in living organisms, and vitamin B12 biosynthesis genes are found in about one-third of the bacteria sequenced. Vitamin B12 can be synthesized in oxygen-independent and oxygen-dependent pathways [100]. A gene cluster (cbi and cob genes) identified in the genome sequences of L. monocytogenes and L. innocua [2, 64] shares high homology with vitamin B12 biosynthesis genes from Salmonella enterica serovar Typhimurium, suggesting that both Listeria species use the oxygen-independent pathway like S. enterica. Furthermore, close to the cobalamin biosynthesis genes both Listeria species contain orthologues of genes necessary in S. enterica for the coenzyme B12-dependent degradation of ethanolamine and propanediol (eut and pdu genes, respectively). All three gene clusters [ from cbiP (lmo1208) to pduS (lmo1142)] may have been acquired by Listeria en bloc by horizontal gene transfer in an ancient event [64]. L. monocytogenes is an aerobically growing microaerophilic that thrives best at reduced oxygen tension and is able to colonize and survive in the mammalian gut where it encounters anaerobic conditions [4]. Vitamin B12-dependent anaerobic degradation of ethanolamine and propanediol could enable

16.8 Proteomics

L. monocytogenes to use ethanolamine and 1,2-propanediol as carbon and energy sources for growth under the anaerobic conditions encountered in the mammalian gut, where both substances are believed to be abundant. Taken together, it is tempting to speculate that the vitamin B12 synthesis genes together with the pdu and eut operons play a role during a listerial infection and hence may represent a novel type of virulence determinants of L. monocytogenes.

16.8 Proteomics

In recent years, several studies used the two-dimensional gel electrophoresis approach in combination with mass spectrometry to study listerial gene expression upon changes in growth condition including salt stress [101], acid [102], high pressure and freezing [103], carbon starvation [104], or transition to the stationary phase [105] – conditions also encountered by the bacteria during food processing and storage. Of particular interest is the finding by Weeks et al. [105] that the bacteria perceived stress and began preparations for stationary phase much earlier (already in mid-exponential phase) than predicted by growth characteristics alone, and that the expression levels of more than 50% of all proteins observed changed significantly during the transition into stationary phase. Transition into stationary phase was also analyzed by Folio et al. [106] in a study primarily aiming to establish a two-dimensional electrophoresis database of L. monocytogenes EGD (see: http://www.clermont.inra.fr/proteome), as also done before by Ramnath et al. [107]. Multiple classes of putative cell-wall-anchored surface proteins which are believed to be important for interaction with the host cell were identified in the L. monocytogenes genome [2, 65]. Thus, the listerial cell wall subproteome is of special interest and was analyzed in detail in two recent studies [108, 109] using either standard two-dimensional protein separation [108] or a two-dimensional nanoliquid chromatography approach designed to specifically identify proteins linked to the peptidoglycan network [109]. For the standard approach [108], the proteins were extracted from the listerial surface by serial treatment with different salts at high concentration, and a total of 55 proteins were identified by N-terminal sequencing and mass spectrometry. Remarkably, besides lipoproteins, transporters, and other proteins (of often unknown function), a relatively high number of proteins with a function in the cytoplasmic compartment were identified in this surface proteome, which had neither predicted or detectable signal peptides, nor could any modification be observed. Among these unexpected proteins are enolase (Lmo2455), DnaK (Lmo1473), elongation factor Tu (Lmo2653), and glyceraldehyde-3-phosphate dehydrogenase (Lmo2459). In contrast, the extraction method used by Calvo et al. [109] resulted in the identification of primarily LPXTG motif-harboring proteins, which were most likely covalently linked to the peptidoglycan. Among them were InlA, InlG, InlH, and seven further LPXTG proteins (putative internalins) of unknown function.

355

356

16 Genomics of Listeria monocytogenes

Two-dimensional gel electrophoresis also represents a powerful approach to characterizing regulons once a regulatory mutant is available. Thus, sigma B-controlled and DegU-dependent genes were identified in L. monocytogenes [103, 110]. Nine proteins were identified to accumulate in the wild-type strain but not in the DsigB strain [103]. These proteins included Pfk, GalE, ClpP, and Lmo1580, all of which are typical proteins involved in acid adaptation. However, the gad operon, coding for the glutamate decarboxylase (GAD) acid resistance system, a specific mechanism for acid adaptation in L. monocytogenes aimed at maintenance of the internal pH and shown to be SigB-dependent by reverse transcription polymerase chain reaction (RT-PCR), was not identified in the proteomic approach pointing to limitations of this otherwise powerful technique. The analysis of the supernatant subproteome of L. monocytogenes WT and a DdegU mutant resulted in the identification of nine proteins, encoded in three operons coding for the flagellar apparatus, the expression of which is DegU-dependent [110]. Similar experiments can now be performed by comparison of WT bacteria with a wide array of L. monocytogenes in-frame deletion mutants in known or putative regulatory genes.

16.9 Transcriptomics

Prior to the completion of the genomic sequence of L. monocytogenes [2] our knowledge of the PrfA regulon was very limited. At that time, the only known genes that did not belong to the virulence gene cluster but were also regulated by PrfA were inlAB, [111] and inlC [7]. The availability of the complete listerial genome sequence allowed the in silico screening of the sequence for genes preceded by putative PrfA boxes. This screening identified four additional previously unknown genes harboring PrfA boxes. One of the genes, lmo0838, later called hpt, codes for a putative hexose permease, the expression of which was shown to be strictly PrfA-dependent [66]. The other three genes preceded by PrfA boxes are genes of unknown function. A recent systematic approach aiming to elucidate the complete PrfA regulon used a whole-genome macroarray carrying PCR products of the 2853 ORFs of the L. monocytogenes EGD-e genome [112] and compared the expression profiles of the wild-type strain and a prfA-deletion mutant. With this approach three groups of differently regulated genes were identified. Group I comprises, in addition to the 10 already known PrfA-regulated genes, two new genes, both positively regulated and preceded by a PrfA box. Group II comprises eight negatively regulated genes, one of which is preceded by a PrfA box and the others form an operon. Group III comprises 53 genes, of which only two are preceded by a PrfA box and which are either activated or repressed under different conditions; most of the genes in this group are transcribed from sigma B-dependent promoters. Taken together, the results suggest that PrfA, on the one hand, may positively regulate a core set of 12 genes, preceded by a PrfA box, which are probably expressed from sigma A-dependent promoters, and on the other hand, negatively regulates eight genes. A second

16.9 Transcriptomics

set of PrfA-regulated genes lacks PrfA boxes and is expressed from sigma Bdependent promoters. These data reveal that PrfA can act either as an activator or as a repressor, and suggest that PrfA may directly or indirectly activate sets of genes in association with different sigma factors [112]. However, several promoters of these additional genes that were affected either positively or negatively by PrfA, when tested in an in vitro transcription system where all truly PrfA-dependent virulence gene promoters showed similar PrfA-dependent transcription as in vivo, did not yield PrfA-dependent transcripts in the presence or absence of SigB [57]. It is therefore more likely that the in vivo observed influence of PrfA on the transcription of these genes is indirect. Besides the above-mentioned sigma factors SigA [RpoD, r43 (lmo1454)] and SigB [RpoF, r37 (lmo0895)], three further sigma factors were identified in the L. monocytogenes genome, namely SigH [RpoH, r30 (lmo0243)], SigL [RpoN, r54 (lmo2461)], and an ECF-type sigma factor (lmo0423) [2]. Whereas sigA is an essential gene, deletion mutants in the other four sigma factor-encoding genes were obtained; disruption of the gene coding for the alternative sigma factor SigH resulted in a mutant that demonstrated reduced growth potential in minimal medium but was without a defect in the infectious process [113]; target genes of SigH are, however, unknown to date. 54 The role of the alternative r factor, encoded by the rpoN gene, was investigated by comparing the global gene expression of the wild-type EGD-e strain and an rpoN mutant [114]. Gene expression using the whole-genome macroarrays mentioned before [112] identified 77 genes whose expressions were modulated in the rpoN mutant as compared to the wild-type strain. Most of the changes in gene expression were related to carbohydrate metabolism, and in particular to pyruvate metabolism. However, (a) further analyses showed that only the mptACD operon 54 54 was directly controlled by r and (b) in silico analysis suggested that r may directly control the expression of four different phosphotransferase system (PTS) 54 operons, including mptACD. RpoN (r ) is hence mainly involved in the control of carbohydrate metabolism in L. monocytogenes [114]. SigB-dependent genes in L. monocytogenes were identified in a coupled bioinformatics/microarray strategy [115]: first, candidate SigB-dependent promoters were searched for in the EGD-e sequence by biocomputing and the data generated were used to develop a specialized 208-gene microarray which included 166 genes downstream of the predicted SigB-dependent promoters as well as selected virulence and stress response genes. This array was hybridized with RNA from WT and a DsigB-mutant which resulted in the identification of more than 50 clearly SigB-dependent genes including both stress response genes (e.g., gadB, ctc, and the glutathione reductase gene lmo1433) and virulence genes (e.g., inlA, inlB, and bsh). These data demonstrate that SigB not only regulates the expression of genes important for survival under environmental stress conditions but also contributes to regulation of virulence gene expression in L. monocytogenes [115]. An ECF (lmo0423) deletion mutant showed unaltered virulence in a mouse sepsis model and a basically unchanged gene expression pattern compared to the wild-type strain when grown under aerobic conditions in a rich culture medium

357

358

16 Genomics of Listeria monocytogenes

(BHI). Growth retardation of the ECF mutant was, however, observed under microaerophilic conditions, suggesting that genes essential for the metabolism under these conditions may be under the control of this ECF sigma factor (M. Rauch and W. Goebel, unpublished results).

16.10 Conclusions

The publication of the complete genome sequence of L. monocytogenes EGD-e in 2001 [2] represents the transition from pregenomics to postgenomics in the field of L. monocytogenes research. As described briefly above, the availability of the L. monocytogenes genome sequence (and simultaneously the L. innocua sequence) has not only altered our view of the biology of this interesting pathogen, but also made it possible to use newly developed genome-based methods in the study of L. monocytogenes. The availability of the complete genomic sequences of at least one strain of each species of the genus Listeria in the near future will dramatically expand our genomic knowledge in this group of bacteria and will allow a fundamentally new view on the evolution of the genus Listeria. Genomic information is, however, not only of major interest in basic science; this information is also vital for genome-based typing of bacteria, especially in such a heterogeneous group as the one represented by the different serovars found in L. monocytogenes. For many years, L. monocytogenes was usually characterized by serotyping and subtyped using pulse-field gel electrophoresis or ribotyping. DNA microarrays based on sequence information of completely or partially sequenced genomes now provide an alternative means to resolve genetic differences among isolates and, unlike pulse-field gel electrophoresis and ribotyping, microarrays can additionally be used to identify specific genes associated with strains of interest. This approach was used successfully in several recent studies for selective discrimination of epidemic strains of L. monocytogenes [116–118]. Additionally, microarrays can also be used for direct detection of pathogen-specific RNA or DNA in complex environmental samples (reviewed in Ref. [119]). Microarray-based microbial detection systems will probably become routine – although the sensitivity of these systems currently limits their application for pathogen detection.

Acknowledgements

We thank T. Williams and D. Beier for careful and critical reading of this manuscript. This work was supported by the Deutsche Forschungsgemeinschaft through grant SFB 479, the BMBF through the PathoGenoMik competence network, and the Fonds der Chemischen Industrie.

References

References 1 Fleischmann, R. D., M. D. Adams,

O. White, R. A. Clayton, E. F. Kirkness, A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, J. M. Merrick, K. MCKENNEY, G. Sutton, W. Fitzhugh, C. Fields, J. D. Gocyne, J. Scott, R. Shirley, L. I. Liu, A. Glodek, J. M. Kelley, J. F. Weidman, C. A. Phillips, T. Spriggs, E. Hedblum, M. D. Cotton, T. R. Utterback, M. C. Hanna, D. T. Nguyen, D. M. Saudek, R. C. Brandon, L. D. Fine, J. L. Fritchmann, J. L. Fuhrmann, N. S. M. Geoghagen, C. L. Gnehm, L. A. McDonald, K. V. Small, C. M. Fraser, H. O. Smith, and J. C. Venter. 1995. Wholegenome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512. 2 Glaser, P., L. Frangeul, C. Buchrieser, C. Rusniok, A. Amend, F. Baquero, P. Berche, H. Bloecker, P. Brandt, T. Chakraborty, A. Charbit, F. Chtouani, E. Couv, A. de Daruvar, P. Dehoux, E. Domann, G. DomnguezBernal, E. Duchaud, L. Durant, O. Dussurget, K.-D. Entian, H. Fsihi, F. GarciaDel Portillo, P. Garrido, L. Gautier, W. Goebel, N. Gomez-Lopez, T. Hain, J. Hauf, D. Jackson, L.-M. Jones, U. Kaerst, J. Kreft, M. Kuhn, F. Kunst, G. Kurapkat, E. Madueno, A. Maitournam, J. Mata Vicente, E. Ng, H. Nedjari, G. Nordsiek, S. Novella, B. de Pablos, J.-C. Prez-Diaz, R. Purcell, B. Remmel, M. Rose, T. Schlueter, N. Simoes, A. Tierrez, J.A. Vzquez-Boland, H. Voss, J. Wehland, and P. Cossart. 2001. Comparative genomics of Listeria species. Science 294:849–852. 3 Nelson, K. E., D. E. Fouts, E. F. Mongodin, J. Ravel, R. T. DeBoy, J. F. Kolonay, D. A. Rasko, S. V. Angiuoli, S. R. Gill, I. T. Paulsen, J. Peterson, O. White, W. C. Nelson, W. Nierman, M. J. Beanan, L. M. Brinkac, S. C. Daugherty, R. J. Dodson, A. S. Durkin, R. Madupu, D. H. Haft, J. Selengut, S. Van Aken, H. Khouri, N. Fedorova, H. Forberger, B. Tran, S. Kathariou, L. D. Wonderling, G. A. Uhlich, D. O. Bayles, J. B. Luchansky, and C. M. Fraser. 2004.

4

5

6

7

8

9

10

Whole genome comparisons of serotype 4b and 1/2a strains of the food-borne pathogen Listeria monocytogenes reveal new insights into the core genome components of this species. Nucleic Acids Res. 32:2386–2395. Vazquez-Boland, J.-A., M. Kuhn, P. Berche, T. Chakraborty, G. Dominguez-Bernal, W. Goebel, B. GonzalezZorn, J. Wehland, and J. Kreft. 2001. Listeria pathogenesis and molecular virulence determinants. Clin. Microbiol. Rev. 14:584–640. Kuhn, M., and W. Goebel. 2005. Molecular virulence determinants of Listeria monocytogenes. In: Listeria, Listeriosis and Food Safety, 3rd edn. E. T. Ryser and E. H. Marth, editors. M. Dekker, New York Gaillard, J. L., P. Berche, C. Frehel, E. Gouin, and P. Cossart. 1991. Entry of Listeria monocytogenes into cells is mediated by internalin, a repeat protein reminiscent of surface antigens from gram-positive cocci. Cell 65:1127–1141. Engelbrecht, F., S.-K. Chun, C. Ochs, J. Hess, F. Lottspeich, W. Goebel, and Z. Sokolovic. 1996. A new PrfA-regulated gene of Listeria monocytogenes encoding a small, secreted protein which belongs to the family of internalins. Mol. Microbiol. 21:823–837. Raffelsbauer, D., A. Bubert, F. Engelbrecht, J. Scheinpflug, A. Simm, J. Hess, S. H. E. Kaufmann, and W. Goebel. 1998. The gene cluster inlC2DE of Listeria monocytogenes contains additional new internalin genes and is important for virulence in mice. Mol. Gen. Genet. 260:144–158. Bierne, H., S. K. Mazmanian, M. Trost, M. G. Pucciarelli, G. Liu, P. Dehoux, L. Jansch, F. Garcia-del Portillo, O. Schneewind, and P. Cossart. 2002. Inactivation of the srtA gene in Listeria monocytogenes inhibits anchoring of surface proteins and affects virulence. Mol. Microbiol. 43:869–881. Braun, L., S. Dramsi, P. Dehoux, H. Bierne, G. Lindahl, and P. Cossart. 1997. InlB: an invasion protein of Lis-

359

360

16 Genomics of Listeria monocytogenes

11

12

13

14

15

16

17

18

19

teria monocytogenes with a novel type of surface association. Mol. Microbiol. 25:285–294. Marino, M., L. Braun, P. Cossart, and P. Ghosh. 1999. Structure of the lnlB leucine-rich repeats, a domain that triggers host cell invasion by the bacterial pathogen L. monocytogenes. Mol. Cell. 4:1063–1072. Schubert, W. D., C. Urbanke, T. Ziehm, V. Beier, M. P. Machner, E. Domann, J. Wehland, T. Chakraborty, and D. W. Heinz. 2002. Structure of internalin, a major invasion protein of Listeria monocytogenes, in complex with its human receptor E-cadherin. Cell 111:825–836. Shen, Y., M. Naujokas, M. Park, and K. Ireton. 2000. InIB-dependent internalization of Listeria is mediated by the Met receptor tyrosine kinase. Cell 103:501–510. Mengaud, J., H. Ohayon, P. Gounon, R. M. Mege, and P. Cossart. 1996. E-cadherin is the receptor for internalin, a surface protein required for entry of Listeria monocytogenes into epithelial cells. Cell 84:923–932. Lecuit, M., R. Hurme, J. Pizarro-Cerda, H. Ohayon, B. Geiger, and P. Cossart. 2000. A role for a- and b-catenins in bacterial uptake. Proc. Natl. Acad. Sci. U. S. A. 97:10008–100013. Ireton, K., B. Payrastre, H. Chap, W. Ogawa, H. Sakaue, M. Kasuga, and P. Cossart. 1996. A role for phosphoinositide 3-kinase in bacterial invasion. Science 274:780–782. Ireton, K., B. Payrastre, and P. Cossart. 1999. The Listeria monocytogenes protein InlB is an agonist of mammalian phosphoinositide 3-kinase. J. Biol. Chem. 274:17025–17032. Braun, L., B. Ghebrehiwet, and P. Cossart. 2000. gC1q-R/p32, a C1q-binding protein, is a receptor for the InlB invasion protein of Listeria monocytogenes. EMBO J. 19:1458–1466. Jonquieres, R., J. Pizarro-Cerda, and P. Cossart. 2001. Synergy between the N- and C-terminal domains of InlB for efficient invasion of non-phagocytic cells by Listeria monocytogenes. Mol. Microbiol. 42:955–965.

20 Marino, M., M. Banerjee, R. Jonquieres,

P. Cossart, and P. Ghosh. 2002. GW domains of the Listeria monocytogenes invasion protein InlB are SH3-like and mediate binding to host ligands. EMBO J. 21:5623–5634. 21 Lecuit, M., S. Dramsi, C. Gottardi, M. Fedor-Chaiken, B. Gumbiner, and P. Cossart. 1999. A single amino acid in E-cadherin responsible for host specificity towards the human pathogen Listeria monocytogenes. EMBO J. 18:3956–3963. 22 Lecuit, M., S. Vandormael-Pournin, J. Lefort, M. Huerre, P. Gounon, C. Dupuy, C. Babinet, and P. Cossart. 2001. A transgenic model for listeriosis: role of internalin in crossing the intestinal barrier. Science 292:1722–1725. 23 Bergmann, B., D. Raffelsbauer, M. Kuhn, M. Goetz, S. Hom, and W. Goebel. 2002. InlA- but not InlBmediated internalization of Listeria monocytogenes by non-phagocytic mammalian cells needs the support of other internalins. Mol. Microbiol. 43:557–570. 24 Palmer, M. 2001. The family of thiolactivated, cholesterol-binding cytolysins. Toxicon 39:1681–1689. 25 Cossart, P., M. F. Vicente, J. Mengaud, F. Baquero, J. C. Perez-Diaz, and P. Berche. 1989. Listeriolysin O is essential for virulence of Listeria monocytogenes: direct evidence obtained by gene complementation. Infect. Immun. 57:3629–3636. 26 Kathariou, S., P. Metz, H. Hof, and W. Goebel. 1987. Tn916-induced mutations in the hemolysin determinant affecting virulence of Listeria monocytogenes. J. Bacteriol. 169:1291–1297. 27 Gaillard, J. L., P. Berche, J. Mounier, S. Richard, and P. J. Sansonetti. 1987. In vitro model of penetration and intracellular growth of Listeria monocytogenes in the human enterocyte-like cell line Caco-2. Infect. Immun. 55:2822–2829. 28 Dramsi, S., and P. Cossart. 2002. Listeriolysin O: a genuine cytolysin optimized for an intracellular parasite. J. Cell Biol. 156:943–946. 29 Decatur, A. L., and D. A. Portnoy. 2000. A PEST-like sequence in listeriolysin O essential for Listeria monocytogenes pathogenicity. Science 290:992–995.

References 30 Bubert, A., Z. Sokolovic, S. K. Chun,

31

32

33

34

35

36

37

38

L. Papatheodorou, A. Simm, and W. Goebel. 1999. Differential expression of Listeria monocytogenes virulence genes in mammalian host cells. Mol. Gen. Genet. 261:323–336. Leimeister-W chter, M., E. Domann, and T. Chakraborty. 1991. Detection of a gene encoding a phosphatidylinositolspecific phospholipase C that is co-ordinately expressed with listeriolysin in Listeria monocytogenes. Mol. Microbiol. 5:361–366. Goldfine, H., and C. Knob. 1992. Purification and characterization of Listeria monocytogenes phosphatidylinositol-specific phospholipase C. Infect. Immun. 60:4059–4067. Geoffroy, C., J. Raveneau, J. L. Beretti, A. Lecroisey, J.-A. Vazquez-Boland, J. E. Alouf, and P. Berche. 1991. Purification and characterization of an extracellular 29-kilodalton phospholipase C from Listeria monocytogenes. Infect. Immun. 59:2382–2388. Vazquez-Boland, J.-A., C. Kocks, S. Dramsi, H. Ohayon, C. Geoffroy, J. Mengaud, and P. Cossart. 1992. Nucleotide sequence of the lecithinase operon in Listeria monocytogenes and possible role of lecithinase in cell-to-cell spread. Infect. Immun. 60:219–230. Poyart, C., E. Abachin, I. Razfimanantsoa, and P. Berche. 1993. The zinc metalloprotease of Listeria monocytogenes is required for maturation of the phosphatidylcholine phospholipase C: direct evidence obtained by gene complementation. Infect. Immun. 61:1576–1580. Camilli, A., L. G. Tilney, and D. A. Portnoy. 1993. Dual roles of PlcA in Listeria monocytogenes pathogenesis. Mol. Microbiol. 8:143–157. Marquis, H., V. Doshi, and D. A. Portnoy. 1995. The broad-range phospholipase C and a metalloprotease mediate listeriolysin O-independent escape of Listeria monocytogenes from a primary vacuole in human epithelial cells. Infect. Immun. 63:4531–4534. Mengaud, J., C. Geoffroy, and P. Cossart. 1991. Identification of a new operon involved in Listeria monocytogenes virulence: its first gene encodes a

protein homologous to bacterial metalloproteases. Infect. Immun. 59:1043– 1049. 39 Domann, E., J. Wehland, M. Rohde, S. Pistor, M. Hartl, W. Goebel, M. Leimeister-W chter, M. Wuenscher, and T. Chakraborty. 1992. A novel bacterial virulence gene in Listeria monocytogenes required for host cell microfilament interaction with homology to the prolinerich region of vinculin. EMBO J. 11:1981–1990. 40 Kocks, C., E. Gouin, M. Tabouret, P. Berche, H. Ohayon, and P. Cossart. 1992. Listeria monocytogenes-induced actin assembly requires the actA gene product, a surface protein. Cell 68:521– 531. 41 Lasa, I., V. David, E. Gouin, J. B. Marchand, and P. Cossart. 1995. The aminoterminal part of ActA is critical for the actin-based motility of Listeria monocytogenes; the central proline-rich region acts as a stimulator. Mol. Microbiol. 18:425–436. 42 Lasa, I., E. Gouin, M. Goethals, K. Vancompernolle, V. David, J. Vandekerckhove, and P. Cossart. 1997. Identification of two regions in the N-terminal domain of ActA involved in the actin comet tail formation by Listeria monocytogenes. EMBO J. 16:1531–1540. 43 Smith, G. A., J. A. Theriot, and D. A. Portnoy. 1996. The tandem repeat domain in the Listeria monocytogenes ActA protein controls the rate of actinbased motility, the percentage of moving bacteria, and the localization of vasodilator-stimulated phosphoprotein and profilin. J. Cell Biol. 135:647–660. 44 Chakraborty, T., F. Ebel, E. Domann, K. Niebuhr, B. Gerstel, S. Pistor, C. J. Temm-Grove, B. M. Jockusch, M. Reinhard, U. Walter, and J. Wehland. 1995. A focal adhesion factor directly linking intracellularly motile Listeria monocytogenes and Listeria ivanovii to the actinbased cytoskeleton of mammalian cells. EMBO J. 14:1314–1321. 45 Pistor. S., L. Grobe, A. S. Sechi, E. Domann, B. Gerstel, L. M. Machesky, T. Chakraborty, and J. Wehland. 2000. Mutations of arginine residues within the 146-KKRRK-150 motif of the ActA

361

362

16 Genomics of Listeria monocytogenes protein of Listeria monocytogenes abolish intracellular motility by interfering with the recruitment of the Arp2/3 complex. J. Cell Sci. 113:3277–3287. 46 Zalevsky, J., I. Grigorova, and R. D. Mullins. 2001. Activation of the Arp2/3 complex by the Listeria ActA protein. ActA binds two actin monomers and three subunits of the Arp2/3 complex. J. Biol. Chem. 276:3468–3475. 47 Skoble, J., D. A. Portnoy, and M. D. Welch. 2000. Three regions within ActA promote Arp2/3 complex-mediated actin nucleation and Listeria monocytogenes motility. J. Cell Biol. 150:527–538. 48 Shetron-Rama, L. M., H. Marquis, H. G. Bouwer, and N. E. Freitag. 2002. Intracellular induction of Listeria monocytogenes actA expression. Infect. Immun. 70:1087–1096. 49 Mounier, J., A. Ryter, M. CoquisRondon, and P. J. Sansonetti. 1990. Intracellular and cell-to-cell spread of Listeria monocytogenes involves interaction with F-actin in the enterocytelike cell line Caco-2. Infect. Immun. 58:1048–1058. 50 Tilney, L. G., and D. A. Portnoy. 1989. Actin filaments and the growth, movement, and spread of the intracellular bacterial parasite, Listeria monocytogenes. J. Cell Biol. 109:1597–1608. 51 Robbins, J. R., A. I. Barth, H. Marquis, E. L. de Hostos, W. J. Nelson, and J. A. Theriot. 1999. Listeria monocytogenes exploits normal host cell processes to spread from cell to cell. J. Cell Biol. 146:1333–1350. 52 Gedde, M. M., D. E. Higgins, L. G. Tilney, and D. A. Portnoy. 2000. Role of listeriolysin O in cell-to-cell spread of Listeria monocytogenes. Infect. Immun. 68:999–1003. 53 Leimeister-W chter, M., C. Haffner, E. Domann, W. Goebel, and T. Chakraborty. 1990. Identification of a gene that positively regulates listeriolysin, the major virulence factor of Listeria monocytogenes. Proc. Natl. Acad. Sci. U. S. A. 87:8336–8340. 54 Kreft J, and J.-A. Vazquez-Boland. 2001. Regulation of virulence genes in Listeria. Int. J. Med. Microbiol. 291:145– 157.

55 Luo, Q., M. Herler, S. Mller-Altrock,

56

57

58

59

60

61

62

63

and W. Goebel. 2005. Supportive and inhibitory elements of a putative PrfAdependent promoter in Listeria monocytogenes. Mol. Microbiol. 55:986–997. Freitag, N. E. and D. A. Portnoy. 1994. Dual promoters of the Listeria monocytogenes prfA transcriptional activator appear essential in vitro but are redundant in vivo. Mol. Microbiol. 12:845– 853. Rauch, M., Q. Luo, S. Mller-Altrock, and W. Goebel. 2005. SigB-dependent in vitro transcription of prfA and some newly identified genes of Listeria monocytogenes whose expression is affected by PrfA in vivo. J. Bacteriol. 187:800–804. Leimeister-W chter, M., E. Domann, and T. Chakraborty. 1992. The expression of virulence genes in Listeria monocytogenes is thermoregulated. J. Bacteriol. 174:947–952. Johansson, J., P. Mandin, A. Renzoni, C. Chiaruttini, M. Springer, and P. Cossart. 2002. An RNA thermosensor controls expression of virulence genes in Listeria monocytogenes. Cell 110:551–561. Klarsfeld, A.D., P.L. Goossens, and P. Cossart. 1994. Five Listeria monocytogenes genes preferentially expressed in infected mammalian cells: plcA, purH, purD, pyrE and an arginine ABC transporter gene arpJ. Mol. Microbiol. 13:585–597. Dubail, I., P. Berche, and A. Charbit. 2000. Listeriolysin O as a reporter to identify constitutive and in vivo-inducible promoters in the pathogen Listeria monocytogenes. Infect. Immun. 68:3242– 3250. Gahan, C.G., and C. Hill. 2000. The use of listeriolysin to identify in vivo induced genes in the gram-positive intracellular pathogen Listeria monocytogenes. Mol. Microbiol. 36:498–507. Kunst, F., N. Ogasawara, I. Moszer, A. M. Albertini, G. Alloni, V. Azevedo, M. G. Bertero, P. Bessieres, A. Bolotin, S. Borchert, R. Borriss, L. Boursier, A. Brans, M. Braun, S. C. Brignell, S. Bron, S. Brouillet, C. V. Bruschi, B. Caldwell, V. Capuano, N. M. Carter, S. K. Choi, J. J. Codani, I. F. Connerton, N. J. Cummings, R. A. Daniel, F. Deni-

References zot, K. M. Devine, A. Dsterh ft, S. D. Ehrlich, P. T. Emmerson, K. D. Entian, J. Errington, C. Fabret, E. Ferrari, D. Foulger, C. Fritz, M. Fujita, Y. Fujita, S. Fuma, A. Galizzi, N. Gallerton S.-Y. Ghim, P. Glaser, A. Goffeau, E. J. Golightly, G. Grandi, G. Guiseppi, B. J. Guy, K. Haga, J. Haiech, C. R. Harwood, A. Hnaut, H. Hilbert, S. Holsappel, S. Hosono, M.-F. Hullo, M. Itaya, L. Jones, B. Joris, D. Karamata, Y. Kasahara, M. Klaerr-Blanchard, C. Klein, Y. Kobayashi, P. Koetter, G. Koningstein, S. Krogh, M. Kumano, K. Kurita, A. Lapidus, S. Lardinois, J. Lauber, V. Lazarevic, S.-M. Lee, A. Levine, H. Liu, S. Masuda, C. Mauel, C. Mdigue, N. Medina, R. P. Mellado, M. Mizuno, D. Moestl, S. Nakai, M. Noback, D. Noone, M. O’Reilly, K. Ogawa, A. Ogiwara, B. Oudega, S.-H. Park, V. Parro, T. M. Pohl, D. Portetelle, S. Porwollik, A. M. Prescott, E. Presecan, P. Pujic, B. Purnelle, G. Rapoport, M. Rey, S. Reynolds, M. Rieger, C. Rivolta, E. Rocha, B. Roche, M. Rose, Y. Sadaie, T. Sato, E. Scanlan, S. Schleich, R. Schroeter, F. Scoffone, J. Sekiguchi, A. Sekowska, S. J. Seror, P. Serror, B.-S. Shin, B. Soldo, A. Sorokin, E. Tacconi, T. Takagi, H. Takahashi, K. Takemaru, M. Takeuchi, A. Tamakoshi, T. Tanaka, P. Terpstra, A. Tognoni, V. Tosato, S. Uchiyama, M. Vandenbol, F. Vannier, A. Vassarotti, A. Viari, R. Wambutt, E. Wedler, H. Wedler, T. Weitzenegger, P. Winters, A. Wipat, H. Yamamoto, K. Yamane, K. Yasumoto, K. Yata, K. Yoshida, H.-F. Yoshikawa, E. Zumstein, H. Yoshikawa, and A. Danchin. 1997. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390:249–256. 64 Buchrieser, C., C. Rusniok, F. Kunst, P. Cossart, P. Glaser and the Listeria Consortium 2003. Comparison of the genome sequences of Listeria monocytogenes and Listeria innocua: clues for evolution and pathogenicity. FEMS Immunol. Med. Microbiol. 35:207–213. 65 Cabanes, D., P. Dehoux, O. Dussurget, L. Frangeul, and P. Cossart. 2002. Surface proteins and the pathogenic poten-

tial of Listeria monocytogenes. Trends Microbiol. 10:238–245. 66 Chico-Calero, I., M. Suarez, B. Gonzalez-Zorn, M. Scortti, J. Slaghuis, W. Goebel, and J.-A. Vazquez-Boland. 2002. Hpt, a bacterial homolog of the microsomal glucose- 6-phosphate translocase, mediates rapid intracellular proliferation in Listeria. Proc. Natl. Acad. Sci. USA 99:431–436. 67 Cole, S. T., R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. Gordon, K. Eiglmeier, S. Gas, C. E. Barry 3rd, F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh, J. McLean, S. Moule, L. Murphy, K. Oliver, J. Oosborne M. A. Quail, M.-A. Rajandream, J. Rogers, S. Rutter, K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead, and B. G. Barrell. 1998. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393:537–544. 68 Blattner, F. R., G. Plunkett III, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Collado-Vides, J. D. Glasner, C. K. Rode, G. F. Mayhew, J. Gregor, N. W. Davis, H. A. Kirkpatrick, M. A. Goeden, D. J. Rose, B. Mau, and Y. Shao. 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453– 1462. 69 Stover, C. K., X. Q. Pham, A. L. Erwin, S. D. Mizoguchi, P. Warrener, M. J. Hickey, F. S. Brinkman, W. O. Hufnagle, D. J. Kowalik, M. Lagrou, R. L. Garber, L. Goltry, E. Tolentino, S. Westbrock-Wadman, Y. Yuan, L. L. Brody, S. N. Coulter, K. R. Folger, A. Kas, K. Larbig, R. Lim, K. Smith, D. Spencer, G. K. Wong, Z. Wu, I. T. Paulsen, J. Reizer, M. H. Saier, R. E. Hancock, S. Lory, and M. V. Olson. 2000. Complete genome sequence of Pseudomonas aeruginosa PA01, an opportunistic pathogen. Nature 406:959–964. 70 Doumith, M., C. Cazalet, N. Simoes, L. Frangeul, C. Jacquet, F. Kunst, P. Martin, P. Cossart, P. Glaser, and C. Buchrieser. 2004. New aspects

363

364

16 Genomics of Listeria monocytogenes

71

72

73

74

75

76

77

78

79

regarding evolution and virulence of Listeria monocytogenes revealed by comparative genomics and DNA arrays. Infect. Immun. 72:1072–1083. Borezee, E., T., Msadek, L. Durant, and P. Berche. 2000. Identification in Listeria monocytogenes of MecA, a homologue of the Bacillus subtilis competence regulatory protein. J. Bacteriol. 182:5931–5934. Kreft, J., M. Dumbsky, and S. Theiss. 1995. The actin-polymerization protein from Listeria ivanovii is a large repeat protein which shows only limited amino acid sequence homology to ActA from Listeria monocytogenes. FEMS Microbiol. Lett. 126:113–121. Schmid, M. W., E. Y. Ng, R. Lampidis, M. Emmerth, M. Walcher, J. Kreft, W. Goebel, M. Wagner, and K. H. Schleifer. 2005. Evolutionary history of the genus Listeria and its virulence genes. Syst. Appl. Microbiol. 28:1–18. Chakraborty, T., T. Hain, and E. Domann. 2000. Genome organization and the evolution of the virulence gene locus in Listeria species. Int. J. Med. Microbiol. 290:167–174. Ng, E. Y. W. 2001. How did Listeria monocytogenes become pathogenic? Dissertation, University of Wrzburg, Wrzburg, Germany. Gouin, E., J. Mengaud, and P. Cossart. 1994. The virulence gene cluster of Listeria monocytogenes is also present in Listeria ivanovii, an animal pathogen, and Listeria seeligeri, a non-pathogenic species. Infect. Immun. 62:3550–3553. Kreft, J., J.-A. Vazquez-Boland, S. Altrock, G. Dominguez-Bernal, and W. Goebel. 2002. Pathogenicity islands and other virulence elements in Listeria. Cur. Top. Microbiol. Immunol. 264:109–125. Ward, T.J., L. Gorski, M. K. Borucki, R. E. Mandrell, J. Hutchins, and K. Pupedis. 2004. Intraspecific phylogeny and lineage group identification based on the prfA virulence gene cluster of Listeria monocytogenes. J. Bacteriol. 186:4994–5002. Dramsi, S., P. Dehoux, M. Lebrun, P. L. Goossens, and P. Cossart. 1997. Identification of four new members of the

81

82

83

84

85

86

87

88

89

90

internalin multigene family of Listeria monocytogenes EGD. Infect. Immun. 65:1615–1625. Reglier-Poupet, H., C. Frehel, I. Dubail, J. L. Beretti, P. Berche, A. Charbit, C. Raynaud. 2003. Maturation of lipoproteins by type II signal peptidase is required for phagosomal escape of Listeria monocytogenes. J. Biol. Chem. 278:49469–49477. Reglier-Poupet, H., E. Pellegrini, A. Charbit, and P. Berche. 2003. Identification of LpeA, a PsaA-like membrane protein that promotes cell entry by Listeria monocytogenes. Infect. Immun. 71:474–482. Cabanes, D., O. Dussurget, P. Dehoux, and P. Cossart. 2004. Auto, a surface associated autolysin of Listeria monocytogenes required for entry into eukaryotic cells and virulence. Mol. Microbiol. 51:1601–1614. Dramsi, S., F. Bourdichon, D. Cabanes, M. Lecuit, H. Fsihi, and P. Cossart. 2004. FbpA, a novel multifunctional Listeria monocytogenes virulence factor. Mol. Microbiol. 53:639–649. Portnoy, D. A., P. S. Jacks, and D. J. Hinrichs. 1988. Role of hemolysin for the intracellular growth of Listeria monocytogenes. J. Exp. Med. 167:1459–1471. Marquis, H., H. G. Bouwer, D. J. Hinrichs, and D. A. Portnoy. 1993. Intracytoplasmic growth and virulence of Listeria monocytogenes auxotrophic mutants. Infect. Immun. 61:3756–3760. Goetz, M., A. Bubert, G. Wang, I. ChicoCalero, J.-A. Vazquez-Boland, M. Beck, J. Slaghuis, A. A. Szalay, and W. Goebel. 2001. Microinjection and growth of bacteria in the cytosol of mammalian host cells. Proc. Natl. Acad. Sci. U. S. A. 98:12221–12226. O’Riordan, M., M. A. Moors, and D. A. Portnoy. 2003. Listeria intracellular growth and virulence require host-derived lipoic acid. Science 302:462–464. Slaghuis, J., M. Goetz, F. Engelbrecht, and W. Goebel. 2004. Inefficient replication of Listeria innocua in the cytosol of mammalian cells. J. Infect. Dis. 189:393–401. Hardy, J., K. P. Francis, M. DeBoer, P. Chu, K. Gibbs, and C. H. Contag.

References

91

92

93

94

95

96

97

98

2004. Extracellular replication of Listeria monocytogenes in the murine gall bladder. Science 303:851–853. Dussurget, O., D. Cabanes, P. Dehoux, M. Lecuit, C. Buchrieser, P. Glaser, and P. Cossart; European Listeria Genome Consortium. 2002. Listeria monocytogenes bile salt hydrolase is a PrfA-regulated virulence factor involved in the intestinal and hepatic phases of listeriosis. Mol. Microbiol. 45:1095–1106. Begley, M., R. D. Sleator, C. G. Gahan, and C. Hill. 2005. Contribution of three bile-associated loci, bsh, pva, and btlB, to gastrointestinal persistence and bile tolerance of Listeria monocytogenes. Infect. Immun. 73:894–904. Sleator, R. D., H. H. Wemekamp-Kamphuis, C. G. Gahan, T. Abee, and C. Hill. 2005. A PrfA-regulated bile exclusion system (BilE) is a novel virulence factor in Listeria monocytogenes. Mol. Microbiol. 55:1183–1195. Cotter, P. D., N. Emerson, C. G. Gahan, and C. Hill. 1999. Identification and disruption of lisRK, a genetic locus encoding a two-component signal transduction system involved in stress tolerance and virulence in Listeria monocytogenes. J. Bacteriol. 181:6840–6843. Flanary, P. L., R. D. Allen, L. Dons, and S. Kathariou. 1999. Insertional inactivation of the Listeria monocytogenes cheYA operon abolishes response to oxygen gradients and reduces the number of flagella. Can. J. Microbiol. 45:646–652. Autret, N., C. Raynaud, I. Dubail, P. Berche, and A. Charbit. 2003. Identification of the agr locus of Listeria monocytogenes: role in bacterial virulence. Infect. Immun. 71:4463–4471. Kallipolitis, B. H., H. Ingmer, C. G. Gahan, C. Hill, and L. Sogaard-Andersen. 2003. CesRK, a two-component signal transduction system in Listeria monocytogenes, responds to the presence of cell wall-acting antibiotics and affects b-lactam resistance. Antimicrob. Agents Chemother. 47:3421–3429. Williams, T., S. Bauer, D. Beier, and M. Kuhn. 2005. Construction and characterisation of Listeria monocytogenes mutants with in-frame deletions in the response regulator genes identified in

the genome sequence. Infect. Immun. 73:3152–3159. 99 Knudsen, G. M., J. E. Olsen, and L. Dons. 2004. Characterization of DegU, a response regulator in Listeria monocytogenes, involved in regulation of motility and contributes to virulence. FEMS Microbiol. Lett. 240:171–179. 100 Raux, E., H. L. Schubert, M. J. Warren. 2000. Biosynthesis of cobalamin (vitamin B12): a bacterial conundrum. Cell. Mol. Life Sci. 57:1880–1893. 101 Duche, O., F. Tremoulet, A. Namane, J. Labadie; European Listeria Genome Consortium. 2002. A proteomic analysis of the salt stress response of Listeria monocytogenes. FEMS Microbiol. Lett. 215:183–188. 102 Phan-Thanh, L. 2002. Proteomic analysis of response to acid in Listeria monocytogenes. Methods Enzymol. 358:256– 276. 103 Wemekamp-Kamphuis, H. H., J. A. Wouters, P. P. de Leeuw, T. Hain, T. Chakraborty, and T. Abee. 2004. Identification of sigma factor sigma B-controlled genes and their impact on acid stress, high hydrostatic pressure, and freeze survival in Listeria monocytogenes EGD-e. Appl. Environ. Microbiol. 70:3457–3466. 104 Helloin, E., L. Jansch, L. Phan-Thanh. 2003. Carbon starvation survival of Listeria monocytogenes in planktonic state and in biofilm: a proteomic study. Proteomics 3:2052–2064. 105 Weeks, M. E, D. C. James, G. K. Robinson, and C. M. Smales. 2004. Global changes in gene expression observed at the transition from growth to stationary phase in Listeria monocytogenes ScottA batch culture. Proteomics 4:123–135. 106 Folio, P., P. Chavant, I. Chafsey, A. Belkorchia, C. Chambon, and M. Hebraud. 2004. Two-dimensional electrophoresis database of Listeria monocytogenes EGDe proteome and proteomic analysis of mid-log and stationary growth phase cells. Proteomics 4:3187–3201. 107 Ramnath, M., K. B. Rechinger, L. Jansch, J. W. Hastings, S. Knochel, and A. Gravesen. 2003. Development of a Listeria monocytogenes EGDe partial pro-

365

366

16 Genomics of Listeria monocytogenes

108

109

110

111

112

113

teome reference map and comparison with the protein profiles of food isolates. Appl. Environ. Microbiol. 69:3368–3376. Schaumburg, J., O. Diekmann, P. Hagendorff, S. Bergmann, M. Rohde, S. Hammerschmidt, L. Jansch, J. Wehland, and U. K rst. 2004. The cell wall subproteome of Listeria monocytogenes. Proteomics 4:2991–3006. Calvo, E., M. G. Pucciarelli, H. Bierne, P. Cossart, J. Pablo Albar, and F. Garcia del Portillo. 2005. Analysis of the Listeria cell wall proteome by two-dimensional nanoliquid chromatography coupled to mass spectrometry. Proteomics 5:433–443. Williams, T., B. Joseph, D. Beier, W. Goebel, and M. Kuhn. 2005. Response regulator DegU of Listeria monocytogenes regulate the expression of flagella-specific genes. FEMS Microbiol. Lett. DOI: 10.1016/j.femsle.2005.09.011 Dramsi, S., C. Kocks, C. Forestier, and P. Cossart. 1993. Internalin-mediated invasion of epithelial cells by Listeria monocytogenes is regulated by the bacterial growth state, temperature and the pleiotropic activator prfA. Mol. Microbiol. 9:931–941. Milohanic, E., P. Glaser, J. Y. Coppee, L. Frangeul, Y. Vega, J.-A. VazquezBoland, F. Kunst F, P. Cossart, and C. Buchrieser. 2003. Transcriptome analysis of Listeria monocytogenes identifies three groups of genes differently regulated by PrfA. Mol. Microbiol. 47:1613–1625. Rea, R. B., C. G. Gahan, and C. Hill. 2004. Disruption of putative regulatory loci in Listeria monocytogenes demon-

114

115

116

117

118

119

strates a significant role for Fur and PerR in virulence. Infect. Immun. 72:717–727. Arous, S., C. Buchrieser, P. Folio, P. Glaser, A. Namane, M. Hebraud, and Y. Hechard. 2004. Global analysis of gene expression in an rpoN mutant of Listeria monocytogenes. Microbiology 150:1581–1590. Kazmierczak, M. J., S. C. Mithoe, K. J. Boor, and M. Wiedmann. 2003. Listeria monocytogenes sigma B regulates stress response and virulence functions. J. Bacteriol. 185:5722–5734. Borucki, M. K., S. H. Kim, D. R. Call, S. C. Smole, and F. Pagotto. 2004. Selective discrimination of Listeria monocytogenes epidemic strains by a mixed-genome DNA microarray compared to discrimination by pulsed-field gel electrophoresis, ribotyping, and multilocus sequence typing. J. Clin. Microbiol. 42:5270–5276. Borucki, M. K., M. J. Krug, W. T. Muraoka, and D. R. Call. 2003. Discrimination among Listeria monocytogenes isolates using a mixed genome DNA microarray. Vet. Microbiol. 92:351–362. Call, D. R., M. K. Borucki, and T. E. Besser. 2003. Mixed-genome microarrays reveal multiple serotype and lineagespecific differences among strains of Listeria monocytogenes. J. Clin. Microbiol. 41:632–639. Call, D. R., M. K. Borucki, and F. J. Loge. 2003. Detection of bacterial pathogens in environmental samples using DNA microarrays. J. Microbiol. Methods 53:235–243.

367

III Genomics of Pathogens and Their Hosts: Applications

369

17 Genomics of Viruses Esteban Domingo, Alejandro Brun, Jos Ignacio Nuez, Juan Cristina, Carlos Briones, and Cristina Escarms

17.1 Introduction: Wide Scope of Virogenomics

The advent of molecular biology techniques made possible the determination of the nucleotide sequence of the entire genomes of many organisms and their cellular and subcellular parasites. The new knowledge that has been generated has revolutionized biology in general and, obviously, also our understanding of viruses and the diseases associated with them. The main techniques that have permitted penetration of the genome structure and organization of viruses are reverse transcription of RNA genomes into cDNA copies, molecular cloning of viral DNA and cDNA into amplification and expression vectors, rapid nucleotide sequencing, polymerase chain reaction (PCR) of viral DNA, and, for RNA genomes, the combination of reverse transcription and PCR (RT-PCR). More recently, nucleic acid hybridization methodology has been extended, both in membrane and microarray format, for the analysis of viral nucleic acids and proteins. A suitable choice of oligonucleotide primers or primers with nucleotide changes permits the construction of virtually any variant form of a gene or a regulatory region containing insertions, deletions, or point mutations. Thus, molecular-biology-based methodology has opened the way to identifying precisely the nucleotide sequence of viral genomes and, even more rewarding, to introduce this information (as found in nature or with designed modifications) into the intracellular milieu. This reverse genetic approach is a powerful tool for investigating the relationships between structure (of nucleic acids or encoded proteins) and function and, more generally, between genotype and phenotype. PCR-based techniques also permit also the retrieval of sequences from biological specimens containing tiny amounts of viruses, and from ancient samples that may still contain remains of viruses [although nucleic acids, like other nitrogen-rich and phosphorus-rich molecules, are unlikely to be preserved in old sediments, and thus we may never find nucleic acids in old fossils, i.e. those older than one million years before the present (> 1 Ma)]. Endogenous retroviruses that have survived as integrated DNA in the genomes of mammalian species can be regarded as fossil versions of

370

17 Genomics of Viruses

viruses, and their sequences may be of help in defining early viral forms, and in tracing the evolution of some mammalian lineages. The virogenomics revolution has had impact on several additional fronts. In molecular epidemiology, it has allowed the precise molecular identification of viruses isolated in the field, leading to the calculation of genetic distances from viral relatives and to the establishment of phylogenetic relationships among viruses displaying various degrees of relatedness (further discussed in the next sections). This information, in turn, can often help in tracing the possible origin of a viral disease outbreak and in defining the molecular changes (mutation, recombination, segment reassortment) associated with the emergence and progression of a viral disease. Such progression can have different time–size scales. It may relate to an outbreak on a small community scale, or up to a worldwide scale, as in the current AIDS or many historical influenza pandemics. It may also relate to the progression of viral disease symptoms and associated viral genomic changes within single infected individuals, as during chronic HIV or hepatitis C virus infections (see Section 17.4 for further information on this issue). Comparisons with parallel information on the genomes of host organisms, that is, the comparison of phylogenies of viruses and their hosts, may identify coadaptation events, such as between herpesviruses and their corresponding vertebrate hosts. On an entirely different line of activity, comparative genomics has opened the fields of gene delivery and genetic therapy, with several groups of viruses [retroviruses, herpesviruses, adenoviruses, parvoviruses (adeno-associated viruses), as well as some RNA viruses] playing key roles as vectors. By reverse genetics and functional studies, essential viral genes can be distinguished from nonessential genes, and the latter can be replaced by marked tags to follow the fate of the viral construct inside organisms, cells, or subcellular compartments. For appropriate targeting, a knowledge of nucleotide sequences of vectors and their target genomes will be increasingly important. Genes encoding proteins that recognize receptors located on cell subsets, tissues, or organs may be introduced into viral vectors for specific gene delivery to the desired cells. Despite the many technical difficulties, this is a highly promising area of research. Nonessential genes can also be replaced by heterologous viral antigens to evoke immune responses; an appropriate choice of vector and the antigenic determinants to be expressed may direct the immune response to be predominantly cellular, humoral, systemic, or mucosal. Several such constructs are currently under investigation as candidate antiviral vaccines. Genomics is providing new insight into the nature of viral populations, how they replicate and evolve in infected organisms, and how evolutionary events can be associated with disease. Comparison of nucleotide sequences of the individual genomes that compose a viral population (obtained by sequencing molecular clones) has revealed that RNA viruses (which amount to about 75% of all viruses that have been identified) consist of collections of nonidentical but closely related genomes, termed viral quasispecies. Although the quasispecies concept originated in theoretical physics, its experimental confirmation came from research in virology, as a remarkable application of the very early developments in virogenomics

17.2 Retrieving Information

(see Section 17.4). Quasispecies dynamics has provided an interpretation of the great adaptability of RNA viruses (and also of some DNA viruses that follow the same dynamics), and adaptability has proven to be a relevant element for viral pathogenesis. Disease progression is in some cases associated with in-host evolution of the viral pathogen. Quasispecies dynamics has also contributed a theoretical framework to adjust antiviral prevention and therapeutic strategies to the nature of the viral populations to be controlled. The very same theoretical foundations of quasispecies have opened a new antiviral design termed “virus entry into error catastrophe”. It consists in forcing a virus to mutate beyond a tolerable level, thereby leading to extinction of the virus. Current results with several viruses are increasingly encouraging research on potential clinical applications of error catastrophe. In all these processes of evolutionary adaptation or collapse of viruses, genomics is permitting close monitoring of the underlying molecular events. In even more general terms, the comparative genomics of viruses and their host organisms – which amount to most biological phyla from the three lineages, Eukarya, Bacteria, and Archaea – promises new insights into many evolutionrelated areas of activity: patterns of diversification and extinction, the implication of viral-like sequences in shaping the human genome (40% of which is made of mobile elements or their relics!), deeper characterization of environmental microbiology, developments in agriculture (transgenic animals and plants), as well as new practical applications that have been grouped under the heading of evolutionary biotechnology. The reader will find updated accounts of concepts related to the molecular genetics of viruses summarized in this introduction, general genomics and their evolutionary implications, as well as developments of new antiviral strategies in several chapters of Ref. [1–7], and in several articles (e.g., Refs. [8, 9]).

17.2 Retrieving Information

Determination of complete or partial nucleotide sequences of viral genomes is proceeding at an astonishing rate. For example, the European Molecular Biology Laboratory (EMBL) included over 247 610 entries for viral sequences at the time of writing of this article (January 2005), representing less than 1% of all sequence entries. The rate of sequence determination greatly exceeds the rate at which their functional significance can be established: chemistry outstrips biology. This avalanche of sequence information of viral genomes necessitates appropriate quality control, annotation, and organization of data, as well as easy accessibility by scientists interested in disparate aspects of genome organization, function of regulatory regions and encoded proteins, and viral evolution in general. A number of databases useful for virogenomics and their main features are summarized in Table 17.1.

371

372

17 Genomics of Viruses

Tab. 17.1 Some nucleotide sequence data banks for viruses.

Database

URL

Contents

EMBL Nucleotide Sequence Database

http://www.ebi.ac.uk/embl/ Access/index.html

All reported sequences. General database

GenBank, the NIH genetic sequence database

http://www.ncbi.nlm.nih.gov/ Genbank/GenbankSearch.html

All reported sequences. General database

DNA Data Bank of Japan

http://www.ddbj.nig.ac.jp/

All reported sequences. General database

Viral Genomes Project

http://www.ncbi.nlm.nih.gov/ genomes/VIRUSES/viruses.html

Complete or nearly complete viral genome sequences. Additional information

The Influenza Sequence Database

http://www.flu.lanl.gov/

Sequences, tools for the analysis of hemagglutinin and neuraminidase sequences

Picornavirus Sequence Database

http://www.iah.bbsr.ac.uk/virus/ picornaviridae/SequenceDatabase/ index.html

Sequences and specific references for different genera

Potyvirus Database

http://www.danforthcenter.org/iltab/ potyviridae/

(under construction) Taxonomy, references, and sequence databases of members of the Potyviridae family

HIV sequence database

http://www.hiv.lanl.gov/content/ hiv-db/mainpage.html

Sequences, drug resistance. Molecular immunology and vaccine trials. Analysis tools

Calcivirus sequence database

http://www.iah.bbsrc.ac.uk/virus/ Caliciviridae/database.htm

Sequences, information, and specific references for different calcivirus isolates

Classical swine fever virus

http://viro08.tiho-hannover.de/eg/ eurl_virus_db.htlm

Sequences and phylogenetic analyses (needs authorization)

ds RNA viruses

http://www.iah.bbsrc.ac.uk/ dsRNA_virus_proteins/index.html

Sequences and information including specific references

HPV sequence database

http://hpv-web.lanl.gov/stdgen/ virus/hpv/

Human papillomavirus, sequences, analysis, and alignment tools

17.2 Retrieving Information

373

Tab. 17.1 Continued.

Database

URL

Contents

HIV RT and protease sequence database

http://hivdb.stanford.edu/

Sequences, genotype– phenotype and genotype–antiretroviral treatment correlations, sequence analysis tools

Hepatitis C virus database

http://hepatitis.ibcp.fr

Sequences and genome analysis tools

Hepatitis B and C virus database

http://s2as02.genes.nig.ac.jp/

Hepatitis B and C sequences

Poxvirus Bioinformatics Research Center

http://www.poxvirus.org

Sequence analysis genomes

VIRGO

http://www.athena.bioc.uvic.ca/ genomes

Pox and herpes genomes

Human endogenous retroviruses

http://herv.img.cas.cz/

Human endogenous retroviruses, and genome analysis tools

VIDA

http://www.biochem.ucl.ac.uk/ bsm/virus_database/VIDA.html

Homologous protein families from herpes, pox, papilloma, coronaviruses, and arteriviruses

Subviral RNA

http://subviral.med.uottawa.ca/ cgi-bin/home.cgi

Sequences and prediction of RNA secondary structures

One of the promising sources of sequence information for viruses is the Viral Genomes Project, involving the National Center for Biotechnology Information (NCBI) at the National Institutes of Health, USA. The URL for the database is http://www.ncbi.nlm.nih.gov/genomes/VIRUSES/viruses.html (for reviews see Refs. [10, 11]). The reasons for creating the Viral Genomes Project have been the increasing number of viral nucleotide sequences deposited in data banks, uncertain quality of some of the early sequences, and the great variability of viral genomes. This variability has two distinct manifestations: the different types of nucleic acid structures (DNA or RNA, single-stranded or double-stranded, linear or circular, segmented or unsegmented), and the great variation in genomic nucleotide sequences among viruses generally and also within individual, taxonomically defined viruses [12].

374

17 Genomics of Viruses

Genetic heterogeneity and potential for rapid evolution reach their maximum manifestation in the quasispecies structure of RNA viruses, in which a distribution of mutants constitutes what was classically called the “wild type” virus (implications of quasispecies for data banks are discussed in Section 17.4). Extreme diversity in genome structure and the presence of mutant spectra in a majority of viral isolates complicates sequence records and may cause inaccuracies in the transfer of information to data banks [10]. Complete sequencing of complex viral genomes has prompted the development of virus-specific or viral family databases that differ in format and contents, and that may include phenotypic traits derived from sequence information; for example, RNA structure predictions (reviewed in several chapters of Refs. [13] and [7]), mutations related to antiviral drug resistance, etc. The construction of homologous protein families (HPFs) is also a feature included in some databases, allowing insight in virus protein functions. As an example, the VIDA database contains a collection of HPFs derived from open reading frames from complete and partial virus genomes [14].

17.3 Applications of Data Banks to Virology

Web sites should be chosen depending on the objective of the study (Table 17.1). Applications of nucleotide sequence data sets for one virus group or several groups for comparative purposes include calculation of genetic distances and establishment of phylogenetic relationships using methodology that has been well established in evolutionary biology, identification of functional domains in viral nucleic acids and proteins, taxonomic classification of viruses, and information for defining viral disease emergence and reemergence. Essential to any phylogenetic analysis is an accurate sequence alignment. One method commonly used is the progressive alignment method. In this, the most closely related sequences are aligned first and then the progressively more divergent ones are added, allowing the introduction of gaps at those positions not present in all sequences [15]. A useful program for the alignment of viral genome sequences is CLUSTAL W [16], available in the main databases, as for instance, DDBJ (DNA Data Bank of Japan, where an expansion of CLUSTAL W has been recently created). If the number of sequences and their size are appropriate, a sequence alignment can be obtained online. To use CLUSTAL W it is important that the sequences be written in a suitable format, such as FASTA. There are programs used to visualize sequences obtained from an automated sequencer, and to write sequences in FASTA format directly from the chromatograms (for instance, CHROMAS: http://www.technelysium.com.au/chromas.htlm). Since different useful programs make use of different formats, it is important to use programs that permit transforming the presentation of sequences from one format into another; examples are SEQ-CONVERT (http://hcv.lanl.gov/content/ hcv-db/SEQCONVERT/seqconvert.htlm) and the EMBOSS sequence conversion

17.3 Applications of Data Banks to Virology

site (http://ngfnblast.gbf.de/). Phylogenetic reconstructions will generally not be possible for distantly related viral sequences. When a minimum genetic relatedness occurs, the three main groups of methods used to derive evolutionary trees are maximum parsimony, distance, and maximum likelihood (reviews in Refs. [7, 17]). Maximum parsimony predicts the minimum number of mutational steps required to produce the observed variation from the ancestral sequences. Most programs assume the existence a molecular clock, which is especially questionable for viruses, in particular for RNA viruses subjected to variations in the levels of population equilibrium (with unpredictable effects on the variations in consensus sequences, which are often the ones entered in the phylogenetic analyses). The method is most appropriate for sequences that have a high degree of similarity. It is time-consuming because often all possible trees are examined before a consensus tree may be produced. The main programs for maximum parsimony analysis in the PHYLIP package (J. Felsenstein’s server: http://evolution.genetics.washington.edu/phylip.htlm) are: DNAPARS, DNAPENNY (which limits the number of trees searched), DNAMOVE, DNACOMP (useful when the rate of evolution varies among sites), and other programs, some of which consider only transversion mutations. PROTPARS analyses protein sequences and does not score silent (synonymous) mutations. Other available programs include PAUP (phylogenetic analysis using parsimony) (D. Swofford’s server: http://paup.csit.fsu.edu), MacClade, MESQUITE, and EMBOSS, among others. Distance methods are based on the degree of difference (genetic distance) among pairs of sequences of a multiple sequence alignment, which is converted into a distance matrix. They are commonly used in molecular biology and can handle a large number of sequences. As the genetic distance increases, a correction for multiple step mutations should be applied, and different correction methods are currently available (e.g., Kimura two-parameter distance [18]). Distance methods are suitable when branch lengths vary, and in most distance methods the results are not significantly altered when a molecular clock does not operate. Several distance methods are included in different software packages, as for instance, PHYLIP, MEGA 2 (or its recent upgrade MEGA 3 [19]). Among them, the neighbor-joining (NJ) method does not assume a molecular clock and produces an unrooted tree, while the unweighted pair group method with arithmetic mean (UPGMA) assumes a molecular clock and produces a rooted tree. The least squares method can be used either without assuming a molecular clock (FITCH program) or with the assumption of an evolutionary clock (KITCH program). Bayesian statistics (based on conditional probabilities derived by Bayes’ rule) have been used to provide best estimates of evolutionary distances between two nucleotide sequences [20, 21]. Computationally, bayesian methods are more practical than maximum likelihood methods. Maximum likelihood methods use probability calculations to derive a branching pattern from the mutations found at different positions of the nucleic acids under comparison. They can be used to estimate both distances and the best mutational pathway between sequences, and are appropriate when sequences are diverse, and

375

376

17 Genomics of Viruses

they have been used to analyze mutations in overlapping reading frames of viral genomes. They are computationally very complex because all possible trees are considered and, thus, the methods are limited to small numbers of sequences unless supercomputers are used. Maximum likelihood methods are included in the program PAML and in several programs of the PHYLIP package. Recently, the program TREE-PUZZLE [22] has been developed, and it allows maximum likelihood analyses to be performed in personal computers, without requiring long computing times. A number of problems must be carefully evaluated prior to the application of phylogenetic methods to address any biological problem, and assumptions must be recognized in relation to the aims of the study [23]. As mentioned above, a problem is the treatment of gaps due to insertions or deletions in sequence alignments. Some programs ignore gaps while others treat them as substitutions. The difficulty will be posed with divergent viral sequences but not for closely related genomes such as in viral quasispecies (Section 17.4). Some phylogenetic analyses assume that the rates of evolution at different tree branches are the same (the molecular clock operates). As indicated above, this assumption is questionable for viruses, but it allows prediction of the root of a tree (for different points of view on the operation of a molecular clock during virus evolution, see Refs. [24–26]). An unrooted tree depicts the relationships among sequences but does not provide information of a possible common ancestor of the group. In considering the choice of a phylogenetic method, when sequences show strong similarity (as in viral quasispecies), parsimony or maximum likelihood methods are preferred. With limited sequence similarity, distance methods are appropriate, although maximum likelihood methods can be used to analyze regions of localized similarity. When sequence variation is high, alternative multiple alignment methods can be tried (for example, global, progressive or iterative programs, or local alignment of protein motifs). It is always advisable to use two different phylogenetic methods to analyze the same set of sequences, and examine whether the tree topologies obtained are equivalent. In doing this, it must be taken into consideration whether the phylogenetic analysis assumes the operation of the molecular clock. In addition, a bootstrap resampling of data is recommended to assess the statistical robustness of the trees obtained by any phylogenetic method. It usually involves 100 to 1000 data sets and gives confidence values to each branching point, those with bootstrap coefficients higher than 0.8 per unit being statistically supported [27]. M. Eigen, A. Dress and colleagues developed a method based on statistical geometry as a means to obtaining information (at the level of nucleic acids or proteins) on the common ancestor of a set of sequences [28–30]. A common branching point of a set of sequences is confined within a portion of sequence space (a concept described in Section 17.4), and distances for the different known sequences are calculated. This method has been applied to very different analyses, such as the dating of ancestral t-RNA sequences, the divergence of eukaryotes through comparison of cytochrome c sequences, and defining relationships with-

17.3 Applications of Data Banks to Virology

in highly variable viruses such as influenza virus or human immunodeficiency virus (overview in Ref. [31]). Some computer programs have been developed to identify natural groupings of closely related sequences, and they have found an application to the analysis of viral quasispecies. These programs are described in Section 17.4. An important application of virogenomics is in the identification of functional domains in the viral nucleic acids and encoded proteins (for example, see Ref. [32]). These include a large number of cis-acting and trans-acting functional motifs. For brevity only a few are mentioned here: origins of replication, promoters, transcription termination signals, intergenic regions, splice sites, polyadenylation signals, ribosome-binding sites, protein-binding domains, enzymatic active sites (polymerases, proteases, kinases, etc.), types of protein domains (coiled coils, etc.), nuclear localization or nuclear export signals, and many other under study in molecular virology because they play key roles in the life cycle of viruses. These searches should take into account the multifunctionality of many viral proteins as well as the fact that many viral protein precursors can have functions other than (or additional to) the functions of the corresponding mature, processed proteins. Once identified, these specific domains may guide in the identification of complete regulatory regions and entire proteins or protein precursors with a predicted function. The sequences can then be compared with those of related and unrelated taxonomic groups of viruses included in the data banks. Many viruses still remain unclassified for various reasons, and new groups are continuously being approved (see successive editions of the report of the International Committee on Taxonomy of Viruses, e.g., seventh report [12]; at the time of this writing, the eighth report is in press]. Therefore, virus sequence data banks may help in defining new taxonomic groups and in assigning unclassified viruses to existing groups. Incoherent grouping of regulatory regions and functional proteins may indicate ancient or recent recombination events. Even with the limited sequence data sets that were available more than a decade ago, sequence comparisons revealed with reasonable certainty that some animal viruses originated by recombination between two different parental viruses. For example, western equine encephalitis virus was probably generated as a result of a recombination event involving a virus related to eastern equine encephalitis virus and some New World relative of Sindbis virus [33, 34]. Recent recombination events are playing a crucial role in HIV-1 diversification in the human population [35]. Recombination may produce high-fitness genomes from two debilitated parents or may generate new combinations of genomic sequences with potential evolutionary novelty. As additional sequences enter the virus data sets, statistical procedures may unveil additional historical and recent recombination events, thereby contributing to clarification of the natural history of viral pathogens. Programs to identify putative recombinant sequences include SimPlot (http://sray.med.som.jhmi.edu/RaySoft/ SimPlot/; examples of application in Refs. [36, 37]) and LARD [38, 39]. Information on the types of evolutionary forces that have acted in the diversification of viral genomes can be obtained by analyzing the types of mutations that distinguish a set of homologous sequences (number of transitions versus trans-

377

378

17 Genomics of Viruses

versions; presence of insertions and deletions; number of nonsynonymous versus synonymous or silent substitutions). Many viral polymerases tend to introduce transitions at a much higher frequency than transversions during template copying [40, 41]. Thus, the proportion of transversion mutations may increase with the evolutionary distance between the sequences under comparison. The ratio of nonsynonymous mutations (per nonsynonymous site) (dn) to synonymous mutations (per synonymous site) (ds) may indicate positive selection in the diversification of sequences under comparison (dn > ds). However, this assumes that synonymous mutations have a higher probability than nonsynonymous mutations to be selectively neutral. Yet multiple lines of evidence indicate that coding regions in RNA viruses may serve functions other than protein coding (cis-acting regulatory elements, structural roles affected by third base residues in codons, etc.). An additional cautionary note was unveiled by Wain-Hobson and colleagues when they were able to document evidence of positive selection (dn > ds) in an experimental setting in which positive selection was impossible [42]. With these limitations in mind, the reader can quantitate mutation types and calculate data for dn, ds in programs such as SNAP (http://hcv.lanl.gov/content/hcv-db/SNAP/SNAP.html; examples in Refs. [39, 43]) and K-estimator [44] (http://www.biology.viowa.edu/ comeron/page3.html; examples in Refs. [37, 45]). Comparison of regulatory regions and proteins that belong to viral genomes and are also found in cellular genomes can provide information on the possible sharing of regulatory elements and functional modules. Similarities and differences between functional elements of viruses and different phyla of cellular organisms are relevant to the origin of viruses, to coevolutionary processes between cells and viruses, and to defining lateral (also called horizontal) gene transfer events in general evolution. Different theories on the origin of viruses have been proposed (reviewed in Ref. [46]). They include (a) viruses as descendants of primitive RNA or RNA-like replicons that preceded formation of the first cells; (b) viruses as the result of regressive evolution of complex, cell-like microbial forms; (c) viruses as descendants of cellular DNA or RNA, or of subcellular organelles; and (d) viruses as ancient autonomous, cell-dependent genetic elements that originated simultaneously with cellular organizations and have coevolved with them. The comparative genomics of cells from organisms belonging to different kingdoms and of viruses may eventually permit the defining of coevolutionary pathways, and further support or modify current evidence that viruses have survived as mediators of lateral gene transfer and have served as a selective force to promote cellular evolution [46–48]. Defining viral disease emergence and reemergence is not a simple task since several interwoven factors participate in a highly unpredictable fashion (reviews in Refs. [49–52]). In addition to multiple sociological and ecological factors, the adaptive potential and dynamics of viruses play an essential role in viral disease emergence. Surveillance of human and animal viruses with zoonotic potential [53] is regarded as essential for early detection of viral pathogens. As soon as viral nucleotide sequences associated with a disease emergence are determined, comparisons with sequences from the data banks may point to the reemergence of a

17.4 Beyond Reference Strains: Towards a Second-Generation Virogenomics?

known viral pathogen, or identify the emergence of a virus that previously was known to infect unrelated hosts, or even suggest the presence of an entirely new (previously unidentified) viral entity. Thus, data banks for viral genomic sequences will find broad and important applications.

17.4 Beyond Reference Strains: Towards a Second-Generation Virogenomics?

As part of the organization of the increasing number of viral genomic nucleotide sequences, in most data banks one or a few reference sequences are chosen to represent a virus species. Other, related sequences may become “genome neighbors” of the reference sequence [10]. Therefore the basis for comparative genomics that we have outlined in Section 17.3 will imply a set of reference sequences, on the understanding that “neighbor” sequences cannot obscure the main findings based on phylogenetic relationships or searches for domains, motifs, open reading frames, and so forth. Most relevant for evolutionary studies, reference viral proteins are appropriate for incorporation in clusters of related viral proteins that have been constructed or are in preparation (e.g., http://www.ncbi.nlm.nih.gov/ genomes/VIRUSES/vog.html) [10]. Despite its appropriateness for several purposes related to general biological evolution, a point which is often ignored is the extreme biological relevance of minimal genetic changes in virus biology. A wealth of evidence (for reviews, see Refs. [2, 4, 6]) indicates that one or a few nucleotide substitutions may be sufficient to produce a relevant biological alteration in the virus. A phylogenetic tree in which the tips of the branches are mutant clouds rather than defined sequences illustrates this point (Fig. 17.1). This is particularly true for RNA viruses (or viruses which include an RNA step in their replication cycle). During their replica–4 –5 tion, RNA viruses mutate at average rates of 10 to 10 mutations per nucleotide 5 copied [54]. This represents rates per nucleotide site which are about 10 -fold higher than the rates operating normally during the replication of cellular DNA. This constant mutational input (between 0.1 and 1 mutations per genome are produced every time a template RNA is copied into a daughter RNA or DNA strand) originates highly dynamic mutant distributions termed viral quasispecies. This concept was developed on theoretical basis by M. Eigen, P. Schuster, and their colleagues to describe primitive RNA (or RNA-like) replicons at the onset of life on Earth [31, 55–57]. Determination of the quantitative parameters that underlie the Darwinian principles of genetic variation, competition, and selection has been achieved with model experiments on replication of simple RNA templates in vitro [58, 59]. Quasispecies theory has been instrumental to understanding the adaptive dynamics of RNA viruses. Presently, virologists use an extended definition of “quasispecies” to describe dynamic distributions of nonidentical but closely related mutant and recombinant viral genomes subjected to a continuous process of genetic variation, competition, and selection, and which act as a unit of selection [60]. This general definition captures theoretical developments that have

379

380

17 Genomics of Viruses

extended quasispecies to nonequilibrium conditions and regards mutation and recombination as sources of genetic variation (recent reviews of quasispecies theory and its implications for virology are found in Refs. [1, 2, 6, 61]). Genetic variation, together with environmental heterogeneity and bottleneck events, underlie the diversification of viruses within hosts and between hosts. Sequence space, a concept derived from information theory, describes all possible nucleotide sequences available to a genetic system. Virus adaptation can be viewed as the result of movements in sequence space to reach points of replicative competence [31, 56]. Connectivity between points of the sequence space is key to the adaptive potential of a genetic system. The main reason why a virus generally maintains its biological identity in terms of pathogenic potential, host range, etc., is the operation of structural and functional constraints that have been well documented for several viruses [62]. However, occasional deviations from prototypic behavior occur, and such deviations may be relevant to disease manifestations and disease emergence [63, 51]. In reality, all known viruses consist of multitudes of related sequences. Viral quasispecies are typical not only of RNA viruses but also of some DNA viruses [64–66]. A relevant consequence of quasispecies is that the behavior of an individual genome, with a defined nucleotide sequence, may be conditioned by the mutant cloud that surrounds it. This was initially proposed as a result of in silico experiments as a derivation of quasispecies theory (reviewed in Ref. [56]), and confirmed by several observations with viruses in cell culture and in vivo [67–71]. A

Fig. 17.1 A rooted phylogenetic tree with branches of different lengths that define related viral genomes. Each virus (tip of a branch) includes in reality a cloud of mutants that despite showing close relatedness may nevertheless express widely different biological properties. On the right are listed a number of variant phenotypes frequently seen in viruses.

17.4 Beyond Reference Strains: Towards a Second-Generation Virogenomics?

current working model is that components of the mutant spectra expressing suboptimal viral functions may act collectively as dominant-negative mutants and, therefore, interfere with the replication of fitter genomes within the same replicative ensemble. A concerted action of dominant-negative mutants provides a biochemical basis for modulating effects of mutant spectra, predicted by quasispecies theory [56]. This model is finding increasing experimental support [71–73]. How are we to reconcile the reality of the population structure of many important viruses – which dictates that relevant biological traits may depend on quantitatively minor changes in the genome – with the general principles of data banks centered around “reference sequences”? Does it make sense to attempt the “neighbor” sequences to form new data sets or subsections of the more general data sets? For what purpose, and how? These are questions without an easy answer. A computer program named Partition Analysis of Quasispecies (PAQ) was developed to identify natural groupings of nucleotide or amino acid sequences that are very similar, as found in mutant spectra of viral quasispecies. It is a nonhierarchical clustering method that partitions sequences in spherical groups, allowing for overlapping groups to occur [74] (program files http:// www.vetmed.iastate.edu/faculty_staff/Users/carplab/PAQ/main.html). The program assumes that the less distant sequences should be grouped together. A radius is selected and each sequence is used as a center to define spherical clusters. One relevant output of the program is compactness, which characterizes the number of variants surrounding the center of the cluster. Increasingly smaller radii can be selected to search for subgroups. This and other clustering techniques are less dependent on evolutionary models than other phylogenetic methods described in Section 17.3. PAQ has been applied to the analysis of envelope sequences of human immunodeficiency virus isolated from different brain regions of infected patients, and to rev sequences of a sample of equine infectious anemia virus [74]. There are some reasons for wanting minority genomes to enter the data banks. One obvious one is that modifications of the host range may relate to minor genetic change independent of the phylogenetic position of the virus. This is more likely to occur with RNA viruses because of their generalized high mutation rates, and quasispecies behavior that results in adaptability through point mutations [75]. A second reason is the presence of memory genomes in viral quasispecies, documented in cell culture [76–79] and in vivo [80]. Memory genomes are a subset of the genomes found in mutant spectra of viral quasispecies that reflect those genomes that were dominant at an early stage of the evolution of the same viral lineage. Since memory genomes often represent variants with interesting phenotypic traits (at variance with the traits of the dominant populations in which they are immersed), their detection and recording in data banks is highly relevant. Microarray technology suitable for detecting memory genomes in viral quasispecies is summarized in Section 17.5. Our proposal is that understanding of the complexity of viral populations and the relationships between limited genetic change and modification of phenotypic traits would benefit from second-generation data banks in which mutant viruses

381

382

17 Genomics of Viruses

can be organized around their respective standard sequences as in the PAQ program [74]. The concept is parallel to cataloguing single nucleotide polymorphisms of the human genome, particularly in relation to genetic disease. Implementation of “second-generation” data banks with point mutations of viral genomes would not, however, be without difficulties. One problem derives from the complex relationships between genotype and phenotype. A current example may serve to illustrate this point. The catalogue of mutations in the HIV-1 genome that are associated with decreased sensitivity to antiretroviral inhibitors (the most frequent being reverse transcriptase and protease inhibitors) has steadily increased with time. It is now evident that combinations of mutations, in a sequence-context-dependent manner, may produce different degrees of resistance to one or several inhibitors. For this reason, a data bank relating point mutations to phenotypes of viruses (such as http://hivdb.stanford.edu/ or other data banks listed in Table 17.1) will have to be periodically updated and will unavoidably suffer of some degree of uncertainty. A second problem may be still more severe: quasispecies behavior may depend on ensembles of mutants (those that constitute mutant spectra), as described previously. Such behavior is often difficult to predict despite knowledge about the types of individual genomes that dominate the mutant spectra. It seems clear that considerable scientific progress will be needed before complex interactions among individual viral genomes that produce a range of phenotypic traits can be incorporated into data sets.

17.5 Virogenomics Through Microarrays

Virogenomics also benefits from recent technological developments of molecular biology and biotechnology. During the last decades, spectacular advances have occurred in the hybridization techniques of nucleic acids and proteins, together with innovations in chemistry and nanotechnology that allow the immobilization of different kinds of molecules onto surfaces. This has permitted the development of a number of biosensors. Among them, rapid advances have been produced in the so-called “microarrays” or “biochips,” in which thousands of probe molecules (nucleic acids, proteins, or others) are covalently linked to a solid substrate (generally glass), in arrays of points of about 100 lm in diameter. Target DNA or protein molecules present in a natural sample are usually detected after fluorescent labeling of the sample, allowing the identification of its interaction point on the array by means of a high-resolution scanner. Protein microarrays can be used to study different kinds of protein–protein or protein–ligand interactions. Among the most specific and sensitive bioaffinitybased protein microarrays are those that use antibodies as capture probes to detect the presence of an antigen in a sample [81]. In the field of virology, these technologies are currently used as a diagnostic tool to characterize the presence of antigenic viral proteins [82] as well as to monitor the antiviral antibody responses of the host organism [83].

17.5 Virogenomics Through Microarrays

In their turn, DNA microarrays have been extensively developed and have found several applications, including in: (a) the study of the gene expression profiles in organisms (e.g., of genes altered as a result of a pathogenic state or of infection); (b) the genotyping and detection of mutations or polymorphisms in cellular or viral genes; (c) the sequencing of genomes; (d) the detection of microorganisms in a natural sample; and (e) the study of the intramolecular interactions in nucleic acids (reviewed in Ref. [84]). The use of DNA microarray technology to genotype viral genomes is of special interest for virogenomics. This kind of microarrays are constructed by immobilizing thousands of single-stranded DNA oligonucleotide probes, by means of the addition of a reactive group at their 5¢ ends, to allow covalent binding of the probes to the chemically modified glass. The length of the ssDNA probes is generally 11–20 nucleotides, and their sequences are chosen (after careful examination of the viral sequences available in data banks) to allow detection of mutations and single nucleotide polymorphisms in the genomes under study. After the extraction of the viral genome and its amplification (via PCR, RT-PCR, or other techniques), target sequences are fluorescently labeled and hybridized to the microarray, which is then washed under controlled conditions. The detection of fluorescence in one point of the array means that the genome harbors the complementary sequence to that of the oligonucleotide at the corresponding position, while the molecules which are not perfectly hybridized, even those involving only a single mismatch, are washed [84–86]. This technology allows the characterization of a large number of mutations and single nucleotide polymorphisms in a single hybridization experiment, as has been documented with different viruses [87, 88]. In the analysis of complex populations such as viral quasispecies, an outstanding advantage of DNA microarray technology with respect to conventional genotypic methods is their ability to detect minority genomes within the population (such as memory genomes, described in Section 17.4). Genotypic techniques (based on the consensus sequencing or the hybridization to nitrocellulose strips of the viral population), like phenotypic analyses, are unable to detect a minority population of genomes in a complex mixture if it represents less than 20–30% of the total [89, 90]. In contrast, DNA microarrays allow the detection of minority genomes in a complex mixture even at frequencies as low as 1% [91] or even 0.1% when microarray technology is combined with mass spectrometry [92, 93]. Other examples of the use of DNA microarrays in virology include the typing and subtyping of influenza viruses [94], rotaviruses [95], or other viruses [96], the detection of molecular recombination in polioviruses [97], and the determination of genomic RNA structure in HIV-1 and hepatitis C virus genomes [98, 99]. Therefore, microarray-based technologies constitute one of the most promising tools in the investigation of the wide scope of virogenomics. A transdisciplinary integration of knowledge from different fields of research has turned virogenomics into an exciting area from which to penetrate the immense world of viruses, their origins, and their behavior.

383

384

17 Genomics of Viruses

References 1 Domingo, E. 2005. Microbial evolution

and emerging diseases. In: Emerging Neurological Infections. C. Power and R. T. Johnson, eds. Boca Raton, Taylor and Francis p. 1–34. 2 Domingo, E., ed. 2005. Viruses as Quasispecies: Biological Implications. Curr. Top. Microbiol. Immunol., Vol. 299. 3 Knipe, D. M. and P. M. Howley, eds. 2001. Fields Virology. Philadelphia, Lippincott Williams and Wilkins. 4 Flint, S. J., L. W. Enquist, V. R. Racaniello and A. M. Skalka. 2004. Principles of Virology: Molecular Biology, Pathogenesis, and Control of Animal Viruses. Washington, DC, ASM Press. 5 Domingo, E., R. G. Webster and J. J. Holland, eds. 1999. Origin and Evolution of Viruses. San Diego, Academic Press. 6 Domingo, E., C. Biebricher, M. Eigen and J. J. Holland. 2001. Quasispecies and RNA Virus Evolution: Principles and Consequences. Austin, Landes Bioscience. 7 Mount, D. W. 2004. Bioinformatics. Sequence and Genome Analysis. Cold Spring Harbor, New York, Cold Spring Harbor Laboratory Press. 8 Lander, E. S. 1996. The new genomics: global views of biology. Science 274:536–539. 9 Woese, C. R. 2004. A new biology for a new century. Microbiol. Mol. Biol. Rev. 68:173–186. 10 Bao, Y., S. Federhen, D. Leipe, V. Pham, S. Resenchuk, M. Rozanov, R. Tatusov and T. Tatusova. 2004. National center for biotechnology information viral genomes project. J. Virol. 78:7291–7298. 11 Wheeler, D. L., et al. 2003. Database resources of the National Center for Biotechnology. Nucleic Acids Res. 31:28– 33. 12 van Regenmortel, M. H. V, C.M. Fauquet, D.H.L Bishop, et al., eds. 2000. Virus Taxonomy. Seventh Report of the International Committee on Taxonomy of Viruses. San Diego, Academic Press. 13 Gesteland, R. F., T. R. Cech and J. F. Atkins, eds. 1999. The RNA World. Cold

Spring Harbor, New York, Cold Spring Harbor Laboratory Press. 14 Alba, M. M., D. Lee, F. M. Pearl, A. J. Shepherd, N. Martin, C. A. Orengo and P. Kellam. 2001. VIDA: a virus database system for the organization of animal virus genome open reading frames. Nucleic Acids Res. 29:133–136. 15 Feng, D. F. and R. F. Doolittle. 1996. Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. Methods Enzymol. 266:368–382. 16 Thompson, J. D., D. G. Higgins and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680. 17 Saitou, N. 1996. Reconstruction of gene trees from sequence data. Methods Enzymol. 266:427–449. 18 Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120. 19 Kumar, S., K. Tamura and M. Nei. 2004. MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform. 5:150–163. 20 Zhu, J., J. S. Liu and C. E. Lawrence. 1998. Bayesian adaptive sequence alignment algorithms. Bioinformatics 14:25– 39. 21 Huelsenbeck, J. P., F. Ronquist, R. Nielsen and J. P. Bollback. 2001. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:2310–2314. 22 Schmidt, H. A., K. Strimmer, M. Vingron and A. von Haeseler. 2002. TREEPUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502– 504. 23 Doolittle, W. F. 1999. Phylogenetic classification and the universal tree. Science 284:2124–2129.

References 24 Jenkins, G. M., A. Rambaut, O. G.

Pybus and E. C. Holmes. 2002. Rates of molecular evolution in RNA viruses: a quantitative phylogenetic analysis. J. Mol. Evol. 54:156–165. 25 Posada, D. 2001. Unveiling the molecular clock in the presence of recombination. Mol. Biol. Evol. 18:1976–1978. 26 Liu, Y., D. C. Nickle, D. Shriner, M. A. Jensen, G. H. Learn, Jr., J. E. Mittler and J. I. Mullins. 2004. Molecular clock-like evolution of human immunodeficiency virus type 1. Virology 329:101–108. 27 Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:738–791. 28 Eigen, M. 1987. New concepts for dealing with the evolution of nucleic acids. Cold Spring Harb Symposia in Quantitative Biology 52:307–320. 29 Eigen, M., B. F. Lindemann, M. Tietze, R. Winkler-Oswatitsch, A. Dress and A. von Haeseler. 1989. How old is the genetic code? Statistical geometry of tRNA provides an answer. Science 244:673–679. 30 Dopazo, J., A. Dress and A. von Haeseler. 1993. Split decomposition: a technique to analyze viral evolution. Proc. Natl. Acad. Sci. U. S. A. 90:10320– 10324. 31 Eigen, M. 1992. Steps towards life. Oxford University Press. 32 Gorbalenya, A. E. 2001. Big nidovirus genome. When count and order of domains matter. Adv. Exp. Med. Biol. 194:1–17. 33 Hahn, C. S., S. Lustig, E. G. Strauss and J. H. Strauss. 1988. Western equine encephalitis virus is a recombinant virus. Proc. Natl. Acad. Sci. U. S. A. 85:5997–6001. 34 Weaver, S. C., A. Hagenbaugh, L. A. Bellew, L. Gousset, V. Mallampalli, J. J. Holland and T. W. Scott. 1994. Evolution of alphaviruses in the eastern equine encephalomyelitis complex. J. Virol. 68:158–169. 35 Thomson, M. M., L. Perez–Alvarez and R. Najera. 2002. Molecular epidemiology of HIV–1 genetic forms and its significance for vaccine development and therapy. Lancet Infect. Dis. 2:461–471.

36 Lole, K. S., R. C. Bollinger, R. S. Para-

37

38

39

40

41

42

43

44

45

njape, D. Gadkari, S. S. Kulkarni, N. G. Novak, R. Ingersoll, H. W. Sheppard and S. C. Ray. 1999. Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J. Virol. 73:152– 160. Colina, R., D. Casane, S. Vasquez, L. Garcia-Aguirre, A. Chunga, H. Romero, B. Khan and J. Cristina. 2004. Evidence of intratypic recombination in natural populations of hepatitis C virus. J. Gen. Virol. 85:31–37. Holmes, E. C., M. Worobey and A. Rambaut. 1999. Phylogenetic evidence for recombination in dengue virus. Mol. Biol. Evol. 16:405–409. Korber, B. 2000. HIV sequence signatures and similarities. In: Computational and Evolutionary Analysis of HIV Molecular Sequences. A. Rodrigo and G. H. Learn, Jr., eds. Dordrecht, Kluwer. Domingo, E., D. Sabo, T. Taniguchi and C. Weissmann. 1978. Nucleotide sequence heterogeneity of an RNA phage population. Cell 13:735–744. Kuge, S., N. Kawamura and A. Nomoto. 1989. Strong inclination toward transition mutation in nucleotide substitutions by poliovirus replicase. J. Mol. Biol. 207:175–182. Vartanian, J. P., M. Henry and S. WainHobson. 2001. Simulating pseudogene evolution in vitro: determining the true number of mutations in a lineage. Proc. Natl. Acad. Sci. U. S. A. 98:13172– 13176. Cristina, J., F. Lopez, G. Moratorio, L. Lopez, S. Vasquez, L. Garcia-Aguirre and A. Chunga. 2005. Hepatitis C virus F protein sequence reveals a lack of functional constraints and a variable pattern of amino acid substitution. J. Gen. Virol. 86:115–120. Comeron, J. M. 1999. K-Estimator: calculation of the number of nucleotide substitutions per site and the confidence intervals. Bioinformatics 15:763– 764. Costa-Mattioli, M., V. Fevr, D. Casane, R. Perez-Beroff, M. Coste-Burel, B.-M. Imbert-Marcille, E. C. M. Andr,

385

386

17 Genomics of Viruses C. Bressollette-Bodin, S. Billaudel and J. Cristina. 2003. Evidence of recombination in natural populations of hepatitis A virus. Virology 311:51–59. 46 Domingo, E. and J. J. Holland 2005. The origin and evolution of viruses. In: Topley and Wilson’s Microbiology and Microbial Infections. 1 Virology. B. W. J. Mahy and L. Collier, eds. London and New York, Arnold and Oxford University Press, in press. 47 Baranowski, E., C. M. Ruiz–Jarabo and E. Domingo. 2001. Evolution of cell recognition by viruses. Science 292:1102– 1105. 48 Bushman, F. 2002. Lateral DNA Transfer. Mechanisms and Consequences. Cold Spring Harbor, New York, Cold Spring Harbor Laboratory Press. 49 Morse, S. S., ed. 1993. Emerging Viruses. Oxford, Oxford University Press. 50 Morse, S. S., ed. 1994. The Evolutionary Biology of Viruses. New York, Raven Press. 51 Smolinski, M. S., M. A. Hamburg and J. Lederberg, eds. 2003. Microbial Threats to Health. Emergence, Detection and Response. Washington DC, The National Academies Press. 52 Domingo, E. 2003. Host–microbe interactions: viruses. Complexities of virus– cell interactions. Curr. Opin. Microbiol. 6:383–385. 53 Krauss, H., A. Weber, M. Appel, B. Enders, H. D. Isenberg, H. G. Schiefer, W. Sleczka, v. Graevenitz and H. Zahner. 2003. Zoonoses. Infectious diseases transmissible from animals to humans. Washington DC, ASM Press. 54 Drake, J. W. and J. J. Holland. 1999. Mutation rates among RNA viruses. Proc. Natl. Acad. Sci. U. S. A. 96:13910– 13913. 55 Eigen, M. and P. Schuster. 1979. The hypercycle. A principle of natural selforganization. Berlin, Springer. 56 Eigen, M. and C. K. Biebricher 1988. Sequence space and quasispecies distribution. In: RNA Genetics, Vol. 3. E. Domingo, P. Ahlquist and J. J. Holland, eds. Boca Raton, CRC Press. p. 211–245.

57 Schuster, P. and P. F. Stadler 1999.

58

59

60

61

62

63

64

65

66

67

Nature and evolution of early replicons. In: Origin and Evolution of Viruses. E. Domingo, R. G. Webster and J. J. Holland, eds. San Diego, Academic Press, pp 1–24. Biebricher, C. K. 1999. Mutation, competition and selection as measured with small RNA molecules. In: Origin and Evolution of Viruses. E. Domingo, R. G. Webster and J. J. Holland, eds. San Diego, Academic Press, pp 65–85. Rohde, N., H. Daum and C. K. Biebricher. 1995. The mutant distribution of an RNA species replicated by Qb replicase. J. Mol. Biol. 249:754–762. Domingo, E. 1999. Quasispecies. In: Encyclopedia of Virology. A. Granoff and R. G. Webster, eds. London, Academic Press, pp 1431–1436. Eigen, M. 2000. Natural selection: a phase transition? Biophys. Chem. 85:101–123. Simmonds, P. and D. B. Smith. 1999. Structural constraints on RNA virus evolution. J. Virol. 73:5787–5794. Holland, J. J., J. C. de La Torre and D. A. Steinhauer. 1992. RNA virus populations as quasispecies. Curr. Top. Microbiol. Immunol. 176:1–20. Isnard, M., M. Granier, R. Frutos, B. Reynaud and M. Peterschmitt. 1998. Quasispecies nature of three maize streak virus isolates obtained through different modes of selection from a population used to assess response to infection of maize cultivars. J. Gen. Virol. 79:3091–3099. Lpez-Bueno, A., M. G. Mateu and J. M. Almendral. 2003. High mutant frequency in populations of a DNA virus allows evasion from antibody therapy in an immunodeficient host. J. Virol. 77:2701–2708. Peng, X., A. Kessler, H. Phan, R. A. Garrett and D. Prangishvili. 2004. Multiple variants of the archaeal DNA rudivirus SIRV1 in a single host and a novel mechanism of genomic variation. Mol. Microbiol. 54:366–375. Borrego, B., I. S. Novella, E. Giralt, D. Andreu and E. Domingo. 1993. Distinct repertoire of antigenic variants of foot-and-mouth disease virus in the

References presence or absence of immune selection. J. Virol. 67:6071–6079. 68 de la Torre, J. C. and J. J. Holland. 1990. RNA virus quasispecies populations can suppress vastly superior mutant progeny. J. Virol. 64:6278–6281. 69 Chumakov, K. M., L. B. Powers, K. E. Noonan, I. B. Roninson and I. S. Levenbook. 1991. Correlation between amount of virus with altered nucleotide sequence and the monkey test for acceptability of oral poliovirus vaccine. Proc. Natl. Acad. Sci. U. S. A. 88:199– 203. 70 Teng, M. N., M. B. Oldstone and J. C. de la Torre. 1996. Suppression of lymphocytic choriomeningitis virus-induced growth hormone deficiency syndrome by disease-negative virus variants. Virology 223:113–119. 71 Gonzlez-Lpez, C., A. Arias, N. Pariente, G. Gmez-Mariano and E. Domingo. 2004. Preextinction viral RNA can interfere with infectivity. J. Virol. 78:3319–3324. 72 Grande-Prez, A., E. Lzaro, P. Lowenstein, E. Domingo and S. C. Manrubia. 2005. Suppression of viral infectivity through lethal defection. Proc. Natl. Acad. Sci. U.S.A. 102:4448–4452. 73 Grande-Prez, A., G. Gmez-Mariano, P. Lowenstein and E. Domingo. 2005. Mutagenesis-induced, large fitness variations with an invariant consensus genomic nucleotide sequence. J. Virol. 79:10451–10459. 74 Baccam, P., R. J. Thompson, O. Fedrigo, S. Carpenter and J. L. Cornette. 2001. PAQ: partition analysis of quasispecies. Bioinformatics 17:16–22. 75 Baranowski, E., C. M. Ruz-Jarabo, N. Pariente, N. Verdaguer and E. Domingo. 2003. Evolution of cell recognition by viruses: a source of biological novelty with medical implications. Adv. Virus Res. 62:19–111. 76 Ruiz–Jarabo, C. M., A. Arias, E. Baranowski, C. Escarms and E. Domingo. 2000. Memory in viral quasispecies. J. Virol. 74:3543–3547. 77 Ruiz–Jarabo, C. M., A. Arias, C. MolinaPars, C. Briones, E. Baranowski, C. Escarms and E. Domingo. 2002. Duration and fitness dependence of quasis-

78

79

80

81

82

83

84

85

86

87

88

pecies memory. J. Mol. Biol. 315:285– 296. Ruiz-Jarabo, C. M., E. Miller, G. GmezMariano and E. Domingo. 2003. Synchronous loss of quasispecies memory in parallel viral lineages: a deterministic feature of viral quasispecies. J. Mol. Biol. 333:553–563. Arias, A., C. M. Ruiz–Jarabo, C. Escarmis and E. Domingo. 2004. Fitness increase of memory genomes in a viral quasispecies. J. Mol. Biol. 339:405–412. Briones, C., E. Domingo and C. MolinaPars. 2003. Memory in retroviral quasispecies: experimental evidence and theoretical model for human immunodeficiency virus. J. Mol. Biol. 331:213– 229. Zhu, H., M. Bilgin, R. Bangham, et al. 2001. Global analysis of protein activities using proteome chips. Science 293:2101–2105. Bacarese-Hamilton, T., L. Mezzasoma, A. Ardizzoni, F. Bistoni and A. Crisanti. 2004. Serodiagnosis of infectious diseases with antigen microarrays. J. Appl. Microbiol. 96:10–17. Neuman de Vegvar, H. E. and W. H. Robinson. 2004. Microarray profiling of antiviral antibodies for the development of diagnostics, vaccines, and therapeutics. Clin. Immunol. 111:196–201. Shena, M., Ed. 2000. Microarray biochip technology. Sunnyvale, Calif, Eaton Publishing. Southern, E. M., S. C. Case-Green, J. K. Elder, M. Johnson, K. U. Mir, L. Wang and J. C. Williams. 1994. Arrays of complementary oligonucleotides for analysing the hybridisation behaviour of nucleic acids. Nucleic Acids Res. 22:1368–1373. Hacia, J. G., W. Makalowski, K. Edgemon, M. R. Erdos, C. M. Robbins, S. P. Fodor, L. C. Brody and F. S. Collins. 1998. Evolutionary sequence comparisons using high-density oligonucleotide arrays. Nat. Genet. 18:155–158. Kozal, M. J., et al. 1996. Extensive polymorphisms observed in HIV-1 clade B protease gene using high-density oligonucleotide arrays. Nat. Med. 2:753–759. Amexis, G., P. Oeth, K. Abel, A. Ivshina, F. Pelloquin, C. R. Cantor, A. Brau and

387

388

17 Genomics of Viruses K. Chumakov. 2001. Quantitative mutant analysis of viral quasispecies by chip-based matrix- assisted laser desorption/ ionization time-of-flight mass spectrometry. Proc. Natl. Acad. Sci. U. S. A. 98:12097–12102. 89 Van Laethem, K., et al. 1999. Phenotypic assays and sequencing are less sensitive than point mutation assays for detection of resistance in mixed HIV-1 genotypic populations. J. Acquir. Immune Defic. Syndr. 22:107–118. 90 Wilson, J. W., P. Bean, T. Robins, F. Graziano and D. H. Persing. 2000. Comparative evaluation of three human immunodeficiency virus genotyping systems: the HIV-GenotypR method, the HIV PRT GeneChip assay, and the HIV-1 RT line probe assay. J. Clin. Microbiol. 38:3022–3028. 91 Gerry, N. P., N. E. Witowski, J. Day, R. P. Hammer, G. Barany and F. Barany. 1999. Universal DNA microarray method for multiplex detection of low abundance point mutations. J. Mol. Biol. 292:251–262. 92 Gilles, P. N., D. J. Wu, C. B. Foster, P. J. Dillon and S. J. Chanock. 1999. Single nucleotide polymorphic discrimination by an electronic dot blot assay on semiconductor microchips. Nat. Biotechnol. 17:365–370. 93 Tillib, S. V. and A. D. Mirzabekov. 2001. Advances in the analysis of DNA sequence variations using oligonucleotide microchip technology. Curr. Opin. Biotechnol. 12:53–58.

94 Sengupta, S., K. Onodera, A. Lai and

95

96

97

98

99

U. Melcher. 2003. Molecular detection and identification of influenza viruses by oligonucleotide microarray hybridization. J. Clin. Microbiol. 41:4542–4550. Chizhikov, V., M. Wagner, A. Ivshina, Y. Hoshino, A. Z. Kapikian and K. Chumakov. 2002. Detection and genotyping of human group A rotaviruses by oligonucleotide microarray hybridization. J. Clin. Microbiol. 40:2398–2407. Wang, D., L. Coscoy, M. Zylberberg, P. C. Avila, H. A. Boushey, D. Ganem and J. L. DeRisi. 2002. Microarray-based detection and genotyping of viral pathogens. Proc. Natl. Acad. Sci. U. S. A 99:15687–15692. Cherkasova, E., M. Laassri, V. Chizhikov, E. Korotkova, E. Dragunsky, V. I. Agol and K. Chumakov. 2003. Microarray analysis of evolution of RNA viruses: evidence of circulation of virulent highly divergent vaccine-derived polioviruses. Proc. Natl. Acad. Sci. U. S. A 100:9398– 9403. Ooms, M., K. Verhoef, E. Southern, H. Huthoff and B. Berkhout. 2004. Probing alternative foldings of the HIV1 leader RNA by antisense oligonucleotide scanning arrays. Nucleic Acids Res. 32:819–827. Martell, M., C. Briones, A. de Vicente, M. Piron, J. I. Esteban, R. Esteban, J. Guardia and J. Gomez. 2004. Structural analysis of hepatitis C RNA genome using DNA microarrays. Nucleic Acids Res. 32:e90.

389

18 Genomics of Pathogenic Fungi Gerwald A. Khler, Alan Kuo, George Newport, and Nina Agabian

18.1 Introduction

Fungi are eukaryotic microorganisms with typical nuclear membranes, but with several features that distinguish them from plants, animals, and protozoa. Like plants they contain cell walls and are with very few exceptions nonmotile; however, they lack chlorophyll. Fungi can be unicellular or multicellular to filamentous forms with sexual and/or asexual propagation. The existence of these two modes of reproduction has led to a dual nomenclature for many species, because sexual states (teleomorphs) often develop under different conditions with different morphology than their asexual counterparts (anamorphs). For example, Emericella nidulans is the teleomorph of the well-known and biotechnologically important anamorph Aspergillus nidulans. For many fungi, including some of the most important pathogenic species, the teleomorphic states are unknown or have been lost during evolution. Deuteromycota is a nonphylogenetic form phylum into which these “Fungi Imperfecti” are classified, the majority of them being anamorphs of ascomycetous fungi. In general, fungi have been classified according to morphological criteria, a phenotypic approach that still has its merits in the clinical setting. Recently, cladistic and molecular methods like sequence determination of ribosomal RNA genes or other phylogenetic markers have addressed some ambiguities in fungal phylogeny, especially in the gray zone between protozoa and fungi. For instance, molecular phylogenetic studies have brought the major opportunistic pathogen Pneumocystis carinii, formerly regarded as a protozoan, into the fungal phylum Ascomycota [1, 2]. In a similar manner, it was discovered that microsporidia are not protozoa, but rather highly specialized fungi [3]. Among the more than 70 000 known fungal species [4] known to date, only about 150 have been associated with infections of humans or animals [5]. However, with the growing number of immunocompromised individuals, new fungal pathogens are emerging steadily; recent estimates indicate a rate of 20 species per year [6]. Despite an ever-increasing list of opportunistic pathogens, however, fungal pathogenesis in humans and animals is still a young field of research, and our

390

18 Genomics of Pathogenic Fungi

understanding of fungal pathogenesis is nowhere near our understanding of bacterial pathogenesis. Recent genomics advances, especially completion of the first eukaryotic genome sequence from the fungus Saccharomyces cerevisiae in 1996 [7] and the first genome sequence of a pathogenic fungus, Candida albicans in 2004 [8], will help us to study the intricate relationship of fungi with their higher eukaryotic relatives. The following review summarizes the current status of fungal genomics, focusing on the most important human pathogens.

18.2 Genomics of Primary Fungal Pathogens

Primary fungal pathogens infect individuals who are in generally good health, may have a predisposition to infection, but are not regarded as immunocompromised. Dermatophytes and dimorphic fungi comprise the most important groups causing primary mycoses in humans. Dermatomycoses, which include athlete’s foot and ringworm as well as other skin infections, are probably the most frequent fungal infections worldwide. They are caused by dermatophytes of the genera Trichophyton, Microsporum, or Epidermophyton (see Table 18.1). Despite their medical importance, research on dermatophytes including genome sequencing is underrepresented, at least in the public sector. Since to date only one genome project is planned for dermatophyte species (T. rubrum, see Table 18.2), we reluctantly omit this group from further consideration for this review. In contrast, the other group of primary pathogens, encompassing several genera of dimorphic fungi with yeast and mycelial phases (Blastomyces, Coccidioides, Histoplasma, and Paracoccidioides), has enjoyed much more scientific attention. These fungi cause endemic mycoses in certain subtropical or tropical areas in the Americas or Africa (see Table 18.1). While reduced immune competence of the host certainly plays a role in severity or reactivation of endemic mycoses, healthy individuals can contract these infections and become critically ill. Efforts to sequence the genomes of dimorphic fungi are under way and in varying states of completion (see Table 18.2). We summarize the most advanced below. 18.2.1 Histoplasma

Histoplasma capsulatum (teleomorph: Ajellomyces capsulatus) is the causative agent of histoplasmosis, a common respiratory infection particularly associated with the Ohio and Mississippi River valleys in the United States (Table 18.1; for review see Ref. [9]). Acute infections in healthy individuals cause influenza-like symptoms and are usually self-limiting, but the fungi enter a persistent state that can be reactivated in conditions of host immunodeficiency induced by HIV infection, AIDS, cancer, or iatrogenic immunosuppression. Clinical manifestations of histoplasmosis can range from pulmonary to disseminated infections with involvement of liver, spleen, and bone marrow. In severity of disease histoplasmosis may be mild

18.2 Genomics of Primary Fungal Pathogens

to severe or even fatal. H. capsulatum is a dimorphic ascomycete with a saprophytic filamentous morphotype found in soil that is nonpathogenic, capable of heterothallic sexual reproduction, but infectious. Inhaled conidia or mycelial fragments transform into the pathogenic yeast form within the host. The dimorphic transition can be triggered reversibly under laboratory conditions by temperature: the mycelial form grows at 25 C, yeast cells at 37 C. The H. capsulatum genome is currently being sequenced from three unrelated strains of H. capsulatum var. capsulatum [10], namely two North American clinical isolates, G217B and WU24 (an isolate closely related to the Downs strain), and the Panamanian isolate G186A (genome projects, see Table 18.2). The strains G217B and G186A belong to two different “chemotypes” whose cell walls differ in polysaccharide contents [11, 12]. G217B lacks a-(1,3)-glucan in the cell wall (chemotype I), whereas large amounts of this polysaccharide are present in G186A (chemotype II). Spontaneous mutants of G186A that lack a-(1,3)-glucan are avirulent [13]. However, this glucan seems not be associated with virulence per se, as G217B is fully virulent. Strains closely related to the Downs strain-like WU24 have been isolated from AIDS patients [14], albeit Downs strains appear to be temperaturesensitive and avirulent in the standard animal models [15]. Analyses of karyotypes and phylogenetic relationships suggest that the three strains also considerably differ in the structure of their genomes (Anita Sil, personal communication), so the genome sequences will bring more clarity to their phylogenetic and phenotypic differences. The genome projects of strains G186A and G217B (see Table 18.2) are currently conducted by the Genome Sequencing Center (GSC) at the Washington University of St. Louis. Both genomes have been assembled with the help of a fosmidbased physical map [16]. Gene identification and annotation has been supported by yeast and mycelial cDNA sequencing for both strains. So far, the projects have corroborated significant differences between the genomes of G186A and G217B (Anita Sil, personal communication): the former genome is much smaller (approximately 30 Mbp; 8923 predicted genes), while the size of the G217B genome amounts to 41 Mbp (9153 predicted genes), probably largely due to higher sequence repeat content (7.5% versus 34%). Repeats sequencing and gene annotations are still in progress. The third strain, WU24, is currently being sequenced by the Fungal Genome Initiative at the Broad Institute (see Table 18.2). Before completion of the ongoing genome projects, Hwang and coworkers published a first expression analysis of mycelial and yeast cells to identify phase-specific genes using a genomic shotgun microarray representing about one-third of the genome [17]. Mycelia-specific induction was observed in genes probably involved in conidia formation, cell polarity, and melanin production. Genes induced in the yeast phase included genes involved in nutrient uptake, cell cycle regulation, and sulfur metabolism. Interestingly, the study also revealed differential regulation of transcript initiation in the two morphotypes. This observation and further anticipated difficulties in gene structure prediction (introns, lack of known homologous sequences) have prompted the use of tiling microarrays [105]. Tiling microarrays can generate invaluable data for genome annotation because

391

Filamentous fungi (molds)

Dimorphic fungi (i.e., saprophytic hyphae and pathogenic yeast form)

Yeast, pseudohyphae, and hyphae (true hyphae only formed by C. albicans, C. dubliniensis)

Dimorphic fungi

Dimorphic fungi

Dimorphic fungi

Blastomyces dermatitidis

Candida albicans, non-albicans Candida spp.

Coccidioides immitis, Co. posadasii

Paracoccidioides brasiliensis

Histoplasma capsulatum

Morphological characteristics

Aspergillus fumigatus, other Aspergillus spp.

Ascomycetes

Microorganism

Soil and guano in the Ohio and Mississippi River Valleys in US Midwest and South

Soil fungi in Mexico, Central and South America

Soil fungi in Central Valley of California, San Diego, and Baja California; Co. posadasii in North American deserts and parts of South America

C. albicans is a commensal constituent of mucosal microflora. Non-albicans species also found in other environments

Soil and guano in the Ohio and Mississippi River Valleys in US Midwest and South

Ubiquitous saprophytes in the environment

Predominant habitat(s)

Table 18.1 The most important fungal pathogens for humans and animals.

Histoplasmosis, endemic mycosis of healthy or immunocompromised individuals. Primary infection of the lung, dissemination to organs of the reticuloendothelial system possible

Paracoccidioidosis, endemic mycosis of healthy or immunocompromised individuals. Primary infection of the lung, dissemination

Coccidiomycosis or “valley fever,” endemic mycosis of healthy or immunocompromised individuals. Asymptomatic to fatal infections of the lungs that can disseminate to other organs.

Candidiasis, OIs that range from skin to deep, disseminated infections. Predominately caused by C. albicans, but non-albicans species are emerging

Blastomycosis, endemic mycosis of the lung in healthy or immunocompromised individuals

Aspergillosis, ranging from allergic bronchopulmonary aspergillosis (ABPA) and sinusitis to pulmonary and disseminated OIs

Type of infection

392

18 Genomics of Pathogenic Fungi

Mermonts (intracellular forms within host cytoplasm) and spores with highly sophisticated infection apparatus

Trophic form (asexual replication by binary fission), sexual phase with formation of precysts and mature cysts

Microsporidia, e.g., Encephalitozoon spp.

Pneumocystis spp.

OI opportunistic infection.

Cryptococcus neoformans

Yeast as typical vegetative form, can undergo filamentation

Filamentous fungi

Dermatophytes: Trichophyton spp., Microsporum spp., Epidermophyton floccosum

Basidiomycetes

Morphological characteristics

Microorganism

Soil, certain plants and fruits, avian excreta

Cryptococcosis, an OI of the central nervous system and lung. Other organs can be affected, especially in the severely immunocompromised

Primarily pulmonary OI, albeit dissemination possible

Microsporidiosis, OIs causing chronic diarrhea, wasting syndrome, keratoconjunctivitis, and many other diseases

Obligate intracellular lifestyle in other eukaryotic cells Closely associated with host cells, no environmental reservoir known

Dermatophytoses: tinea (ringworm), dermatomycoses

Type of infection

Monophyletic group of anthropophilic, zoophilic, and geophilic fungi with worldwide distribution

Predominant habitat(s)

18.2 Genomics of Primary Fungal Pathogens 393

Genome project(s)

http://www.sanger.ac.uk/Projects/A_fumigatus/ http://www.tigr.org/tdb/mdb/mdbinprogress.html

http://www.broad.mit.edu/annotation/fungi/fgi/nominated.html

http://www-sequence.stanford.edu/group/candida/index.htm, http://www.candidagenome.org/

http://www.sanger.ac.uk/Projects/C_dubliniensis/l

http://cbi.labri.fr/Genolevures/

http://www.broad.mit.edu/annotation/fungi/candida_guilliermondii/

http://www.broad.mit.edu/annotation/fungi/fgi/nominated.html

http://www.broad.mit.edu/annotation/fungi/candida_lusitaniae/

http://www.sfi.ie/content/content.asp?section_id=390&language_id=1

http://www.broad.mit.edu/annotation/fungi/candida_tropicalis/

http://www.broad.mit.edu/annotation/fungi/coccidioides_immitis/

http://www.tigr.org/tdb/mdb/mdbinprogress.html

Microorganism

Aspergillus fumigatus

Blastomyces dermatitidis

Candida albicans

C. dubliniensis

C. glabrata

C. guilliermondii

C. krusei

C. lusitaniae

C. parapsilosis

C. tropicalis

Coccidioides immitis

Co. posadasii

Table 18.2 Current public genome projects on fungi pathogenic to humans and animals.

Sequencing in progress

10 sequence assembly released

10 sequence assembly released

Genome survey in progress

9 sequence assembly released

FGI nominated candidate

12 sequence assembly released

Sequence assembly released

Sequencing in progress

Genome sequence published [8]

Fungal Genome Initiative (FGI) candidate

Sequencing in progress

Status February 2005

394

18 Genomics of Pathogenic Fungi

http://www.broad.mit.edu/annotation/fungi/fgi/nominated.html EST Genome Project

http://www.sanger.ac.uk/Projects/P_carinii/ http://pgp.cchmc.org/

http://www.broad.mit.edu/annotation/fungi/fgi/nominated.html

Paracoccidioides brasiliensis

Pneumocystis carinii

Trichophyton rubrum

FGI nominated candidate

Sequencing in progress

FGI nominated candidate [29]

Strains G186A and G217B, sequencing in progress Strain WU24, sequencing in progress

http://www.genome.wustl.edu/projects/hcapsulatum/ http://www.broad.mit.edu/annotation/fungi/fgi/

Histoplasma capsulatum

Genomes of two serotype D strains (B-3501A & JEC21) published: [40]

Genome published: [60]

http://www.broad.mit.edu/annotation/fungi/cryptococcus_neoformans/ http://www.broad.mit.edu/annotation/fungi/cryptococcus_neoformans_b/ http://www.tigr.org/tdb/e2k1/cna1/ http://www-sequence.stanford.edu/group/C.neoformans/index.html http://www.genome.ou.edu/cneo.html

Cryptococcus neoformans

Status February 2005

Encephalitozoon cuniculi http://bioinfo.hku.hk/CunicuList/help/project.html

Genome project(s)

Microorganism

18.2 Genomics of Primary Fungal Pathogens 395

396

18 Genomics of Pathogenic Fungi

they enable researchers to detect 5¢ and 3¢ ends of transcripts as well as exon/ intron boundaries [18]. Future whole-genome microarrays will help us to further discern the transcriptional responses of H. capsulatum during host macrophage infection, the dimorphic transition, and the development and germination of conidia. Matingtype-specific gene expression will also be of interest, since the – mating type seems to be more virulent than the + mating type, although both mating types are equally represented in the environment [19, 20]. In summary, genomics of H. capsulatum will provide new insights into how and why these fungi are able to depart from the saprobic lifestyle in soil and become parasitic in the mammalian host. 18.2.2 Coccidioides

Coccidioides is a haploid ascomycete found in soil of semiarid regions of the southwestern US, Mexico, and Central and South America [21, 22]. As a saprobe, this dimorphic fungus generates mycelia in its saprobic phase that produce enterothallic arthroconidia. Aerosolized arthroconidia can be inhaled and then enter the parasitic phase with differentiation into large spherules. At least in vitro, spherulation is induced by elevated temperature (34–41 C) and CO2 concentration (10– 20%); at low temperature arthrospores produce hyphae. Spherules divide internally into endospores and rupture with maturation to release thousands of endospores which in turn are capable of generating new endosporulating spherules [23]. Coccidioidomycosis is characterized by progressive pulmonary disease and/ or dissemination to virtually any body site with the exception of the gastrointestinal tract. Meningitis develops in 30–50% of cases of disseminated disease and, if left untreated, is fatal. Coccidioidomycosis can affect previously healthy individuals and, depending on the inhaled dose, genetic predisposition, and immune status of the host, the mycosis ranges from self-limiting infection to disease with high morbidity and mortality [24]. Two very closely related species cause coccidioidomycosis: Co. immitis and Co. posadasii [25]. No sexual phase is known in either species. The genomes of Co. immitis and Co. posadasii are currently being sequenced by the Broad Institute and The Institute for Genomic Research (TIGR), respectively (see Table 18.2). 18.2.3 Blastomyces and Paracoccidioides

Two additional thermally dimorphic fungi are Blastomyces dermatitidis (teleomorph Ajellomyces dermatitidis) [26] and Paracoccidioides brasiliensis [27]. Blastomycosis, endemic in the central US, may remain subclinical, but can also progress to pulmonary and/or extrapulmonary disease with primary involvement of the skin, but also bone, prostate, or central nervous system [26]. B. dermatitidis grows in the mycelial phase in soil. Inhalation of conidia from contaminated soil can lead to pulmonary infection and conversion to the pathogenic yeast form.

18.3 Genomics of Opportunistic Fungal Pathogens

Paracoccidioiomycosis is one of the most common systemic mycoses in Latin America. Saprobic mycelia of P. brasiliensis in the environment produce conidia which transform to the yeast form after inhalation [27]. The adult form of paracoccidioidomycosis occurs predominantly in immunocompetent males. Clinical presentations include primary lung infections, often with resultant fibrosis, and secondary dissemination to skin, lymph nodes, and adrenal glands. The gender bias is probably due to the inhibition of the morphological transition by estrogens [28]. The juvenile form of paracoccidioidomycosis is less frequent (< 10%), but more severe with dissemination to organs of the reticuloendothelial system and increased mortality. Untreated paracoccidioidomycosis is often fatal, current treatment regimens are prolonged, and persistence of the fungus in a dormant state over years is not uncommon. B. dermatitidis and P. brasiliensis are candidates for future genome sequencing projects at the Broad Institute (http://www.broad.mit.edu/annotation/fungi/fgi/ nominated.html). As a first step towards the genomics of P. brasiliensis an expressed sequence tag (EST) sequencing project has been established, allowing identification of putative virulence genes by homology to genes in other fungi and expression analysis of genes involved in the mycelia-to-yeast transition [29].

18.3 Genomics of Opportunistic Fungal Pathogens

Opportunistic fungi are pathogens that normally do not cause infections in the immunocompetent host, for example the normally saprophytic Aspergillus fumigatus, but cause mycoses of various grades of severity in the immunocompromised host. Candida albicans is a minor part of the normal microflora of many healthy individuals that can switch from its commensalism to a virulent state when the host becomes immunocompromised by HIV infection, malignancies, diabetes, or iatrogenic intervention in cancer therapy or transplant medicine. Another ascomycete, Pneumocystis carinii, and the basidiomycete Cryptococcus neoformans have drawn most attention as major opportunists with the advent of the AIDS epidemic. The following sections will give an overview of the genomics of the clinically most relevant opportunistic fungi.

18.3.1 Aspergillus Aspergilli are filamentous ascomycetes with many species lacking known sexual phases and therefore are grouped in the Deuteromycetes. Of the more than 180 species in the genus Aspergillus, several species emerge as pathogenic to humans and/or produce mycotoxins during food spoilage. A. fumigatus is the leading opportunistic pathogen among the aspergilli, causing about 90% of invasive infections. A. flavus, A. terreus, A. nidulans, and A. niger are less common, but the frequency of opportunistic infections by these species may be on the rise [30]. Aspergillus species are typical saprophytes, commonly recognized in their environmen-

397

398

18 Genomics of Pathogenic Fungi

tal states as molds that form aerial hyphae with conidia as asexual propagules in characteristic colors. The readily aerosolized conidia pose the infective form of the fungi, usually by uptake via the respiratory route. Clinical manifestations of Aspergillus infections are numerous and correlate with aberrant host immune responses. They range from allergic bronchopulmonary aspergillosis and sinusitis in atopic individuals to mycetoma, pulmonary aspergillosis, and disseminated aspergillosis in the severely immunocompromised. Invasive aspergilloses are especially associated with high mortality in patients with neutropenia caused by disease or medical intervention. Conidia of A. fumigatus are very small (2–3 lm in diameter) and thus easily aerosolized and less effectively barred from entry into lung tissue. Since conidia of other aspergilli are larger, spore size probably contributes to the fact that A. fumigatus is the most common species causing disease. As is emblematic of opportunistic fungal pathogens, the identification of true virulence factors is difficult for A. fumigatus. Putative virulence determinants include conidial adhesins and hydrophobicity, melanin production for scavenging of reactive oxygen species, secreted phospholipases and proteases [31] as well as production of gliotoxin for immune evasion [32]. Since none of these factors by itself appears to be essential, virulence is likely to be multifactorial. To date genome sequence information is available for four Aspergillus species: A. niger, A. oryzae, A. nidulans, and A. fumigatus (for review see Ref. [33]). A. niger and A. oryzae are species of biotechnological importance for the production of metabolites and enzymes. The latter species plays an important role in sake and soy sauce production. A. nidulans with its known sexual phase (Emericella nidulans) has been a key model organism for classical genetics. The A. niger and A. oryzae genomes, considerably larger than those of A. nidulans and A. fumigatus, are predicted to harbor approximately 14 000 genes versus 10 000 genes in the latter two species. The A. fumigatus genome with a haploid size of approx. 30 kb is comprised of eight chromosomes ranging from 1.7 to approx. 5.3 Mbp in size [34]. At the time of writing this review, annotations and genome-wide comparisons of Aspergillus genomes are still in progress. Nevertheless, genome sequence data have already led to important discoveries about the biology and genetics of the aspergilli. For example, a search of the incomplete genome sequence for A. fumigatus revealed the presence of a putative mating-type gene (MAT1-2 homologue), a pheromone gene, and two pheromone-receptor genes involved in mating in Ascomycetes [35]. Since a recent study suggests a recombining population structure in A. fumigatus [36], future research will have to clarify whether recombination events are due to past meiotic processes, a cryptic sexual stage, or parasexuality [37]. Comparative analysis of the soon to be released Aspergillus genomes will provide a powerful approach to identifying the traits that endow A. fumigatus with such broad pathogenicity, causing allergies and invasive infections.

18.3.2 Cryptococcus The basidiomycete Cryptococcus neoformans (teleomorph: Filobasidiella neoformans) primarily causes opportunistic infections of the central nervous system in immu-

18.3 Genomics of Opportunistic Fungal Pathogens

nocompromised patients. Cryptococcal meningoencephalitis is fatal if untreated [38]. C. neoformans undergoes a complete fungal life cycle characterized by vegetative growth as yeast with the capacity for dimorphic transitions to filamentous forms. Haploid budding yeast with two mating types (a and a) are the predominant environmental forms. Under nutrient limiting conditions or upon contact with mating pheromones, cell fusion between mating partners is initiated; however, at 25 C nuclear fusion is delayed and a filamentous heterokaryon is formed. This filamentous dikaryon eventually forms basidia that support karyogamy, meiosis, and basidiospore formation (for review see Ref. [39]). At 37 C haploid a and a yeast cells form a diploid yeast cell that differentiates to monokaryotic filaments at 25 C. Basidia formation and haploid a and a spores can also be produced by these filaments. In response to nitrogen limitation, desiccation, or the MFa pheromone, cells of the a mating type can form haploid monokaryotic filaments with basidia that produce a spores (haploid fruiting). Basidiospores originating from all three phases of the life cycle are able to germinate and grow as haploid cells and also are considered to be the infectious propagule due to their small size (1–2 lm). Interestingly, almost all clinical and environmental isolates are of the a mating type, indicating that sexual reproduction plays only a minor role in pathogenicity and survival of a healthy population. The formation of a polysaccharide capsule and production of the antioxidant melanin are considered important virulence factors of C. neoformans [38, 39]. Moreover, molecular genetic studies in C. neoformans identified calcineurin, a serine-threonine-specific protein phosphatase, as essential for growth at elevated temperature, and consequently calcineurin mutants are avirulent in animal models. An international consortium initiated a C. neoformans genome sequencing project in 1999 that comprises sequencing of 2 to 4 strains of these pathogenic fungi. Since strain differences as identified by serotypes A to D seem to play a crucial role in virulence [38], sequencing of several strains could provide valuable information on virulence properties. Very recently, the genomes of the two phenotypically distinct serotype D strains JEC21 and B-3501A were published [40]. These inbred strains are closely related, but differ in their virulence properties: B-3501A is more thermotolerant and more virulent in animal models than JEC21. The C. neoformans JEC21 genome sequence is 19 Mbp in size (approx. 5% rDNA repeats excluded) and consists of 14 chromosomes ranging from 762 kbp to 2.3 Mbp (see Table 18.3). The sequence of strain B-3501A is 18.5 Mbp in 14 linked assemblies. Comparative genomics revealed that 99.7% of genes are more than 98% identical, while three genes encoding a Ras GTPase-activating protein and two proteins of unknown function are specific to B-3501A. Specific to JEC21 are four genes encoding proteins of unknown function which are located among other genes on an approx. 60-kbp duplication segment that is not present in B3501A. In total, 6572 protein coding regions were identified that contain an average of 6.3 exons of 255 bp and 5.3 introns of 67 bp. The mean transcript size is 1.6 kbp. Overall, the gene structure of the basidiomycete C. neoformans is more complex than in the ascomycetes with available genomic sequences. Parallel sequencing of the ends of cDNA clones generated from JEC21 grown under var-

399

400

18 Genomics of Pathogenic Fungi

ious conditions revealed alternative splicing for 277 genes (4.2% of the transcriptome) and the presence of endogenous antisense transcripts for 53 genes of various functions [40], indicating sophisticated gene regulatory mechanisms. The functionality of antisense repression and RNA interference [41, 42] in C. neoformans suggests that dsRNA-mediated regulation plays a significant role in C. neoformans. Genome analysis also revealed new putative factors involved in virulence: additional genes were found in the mating-type (MAT) locus, which is closely associated with virulence. Furthermore, new genes were identified that could be involved in capsule biosynthesis, the major virulence factor of C. neoformans. Some of these genes are represented in gene families and appear to be restricted to basidiomycetes or a subset of fungi excluding yeasts. In summary, the C. neoformans genome reflects the phylogenetic relationship to other fungi by encoding core fungal sequences also present in the more distantly related ascomycetes like S. cerevisiae or C. albicans as well as basidiomycete-specific genes with conserved homologues in the genome of the basidiomycete Phanerochaete chrysosporium. The differences in virulence strategies of C. neoformans and C. albicans are in particular revealed by the absence of the pathways for capsule and melanin formation in C. albicans. Table 18.3 Genome characteristics of completed genomes of pathogenic

fungi and S. cerevisiae. Saccharomyces cerevisiae

Candida albicans

Candida glabrata

Cryptococcus neoformans

Encephalitozoon. cuniculi

12.16

14.9

12.28

19 (JEC21), 18.5 (B-3501A)

2.9

Number of chromosomes 16

8

13

14

11

Average G+C content (mole %)

38.3

33.5

38.8

48.6

48.5

Predicted genes

5775

6354

5283

6572

1997

Coding (as % of total)

69.9

61.5

65.0

55

nd

Mean coding length

1485

1439

1479

2195

1077

Genes with introns

5%

3.4%

nd

98.4%

0.7%

No. of introns

272

274

nd

35025

13

Gene density (bp/gene)

2182

2342

2324

2927

1255

References

[40, 73]

[40, 106]

[73]

[40]

[49, 60]

Genome size (Mbp)

18.3 Genomics of Opportunistic Fungal Pathogens

18.3.3 Pneumocystis

For decades Pneumocystis carinii was considered a protozoan while ultrastructural investigations suggested an association with fungi. Only molecular genetic techniques could finally solve the question of phylogenetic placement for this microorganism: sequencing of rRNA [43] and rDNA [1] in the late 1980s clearly grouped P. carinii with the Fungi. All further phylogenetic marker sequences and sequences derived from the ongoing genome sequencing effort have confirmed and refined the association to the ascomycetes [44]. Recently, distinct species have been described in the genus Pneumocystis, and for the human form the species designation P. jirovecii has been suggested, while P. carinii was retained for the species infecting rats [44]. Pneumocystis is considered an opportunistic pathogen; however, unlike for other opportunists, no environmental reservoir has been identified. Cryptic infections of immune-competent mammalian hosts might be the only strategy of Pneumocystis species to maintain their presence, preferably in the lung as the primary niche [45, 46]. Albeit relatively rare, disseminated pneumocystosis can affect a wide variety of host sites, especially bone marrow and spleen [47]. While the complete life cycles for any Pneumocystis species are still unknown, evidence for a biphasic life cycle for P. carinii within the mammalian alveolus exists (for review see Ref. [48]): an asexual phase replicates by binary fission, and a presumed sexual phase capable of meiosis occurs in the cyst stages. Both forms remain extracellular in close contact with host cells within the lumen of the alveoli, which eventually become compromised for gas exchange. Genetic evidence for meiosis has been provided by the Pneumocystis Genome Project [49] through the identification of homologues of meiosis-specific genes from other fungi. The genome project also indicates that P. carinii may be able to synthesize cholesterol rather than take it up from the host as P. carinii lacks ergosterol as the bulk sterol (unlike most fungi), but still contains several key enzymes for sterol synthesis. Another peculiar feature of the P. carinii genome is the presence of only a single nuclear ribosomal locus [50], whereas most fungi and other eukaryotes have multiple copies (> 100). The Pneumocystis genome sequencing effort is still ongoing (see Table 18.2). The organization of the P. carinii genome reveals a remarkable reduction in size when compared to the apathogenic yeast S. cerevisiae: 13 Mbp in 16 linear chromosomes ranging from 200 to 2200 kbp [51] versus approximately 8 Mbp in 17 linear chromosomes ranging from 300 to 700 kbp [52]. Concomitantly, gene numbers appear to be reduced, with approx. 6000 yeast genes versus approx. 4000 genes in P. carinii, while gene densities are similar at about 1 gene per 2000 bp. Introns are much more common in P. carinii, 3.7 on average per gene but only about 270 total in S. cerevisiae. Chromosome ends in P. carinii present tandem arrays of genes encoding surface antigens, the “major surface glycoproteins” (MSG genes), subtilisin-like proteases as putative processing components (PRT genes), and MSGrelated proteins (MSR genes) of unknown function [53]. MSG and MSR gene families account for about 10% of the genome [54]. Their functional roles could

401

402

18 Genomics of Pathogenic Fungi

be as surface antigens for evasion of the host immune system and/or providing adhesins for cell attachment and aggregation during mating. Completion of the genome sequencing project for P. carinii, predicted to be by early 2005 (http://pneumocystis.cchmc.org), and the efforts of the Fungal Genome Initiative on the genomes of P. jirovecii and P. carinii f.sp. muris (http://wwwgenome.wi.mit.edu/annotation/fungi/fgi/) will certainly advance our understanding of the biology and pathogenesis of these enigmatic fungi. 18.3.4 Microsporidia

With more than 1000 species described to date, microsporidia are a large and diverse group of intracellular parasites, virtually restricted to infecting animals (for review see Ref. [3]). While some infect multiple distantly related hosts, most microsporidia are specific to one host or a closely related group of animals. Microsporidiosis in humans is caused by several, mostly opportunistic species and includes a wide variety of diseases [55]. Because of the relative simplicity of their intracellular organelles (e.g., apparent lack of mitochondria, stacked Golgi dictyosomes, flagella, and peroxisomes), microsporidia were considered very primitive protozoa that were grouped in the Archezoa. It was believed that this group of ancient protists, also including the genera Giardia and Entamoeba, evolved before endosymbiotic uptake of mitochondria. In contrast, recent molecular data support the assumption that microsporidia are not primitive and amitochondriate, but rather highly derived and adapted eukaryotes with putative relict mitochondria (mitosomes). Their close phylogenetic relationship with the fungi suggests evolution from a complex fungal ancestor [56]. As obligate intracellular parasites they have completely adapted their cell metabolism to the intracellular lifestyle and only use spores as extracellular forms with complex structures for infection of host cells [57]. This reductive evolution is remarkably reflected in the genomes of some microsporidian species: while a genome size of 19.5 Mbp in Glugea atherinae [58] is not out of the ordinary for a fungal genome, 2.3 Mbp in Encephalitozoon intestinalis is one of the smallest eukaryotic genomes [59]. The first completely sequenced microsporidian genome is that of Encephalitozoon cuniculi, a parasite of a wide range of mammalian hosts causing infections of the eye, respiratory and genitourinary tracts as well as disseminated infections [60]. A mere 2.9 Mbp is distributed over 11 chromosomes ranging from 217 to 315 kbp in size. Substantive gene loss has contributed to the reduction in genome size: fewer than 2000 genes encode for proteins in E. cuniculi. This process has occurred in nonrandom fashion; specific metabolic pathways and regulatory functions disappeared [60]. For example, E. cuniculi harbors no genes involved in purine, sterol, or fatty acid biosynthesis, and only two amino acid biosynthesis genes [49, 60]. Moreover, the TCA cycle is absent – another indication that some metabolic functions such as the synthesis of small molecules are provided by the host cell, and therefore genes encoding these functions became redundant and were lost. Vital functions for the parasitic life style such as transporters and other genes for scavenging and the use

18.3 Genomics of Opportunistic Fungal Pathogens

of host-derived molecules as well as genes encoding important cellular processes such as transcription, DNA replication, and protein synthesis remain opportunistic represented in the E. cuniculi genome. Additional reduction of genome size was accomplished by increase of gene density and decrease in size of genes and intergenic regions. With approximately one gene every 1000 bp in the gene-rich regions of its chromosomes, E. cuniculi surpasses S. cerevisiae (one coding sequence per 2000 bp) and all other eukaryotes with known complete genome sequences. Due to reduction in size of many proteins and elimination of most introns, only about 15 short introns (average 23–52 bp) are left; genes are substantially compacted and together with remarkably short intergenic regions (approx. 120 bp) and the absence of highly repetitive DNA contribute to the diminutive E. cuniculi genome. Genome compaction may also influence genome stability in the microsporidia since the close association of coding regions to each other may lead to cotranscription and reduced recombination events. A recent survey on gene order between the distantly related microsporidia Antonospora locustae and E. cuniculi revealed that the degree of synteny is higher than between the closely related ascomycetes S. cerevisiae and C. albicans or between humans and fish [61]. In contrast, protein or small-subunit rRNA gene sequences are clearly more divergent in A. locustae and E. cuniculi than in S. cerevisiae and C. albicans. Furthermore, absence or low amounts of repetitive DNA and retroelements might enhance genome stability in microsporidia. In summary, the genome of the microsporidium E. cuniculi provides a paradigm of how obligate intracellular parasitism can lead to streamlined genomes that are reduced to provide only the functions necessary to maintain presence in the relatively constant milieu within host cells, in the absence of the larger environmental fluctuations encountered by free-living fungi. 18.3.5 Candida

Several Candida species are opportunistic pathogens in humans, and in correlation to their clinical significance various genome projects have been initiated which will be summarized in the following section. C. albicans is part of the normal human mucosal microflora and coexists as a commensal with many other microorganisms in the immunocompetent host without causing disease. Impairment of the host immune defenses, however, can lead to a wide variety of opportunistic infections by C. albicans that range from superficial skin infections to invasive candidiasis with hematogenous dissemination to organs which is associated with high morbidity and mortality. C. albicans is the most virulent and most frequently isolated Candida species in the immunocompromised host. Its importance as an indicator of the early phases of the HIVinduced immune deficiency and the main eukaryotic agent causing nosocomial bloodstream infections has led to a publicly funded genome sequencing project. C. albicans has posed some peculiar challenges to molecular genetic research: it

403

404

18 Genomics of Pathogenic Fungi

has no known haploid phase and exhibits, like many other Candida species, nonstandard decoding of the CUG codon to serine instead of leucine [62]. The diploid nature of the C. albicans genome (see Table 18.2) demanded special methods of sequencing and assembly [8]. The genome was sequenced by wholegenome shotgun to 7.1x PHRED20 coverage (PHRED is a base-calling program for DNA sequence traces [63, 64]). Conventional PHRAP assembly [63] of the sequence produced an excessively large number of contigs (compared to that predicted from the sequence coverage) and an excessively long haploid genome (compared to the approx. 16 Mbp determined empirically) – a result of apparently high rates of heterozygosity in the diploid genome. To achieve a more realistic assembly, Jones and coworkers [8] developed a novel computational method that assumed heterozygosity and thus was able to assign many of the PHRAP-generated contigs to a smaller number of larger paired “supercontigs.” With some additional manual intervention, Jones et al. [8] assembled 412 distinct supercontigs, of which 146 were paired, yielding a reference haploid set of 266 supercontigs and 6419 open reading frames (ORFs). As rDNA genes had been removed from consideration prior to assembly, the total assembly length of 14.851 Mbp agreed favorably with the physical mapping measurement (minus rDNA) of 14.855 Mbp. Physical map markers allowed assignment of 121 of the haploid supercontigs (87% of the haploid set) to the eight chromosomes of C. albicans. Of the unassigned supercontigs, only 2 failed to be assigned because of mapping conflicts (0.81% of the haploid set). Jones et al. confirmed the high rates of heterozygosity in C. albicans by developing a statistical test based on the likelihood of polymorphism at a base position given PHRED quality scores. They detected 62 534 polymorphisms, more than 89% of which were single base substitutions. Transitions exceeded transversions by 2:1. Rates of heterozygosity ranged from 1.73 polymorphisms/kb across chromosome 7 to 9.48 polymorphisms/kb across chromosome 5, with an average rate of 4.21 polymorphisms/kb across the entire genome. They detected 11 large stretches (over 100 bp) of high heterozygosity, the largest of which (the mating type-like locus, MTL) exceeded 8.7 kbp. Heterozygosity was detected in 3579 ORF pairs, of which 2792 cases (78%) resulted in a difference in translation. In addition, 6699 insertions and deletions were detected, the majority of which were multiples of three bases and therefore not frameshifting. To begin preliminary functional annotation of the C. albicans genome, the haploid ORF set was used to search the S. cerevisiae, Schizosaccharomyces pombe, and –8 Homo sapiens gene lists, and those BLAST hits with E-value below 10 were counted as suggesting potential homologues. Three thousand twenty-seven C. albicans ORFs (47% of the haploid set) had potential homologues among all three of the other genomes and therefore possibly encode general eukaryotic functions. Nine hundred thirty-nine ORFs (15%) encoded probable ascomycete-specific proteins. Only 1426 ORFs (22%) appeared to be unique to C. albicans. Further scrutiny of the genome sequence revealed some surprising results: although C. albicans appears to have a mostly clonal population and no teleomorph has been identified, the genome sequence uncovered a mating-type like locus [65]

18.3 Genomics of Opportunistic Fungal Pathogens

which subsequently stimulated new interest in a possible cryptic sexual cycle. Some already expected tendencies were corroborated, including the existence of large multigene families in C. albicans, especially those encoding extracellular proteins like secreted aspartyl proteinases (SAPs), lipases, phospholipases, and the agglutinins-like sequence (ALS) cell surface glycoproteins. New multigene families were discovered, including a large family encoding proteins of unknown function with leucine-rich repeats, and the CTA family of putative transcriptional activators. Additionally, the striking degree of heterozygosity between homologous chromosomes of the diploid genome not only posed a challenge to genome assembly, but also demonstrated that there may be increased functional and regulatory complexity. Comparisons to the S. cerevisiae genome are eagerly awaited to see what makes C. albicans pathogenic, but the evolutionary distance between the two microorganisms, probably as far as between humans and Drosophila, has to be considered. The near future will probably allow better comparative genomics to more closely related apathogenic species. Nevertheless, S. cerevisiae remains the standard model organism and some of the observed disparities may lead to new insights. Oxidative and lipid metabolism, for instance, may play a greater role in C. albicans than they do in S. cerevisiae, as indicated by a stronger representation in the genome of the pathogenic fungus. C. dubliniensis and C. albicans are the most closely related Candida species causing human infections. Both species are able to form true hyphae, the former species, however, with less efficiency during induction through temperature/pH shifts or growth in serum. C. dubliniensis is also less frequently found in oral candidiasis, virtually absent in systemic infection, and shows reduced virulence in animal models. Additionally, C. dubliniensis appears to be more sensitive to elevated temperature and salt concentration [66, 67]. Ahead of the completion of the C. dubliniensis genome sequencing project (see Table 18.2) Moran and coworkers [68] performed comparative genomic hybridization (CGH) by cohybridization of C. dubliniensis and C. albicans genomic DNA to C. albicans cDNA microarrays to identify genes that are absent in C. dubliniensis and thus could account for its reduced virulence phenotype. This approach was feasible due to the high degree of sequence similarity between the two species. The authors found that 247 of C. albicans coding sequences had less than 60% homology or were absent in C. dubliniensis. Divergent sequences were functionally categorized and included a large percentage of genes with unknown function (54%) as well as retrotransposon-encoded sequences, and genes encoding putative transcriptional regulators, membrane transporters, proteins with leucine-rich repeats, and glycosyl phosphatidyl-inositol (GPI)-anchored proteins. Genes encoding factors associated with virulence in C. albicans like the ALS genes, the hyphal adhesin gene HWP1, and the SAP genes also showed significant sequence divergence. Interestingly, a SAP5 orthologue appears to be conspicuously absent from the SAP gene family in C. dubliniensis, and only one gene (CdSAP4) seems to represent the hyphae-specific subfamily SAP4-6 present in C. albicans. Consequently, host tissue penetration by C. dubliniensis hyphae could be impaired since these SAP genes, specifically SAP6, have been shown to be important for C. albicans systemic and intraperitoneal infections [69, 70].

405

406

18 Genomics of Pathogenic Fungi

C. glabrata, previously classified as Torulopsis glabrata, has emerged as the second most common cause of superficial and systemic candidiasis in immunocompromised hosts. The yeast is a minor constituent of the human microbiota colonizing skin, and gastrointestinal and vaginal mucosa, but also can be found in the environment on plants, soil and water. Infections by C. glabrata are of special concern since this yeast displays low susceptibility to azole antifungals [71]. The haploid yeast has no known teleomorph and was considered asexual. However, recent genomic explorations also suggest a cryptic sexual cycle as in C. albicans, since mating pathway genes and meiotic genes are encoded in the C. glabrata genome [72]. Phenotypic switching is another trait that C. glabrata appears to share with C. albicans. Conversely, phylogenetic studies suggest only a distant relationship of these hemiascomycetes. C. glabrata is much more closely related to S. cerevisiae than to C. albicans, thus demonstrating the phylogenetic heterogeneity of the genus Candida. A recent report on comparative genomics of five hemiascomycetous yeasts, S. cerevisiae, C. glabrata, Kluyveromyces lactis, a yeast used in industrial applications, Debaromyces hansenii, a cryo- and salt-tolerant yeast, and Yarrowia lipolytica, an alkane-using yeast, confirmed this close relationship [73]. S. cerevisiae and C. glabrata are believed to share a common ancestor that underwent a wholegenome duplication which was followed by extensive gene loss in both species. The relics of these events are still traceable and demonstrate that the two species followed a divergent path. The more extensive gene loss in C. glabrata indicates species-specific reductive evolution that probably coincided with the emergence of these fungi as human pathogens. Affected gene functions include galactose metabolism (five genes), phosphate metabolism (four genes), cell rescue, defense and virulence (three genes), and nitrogen and sulfur metabolism (three genes) [73]. The genomes of three other Candida species are currently being sequenced and annotated by the Broad Institute Fungal Genome Initiative (see Table 18.2): C. tropicalis is one of the most frequent non-albicans species causing systemic candidiasis in humans; in animal models it is the second-most virulent species. The haploid yeasts C. lusitaniae (Clavispora lusitaniae) and C. guilliermondii (Pichia guilliermondii) are able to undergo a complete sexual cycle including meiosis and sporulation. Both are emerging opportunistic pathogens [74] with frequent tolerance or resistance to the polyene antifungals including amphotericin B. Comparative genomics of Candida species will provide invaluable insights into the genetic basis of ploidy, e.g., haploid versus diploid, and the life style with a complete sexual cycle versus rare mating and parasexuality.

18.4 The Tool Box for Functional Genomics

A wealth of research tools for genetics and genomics is available for the nonpathogenic model organisms S. cerevisiae and Schizosaccharomyces pombe, while resources for the pathogenic fungi have been scarce. Absent or cryptic sexual phases often prevent the use of classical genetics in these fungi. Furthermore, dif-

18.4 The Tool Box for Functional Genomics

ficulties in growing the microorganisms under defined laboratory conditions, the lack of stable extrachromosomal elements, poor efficiencies for transformation or homologous recombination, and such peculiarities as noncanonical decoding of the CUG codon in some members of the genus Candida [62] have hampered genetic studies in pathogenic fungi. Genomics and recent developments of molecular genetic tools for many pathogenic fungi, however, promise to narrow the gap to the model organisms and help to identify the molecular basis of pathogenesis. In the following, we highlight some recent developments. 18.4.1 Expression Analysis

With the availability of genome sequences, techniques for measuring and comparing the expression of thousands of genes have been developed. DNA microarrays provide a powerful tool with which to study gene expression on a whole-genome scale and determine how gene transcription is affected by growth phases (e.g., log to stationary phase) and morphological differentiation or environmental cues and mutations. Both oligonucleotide-based and cDNA arrays have been used in gene expression analyses of pathogenic fungi, in particular C. albicans, for which the genome sequence information was first available. Custom Affymetrix GeneChip arrays have been used for transcriptional profiling of C. albicans during iron limitation and during phenotypic switching of C. albicans strain WO-1 [75, 76]. Other examples of the growing numbers of DNA microarray analyses in C. albicans include transcriptional profiling of yeast-to-hyphae transition [77], cAMP signaling [78], pH response [79] and biofilm formation [80] using cDNA microarrays. In vivo expression technology (IVET) is a system that has been developed initially for bacterial pathogens to identify genes that are specifically induced in host tissue [81, 82]. Recently IVET has been adapted for use in C. albicans (see Fig. 18.1 and Ref. [83]) and H. capsulatum [84]. In C. albicans IVET revealed the temporal and spatial expression patterns of the SAP genes, demonstrating a high degree of differential gene regulation, even to the level of individual SAP2 alleles [85, 86]. 18.4.2 Transformation and Mutagenesis

The formulation of the molecular Koch’s postulates for research on bacterial pathogenicity [87] provided a framework for validation of microbial genes as virulence factors. Mycologists have been trying to apply these postulates to research on fungal pathogenesis and virulence with mixed success in defining true virulence factors, because fungal virulence is typically multifactorial. The proof of concept for a gene to be involved in pathogenesis of infection and disease is achieved when the specific inactivation of this gene leads to significant loss or attenuation of pathogenicity or virulence. Restoration of the gene activity should therefore result in regain of these properties. Transformation and gene inactivation are crucial techniques to evaluate the role of genes in pathogenesis. Spheroplast or lithi-

407

408

18 Genomics of Pathogenic Fungi

Fig. 18.1 In vivo expression technology (IVET) in Candida albicans. C. albicans strains are constructed with the site-specific recombinase FLP adapted for use in C. albicans (caFLP) under the control of the promoter of a gene/ allele of interest (YFG). Upon activation of the YFG-1 promoter (PYFG-1) the FLP protein is expressed and excises the MPAR resistance marker gene, which confers mycophenolic acid resistance and is flanked by FLP recombination target sites (black arrowheads). Following experimental animal infection, fungal

cells are isolated from different host organs at various time points and MPA susceptibility of the cells is determined by plating on media with low MPA content, where MPA-susceptible (or MPA-sensitive) (mpaS) cells grow to smaller colony size. The percentage of mpaS cells in the isolated cells population reflects PYFG-1 induction. Allelic differences in promoter induction can be revealed when the reporter system is used for both alleles. Pconst: constitutive promoter.

um acetate transformation protocols or electroporation [88] are preferred methods for transformation of Candida, while other methods like biolistic delivery of DNA into C. neoformans [89] or transformation mediated by Agrobacterium tumifaciens have been adapted for Co. immitis [90], P. brasiliensis, B. dermatitidis, and H. capsulatum [91]. The rate for homologous recombination differs widely among pathogenic fungi: homologous integrations are frequent in C. albicans, whereas other fungi like H. capsulatum show a high rate of nonhomologous integration that impedes targeted gene disruption techniques [92]. The construction of isogenic mutant strains that only differ in the targeted virulence gene is highly desirable for comparative and well-controlled virulence studies. Difficulties in the interpretation of virulence studies have been encountered in C. albicans where researchers used auxotrophic strains which were inherently avirulent and relied on a selection marker gene to render prototrophy and virulence. Additional virulence defects with differential expression of the selection marker in mutant and control strains due to position effects could not always be excluded [93]. An elegant method for

18.4 The Tool Box for Functional Genomics

Fig. 18.2 Targeted gene disruption using the MPAR-Flipper. Sequential gene disruption of both alleles of a Candida gene (YFG) employing the MPAR-Flipper with flanking sequences for homologous recombination and subsequent recycling through the inducible FLP recombinase system. Pind: inducible promoter.

sequential disruption of both copies of a gene in strictly diploid C. albicans has been developed using a dominant selection marker in conjunction with a site-specific recombinase (see Fig. 18.2). The method allows generation of isogenic mutants in wild-type strains and clinical isolates of C. albicans and non-albicans Candida species without the reliance on auxotrophic strains and their inherent drawbacks for virulence studies. As an alternative to permanent gene disruptions for functional analysis, antisense techniques and recently RNA interference (RNAi) by double-stranded RNA have been used to transiently decrease (“knock down”) the expression of genes in pathogenic fungi. For example, an antisense-based approach was used by De Backer and coworkers to identify unknown genes that are critical for growth in C. albicans [94]. RNAi approaches for gene silencing have been reported for C. neoformans and A. fumigatus [42, 95]. Genome-wide functional screens using knock-out strategies (disruption or insertional mutagenesis) have been carried out or are under way in pathogenic fungi. A large-scale gene deletion strategy was recently reported that uses auxotrophic strains of C. albicans with confirmed virulence in a single animal model and combined use of heterologous selectable markers [96]. Forward genetic ap-

409

410

18 Genomics of Pathogenic Fungi

proaches can be used to identify genes involved in a particular phenotype or morphogenetic process, e.g., drug resistance, pH response, filamentous growth or biofilm formation. For example, the switch to filamentous growth of C. albicans was recently investigated by Uhl and coworkers using transposon mutagenesis to identify genes involved in this process [107]. In a genetic screen, mutants with altered capabilities for forming biofilms were recently isolated from an insertional mutant library of C. glabrata [97]. Signature-tagged mutagenesis (STM; for review see Ref. [98]) can be used to identify individual mutants in a large pool of mutants that have a negative phenotype, e.g., attenuation of virulence. STM was applied in C. neoformans to identify virulence mutants [99] and in C. glabrata, where it led to the identification of an adhesin involved in attachment to epithelial cells [100].

18.5 Fungal Virulence – From the Genomic Point of View

In a recent commentary, Arturo Casadevall [101] pointed out the link between thermotolerance and fungal virulence: only fungal species that can tolerate the body temperatures of endothermic animals are potential pathogens, and these still have to be able to cope with the slight alkalinity of mammalian body fluids and the immune defense of their prospective host. Most fungi prefer temperatures between 25 C and 35 C and acidic pH [5]. In contrast, ectothermic organisms like plants, insects, amphibians, and some reptiles appear more susceptible to fungal disease. Consequently, Casadevall even argues that fungi could have been involved in the demise of the dinosaurs and the rise of mammals about 65 million years ago [101]. Commensal fungi like C. albicans have evolved towards an intimate relationship with their host and only cause endogenous mycoses when the host becomes immunocompromised. Soil fungi such as the dimorphic fungi become pathogens when their infectious propagules are inhaled in sufficient quantities. Here Casadevall and his group suggest that selective processes instigated by amoebae and/ or other soil microorganisms might prime certain fungal species for virulence in the mammalian host [102, 103]. For example, the predatory relationship of Acanthamoeba castellanii with C. neoformans and the intracellular survival strategy of the fungi could be a “battle training” for the encounter with mammalian macrophages [104]. Furthermore, once an infection has been “successful,” animal passage could select for strains that are more adapted to the host environment and thus more virulent. Since fungal pathogenicity is polyphyletic and virulence is certainly multifactorial, only detailed study of each fungal pathogen and its close apathogenic relatives will reveal the traits that in combination render one species or strain more virulent than another. Tolerance of higher temperatures and alkaline conditions may be common denominators, but pathogenic species might have reached these qualities through different evolutionary paths. Additional factors important for virulence, such as adhesins, pigments, capsules, extracellular enzymes, or other secreted products, are very specific to the pathogen.

References

Comparative genomics of close relatives, pathogenic versus nonpathogenic, will help to identify candidates for virulence determinants, while functional genomics and molecular genetics will be needed to validate these factors. Comparisons between phylogenetically distant species like S. cerevisiae and C. albicans are probably too imprecise due to the long period of divergent evolution, but overall trends may be revealed. The remarkable reductive evolution of some microsporidian species (see above) can hardly be missed. It will be interesting to see whether commensalism or close association with the mammalian host as seen with C. albicans or Pneumocystis spp. has led in all cases to reductive evolution in certain functional categories (e.g., nucleotide metabolism as in E. cuniculi) while other functional areas remain unchanged or are expanded, as for instance the SAP gene family in C. albicans.

18.6 Conclusion

Within the last century, medical mycology has undergone a transition from a rather eccentric specialty to a field of medical urgency and growing scientific importance. The complexity of interactions of fungal pathogens with the eukaryotic relatives they more or less accidentally damage during parasitic life will be reflected in their genomes. Additionally, most fungi harbor an exceptional variety of functions necessary for a saprobic lifestyle. How these traits are used for propagation, defense, and ultimately infection of a host is inscribed in their genomes, but we are just beginning to realize that we will have to read many more genomes in order to fully comprehend fungal pathogenomics. References 1 Edman, J.C., J.A. Kovacs, H. Masur,

6 Guarro, J., GeneJ, and A.M. Stchigel.

D.V. Santi, H.J. Elwood, and M.L. Sogin. 1988. Ribosomal RNA sequence shows Pneumocystis carinii to be a member of the fungi. Nature 334:519–522. 2 Taylor, J.W., E. Swann, and M.L. Berbee. 1994. Ascomycete Systematics: Problems and Perspectives in the Nineties. Plenum Press, New York. 421 pp. 3 Keeling, P.J., and N.M. Fast. 2002. Microsporidia: biology and evolution of highly reduced intracellular parasites. Annu. Rev. Microbiol. 56:93–116. 4 Hawksworth, D.L. 2001. The magnitude of fungal diversity: the 1.5 million species estimate revisited. Mycol. Res. 105:1422–1432. 5 Kwon-Chung, K.J., and J.E. Bennett. 1992. Medical Mycology. Lea & Febiger, Philadelphia.

1999. Developments in fungal taxonomy. Clin. Microbiol. Rev. 12:454–500. 7 Goffeau, A., B.G. Barrell, H. Bussey, R.W. Davis, B. Dujon, H. Feldmann, F. Galibert, J.D. Hoheisel, C. Jacq, M. Johnston, et al. 1996. Life with 6000 genes. Science 274:546, 563–547. 8 Jones, T., N.A. Federspiel, H. Chibana, J. Dungan, S. Kalman, B.B. Magee, G. Newport, Y.R. Thorstenson, N. Agabian, P.T. Magee, et al. 2004. The diploid genome sequence of Candida albicans. Proc. Natl. Acad. Sci. U.S.A. 101:7329–7334. 9 Retallack, D.M., and J.P. Woods. 1999. Molecular epidemiology, pathogenesis, and genetics of the dimorphic fungus Histoplasma capsulatum. Microbes Infect. 1:817–825.

411

412

18 Genomics of Pathogenic Fungi 10 Kasuga, T., T.J. White, G. Koenig,

11

12

13

14

15

16

17

18

J. McEwen, A. Restrepo, E. Castaneda, C. Da Silva Lacaz, E.M. Heins-Vaccari, R.S. De Freitas, R.M. Zancope-Oliveira, et al. 2003. Phylogeography of the fungal pathogen Histoplasma capsulatum. Mol. Ecol. 12:3383–3401. Reiss, E. 1977. Serial enzymatic hydrolysis of cell walls of two serotypes of yeastform Histoplasma capsulatum with alpha(1 leads to 3)-glucanase, beta(1 leads to 3)-glucanase, pronase, and chitinase. Infect. Immun. 16:181–188. Davis, T.E., Jr., J.E. Domer, and Y.T. Li. 1977. Cell wall studies of Histoplasma capsulatum and Blastomyces dermatitidis using autologous and heterologous enzymes. Infect. Immun. 15:978–987. Klimpel, K.R., and W.E. Goldman. 1988. Cell walls from avirulent variants of Histoplasma capsulatum lack alpha-(1,3)-glucan. Infect. Immun. 56:2997–3000. Spitzer, E.D., E.J. Keath, S.J. Travis, A.A. Painter, G.S. Kobayashi, and G. Medoff. 1990. Temperature-sensitive variants of Histoplasma capsulatum isolated from patients with acquired immunodeficiency syndrome. J. Infect. Dis. 162:258–261. Medoff, G., M. Sacco, B. Maresca, D. Schlessinger, A. Painter, G.S. Kobayashi, and L. Carratu. 1986. Irreversible block of the mycelial-to-yeast phase transition of Histoplasma capsulatum. Science 231:476–479. Magrini, V., W.C. Warren, J. Wallis, W.E. Goldman, J. Xu, E.R. Mardis, and J.D. McPherson. 2004. Fosmid-based physical mapping of the Histoplasma capsulatum genome. Genome Res. 14:1603–1609. Hwang, L., D. Hocking-Murray, A.K. Bahrami, M. Andersson, J. Rine, and A. Sil. 2003. Identifying phase-specific genes in the fungal pathogen Histoplasma capsulatum using a genomic shotgun microarray. Mol. Biol. Cell 14:2314–2326. Shoemaker, D.D., E.E. Schadt, C.D. Armour, Y.D. He, P. Garrett-Engele, P.D. McDonagh, P.M. Loerch, A. Leonardson, P.Y. Lum, G. Cavet, et al. 2001. Experimental annotation of the human genome using microarray technology. Nature 409:922–927.

19 Kwon-Chung, K.J., M.S. Bartlett, and

L.J. Wheat. 1984. Distribution of the two mating types among Histoplasma capsulatum isolates obtained from an urban histoplasmosis outbreak. Sabouraudia 22:155–157. 20 Kwon-Chung, K.J., R.J. Weeks, and H.W. Larsh. 1974. Studies on Emmonsiella capsulata (Histoplasma capsulatum). II. Distribution of the two mating types in 13 endemic states of the United States. Am. J. Epidemiol. 99:44–49. 21 Chiller, T.M., J.N. Galgiani, and D.A. Stevens. 2003. Coccidioidomycosis. Infect. Dis. Clin. North Am. 17:41–57, viii. 22 Hector, R.F., and R. Laniado-Laborin. 2005. Coccidioidomycosis – A Fungal Disease of the Americas. PLoS Med. 2:e2. 23 Stevens, D.A. 1995. Coccidioidomycosis. N. Engl. J. Med. 332:1077–1082. 24 Cox, R.A., and D.M. Magee. 2004. Coccidioidomycosis: host response and vaccine development. Clin. Microbiol. Rev. 17:804–839. 25 Fisher, M.C., G.L. Koenig, T.J. White, and J.W. Taylor. 2002. Molecular and phenotypic description of Coccidioides posadasii sp. nov., previously recognized as the non-Californian population of Coccidioides immitis. Mycologia 94:73– 84. 26 Bradsher, R.W., S.W. Chapman, and P.G. Pappas. 2003. Blastomycosis. Infect. Dis. Clin. North Am. 17:21–40, vii. 27 Borges-Walmsley, M.I., D. Chen, X. Shu, and A.R. Walmsley. 2002. The pathobiology of Paracoccidioides brasiliensis. Trends Microbiol. 10:80–87. 28 Restrepo, A., M.E. Salazar, L.E. Cano, E.P. Stover, D. Feldman, and D.A. Stevens. 1984. Estrogens inhibit myceliumto-yeast transformation in the fungus Paracoccidioides brasiliensis: implications for resistance of females to paracoccidioidomycosis. Infect. Immun. 46:346– 353. 29 Goldman, G.H., E. dos Reis Marques, D.C. Duarte Ribeiro, L.A. de Souza Bernardes, A.C. Quiapin, P.M. Vitorelli, M. Savoldi, C.P. Semighini, R.C. de Oliveira, L.R. Nunes, et al. 2003. Expressed sequence tag analysis of the human

References

30

31

32

33

34

35

36

37

38

39

40

41

pathogen Paracoccidioides brasiliensis yeast phase: identification of putative homologues of Candida albicans virulence and pathogenicity genes. Eukaryot. Cell 2:34–48. Marr, K.A., R.A. Carter, F. Crippa, A. Wald, and L. Corey. 2002. Epidemiology and outcome of mould infections in hematopoietic stem cell transplant recipients. Clin. Infect. Dis. 34:909–917. Latge, J.P. 2001. The pathobiology of Aspergillus fumigatus. Trends Microbiol. 9:382–389. Stanzani, M., E. Orciuolo, R. Lewis, D.P. Kontoyiannis, S.L. Martins, L.S. St John, and K.V. Komanduri. 2004. Aspergillus fumigatus suppresses the human cellular immune response via gliotoxin-mediated apoptosis of monocytes. Blood 105:2258–2265. Archer, D.B., and P.S. Dyer. 2004. From genomics to post-genomics in Aspergillus. Curr. Opin. Microbiol. 7:499–504. Anderson, M.J., J.L. Brookman, and D.W. Denning. 2003. Aspergillus. In: Genomics of Plants and Fungi. R.A. Prade and H.J. Bohnert, editors. Marcel Dekker, New York. 1–39. Poggeler, S. 2002. Genomic evidence for mating abilities in the asexual pathogen Aspergillus fumigatus. Curr. Genet. 42:153–160. Varga, J., and B. Toth. 2003. Genetic variability and reproductive mode of Aspergillus fumigatus. Infect. Genet. Evol. 3:3– 17. Varga, J. 2003. Mating type gene homologues in Aspergillus fumigatus. Microbiology 149:816–819. Perfect, J.R., and A. Casadevall. 2002. Cryptococcosis. Infect. Dis. Clin. North Am. 16:837–874, v–vi. Hull, C.M., and J. Heitman. 2002. Genetics of Cryptococcus neoformans. Annu. Rev. Genet. 36:557–615. Loftus, B.J., E. Fung, P. Roncaglia, D. Rowley, P. Amedeo, D. Bruno, J. Vamathevan, M. Miranda, I.J. Anderson, J.A. Fraser, et al. 2005. The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science 307:1321–1324. Gorlach, J.M., H.C. McDade, J.R. Perfect, and G.M. Cox. 2002. Antisense

repression in Cryptococcus neoformans as a laboratory tool and potential antifungal strategy. Microbiology 148:213–219. 42 Liu, H., T.R. Cottrell, L.M. Pierini, W.E. Goldman, and T.L. Doering. 2002. RNA interference in the pathogenic fungus Cryptococcus neoformans. Genetics 160:463–470. 43 Stringer, S.L., J.R. Stringer, M.A. Blase, P.D. Walzer, and M.T. Cushion. 1989. Pneumocystis carinii: sequence from ribosomal RNA implies a close relationship with fungi. Exp. Parasitol. 68:450– 461. 44 Cushion, M.T. 2004. Pneumocystis: unraveling the cloak of obscurity. Trends Microbiol. 12:243–249. 45 Vargas, S.L., W.T. Hughes, M.E. Santolaya, A.V. Ulloa, C.A. Ponce, C.E. Cabrera, F. Cumsille, and F. Gigliotti. 2001. Search for primary infection by Pneumocystis carinii in a cohort of normal, healthy infants. Clin. Infect. Dis. 32:855–861. 46 Icenhour, C.R., S.L. Rebholz, M.S. Collins, and M.T. Cushion. 2001. Widespread occurrence of Pneumocystis carinii in commercial rat colonies detected using targeted PCR and oral swabs. J. Clin. Microbiol. 39:3437–3441. 47 Telzak, E.E., R.J. Cote, J.W. Gold, S.W. Campbell, and D. Armstrong. 1990. Extrapulmonary Pneumocystis carinii infections. Rev. Infect. Dis. 12:380–386. 48 Cushion, M.T. 1998. Pneumocystis carinii. In: Topley and Wilson’s Microbiology and Microbial Infections. L. Collier, A. Balows, and M. Sussman, editors. Arnold & Oxford University Press, New York. 645–683. 49 Cushion, M.T. 2004. Comparative genomics of Pneumocystis carinii with other protists: implications for life style. J. Eukaryot. Microbiol. 51:30–37. 50 Giuntoli, D., S.L. Stringer, and J.R. Stringer. 1994. Extraordinarily low number of ribosomal RNA genes in P. carinii. J. Eukaryot. Microbiol. 41:88S. 51 Khler, G., J. Morschhuser, and J. Hacker. 2002. Genome structure of pathogenic fungi. Curr. Top. Microbiol. Immunol. 264:149–166. 52 Cushion, M.T., M. Kaselis, S.L. Stringer, and J.R. Stringer. 1993. Genetic stability

413

414

18 Genomics of Pathogenic Fungi and diversity of Pneumocystis carinii infecting rat colonies. Infect. Immun. 61:4801–4813. 53 Stringer, J.R., and S.P. Keely. 2001. Genetics of surface antigen expression in Pneumocystis carinii. Infect. Immun. 69:627–639. 54 Stringer, J.R., and M.T. Cushion. 1998. The genome of Pneumocystis carinii. FEMS Immunol. Med. Microbiol. 22:15–26. 55 Franzen, C., and A. Muller. 2001. Microsporidiosis: human diseases and diagnosis. Microbes Infect. 3:389–400. 56 Hirt, R.P., J.M. Logsdon, Jr., B. Healy, M.W. Dorey, W.F. Doolittle, and T.M. Embley. 1999. Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc. Natl. Acad. Sci. U.S.A. 96:580–585. 57 Keeling, P.J., and C.H. Slamovits. 2004. Simplicity and complexity of microsporidian genomes. Eukaryot. Cell 3:1363– 1369. 58 Biderre, C., M. Pages, G. Metenier, D. David, J. Bata, G. Prensier, and C.P. Vivares. 1994. On small genomes in eukaryotic organisms: molecular karyotypes of two microsporidian species (Protozoa) parasites of vertebrates. C. R. Acad. Sci. III. 317:399–404. 59 Peyretaillade, E., C. Biderre, P. Peyret, F. Duffieux, G. Metenier, M. Gouy, B. Michot, and C.P. Vivares. 1998. Microsporidian Encephalitozoon cuniculi, a unicellular eukaryote with an unusual chromosomal dispersion of ribosomal genes and a LSU rRNA reduced to the universal core. Nucleic Acids Res. 26:3513–3520. 60 Katinka, M.D., S. Duprat, E. Cornillot, G. Metenier, F. Thomarat, G. Prensier, V. Barbe, E. Peyretaillade, P. Brottier, P. Wincker, et al. 2001. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414:450–453. 61 Slamovits, C.H., N.M. Fast, J.S. Law, and P.J. Keeling. 2004. Genome compaction and stability in microsporidian intracellular parasites. Curr. Biol. 14:891–896.

62 Santos, M.A., T. Ueda, K. Watanabe,

63

64

65

66

67

68

69

70

and M.F. Tuite. 1997. The non-standard genetic code of Candida spp.: an evolving genetic code or a novel mechanism for adaptation? Mol. Microbiol. 26:423– 431. Ewing, B., and P. Green. 1998. Basecalling of automated sequencer traces using PHRED. II. Error probabilities. Genome Res. 8:186–194. Ewing, B., L. Hillier, M.C. Wendl, and P. Green. 1998. Base-calling of automated sequencer traces using PHRED. I. Accuracy assessment. Genome Res. 8:175–185. Hull, C.M., and A.D. Johnson. 1999. Identification of a mating type-like locus in the asexual pathogenic yeast Candida albicans. Science 285:1271– 1275. Pinjon, E., D. Sullivan, I. Salkin, D. Shanley, and D. Coleman. 1998. Simple, inexpensive, reliable method for differentiation of Candida dubliniensis from Candida albicans. J. Clin. Microbiol. 36:2093–2095. Alves, S.H., E.P. Milan, P. de Laet Sant’Ana, L.O. Oliveira, J.M. Santurio, and A.L. Colombo. 2002. Hypertonic sabouraud broth as a simple and powerful test for Candida dubliniensis screening. Diagn. Microbiol. Infect. Dis. 43:85–86. Moran, G., C. Stokes, S. Thewes, B. Hube, D.C. Coleman, and D. Sullivan. 2004. Comparative genomics using Candida albicans DNA microarrays reveals absence and divergence of virulence-associated genes in Candida dubliniensis. Microbiology 150:3363–3382. Felk, A., M. Kretschmar, A. Albrecht, M. Schaller, S. Beinhauer, T. Nichterlein, D. Sanglard, H.C. Korting, W. Schafer, and B. Hube. 2002. Candida albicans hyphal formation and the expression of the Efg1-regulated proteinases Sap4 to Sap6 are required for the invasion of parenchymal organs. Infect. Immun. 70:3689–3700. Sanglard, D., B. Hube, M. Monod, F.C. Odds, and N.A. Gow. 1997. A triple deletion of the secreted aspartyl proteinase genes SAP4, SAP5, and SAP6 of Candida albicans causes attenuated virulence. Infect. Immun. 65:3539–3546.

References 71 Hitchcock, C.A., G.W. Pye, P.F. Troke,

E.M. Johnson, and D.W. Warnock. 1993. Fluconazole resistance in Candida glabrata. Antimicrob. Agents Chemother. 37:1962–1965. 72 Wong, S., M.A. Fares, W. Zimmermann, G. Butler, and K.H. Wolfe. 2003. Evidence from comparative genomics for a complete sexual cycle in the asexual’ pathogenic yeast Candida glabrata. Genome Biol. 4:R10. 73 Dujon, B., D. Sherman, G. Fischer, P. Durrens, S. Casaregola, I. Lafontaine, J. De Montigny, C. Marck, C. Neuveglise, E. Talla, et al. 2004. Genome evolution in yeasts. Nature 430:35–44. 74 Barnes, C., J. Tuck, S. Simon, F. Pacheco, F. Hu, and J. Portnoy. 2001. Allergenic materials in the house dust of allergy clinic patients. Ann. Allergy. Asthma. Immunol. 86:517–523. 75 Lan, C.Y., G. Newport, L.A. Murillo, T. Jones, S. Scherer, R.W. Davis, and N. Agabian. 2002. Metabolic specialization associated with phenotypic switching in Candida albicans. Proc. Natl. Acad. Sci. U.S.A. 99:14907–14912. 76 Lan, C.Y., G. Rodarte, L.A. Murillo, T. Jones, R.W. Davis, J. Dungan, G. Newport, and N. Agabian. 2004. Regulatory networks affected by iron availability in Candida albicans. Mol. Microbiol. 53:1451–1469. 77 Nantel, A., D. Dignard, C. Bachewich, D. Harcus, A. Marcil, A.P. Bouin, C.W. Sensen, H. Hogues, M. van het Hoog, P. Gordon, et al. 2002. Transcription profiling of Candida albicans cells undergoing the yeast-to-hyphal transition. Mol. Biol. Cell 13:3452–3465. 78 Harcus, D., A. Nantel, A. Marcil, T. Rigby, and M. Whiteway. 2004. Transcription profiling of cyclic AMP signaling in Candida albicans. Mol. Biol. Cell 15:4490–4499. 79 Bensen, E.S., S.J. Martin, M. Li, J. Berman, and D.A. Davis. 2004. Transcriptional profiling in Candida albicans reveals new adaptive responses to extracellular pH and functions for Rim101p. Mol. Microbiol. 54:1335–1351. 80 Garcia-Sanchez, S., S. Aubert, I. Iraqui, G. Janbon, J.M. Ghigo, and C. d’Enfert. 2004. Candida albicans biofilms: a devel-

opmental state associated with specific and stable gene expression patterns. Eukaryot. Cell 3:536–545. 81 Mahan, M.J., J.W. Tobias, J.M. Slauch, P.C. Hanna, R.J. Collier, and J.J. Mekalanos. 1995. Antibiotic-based selection for bacterial genes that are specifically induced during infection of a host. Proc. Natl. Acad. Sci. U.S.A. 92:669– 673. 82 Slauch, J.M., M.J. Mahan, and J.J. Mekalanos. 1994. In vivo expression technology for selection of bacterial genes specifically induced in host tissues. Methods Enzymol. 235:481–492. 83 Staib, P., M. Kretschmar, T. Nichterlein, G. Khler, S. Michel, H. Hof, J. Hacker, and J. Morschhuser. 1999. Hostinduced, stage-specific virulence gene activation in Candida albicans during infection. Mol. Microbiol. 32:533–546. 84 Retallack, D.M., G.S. Deepe, Jr., and J.P. Woods. 2000. Applying in vivo expression technology (IVET) to the fungal pathogen Histoplasma capsulatum. Microb. Pathog. 28:169–182. 85 Staib, P., M. Kretschmar, T. Nichterlein, H. Hof, and J. Morschhuser. 2000. Differential activation of a Candida albicans virulence gene family during infection. Proc. Natl. Acad. Sci. U.S.A. 97:6102– 6107. 86 Staib, P., M. Kretschmar, T. Nichterlein, H. Hof, and J. Morschhuser. 2002. Host versus in vitro signals and intrastrain allelic differences in the expression of a Candida albicans virulence gene. Mol. Microbiol. 44:1351–1366. 87 Falkow, S. 1988. Molecular Koch’s postulates applied to microbial pathogenicity. Rev. Infect. Dis. 10 Suppl 2:S274– S276. 88 Khler, G.A., T.C. White, and N. Agabian. 1997. Overexpression of a cloned IMP dehydrogenase gene of Candida albicans confers resistance to the specific inhibitor mycophenolic acid. J. Bacteriol. 179:2331–2338. 89 Toffaletti, D.L., T.H. Rude, S.A. Johnston, D.T. Durack, and J.R. Perfect. 1993. Gene transfer in Cryptococcus neoformans by use of biolistic delivery of DNA. J. Bacteriol. 175:1405–1411.

415

416

18 Genomics of Pathogenic Fungi 90 Abuodeh, R.O., M.J. Orbach, M.A. Man-

91

92

93

94

95

96

97

98

del, A. Das, and J.N. Galgiani. 2000. Genetic transformation of Coccidioides immitis facilitated by Agrobacterium tumefaciens. J. Infect. Dis. 181:2106– 2110. Sullivan, T.D., P.J. Rooney, and B.S. Klein. 2002. Agrobacterium tumefaciens integrates transfer DNA into single chromosomal sites of dimorphic fungi and yields homokaryotic progeny from multinucleate yeast. Eukaryot. Cell 1:895–905. Magee, P.T., C. Gale, J. Berman, and D. Davis. 2003. Molecular genetic and genomic approaches to the study of medically important fungi. Infect. Immun. 71:2299–2309. Staab, J.F., and P. Sundstrom. 2003. URA3 as a selectable marker for disruption and virulence assessment of Candida albicans genes. Trends Microbiol. 11:69–73. De Backer, M.D., B. Nelissen, M. Logghe, J. Viaene, I. Loonen, S. Vandoninck, R. de Hoogt, S. Dewaele, F.A. Simons, P. Verhasselt, et al. 2001. An antisense-based functional genomics approach for identification of genes critical for growth of Candida albicans. Nat. Biotechnol. 19:235–241. Mouyna, I., C. Henry, T.L. Doering, and J.P. Latge. 2004. Gene silencing with RNA interference in the human pathogenic fungus Aspergillus fumigatus. FEMS Microbiol. Lett. 237:317–324. Noble, S.M., and A.D. Johnson. 2005. Strains and strategies for large-scale gene deletion studies of the diploid human fungal pathogen Candida albicans. Eukaryot. Cell 4:298–309. Iraqui, I., S. Garcia-Sanchez, S. Aubert, F. Dromer, J.M. Ghigo, C. d’Enfert, and G. Janbon. 2005. The Yak1p kinase controls expression of adhesins and biofilm formation in Candida glabrata in a Sir4p-dependent pathway. Mol. Microbiol. 55:1259–1271. Shea, J.E., J.D. Santangelo, and R.G. Feldman. 2000. Signature-tagged mutagenesis in the identification of virulence genes in pathogens. Curr. Opin. Microbiol. 3:451–458.

99 Nelson, R.T., J. Hua, B. Pryor, and J.K.

Lodge. 2001. Identification of virulence mutants of the fungal pathogen Cryptococcus neoformans using signaturetagged mutagenesis. Genetics 157:935– 947. 100 Cormack, B.P., N. Ghori, and S. Falkow. 1999. An adhesin of the yeast pathogen Candida glabrata mediating adherence to human epithelial cells. Science 285:578–582. 101 Casadevall, A. 2005. Fungal virulence, vertebrate endothermy, and dinosaur extinction: is there a connection? Fungal Genet. Biol. 42:98–106. 102 Casadevall, A., J.N. Steenbergen, and J.D. Nosanchuk. 2003. Ready made’ virulence and dual use’ virulence factors in pathogenic environmental fungi – the Cryptococcus neoformans paradigm. Curr. Opin. Microbiol. 6:332–337. 103 Steenbergen, J.N., J.D. Nosanchuk, S.D. Malliaris, and A. Casadevall. 2003. Cryptococcus neoformans virulence is enhanced after growth in the genetically malleable host Dictyostelium discoideum. Infect. Immun. 71:4862–4872. 104 Steenbergen, J.N., H.A. Shuman, and A. Casadevall. 2001. Cryptococcus neoformans interactions with amoebae suggest an explanation for its virulence and intracellular pathogenic strategy in macrophages. Proc. Natl. Acad. Sci. U.S.A. 98:15245–15250. 105 Nittler, M. P., D. Hocking-Murray, C. K. Foo, and A. Sil. 2005. Identification of Histoplasma capsulatum Transcripts Induced in Response to Reactive Nitrogen Species. Mol. Biol. Cell:10.1091/mbc.E1005-1005-0434. 106 Braun, B. R., M. van Het Hoog, C. d’Enfert, M. Martchenko, J. Dungan, A. Kuo, D. O. Inglis, M. A. Uhl, H. Hogues, M. Berriman, et al. 2005. A Human-Curated Annotation of the Candida albicans Genome. PLos Genet 1:36–57. 107 Uhl, M. A., M. Biery, N. Craig, and A. D. Johnson. 2003. Haploinsufficiency-based large-scale forward genetic analysis of filamentous growth in the diploid human fungal pathogen C. albicans. EMBO J. 22:2668–2678.

417

19 Genomics of Pathogenic Parasites Gabriele Pradel and Thomas James Templeton

Human diseases caused by protozoans represent a great, neglected global economic impact and health burden affecting hundreds of millions of children and adults [1]. Currently there are no vaccine or immunotherapy regimens in circulation for the treatment of any human parasitic infection, and pharmaceutical approaches are increasingly encountering parasite drug resistance. The recent sequencing of several protozoan genomes has provided the opportunity to characterize novel antigens and metabolic enzymes essential for the parasite life cycle that might lead to the development of new therapeutic targets. Completed genome sequences include the published sequences for Plasmodium sp. [2, 3] and Cryptosporidium sp. [4, 5], and the ongoing genome sequencing projects of other parasitic protists such as apicomplexans Toxoplasma, Theileria, and Eimeria, kinetoplastids Leishmania and Trypanosoma spp., the diplomonad Giardia, and the parabasalid Trichomonas (Table 19.1). This review seeks to introduce the basic biology of the most important pathogenic protozoans, that is, those parasites that have a significant global impact on human health or have importance as laboratory research systems, followed by an outline of the status of their respective genome sequencing projects and postgenomic research. Table 19.1 Genome sequence resources of parasitic protists.

Parasite

Web address

Source

Plasmodium sp.

www.plasmodb.org www.sanger.ac.uk/Projects/Protozoa/ www.cbil.upenn.edu/apidots/ www.tigr.org/tdb/parasites/

Genome sequences, shotgun, ESTs, GSSs, SAGEs, microarrays, annotations

Cryptosporidium sp.

cryptodb.org

Genome sequences, ESTs, GSSs

Toxoplasma gondii

www.toxodb.org www.sanger.ac.uk/Projects/Protozoa/ www.cbil.upenn.edu/apidots/

ESTs, shotgun

418

19 Genomics of Pathogenic Parasites Table 19.1 Continued.

Parasite

Web address

Source

Gregarina nephandrodes

www.cbil.upenn.edu/apidots/

ESTs

Sarcocystis neurona

www.cbil.upenn.edu/apidots/

ESTs

Neospora caninum

www.cbil.upenn.edu/apidots/

ESTs

Eimeria tenella

www.sanger.ac.uk/Projects/Protozoa/ www.cbil.upenn.edu/apidots/

Shotgun

Babesia bovis

www.sanger.ac.uk/Projects/Protozoa/ www.cbil.upenn.edu/apidots/

ESTs

Theileria annulata

www.genedb.org www.sanger.ac.uk/Projects/Protozoa/

Shotgun

Leishmania spp.

www.genedb.org, www.sanger.ac.uk/Projects/Protozoa/

Genome sequence data, shotgun

Trypanosoma cruzi

Tcruzidb.org www.tigr.org/tdb/parasites/

Genome sequence data, shotgun, ESTs, GSSs, annotations

Trypanosoma brucei, T. vivax T. congolense

www.genedb.org www.sanger.ac.uk/Projects/Protozoa/ www.tigr.org/tdb/parasites/

Genome sequence data, shotgun, ESTs, GSSs, annotations

Entamoeba spp.

www.genedb.org www.sanger.ac.uk/Projects/Protozoa/ www.tigr.org/tdb/parasites/

Shotgun

Trichomonas vaginalis

www.tigr.org/tdb/parasites/

Shotgun

Giardia lamblia

www.mbl.edu/Giardia

Genome sequence, annotation, SAGEs

19.1 Exploring the Genomes of Pathogenic Protozoans

Many parasitic protozoan genomes are relatively small, ranging from approximately 9 Mbp in Theileria and Cryptosporidium to upwards of 170 Mbp in Trichomonas vaginalis (Table 19.2), thus facilitating sequencing efforts of multiple new organisms. In concert with knowledge of complete genome sequences, gene expression profiling has emerged as an important tool to study gene function in genetically intractable organisms and has transformed the traditional “gene-bygene” analyses originating from functional cloning methodologies. While genome sequence information itself contains considerable biological significance, and therefore data concerning DNA content, gene organization, and chromosomal

19.1 Exploring the Genomes of Pathogenic Protozoans

composition has informative appeal, the definition of functional pathways using bioinformatics tools (e.g., Basic Local Alignment Search Tool (BLAST)), microarrays, proteomics, and confirmation of function via gene disruption and transfection of experimental chimeras will have a dramatic impact on exploring parasite biology. The further challenge, however, will be the translation of this knowledge into alleviation of disease.

Table 19.2 Genome sizes of parasitic protists.

Parasite

Genome size

Number of chromosomes

Cryptosporidium parvum

9 Mbp

8 chromosomes

Theileria annulata

10 Mbp

4 chromosomes

Giardia lamblia

12 Mbp

5 chromosomes

Entamoeba histolytica

20 Mbp

14 chromosomes

Plasmodium falciparum

23 Mbp

14 chromosomes

Leishmania major

34 Mbp

36 chromosomes

Trypanosoma cruzi

40 Mbp

Unknown

Eimeria tenella

60 Mbp

14 chromosomes

Toxoplasma gondii

estimated 80 Mbp

11 chromosomes

Trichomonas vaginalis

estimated 170 Mbp

6 chromosomes

To facilitate data dissemination and utilization, most publicly funded genome sequencing efforts, as well as projects generating expressed sequence tags (EST) and gene-survey sequence tags (GSS) sequencing, provide sequence download sites for real-time-generated data, either at their own website or via GenBank (Table 19.1). Additionally, web-based project sites typically post periodic summaries of significant similarity (BLAST) hits between their data and GenBank databases, and provide the capacity for web-based BLAST searches of their sequences to facilitate the identification of genes and gene discovery. For example, the genome sequence data for Plasmodium malaria parasites is presented on a website, http://www.PlasmoDB.org [6–8], that maintains data from several Plasmodium species. In addition to BLAST search engines, the site also permits gene retrieval via text-based searches, gene predictions, microsatellite marker mapping, ePCR, and searches based upon gene features or user-defined motifs. Similar databases for other parasitic protists, e.g., http://www.ToxoDB.org [9], http://TcruziDB.org [10], and http://CryptoDB.org [11], are additional valuable resources (Table 19.1). A comparative genome database, ApiDots (www.cbil.upenn.edu/apidots/), provides integrated access to publicly available mRNA/EST sequences and was con-

419

420

19 Genomics of Pathogenic Parasites

structed for the systematic approach of apicomplexan genome analysis [12, 13]. The database currently incorporates ESTs from several parasite species of clinical and/or veterinary interest, including Eimeria tenella, Neospora caninum, Plasmodium falciparum, Sarcocystis neurona, Gregarina niphandrodes, and Toxoplasma gondii. Additional integrated parasite databases are provided by the Wellcome Trust Sanger Institute and the Institute for Genomic Research (TIGR; for web addresses see Table 19.1).

19.2 The Shaping of the Proteomes of the Pathogenic Protists

The availability of numerous complete genome sequences will allow phylogenetic studies having a whole-genome basis, thus improving our understanding of the radiation of the protozoans and their relative relationships to the metazoans. The protozoans as a group might be envisioned as deeply branched lineages of freeliving protozoans, from which parasitic members have independently evolved along multiple branches (Fig. 19.1). The branching of the protozoan lineages, as well as their divergence from the crown group of multicellular eukaryotes, is at

Fig. 19.1 Phylogenetic tree showing the relative positions of the important pathogenic protozoans, their notable free-living relatives (indicated by a star), and the crown group composed of plants, fungi, slime molds, and animals. The dashed line at the base of the

tree indicates a proposed genome fusion event creating the protozoan lineage ([145], reviewed in Ref. [146]). Question marks refer to unresolved nodes (after Refs. [19, 39]), and the overall tree is drawn in part from Refs. [17, 20].

19.3 Role of Lateral Gene Transfer in Protozoan Genome Plasticity

present poorly defined, and models have at their extremes an explosive “big bang” radiation from a primordial eukaryote (with a generally unresolvable phylogeny and no clear outgroup) versus an orderly succession of deeply branching lineages [14–19]. An intuitive grouping places the plants and metazoans together as a crown group distinct from protozoans, and this is supported with respect to the apicomplexans plus Giardia using pooled amino acid sequence concatamers afforded by whole genome sequence data [20]. It is hoped that the imminently available kinetoplastid genome sequence information, as well as information from other free-living and parasitic protists, will further refine our understanding of the phylogeny of the protozoans. All parasitic protozoans had free-living ancestors; for example, an ancestor common to Leishmania and Euglena, or common to Tetrahymena and Plasmodium. Whereas the parasitic protozoans encounter an elaborate interplay in evading the innate and adaptive host immune response (see Ref. [21] for a superb recent discussion), the free-living protozoans are driven by environmental pressures or evolutionary selection stemming from predation and prey. In fact, it has been proposed that the origin of multicellularity was an invention conferring advantage against predation [22, 23], and in turn the increasing complexity of the metazoans opened the opportunity for the numerous independent inventions of parasitism. There might be more than mere irony in these observations, because cellular mechanisms underpinning avoidance of predation, mediating search for prey, or sensing other environmental changes might have created cellular mechanisms that later facilitated the avoidance of an immune response. A foremost example for study in this regard is the possible relationship between antigenic variation in the free-living ciliates and the related alveolates, the pathogenic apicomplexans (Fig. 19.2). It is therefore of keen interest to compare the proteomes, particularly the protein-trafficking machineries and repertoires of surface proteins, encoded by the soon to be completed genome sequence of the ciliate Tetrahymena with the available complete genome sequences of the phylogenetically related apicomplexans Plasmodium and Cryptosporidium, as well as to initiate new genome sequencing projects for free-living protozoans.

19.3 Role of Lateral Gene Transfer in Protozoan Genome Plasticity

The adaptation of pathogenic protozoans to specific parasitic niches required the emergence, or “invention,” of novel metabolic pathways and extracellular proteins. In turn, adaptation to the host environment affords the opportunity to eliminate select pathways rendered obsolete via metabolic scavenging from the host, thereby creating a highly evolved “streamlined” parasite proteosome. Acquisition of new metabolic pathways might have occurred following endosymbiotic events and transfer of select genes to the nuclear genome, as evidenced by the many examples of bacterial-like genes in protozoan genomes [20, 24, 25] and the shuttling of organelle-encoded genes to the nuclear genome, as is known to have

421

422

19 Genomics of Pathogenic Parasites

occurred extensively in mitochondrial, chloroplast, and apicoplast genomes [20, 26]. For example, the bacterial component of apicomplexans might be largely attributable to the cyanobacterial endosymbiont component [27–31]. In addition, parasites have probably captured discrete genes via multiple lateral transfer events, perhaps by “sampling” of foreign DNA within their intracellular parasitic niche [32]. Although bacteria and archaea were relatively liberal in exchanging genetic information, in the eukaryotic lineages the extent of isolated lateral gene transfer as against gene acquisition via endosymbiotic events is unknown. Evidence suggests that within extracellular proteins the apicomplexans have acquired animal- and bacterial-like domains via lateral transfer [20], and to a much greater extent than the kinetoplastids and Giardia (Fig. 19.2) – a hypothesis that can be tested via whole-genome annotations and phylogenetic analyses. As might be anticipated, the most highly evolving functional protein class is composed of extracellular receptors mediating adhesion, recognition, and response to host cells and tissues, and protection from the environment [20]. The catalogs of surface proteins have been largely invented in a lineage-specific fashion; that is, it is unlikely that a kinetoplastid such as Trypanosoma will possess more than just a few surface proteins having a common vertical ancestor relationship with the apicomplexan Plasmodium, and likewise Plasmodium and Cryptosporidium share few surface proteins (Fig. 19.1; see Section 19.4). It is hoped that the role of isolated lateral transfer events, as opposed to bulk acquisition of new genes via endosymbiotic events, will be better understood through whole-genome sequence annotations and comparative genomics. Regardless of the origin of captured genes, via these mechanisms parasites probably acquire new metabolic capacities conferring specific adaptations.

19.4 The Apicomplexa

The Apicomplexa have in common with dinoflagellates and ciliates (e.g., Tetrahymena, Paramecium) a subpellicular network of flattened vesicles (“alveoli”) that are the ultrastructural justification for their inclusion in a common protozoan clade and the inspiration for the grouping name, Alveolata. This structural taxonomy is supported by phylogenetic analysis of small subunit rRNA, as well as combined protein sequence phylogenies. The evolutionary origin of the Apicomplexa within the Alveolata is unknown, but it is thought that they share ancestry with dinoflagellate parasites of marine polychaetes. All apicomplexans are parasitic and possess a unique apical secretory complex and specialized secretory organelles, termed rhoptries and micronemes, mediating locomotion, tissue disruption, and invasion of target host cells. Evolutionary landmarks in the radiation of the Apicomplexa remain speculative but resulted in specialization to parasites of invertebrate, vertebrate, and insect hosts, as well as complex life cycles involving multiple hosts, including insect transmission vectors. The Apicomplexa can be generally classified into five groups: eugregarines, haemogregarines, haemosporines

19.4 The Apicomplexa

(e.g., Plasmodium, Haemoproteus), eimeriorines (the coccidia, e.g., Toxoplasma, Sarcocystis, Eimeria), and piroplasms (e.g., Babesia, Theileria). As complete apicomplexan genome sequences are acquired and incorporated into comparative analyses, a better understanding will be acquired of the evolutionary adaptations and functional relationships of the apicomplexans, such as metabolic pathways associated with parasitic niches, the molecular basis of different host cell invasion mechanisms and tissue tropisms (e.g., hepatocytes, erythrocytes, and insect tissues in Plasmodium versus lymphocytes and tick tissues in Theileria), and an understanding of differentially present cellular structures such as flagellated gametes (present in Plasmodium but not in Cryptosporidium), parasitophorous vacuole (absent in Theileria), and extracellular cyst stages (present in Cryptosporidium and Toxoplasma but not in Plasmodium). The genome sequences for Plasmodium sp. and Cryptosporidium sp. are currently complete and it is anticipated that the next year or two will see the completion of genome sequences for Toxoplasma gondii, an opportunistic pathogen causing cerebral toxoplasmosis in immunocompromised individuals, and the cattle pathogen Theileria annulata, as well as largely complete DNA sequencing data for the economically important chicken pathogen Eimeria sp. In addition, the pending availability of complete genome sequence information for the alveolate Ciliophora Tetrahymena will hopefully shed light on evolutionary adaptations having origins in the distinction between free-living and parasitic lifestyles. Following are thumbnail sketches of genome projects for the important apicomplexan human pathogens Plasmodium, Cryptosporidium, and Toxoplasma. 19.4.1 Plasmodium, the Malaria Parasite

The tropical disease malaria is caused by four species of the apicomplexan parasite Plasmodium, and is transmitted by anopheline mosquito vectors. Approximately 40% of the world’s population lives in areas where malaria is transmitted, and with an estimated 300–500 million cases every year and an annual death toll of more than two million people, malaria is still considered the most significant tropical disease. The mortality levels are greatest in subsaharan Africa, where children under 5 years of age account for 90% of all deaths due to malaria [33]. In most endemic countries the development of drug and vaccine treatments, as well as disease control measures, are undermined by the spread of drug resistance, extreme poverty, and the collapse of health services under the impact of HIV or war. An international effort was launched in 1996 to determine the complete genome sequence of P. falciparum, with the expectation that this database would open new avenues for research. The project took 6 years to complete, a remarkable achievement considering the sequencing and assembly complications stemming from the extreme A+T-richness of the genome sequence (up to 86% A+T in noncoding regions), and was unveiled in 2002 by Gardner and co-workers [3]. The 22.8-Mbp genome of P. falciparum is comprised of 14 linear chromosomes, vary-

423

424

19 Genomics of Pathogenic Parasites

ing in size from 0.643 Mbp to 3.29 Mbp, as well as a circular 35-kbp plastid-like genome and a 6-kbp linear mitochondrial genome. Annotation of the genome predicted 5268 proteins, 60% of which lack a known function or a homologue in any other organism and are defined as hypothetical proteins. These hypothetical proteins either mediate functions that are unique to Plasmodium (e.g., surface proteins involved in antigenic variation, described further below, or proteins involved in trafficking membranes and structures unique to Plasmodium), are conserved proteins of unknown function, or have diverged sufficiently to be unrecognized by simple BLAST searches. Metabolic pathways are of particular interest for identifying targets toward developing new antiparasitic drug therapies. Analysis of the malaria genome sequence has provided insight of the P. falciparum metabolic pathways, with 14% of proteins being identified as enzymes [3]. In the erythrocytic stages, P. falciparum relies on anaerobic lactate glycolysis for energy production, with regenera+ tion of NAD by conversion of pyruvate to lactate. All genes encoding the essential enzymes are now identified, as well as genes necessary for a complete TCA cycle. Genome annotation has also confirmed our understanding of examples of metabolic streamlining; for example, the parasite is incapable of de novo purine synthesis or the synthesis of most amino acids [3], and is reliant on salvage from the host. Genome sequence information provides a wealth of knowledge concerning protozoan organelles having an endosymbiotic origin. For example, protozoans within the phylum apicomplexa, as well as related dinoflagellates and ciliates, harbor a plastid organelle homologous to the chloroplasts of plants and algae [34–36]. This apicoplast probably arose through a process of secondary endosymbiosis [37, 38] and, although the full extent of its function is unknown, it appears to be essential for the anabolic synthesis of fatty acids, isoprenoids, and heme. The circular 35kbp apicoplast genome encodes only 30 proteins, and thus the apicoplast proteome is supplemented by as many as 500 nuclear-encoded proteins (10% of the predicted parasite proteome) that are trafficked into the organelle. Because of its important role for the parasite’s survival, the apicoplast is a promising target for drug treatment. Surface-expressed P. falciparum proteins mediate adhesive and recognition interactions with the host and are important for malaria pathogenesis, as well as constituting catalogs of candidate targets for vaccine development, due to their potential exposure to human immune system surveillance. Whole-genome sequence comparisons of the catalog of surface protein-encoding genes have demonstrated an extensive role of lineage-specific gene invention (e.g., genes present in Plasmodium but absent in other apicomplexan genera) and paralogous gene expansion [3, 39]. Prominent among these are multigene families such as the variant gene family (var) clustered in the subtelomeric regions of the P. falciparum genome, as well as a second subtelomeric repetitive gene superfamily termed rifin/stevor (Fig. 19.2). The P. falciparum 3D7 isolate genome (the source DNA for the genome sequence project) contains 59 var, and 179 rifin/stevor genes (numbers include apparent pseudogenes and gene truncations). The var genes encode the

19.4 The Apicomplexa

Fig. 19.2 Domain architectures for select surface proteins from the pathogenic protozoa, demonstrating the almost exclusive nonoverlap in the catalogs of extracellular proteins. Red stars indicate proteins composed of domain(s) invented in a lineage- or cladespecific fashion (e.g., a molecule found in

Plasmodium but not in the apicomplexan Cryptosporidium or in kinetoplastids). The remaining domains originated either via vertical inheritance (followed by gene loss in widespread lineages) or by lateral transfer from a bacterial or metazoan source. (This figure also appears with the color plates.)

erythrocyte membrane protein PfEMP1 that is expressed on the surface of infected red blood cells and mediates adherence to host endothelial surface proteins, resulting in the sequestration of infected cells in a variety of organs. PfEMP1 class surface proteins are capable of eliciting host protective antibody responses, which are effectively countered by the parasite via transcriptional switching between var genes and resulting antigenic variation and immune eva-

425

426

19 Genomics of Pathogenic Parasites

sion. The combination of antigenic variation, at a level of both individual parasites and parasite communities, and meiotic recombination within var multigene loci mediating antigenic diversity (similarly at a level of individual parasites or communities) conspires to facilitate chronic infections and efficient transmission. PfEMP1 proteins are thus central to the pathogenesis of malaria and to the induction of protective immunity. The functions of rifin/stevor gene products are unknown. Rifins are also expressed on the surface of infected erythrocytes and undergo antigenic variation [40]. Proteins encoded by stevor genes show extensive sequence similarities to rifins, but are less polymorphic. Stevor proteins are thought to be expressed in gametocytes [41, 42] and asexual forms, some found within Maurer’s clefts [43]. A second class of apicomplexan surface proteins are those composed of remarkable combinations of extracellular domains that were probably acquired by the parasite via lateral transfer and accresced into multidomain adhesive molecules (Fig. 19.2; see Section 19.3; see also Ref. [20]). Annotation of complete genome sequences, and screening using panels of domain-specific PSSM profiles (position-specific scoring matrices, see, e.g., http://www.ncbi.nlm.nih.gov/Structure/ cdd/cdd.shtml), allows rapid identification of all members of this catalog of surface proteins and their subsequent investigation as potential targets for vaccine development. Perhaps the most dramatic example of apicomplexan proteins exhibiting multiple adhesion modules of animal or bacterial origin is the LCCLdomain-containing [44] protein family. This gene family includes five members, which were shown to be expressed predominantly in the sexual stages [45–49]. Several LCCL-domain-containing genes are conserved as orthologues present in other apicomplexan parasites, such as Cryptosporidium parvum [20, 50], Toxoplasma gondii, and Theileria annulata, indicating an “apicomplexan” function that is highly conserved across the apicomplexan clade (see “Examples of apicomplexan’ proteins” in Fig. 19.2). Gene disruption experiments on select LCCL proteins indicate an essential role in transmission to the mosquito, highlighting them as candidates for subunits of transmission-blocking vaccines [46, 49]. Comparison of predicted proteins from the sequenced genome of the rodent parasites Plasmodium yoelii [2] as well as the nearly complete P. chabaudi and P. berghei genome sequences with those of P. falciparum indicated that of 5878 P. yoelii genes [2] and 766 P. chabaudi genes [51], approximately 50% are similar to P. falciparum predicted proteins. Identification of orthologues within biochemical pathways supports the use of rodent models to investigate novel drug targets as well as mechanisms of drug activity. Gene synteny, the clustering of two or more orthologous genes in two or more species, may indicate evolutionary preservation of linkage due to common function [52, 53]. Investigation of 42 orthologous genes conserved between rodent malaria parasites and P. falciparum showed that 26 appeared to be conserved within ten synteny groups [54], indicating that more than 60% of housekeeping genes shared between these species are likely to be within a conserved synteny. The various Plasmodium genome projects now provide the opportunity to perform comparative genomics in more detail, addressing questions of conservation, evolution, and function. The P. falciparum completed

19.4 The Apicomplexa

genome sequence further provides a framework on which to model related Plasmodium proteomes in comparative studies. As mentioned above, considerable genome sequence and computational information on multiple Plasmodium species is available at the Plasmodium genome sequencing consortium website (www.PlasmoDB.org; Table 19.1). The website incorporates the recently completed P. falciparum genome sequence and annotation [3], the recently completed P. yoelii genome sequence [2], as well as emerging draft sequence and annotation from other Plasmodium sequencing projects. The complete genomic sequence of the human host [55, 56] and the genome sequence of the African Anopheles gambiae mosquito vector [57] are also now available, and thus the genome sequencing is now complete for the trio of organisms involved in malaria. 19.4.2 Cryptosporidium

The apicomplexan coccidian parasite Cryptosporidium is an important opportunistic pathogen in AIDS and leads to unrelenting disease or death in immunosuppressed individuals. Despite intensive efforts over the past 20 years there is currently no effective therapy for treating or preventing infection in humans. Cryptosporidium is transmitted via external, typically waterborne, oocysts that are highly resistant to conventional water disinfection, making the vulnerability of the water supplies to Cryptosporidium infections a major public health threat in undeveloped countries, as well as a biodefense concern in the United States. The genome sequences for the zoonotic human and veterinary pathogen C. parvum [4] and the human pathogen C. hominis [5] were recently completed. The two genomes share 97% sequence similarity and apparently identical gene synteny, although gaps in the sequencing of the latter genome have thus far obscured efforts to determine the molecular basis of the host and tissue preferences of the two pathogens, as well as molecular investigations to confirm the recent reclassification into distinct species [58]. Sequence annotation has confirmed the absence in this pathogen of conventional drug targets of the kind currently pursued for the control and treatment of other parasitic protists. This is due to a streamlined metabolism and dependence on the host for nutrients and metabolic substrates. In this vacuum of drug targets, the complete genome sequence has provided an excellent resource for understanding metabolic reconstruction of important pathways, and has highlighted numerous enzymes that might be targeted for pharmaceutical intervention (discussed further below). The Cryptosporidium genome is of a relatively small size, 9.3 Mbp within eight chromosomes versus the 22.8 Mbp, 14-chromosome P. falciparum genome sequence, the result of a reduced proteome (3807 predicted genes versus 5268 in P. falciparum), shorter intergenic regions, and a relative scarcity of introns (roughly 10% versus 50% or more in P. falciparum). A comprehensive genome database, http://CryptoDB.org [11], currently houses the genomic sequence data for both C. parvum (bovine Type 2 IOWA strain) and C. hominis (former human Type 1 H strain), in addition to avail-

427

428

19 Genomics of Pathogenic Parasites

able EST and GSS sequences. A second genome browser (http://134.84.110.219/ cgi-bin/gbrowse/crypto909) is an excellent resource for studying the ordering of genes within the chromosomes, but is not particularly user friendly. Genome sequence annotation and comparison with other apicomplexans revealed that Cryptosporidium is a greatly streamlined parasite, and is highly adapted to an intracellular lifestyle within the anaerobic environment of the host intestinal epithelium. For example, the parasite relies on nucleotide uptake from the host, and has a nucleotide metabolism that bears striking resemblance to the reduced pathways found in bacterial pathogens carrying minimal genomes. The parasite also lacks the capacity for de novo amino acid synthesis and thus has a further reliance upon salvage. The parasite appears to have families of amplified paralogous genes encoding amino acid and sugar transporters that may function to transport uptake of host nutrients [4]. These and numerous other predicted membrane-spanning proteins have sequence similarities to known transport proteins from other species [59] and may be exploited for chemotherapy. Genome annotation has further revealed novel extracellular proteins, many arranged in clusters of paralogous genes, that are unique to Cryptosporidium and might be candidates for vaccine development [4, 20]. Cryptosporidium lacks an apicoplast and its associated genome, and in contrast to other apicomplexans its fatty acid metabolism appears to be entirely cytoplasmic, mediated in part by a type I fatty acid synthetase [60]. These reductions could explain the parasite’s resistance to macrolide antibiotics targeting the plastid [61]. The parasite also has a greatly reduced mitochondrion structure, including loss of the mitochondrial genome, elimination of the Embden–Meyerhoff pathway, and most components of the electron transport machinery [4, 62]. Cryptosporidium is a notoriously unyielding experimental system, hampered by the lack of a continuous culture system and transformation methodologies. These hurdles have resulted in severely thwarted efforts to understand the biology and control methods for this parasite. Recently transgenic models have been explored in an effort to overcome these limitations, specifically, the use of Toxoplasma gondii to take advantage of a compatible apicomplexan biology and excellent experimental accessibility [63, 64]. Toxoplasma has recently been explored as a tool to build genetic screens for the identification of enzymes in C. parvum nucleotide metabolism [25]. The expression of Cryptosporidium enzymes and antigens in Toxoplasma can provide an urgently needed vehicle for the investigation of introduced genes for drug and vaccine screening. 19.4.3 Toxoplasma

Toxoplasma gondii is an intracellular apicomplexan parasite that infects humans and virtually all warm-blooded organisms. Although infection in healthy individuals is normally benign, it can be deadly in association with pregnancy or immunosuppressive diseases. The T. gondii genome is approximately 80 Mbp in size within 11 chromosomes, and genome sequence is nearing completion via the whole-gen-

19.5 The Pathogenic Kinetoplastids

ome shotgun approach. The database http://www.ToxoDB.org [9] contains all publicly available genomic sequence data for two strains, RH and ME49-B7, and clustered EST assemblies from multiple strains representing all stages of the parasite life cycle. The genes of T. gondii are much more intron-rich than those of Plasmodium or Cryptosporidium (but not to the apparent degree of the cattle apicomplexan parasite Theileria) and therefore of particular concern regarding gene predictions. As a general rule, genes for surface antigens and proteins secreted from micronemes, rhoptries, and dense granules have fewer introns than housekeeping genes. This is perhaps a reflection of their more recent evolutionary origin, or may provide clues to mechanisms driving accelerated divergence in genes whose products function directly at the host–parasite interface. As in Plasmodium and Cryptosporidium, many surface protein-encoding genes are telomeric or subtelomeric, thus supporting an emerging theme that telomeric mechanisms might accelerate the mutation rate of surface protein-encoding genes, in addition to mechanisms mediating antigenic switching. Due to its easy culture and amenability to genetic manipulation, T. gondii serves as a model system for experimentally onerous apicomplexan parasites such as P. falciparum and C. parvum (reviewed in Ref. [64]). Toxoplasma has long served as a model system for studying apicomplexan motility and cell invasion (reviewed in Refs. [65, 66]). Cell biology studies are more readily performed in T. gondii due to the high efficiency of transient and stable transfection, the availability of numerous cellular markers, and the relative ease with which the parasite can be visualized using advanced microscopic techniques [64]. The first transfection of an apicomplexan was reported in T. gondii in 1993 [67–69], which rapidly led to the development of a variety of tools for genetic manipulation. One of the first effective strategies for stable transformation in T. gondii was the genetic engineering of mapped mutations of malaria dihydrofolate reductase from lines resistant to antifolates [67, 70]. This resulted in an effective selectable marker for T. gondii and subsequently led to the first effective stable selectable marker for Plasmodium species [71]. Various genes have now been expressed in T. gondii, most frequently for testing of T. gondii as a possible malaria vaccine delivery system, e.g., for the expression of the Plasmodium sporozoite circumsporozoite protein [72, 73]. As described above, several groups have successfully used T. gondii as an expression system to explore aspects of C. parvum biology, for instance the expression of the C. parvum sporozoite surface CpGp40/15 protein in T. gondii tachyzoites by transient transfection [74].

19.5 The Pathogenic Kinetoplastids

The pathogenic Kinetoplastida, principally Trypanosoma brucei, Trypanosoma cruzi, and Leishmania spp., are members of the diverse protozoan class Euglenozoa. The euglenids, such as the laboratory standard, Euglena gracilis, are distinguished by the presence of chloroplasts, whereas both classes share hallmarks of flagella, a

429

430

19 Genomics of Pathogenic Parasites

contractile vacuole emptying into a flagellar pocket [75], and frequently a separate cytopharynx, a cortical microtubule cytoskeleton, and peroxisomes or glycosomes [76]. The kinetoplastids are distinguished by a single mitochondrion containing an amplified circular DNA mass, termed the kinetoplast (kDNA), that is linked to the base of the flagellum [77, 78]. In contrast to study of the technically onerous apicomplexan parasites, kinetoplastid research had a historical head start in the ability to cultivate and isolate select life cycle stages of kinetoplastid populations, as well as the availability of rapid transformation and gene disruption methodologies, all resulting in a tremendous boost in our understanding of their cellular and molecular biology. These biological insights have been punctuated by stellar discoveries, such as the presence of RNA editing and RNA trans-splicing [79–82], discovery of GPI-linked surface proteins and carbohydrates ([83], reviewed in Ref. [84]), and understanding of mechanisms of antigenic variation [85, 86]. As with the free-living and pathogenic alveolates Tetrahymena and Apicomplexa, it would be of great interest to compare the proteome of the free-living Euglena with the pathogenic kinetoplastids. It would be also of interest to have available complete genome sequence information for a free-living kinetoplastid, but both of these sequencing projects await research funds. Trypanosoma cruzi, like Leishmania spp., is an intracellular parasite, whereas Tr. brucei remains in luminal spaces throughout its life cycle. The three parasites are distinct in the choice of an insect vector, with Leishmania utilizing the sand fly Phlebotomus, Tr. cruzi the reduviid Triatoma infestans, and Tr. brucei the tsetse fly (Glossina). Tr. brucei is remarkably distinct from Tr. cruzi and Leishmania spp. (as well as apicomplexans Plasmodium and Cryptosporidium) in that the parasite possesses the molecular machinery supporting RNA interference (RNAi) methodologies [87, 88]. In addition to the impact the RNAi pathway has on the ease of research, the presence or absence of the pathway provides an opportunity of understanding pathway components via comparative genomics. Once the complete genome sequences for kinetoplastids become available to complement available genome sequences for apicomplexans Plasmodium and Cryptosporidium (Table 19.1), and the fungi Saccharomyces cerevisiae, for which there is disparate presence of the RNAi pathway, it will be possible to use comparative genomics as a foundation supporting biochemical dissection of the pathway and the use of small RNAs in transcriptional control and heterochromatin assembly. 19.5.1 Trypanosoma

The African parasites Tr. brucei gambiense and Tr. b. rhodesiense cause human sleeping sickness, whereas Tr. b. brucei infects cattle in subsaharan Africa, causing the Nagana disease. The South American parasite Tr. cruzi, on the other hand, is the causative agent of human Chagas disease. Tr. brucei posses a nuclear genome as well as a 35-Mbp kinetoplast genome. The nuclear genome is split into three classes of chromosomes and contains 11 megabase chromosomes (0.9–5.7 Mbp),

19.6 The Pathogenic Diplomanad Giardia and the Parabasalid Trichomonas

intermediate chromosomes (300–900 kbp), and minichromosomes (50–100 kbp). Chromosome 1 was recently published [89], and genome sequencing of Tr. brucei and Tr. cruzi species is ongoing (Table 19.1). Genome sequencing has helped our understanding of the genomic organization of the variant surface antigen genes (VSG; Fig. 19.2), which encode the coat proteins that uniformly cover the surface of the Tr. brucei bloodstage parasite. Tr. brucei evades the host immune system by stochastic switching of surface antigen expression in a fraction of the parasite population. The active VSG gene is transcribed in one of approximately 20 telomeric expression sites, whose structures were recently characterized via sequencing [90]. 19.5.2 Leishmania

Leishmania parasites are transmitted to humans by the bite of a sandfly and cause visceral, cutaneous, or mucosal lesions. An estimated two million people are infected per year in tropical and temperate regions of the word. Sequencing of the Old World pathogen L. major began in 1994, and the complete sequences of 2 out of 36 chromosomes have been published [91–92]. The Leishmania genome sequencing project recently confirmed the remarkable observation of large polycistronic transcribed units (reviewed in Ref. [93]) composed of blocks of genes encoded on the same DNA strand [91, 92, 94]. All identified genes in the completed chromosomes 1–3 are oriented within such polycistronic clusters, with the creation of discrete mRNAs created by trans-splicing [95]. Leishmania genes, thought to be largely free of introns, constitute a 34-Mbp high-density genome, 45% of which are estimated to be protein coding regions. The recent shotgun sequencing of the Latin American pathogen L. braziliensis obtained 15% of the parasite haploid genome. BLAST search revealed 45.3–60.2% similarity to L. major [96].

19.6 The Pathogenic Diplomanad Giardia and the Parabasalid Trichomonas

The diplomonad Giardia is an exceedingly common waterborne intestinal parasite responsible for global disease, with a chronic burden particularly in children in developing countries [97, 98]. The parasite has the distinctive cellular signatures of a flattened pear shape, concave ventral disk, and two anterior nuclei in the trophozoites stage (four nuclei in the transmissible nonmotile cyst stage). It has a relatively simplified cellular organization but retains an endoplasmic reticulum, Golgi apparatus, and a “rudimentary” organelle termed a mitosome (described further below). Giardia possesses both anterior and posterior flagella, as well as two recurrent dorsal flagella. The parasite adheres to intestinal epithelial cells using the unique concave ventral disk that creates a mechanical suction via contraction of myosin–actin filaments, perhaps aided by the beating of the recurrent

431

432

19 Genomics of Pathogenic Parasites

dorsal flagella. Feeding occurs via pinocytosis from the intestinal lumen at the dorsal face of the parasite. Giardia and the apicomplexan Cryptosporidium share two notable characteristics: a highly evolved scavenging parasitism resulting in the loss of multiple metabolic pathways; and an “anaerobic” lifestyle, indicated by the absence of many mitochondrial functions, including lack of the Embden–Meyerhoff pathway and mitochondrial electron transport (reviewed in Ref. [97], see also Ref. [4]). Both parasites have an external environmentally durable cyst stage, but whereas Cryptosporidium invades host intestinal epithelial cells and resides within a parasitophorous vacuole, Giardia remains extracellular throughout its life cycle. This difference might be reflected in the respective catalogs of lineage-specific surface proteins: no adhesive surface proteins have been identified in Giardia, and the large repertoire of cysteine-rich variant surface antigen proteins (vsps) probably function solely in protection from the harsh intestinal environment and evasion of mucosal immune surveillance [99, 100]. The complete nucleotide sequence for Giardia lamblia has been submitted to GenBank (Table 19.1; see www.mbl.edu/ Giardia), but no final project summary has been published at the time of this writing. Parabasalids, such as Trichomonas vaginalis, are also typically “anaerobic,” and some members are pathogenic in the urogenital or intestinal tract. They possess three or more basal flagella plus a recurrent membrane-associated flagellum, most notably in Trichomonas, that creates an undulating wave. Parabasalids are also noteworthy for the presence of acristate dense granules termed hydrogenosomes that are involved in pyruvate metabolism leading to the generation of molecular hydrogen [101–103]. The parasite lacks a cyst stage and is dependent on sexual transmission [104]. A Trichomonas genome sequencing project is nearing completion (Table 19.1) [105], and it is hoped that proteome annotation will reveal insights into the hydrogenosome structure, metabolic pathways, and tissue-invasive and tissue-adhesive mechanisms of this pathogenic protozoan. Comparative genomics might reveal common themes in the adaptations to anaerobic, parasitic lifestyles in Trichomonas, Giardia, and Cryptosporidium. For example, the Cryptosporidium “remnant” mitochondrion and the Giardia “mitosome” both appear to be sites of iron–sulfur protein maturation [4, 106, 107]. Little is known regarding Trichomonas virulence factors, and to date the most prominent studies center on a possibly large family of cysteine proteinases [108–110], adhesins [111], and a possibly large family of surface proteins containing a leucine-rich repeat [112], all predicted to be involved in tissue invasion and epithelial adhesion. Giardia and Trichomonas have been traditionally regarded as “early branching” eukaryotes, based upon the widespread and apparently erroneous belief that they lack a mitochondrion, supporting the attractive hypothesis that this indicates an evolutionary branching prior to the appearance of this organelle in eukaryotes. Although their early branching status may yet hold true, as suggested from phylogenetic analyses of select genes [113–115], it has recently been realized that the vestigial or absent mitochondrion is either the result of a secondary loss or remains as a highly evolved diminutive organelle, as evidenced by nuclear genes of apparent

19.7 Postgenomic Strategies and the Search for Cure

mitochondrial or a-proteobacterial origin [107, 116–118]. Thus it is apparent that all known extant eukaryotes share a common ancestor having both a mitochondrion and nucleus. To date phylogenies have been based upon the presence or absence of specific traits, such as morphological structures, diagnostic gene fusions, or diagnostic insertions within genes, or analyses based upon the phylogenetic trees of discrete genes (for an excellent discussion see Ref. [19]). However, any approach that is reliant on discrete evolutionary events is prone to errors arising from convergence or lateral gene transfer; and it is therefore hoped that phylogenies based upon whole-genome analyses of sequence data from multiple protozoans, both pathogenic and free-living, will refine our understanding of the “last common ancestor” of eukaryotes as well as delineate the branching of the protozoans and place the protozoans in proper phylogenetic order with respect to the metazoans, fungi, and plants. More problematic is rooting Giardia and Trichomonas at the base of the protozoan tree, but whole-genome-based comparisons of protozoans, bacteria, and archaea might indeed confirm their place among the earliest branching protozoans. Once a protozoan phylogenetic tree is put in place, we can then address with confidence the role of endosymbiotic events and nuclear gene transfer, independent lateral gene transfer, and convergence in the evolution of the protozoans and adaptations conferring parasitism.

19.7 Postgenomic Strategies and the Search for Cure

Via sequencing of various parasite genomes and subsequent gene annotation, it is possible to predict the complete set of theoretical parasite gene products. Postgenome-sequence analyses attempt to confirm, support, and extend the genome annotation via hypothesis-based experimentation into the biological aspects of the parasite’s life cycle. In the parasitic protists for which sufficient sequence information is available, techniques are being applied for studying large-scale gene expression. Recent methodologies to identify proteins as research targets include “post-genomic” techniques such as the serial analysis of gene expression (SAGE), microarray screening, and proteomics. Delineation of life cycle stages of expression, followed by cellular localization studies, might provide clues as to how complex parasitic protists interact with their hosts, respond to drugs, and develop mechanisms of immune evasion or drug resistance. For the malaria parasite, insights provided by analysis of the annotated genome will probably lead to the discovery of new drug targets based upon novel parasite-specific pathways, as well as the identification of novel extracellular proteins that may serve as candidates for vaccine targets. New information on the basis of resistance to existing drugs will facilitate the development of co-treatments to reduce or reverse drug resistance, rescuing previously highly successful treatment regimens for future use (e.g., chloroquine and the P. falciparum chloroquine-resistance transporter gene). Finally, genetic manipulation of both the parasite and the insect vector to create

433

434

19 Genomics of Pathogenic Parasites

either attenuated strains or insects refractory to parasite invasion could produce therapeutic benefits. Principal post-genomic strategies are discussed in the following paragraphs in more detail, using P. falciparum as an example. 19.7.1 Gene Expression Analysis

Gene expression studies promise to reveal changes in gene expression between the different stages of the parasite life cycle. Quantification of gene expression in Plasmodium spp. is an essential requirement for a full understanding of parasite biology and has been achieved at the level of transcription using SAGE [119, 120], quantitative real-time PCR [121–123], and with microarrays [124–126]. Microarrays allow analysis of gene expression changes under a variety of conditions, or as a function of developmental stage. The Malaria Research and Reference Reagent Resource Center (MR4, www.malaria.mr4.org) has produced P. falciparum microarrays containing 70-nucleotide-long oligomers, representing the complete set of genes. To date, microarray studies have revealed an unusual program for transcription in the bloodstage cycle of the parasite, whereby each gene appears to go through a single episode of induction during the 48-h cycle, with many genes of related function being expressed at a similar point in a tightly regulated program [124–128]. Efforts to perturb this system via environmental stress or drug pressure may give an insight into the degree that asexual gene expression is “hard wired.” Gene expression studies further helped to identify over 1300 genes transcribed in the sporozoite stage. Sporozoites are the liver-infective stage of the Plasmodium pathogen and, until recently, little was known about protein expression and antigen exposure in these forms. Recent studies identified a variety of novel genes that might code for potentially important surface molecules and proteins essential for the development of exoerythrocytic liver forms [129–131]. 19.7.2 Proteomics

Proteomics refers to the large-scale analysis of proteins expressed under described conditions, within specific cellular compartments, or at a particular time in an organism’s life cycle. Advances in proteomics will be crucial in identifying differentially expressed proteins, which along with comparative genomic analysis, utilization of protein interaction maps, and an understanding of metabolic pathways, will help identify and prioritize targets for therapeutic intervention. Traditional methods of characterizing and identifying large numbers of proteins from complex protein mixtures have relied predominantly on two-dimensional gel electrophoresis [132] combined with N-terminal sequencing or mass spectrometry of individually prepared proteins. New proteomics methods are now available that are based upon resolving small peptides derived from complex protein mixtures by high-resolution liquid chromatography and directly identifying them by tandem

19.7 Postgenomic Strategies and the Search for Cure

mass spectrometry [133, 134], followed by sophisticated computer search algorithms against whole-genome sequence databases. One of the powerful features of this approach to protein identification is the ability to identify proteins expressed associated with membrane fractions. These newer proteomics methods have the potential to promote the identification of large numbers of proteins from various life cycle stages of Plasmodium, in turn supporting a better understanding of parasite biology and leading to the identification of new vaccine and drug targets [135–137]. With the advent of whole-genome sequence data, two different approaches to investigating the P. falciparum proteome were made in 2002 [47, 138]. Lasonder and coworkers [47] identified 1289 proteins by high-accuracy mass spectrometric proteome analysis of P. falciparum blood stages, of which 714 proteins were identified in asexual blood stages, 931 in gametocytes, and 645 in gametes. In the second study, Florens et al. [138] investigated four stages of the parasite life cycle (sporozoites, merozoites, trophozoites, and gametocytes) using multidimensional protein identification methods and identified more than 2400 proteins. Interestingly, the antigenic variant proteins var and rif, previously defined as molecules on the surface of infected erythrocytes (see Section 19.4.1 and Fig. 19.2), were also largely expressed in sporozoites. The sporozoite proteome appeared markedly different from all other stages, and almost half of the sporozoite proteins were unique to this stage. In contrast to sporozoites, which share an average of 25% of proteins with any other stage, trophozoites, merozoites, and gametocytes had between 20% and 33% unique proteins and shared between 39% and 56% of their proteins. Consequently, only 6% of proteins are common to all four stages, and these were predominantly identified as housekeeping proteins. The lifecycle-stage specificity of Plasmodium protein expression suggests that there is a highly coordinated expression of genes involved in common processes. Analysis of coregulated gene groups facilitates both searching for regulatory motifs common to upregulated genes, and predicting protein functions on the basis of the “guilt by association” model. When detected proteins were mapped onto all 14 chromosomes, a total of 98 clusters containing 3 loci, 32 clusters containing 4 loci, and 5 clusters containing 6 loci were identified [138]. The focus on analyses describing stage-specific multistage clusters will facilitate identifying stage-specific and general cis-acting sequences, and will help decipher gene expression regulation during the Plasmodium parasite life cycle. 19.7.3 Drug and Vaccine Development

Targeted development of drugs as well as vaccines against the pre-erythrocytic stages (i.e., the sporozoite and intracellular hepatic forms), the disease-causing blood stages, and the sexual stages, which might be relevant for transmissionblocking vaccines, require the identification of the expressed proteins at each stage, as well as the characterization of protein expression and function (Fig. 19.3). Proteome analysis revealed 439 proteins that have at least one trans-

435

436

19 Genomics of Pathogenic Parasites

membrane segment or a GPI sequence and 304 soluble proteins with a signal sequence, which are potentially secreted or transported to organelles [138]. Well over half of the secreted proteins and integral membrane proteins detected were annotated as hypothetical. The obvious interest in this class of proteins is that, with no homology to known proteins, they represent potential Plasmodium-specific proteins and may provide targets for new vaccine development. The description of metabolic pathways, on the other hand, provides new targets for drug development [139]. Of particular interest are those differing from the human host, which open up the possibility of designing drugs that are lethal to the pathogen but cause minimal side effects in humans. However, it is a long path from bioinformatics and functional genomics to the identification of potential drug and vaccine targets, evaluation of target candidacy and, ultimately, clinical trials (Fig. 19.3).

Fig. 19.3 The pipeline in drug and vaccine research of genome sequence determination, proteome annotation, target identification and validation.

19.7.4 Vector Genetics

For protozoan parasites that utilize an insect vector (e.g. Plasmodium, Babesia, Theileria), the vector itself constitutes a target for control of pathogen transmission and disease. Recent developments in the generation of transgenic mosquitoes for Aedes aegypti [140], Anopheles stephensi [141], and An. gambiae [142] have greatly facilitated study on the interaction of Plasmodium with the mosquito vector. Genetically modified mosquitoes will allow researchers to study the role of mosquito genes involved in immune recognition and destruction of parasite, and mos-

References

quito proteins involved in recognition of vector tissue by the parasite. A recently established gene silencing technique [143] enables investigations on the effect of mosquito vector proteins on the Plasmodium parasite [144]. The design of genetically modified mosquitoes preventing the parasite development within the mosquito, and thereby transmission and spread of the disease, are invaluable tools in studying parasite transmission biology, but it remains to be determined whether transgenic mosquitoes, either released or generated via dissemination of modifying transposable elements, will have utility in the fight against malaria.

References 1 Sachs, J., and P. Malaney. 2002. The

2

3

4

5

6

economic and social burden of malaria. Nature 415:680–685. Carlton, J. M.-R., S.V. Angiuoli, B.B. Suh, et al. 2002. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature 419:512–519. Gardner, M. J., N. Hall, E. Fung, 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419:498–511. Abrahamsen, M. S., T. J. Templeton, S. Enomoto, J. E. Abrahante, G. Zhu, C. A. Lancto, M., Deng, C. Liu, G. Widmer, S. Tzipori, G. A. Buck, P. Xu, A. T. Bankier, P. H. Dear, B. A. Konfortov, H. F. Spriggs, L. Iyer, V. Anantharaman, L. Aravind, and V. Kapur. 2004. Complete genome sequence of the apicomplexan Cryptosporidium parvum. Science 304:441–445. Xu, P., G. Widmer, Y. Wang, L. S. Ozaki, J. M. Alves, M. G. Serrano, D. Puiu, P. Manque, D. Akiyoshi, A. J. Mackeu, W. R. Pearson, P. H. Dear, A. T. Bankier, D. L. Peterson, M. S. Abrahamsen, V. Kapur, S. Tzipori, and G. A. Buck. 2004. The genome of Cryptosporidium hominis. Nature 431:1107–1112. Kissinger, J. C., B. P. Brunk, J. Crabtree, M. J. Fraunholz, B. Gajria, A. J. Milgram, D. S. Pearson, J. Schug, A. Bahl, S. J. Diskin, H. Ginsburg, G. R. Grant, D. Gupta, P. Labo, L. Li, M. D. Mailman, S. K. McWeeney, P. Whetzel, C. J. Stoeckert, and D. S. Roos. 2002. The

7

8

9

10

11

Plasmodium genome database. Nature 419:490–492. Bahl, A., B. Brunk, R. L. Coppel, J. Crabtree, S. J. Diskin, M. J. Fraunholz, G. R. Grant, D. Gupta, R. L. Huestis, J. C. Kissinger, P. Labo, L. Li, S. K. McWeeney, A. J. Milgram, D. S. Roos, J. Schug, and C. J. Stoeckert Jr. 2002. PlasmoDB: the Plasmodium genome resource. An integrated database providing tools for accessing, analyzing and mapping expression and sequence data (both finished and unfinished). Nucleic Acids Res. 30:87–90. Bahl, A., B. Brunk, J. Crabtree, M. J. Fraunholz, B. Gajria, G. R. Grant, H. Ginsburg, D. Gupta, J. C. Kissinger, P. Labo, L. Li, M. D. Mailman, A. J. Milgram, D. S. Pearson, D. S. Roos, J. Schug, C. J. Stoeckert Jr, and P. Whetzel. 2003. PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Res. 31:212–215. Kissinger, J. C., B. Gajria, L. Li, I. T. Paulsen, and D. S. Roos. 2003. ToxoDB. Accessing the Toxoplasma gondii genome. Nucleic Acids Res. 31:234–236. Luchtan, M., C. Warade, D. B. Weatherly, W. M. Degrave, R. L. Tarleton, and J. C. Kissinger. 2004. TcruziDB: an integrated Trypanosoma cruzi genome resource. Nucleic Acids Res. 32:344– 346. Puiu, D., S. Enomoto, G. A. Buck, M. S. Abrahamsen, and J. C. Kissinger. 2004. CryptoDB: the Cryptosporidium genome

437

438

19 Genomics of Pathogenic Parasites resource. Nucleic Acids Res. 32: 329– 331. 12 Li, L., B. P. Brunk, J. C. Kissinger, D. Pape, P. Tang, R. Cole, J. Martin, T. Wylie, M. Dante, S.J. Fogarty, D. K. Howe, P. Liberator, C. Diaz, J. Anderson, M. White, M. E. Jerone, E. A. Johnson, J.A. Radke, C. J. Stoeckert Jr., R. H. Waterson, S. W. Clifton, D. S. Roos, and L. D. Sibley. 2003. Gene discovery in the Apicomplexa as revealed by EST sequencing and assembly of a comparative gene database. Genome Res. 13:443– 454. 13 Li, L., J. Crabtree, S. Fischer, D. Pinney, C. J. Stoeckert Jr., L. D. Sibley, and D. S. Roos. 2004. ApiEST-DB: analyzing clustered EST data of the apicomplexan parasites. Nucleic Acids Res. 32:326– 328. 14 Embley, T. M., and R. P. Hirt. 1998. Early branching eukaryotes? Curr. Opin. Genet. Dev. 8:624–629. 15 Woese, C. 1998. The universal ancestor. Proc. Natl. Acad. Sci. U. S. A. 95:6854– 6859. 16 Doolittle, W. F. 1999. Phylogenetic classification and the universal tree. Science 284:2124–2128. 17 Baldauf, S. L., A. J. Roger, I. WenkSiefert and W. F. Doolittle. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972–976. 18 Glandsdorff, N. 2000. About the last common ancestor, the universal life-tree and lateral gene transfer: a reappraisal. Mol. Microbiol. 38:177–185. 19 Dacks, J. B. and W. F. Doolittle. 2001. Reconstructing/deconstructing the earliest eukaryotes: how comparative genomics can help. Cell 107:419–425. 20 Templeton, T. J., L. M. Iyer, V. Anantharaman, S. Enomoto, J. E. Abrahante, G. M. Subramanian, S. L. Hoffman, M. S. Abrahamsen, and L. Aravind. 2004. Comparative analysis of apicomplexa and genomic diversity in eukaryotes. Genome Res. 14:1686–1695. 21 Hedrick, S. M. 2004. The acquired immune system: a vantage from beneath. Immunity 21:607–615. 22 Kusch, J., and H. J. Schmidt. 2001. Genetically controlled expression of sur-

face variant antigens in free-living protozoa. J. Membrane Biol. 180:101–109. 23 King, N. 2004. The unicellular ancestry of animal development. Dev. Cell 7:313– 725. 24 Huang, J., N. Mullapudi, T. SicheritzPonten, and J. C. Kissinger. 2004. A first glimpse into the pattern and scale of gene transfer in Apicomplexa. Int. J. Parasitol. 34:265–274. 25 Striepen, B., A. J. Pruijssers, J. Huang, C. Li, M. J. Gubbels, N. N. Umejiego, L. Hedstrom, and J. C. Kissinger. 2004. Gene transfer in the evolution of parasite nucleotide biosynthesis. Proc. Natl. Acad. Sci. U. S. A. 101:3154–3159. 26 Huang, J., N. Mullapudi, C. A. Lancto, M. Scott, M. S. Abrahamsen, and J. C. Kissinger. 2004. Phylogenomic evidence supports past endosymbiosis, intracellular and horizontal gene transfer in Cryptosporidium parvum. Genome Biol. 5:R88. 27 Wilson, R. J. M., and D. H. Williamson. 1997. Extrachromosomal DNA in the Apicomplexa. Microbiol. Mol. Biol. Rev. 61:1–16. 28 Waller, R. F., et al. 1998. Nuclearencoded proteins target to the plastid in Toxoplasma gondii and Plasmodium falciparum. Proc. Natl. Acad. Sci. U. S. A. 95:12352–12357. 29 Gleeson, M. T. 2000. The plastid in Apicomplexa: What use is it? Int. J. Parasitol. 30:1053–1070. 30 Fast, N. M., J. C. Kissinger, D. S. Roos, and P. J. Keeling. 2001. Nuclearencoded, plastid-targeted genes suggest a single common origin for apicomplexan and dinoflagellate plastids. Mol. Biol. Evol. 18:418–426. 31 Saldarriaga, J. F., F. J. R. Taylor, P. J. Keeling, and T. Cavalier-Smith. 2001. Dinoflagellate nuclear SSU rRNA phylogeny suggests multiple plastid losses and replacements. J. Mol. Evol. 53:204– 213. 32 Doolittle, W. F. 1998. You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet. 14:307–311. 33 Breman, J. G. 2001. The ears of the hippopotamus: manifestations, determi-

References

34

35

36

37

38

39

40

41

42

43

nants, and estimates of the malaria burden. Am. J. Trop. Med. Hyg. 64 (1–2 Suppl):1–11. McFadden, G. I., M. E. Reith, J. Munholland, and N. Lang-Unnasch. 1996. Plastid in human parasites. Nature 381:482. Wilson, R. J., P. W. Denny, P. R. Preiser, K. Rangachari, K. Roberts, A. Roy, A. Whyte, M. Strath, D. J. Moore, P. W. Moore, and D. H. Williamson. 1996. Complete gene map of the plastid-like DNA of the malaria parasite Plasmodium falciparum. J. Mol. Biol. 261:155– 72. Kohler, S., C. F. Delwiche , P. W. Denny, L. G. Tilney, P. Webster, R. J. Wilson, J. D. Palmer, and D. S. Roos. 1997. A plastid of probable green algal origin in apicomplexan parasites. Science 275:1485–1489. Fichera, M. E., and D. S. Roos. 1997. A plastid organelle as a drug target in apicomplexan parasites. Nature 390:407– 409. He, C. Y., B. Striepen, C. H. Pletcher, J. M. Murray, and D. S. Roos. 2001. Targeting and processing of nuclearencoded apicoplast proteins in plastid segregation mutants of Toxoplasma gondii. J. Biol. Chem. 276:28436–28442. Aravind, L., L. M. Iyer, T. E. Wellems, and L. H. Miller. 2003. Plasmodium biology: genomic gleanings. Cell 115:771– 785. Rasti, N., M. Wahlgren, and Q. Chen. 2004. Molecular aspects of malaria pathogenesis. FEMS Immunol. Med. Microbiol. 41:9–26. Sutherland, C. J. 2001. Stevor transcripts from Plasmodium falciparum gametocytes encode truncated polypeptides. Mol. Biochem. Parasitol. 113:331– 335. McRobert, L., P. Preiser, S. Shapr, W. Jarra, M. Kaviratne, M. C. Taylor, L. Renia, and C. J. Sutherland. 2004. Distinct trafficking and localization of STEVOR proteins in three stages of the Plasmodium falciparum life cycle. Infect. Immun. 72:6597–6602. Kaviratne, M., S. M. Khan, W. Jarra, and P. R. Preiser. 2002. Small variant STEVOR antigen is uniquely located within

Maurer’s clefts in Plasmodium falciparum-infected red blood cells. Eukaryot. Cell 1:926–35. 44 Trexler, M., L. Banyai, and L. Patthy. 2000. The LCCL module. Eur. J. Biochem. 267:5751–5757. 45 Delrieu, I., C. C. Waller, M. M. Mota, M. Grainger, J. Langhorne, and H. H. Holder. 2002. PSLAP, a protein with multiple adhesive motifs, is expressed in Plasmodium falciparum gametocytes. Mol. Biochem. Parasitol.121:11–20. 46 Claudianos, C., J. T. Dessens, H. E. Trueman, M. Arai, J. Mendoza, G. A. Butcher, T. Crompton, and R. E. Sinden. 2002. A malaria scavenger receptor-like protein essential for parasite development. Mol. Microbiol. 45:1473–1484. 47 Lasonder, E., Y. Ishihama, J. S. Andersen, A. M. Vermunt, A. Pain, R. W. Sauerwein, W. M. Eling, N. Hall, A. P. Waters, H. G. Stunnenberg, and M. Mann. 2002. Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 419:537–542. 48 Dessens, J. T., R. E. Sinden and C. Claudianos. 2004. LCCL proteins of apicomplexan parasites. Trends Parasitol. 20:102–108. 49 Pradel, G., K. Hayton, L. Aravind, L. M. Iyer, M. S. Abrahamsen, A. Bonawitz, C. Mejia, and T. J. Templeton. 2004. A multidomain adhesion protein family expressed in Plasmodium falciparum is essential for transmission to the mosquito. J. Exp. Med. 199:1533–1544. 50 Tosini, F., A. Agnoli, R. Mele, M. A. Gomez Morales, and E. Pozio. 2004. A new modular protein of Cryptosporidium parvum, with ricin B and LCCL domains, expressed in the sporozoite invasive stage. Mol. Biochem. Parasitol. 134:137–147. 51 Janssen C. S., M. P. Barrett, D. Lawson, M. A. Quail, D. Harris, S. Bowman, R. S. Phillips, and C. M. Turner. 2001. Gene discovery in Plasmodium chabaudi by genome survey sequencing. Mol. Biochem. Parasitol. 113:251–260. 52 Carlton, J. M.-R. 1999. Gene synteny across Plasmodium ssp: could operonlike’ structure exist? Parasitol. Today 15:178–179.

439

440

19 Genomics of Pathogenic Parasites 53 Janse, C. J., J. M. Carlton, D. Walliker,

54

55

56

57

58

59

60

61

62

63

and A. P. Waters. 1994. Conserved location of genes on polymorphic chromosomes of four species of malaria parasites. Mol. Biochem. Parasitol. 68:285– 296. Carlton, J.M.-R., R. Vinkenoog, A. P. Waters, and D. Walliker. 1998. Gene synteny in species of Plasmodium. Mol. Biochem. Parasitol. 93:285–294. Lander, E. S., L.M. Linton, B. Birren, et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860– 921. Venter, J. C., M.D. Adams, E.W. Myers, et al. 2001. The sequence of the human genome. Science 291:1304–1351. Holt, R.A., G.M. Subramanian, A. Halpern, et. al. 2002. The genome sequence of the malaria mosquito Anopheles gambiae. Science 298:129–148. Morgan-Ryan, U. M., A. Fall, L. A. Ward, N. Hijjawi, I. Sulaiman, R. Fayer, R. C. Thompson, M. Olson, A. Lal, and L. Xiao. 2002. Cryptosporidium hominis n. sp. (Apicomplexa: Cryptosporidiidae) from Homo sapiens. J. Eukaryot. Microbiol. 49:433–440. Bankier, A. T., H. F. Spriggs, B. Fartmann, B. A. Konfortov, M. Madera, C. Vogel, S. A. Teichmann, A. Ivens, and P. H. Dear. 2003. Integrated mapping, chromosomal sequencing and sequence analysis of Cryptosporidium parvum. Genome Res. 13:1787–1799. Zhu, G., Y. Li, X. Cai, J. J. Millership, M. J. Marchewka, and J. S. Keithly. 2004. Expression and functional characterization of a giant Type I fatty acid synthase (CpFAS1) gene from Cryptosporidium parvum. Mol. Biochem. Parasitol. 134:127–135. Fichera, M. E., and D. S. Roos. 1997. A plastid organelle as a drug target in apicomplexan parasites. Nature 390:407– 409. Riordan, C. E., J. G. Ault, S. G. Langreth, and J. S. Keithly. 2003. Cryptosporidium parvum Cpn60 targets a relict organelle. Curr. Genet. 44:138–147. Striepen, B., and J. C. Kissinger. 2004. Genomics meets transgenics in search of the elusive Cryptosporidium drug target. Trends Parasitol. 20: 355–358.

64 Kim, K., and L. M. Weiss. 2004. Toxo-

plasma gondii: the model apicomplexan. Int. J. Parasitol. 34:423–432. 65 Carruthers, V. B. 2002. Host cell invasion by the opportunistic pathogen Toxoplasma gondii. Acta Trop. 81:111–122. 66 Sibley, L. D. 2004. Intracellular parasite invasion strategies. Science 304:258– 253. 67 Donald, R. G., and D. S. Roos. 1993. Stable molecular transformation of Toxoplasma gondii: a selectable dihydrofolate reductase-thymidylate synthase marker based on drug-resistance mutations in malaria. Proc. Natl. Acad. Sci. U. S. A. 90:11703–7. 68 Kim, K., D. Soldati, and J. C. Boothroyd. 1993. Gene replacement in Toxoplasma gondii with chloramphenicol acetyltransferase as selectable marker. Science 262:911–914. 69 Soldati, D., and J. C. Boothroyd. 1993. Transient transfection and expression in the obligate intracellular parasite Toxoplasma gondii. Science 260:349–352. 70 Reynolds, M.G., and D. S. Roos. 1998. A biochemical and genetic model for parasite resistance to antifolates. Toxoplasma gondii provides insights into pyrimethamine and cycloguanil resistance in Plasmodium falciparum. J. Biol. Chem. 273:3461–3469. 71 Roos, D. S., M. J. Crawford, R. G. Donald, L. M. Fohl, K. M. Hager, J. C. Kissinger, M. G. Reynolds, B. Striepen, and W. J. Sullivan Jr. 1999. Transport and trafficking: Toxoplasma as a model for Plasmodium. Novartis Found. Symp. 226:176–195. 72 Di Cristina, M., F. Ghouze, C. H. Kocken, S. Naitza , P. Cellini, D. Soldati, A. W. Thomas, and A. Crisanti. 1999. Transformed Toxoplasma gondii tachyzoites expressing the circumsporozoite protein of Plasmodium knowlesi elicit a specific immune response in rhesus monkeys. Infect. Immun. 67:1677– 1682. 73 Charest, H., M. Sedegah, G. S. Yap, R. T. Gazzinelli, P. Caspar, S. L. Hoffman, and A. Sher. 2000. Recombinant attenuated Toxoplasma gondii expressing the Plasmodium yoelii circumsporozoite protein provides highly effective priming

References

74

75

76

77

78

79

80

81

82

83

84

for CD8+ T cell-dependent protective immunity against malaria. J. Immunol. 165:2084–2092. O’Connor, R. M., K. Kim, F. Khan, and H. D. Ward. 2003. Expression of Cpgp40/15 in Toxoplasma gondii: a surrogate system for the study of Cryptosporidium glycoprotein antigens. Infect. Immun. 71:6027–6034. Gull, K. 2003. Host–parasite interactions and trypanosome morphogenesis: a flagellar pocketful of goodies. Curr. Opin. Microbiol. 6:365–370. Parsons, M. 2004. Glycosomes: parasites and the divergence of peroxisomal purpose. Mol. Microbiol. 53:717–724. Robinson, D. R., and K. Gull. 1991. Basal body movements as a mechanism for mitochondrial genome segregation in the trypanosome cell cycle. Nature 352:731–733. Ogbadoyi, E. O., D. R. Robinson, and K. Gull. 2003. A high-order trans-membrane structural linkage is responsible for mitochondrial genome positioning and segregation by flagellar basal bodies in trypanosomes. Mol. Biol. Cell 14:1769–1779. Madison-Antenucci, S., J. Grams, and S. L. Hajduk. 2002. Editing machines: the complexities of trypanosome RNA editing. Cell 108:435–438. Liang, X. H., A. Haritan, S. Uliel, and S. Michaeli. 2003. Trans and cis splicing in trypanosomatids: mechanism, factors, and regulation. Eukaryot. Cell 2:830–840. Simpson, L., S. Sbicego, and R. Aphasizhev. 2003. Uridine insertion/deletion RNA editing in trypanosome mitochondria: a complex business. RNA 9:265– 276. Stuart, K., and A. K. Panigrahi. 2002. RNA editing: complexity and complications. Mol. Microbiol. 45:591–596. Low, M., and J. Finean. 1977. Release of alkaline phosphatase from membranes by a phosphatidylinositol-specific phospholipase C. Biochem. J. 167:281–284. Ferguson, M. A. J. 1999. The structure, biosynthesis and functions of glycosylphosphatidylinositol anchors, and the contribution of trypanosome research. J. Cell Science 112:2799–2809.

85 Cross, G. A. M. 1996. Antigenic varia-

tion in trypanosomes: secrets surface slowly. BioEssays 18:283–291. 86 Donelson, J. E. 2002. Antigenic variation and the African trypanosome genome. Acta Trop. 85:391–404. 87 Ngo, H., C. Tschudi, K. Gull, and E. Ullu. 1998. Double-stranded RNA induces mRNA degradation in Trypanosoma brucei. Proc. Natl. Acad. Sci. U. S. A. 95:14687–14692. 88 Ullu, E., C. Tschudi, and T. Chakraborty. 2004. RNA interference in protozoan parasites. Cell Microbiol. 6:509–519. 89 Hall, N., M. Berriman, N.J. Lennard, et al. 2003. The DNA sequence of chromosome I of an African trypanosome: gene content, chromosome organisation, recombination and polymorphism. Nucleic Acids Res. 31:4864–4873. 90 Berriman, M., N. Hall, K. Sheader, F. Bringaud, B. Tiwari, T. Isobe, S. Bowman, C. Corton, L. Clark, G. A. Cross, M. Hoek , T. Zanders, M. Berberof, P. Borst, and G. Rudenko. 2002. The architecture of variant surface glycoprotein gene expression sites in Trypanosoma brucei. Mol. Biochem. Parasitol. 122:131–140. 91 Myler, P. J., L. Audleman, T. deVos, G. Hixson, P. Kiser, C. Lemley, C. Magness, E. Rickel, E. Sisk, S. Sunkin, S. Swartzell, T. Westlake, P. Bastien, G. Fu, A. Ivens, and K. Stuart. 1999. Leishmania major Friedlin chromosome 1 has an unusual distribution of protein-coding genes. Proc. Natl. Acad. Sci. USA 96:2902–2906. 92 Worthey E. A., et al. 2003. Leishmania major chromosome 3 contains two long convergent polycistronic gene clusters separated by a tRNA gene. Nucleic Acids Res. 31:4201–4210. 93 Lee, M. G.-S., and L. H. T. Van de Ploeg. 1997. Transcription of protein-coding genes in trypanosomes by RNA polymerase I. Ann. Rev. Microbiol. 51:463– 489. 94 Martinez-Calvillo, S., S. Yan, D. Nguyen, M. Fox, K. Stuart, and P. J. Myler. 2003. Transcription of Leishmania major Friedlin chromosome 1 initiates in both directions within a single region. Mol. Cell 11:1291–1299.

441

442

19 Genomics of Pathogenic Parasites 95 LeBowitz, J. H., H. Q. Smith, L. Rusche,

and S. M. Beverley. 1993. Coupling of poly(A) site selection and trans-splicing in Leishmania. Genes Dev. 7:996–1007. 96 Laurentino, E. C., J. C. Ruiz, G. Fazelinia, P. J. Myler, W. Degrave, M. AlvesFerreira, J. M. C. Ribeiro, and A. K. Cruz. 2004. A survey of Leishmania braziliensis genome by shotgun sequencing. Mol. Biochem. Parasitol. 137:81– 86. 97 Adam, R. D. 2001. Biology of Giardia lamblia. Clin. Microbiol. Rev. 14:447– 475. 98 Ali, S. A. and D. R. Hill. 2003. Giardia intestinalis. Curr. Opin. Infect. Dis. 16:453–460. 99 Nash, T. E. 2002. Surface antigenic variation in Giardia lamblia. Mol. Microbiol. 45:585–590. 100 Nash, T. E., H. T. Lujan, M. R. Mowatt, and J. T. Conrad. 2001. Variant-specific surface protein switching in Giardia lamblia. Infect. Immun. 69:1922–1923. 101 Lindmark, D. G., and M. Muller. 1973. Hydrogenosome, a cytoplasmic organelle of the anaerobic flagellate Tritichomonas foetus, and its role in pyruvate metabolism. J. Biol. Chem. 248:7724– 7728. 102 Hackstein, J. H., A. Akhmanova, B. Boxma, H. R. Harhangi, and F. G Voncken. 1999. Hydrogenosomes: eukaryotic adaptations to anaerobic environments. Trends Microbiol. 7:441– 447. 103 Dyall, S. D., and P. J. Johnson. 2000. Origins of hydrogenosomes and mitochondria: evolution and organelle biogenesis. Curr. Opin. Microbiol. 3:404– 411. 104 Schwebke, J. R., and D. Burgess. 2004. Trichomoniasis. Clin. Microbiol. Rev. 17:794–803. 105 Lyons, E. J., and J. M.-R. Carlton. 2004. Mind the gap: bridging the divide between clinical and molecular studies of the trichomonads. Trends Parasitol. 20:204–207. 106 LaGier, M. J., J. Tachezy, F. Stejskal, K. Kutisova, and J. S. Keithly. 2003. Mitochondrial-type iron–sulfur cluster biosynthesis genes (IscS and IscU) in

the apicomplexan Cryptosporidium parvum. Microbiology 149:3519–3530. 107 Tovar, J. et al. 2003. Mitochondrial remnant organelles of Giardia function in iron–sulphur maturation. Nature 426:172–176. 108 Arroyo, R. and J. F. Alderete. 1989. Trichomonas vaginalis surface proteinase activity is necessary for parasite adherence to epithelial cells. Infect. Immun. 57:2992–2997. 109 Mendoza-Lopez, M. R., C. Becerril-Garcia, L. V. Fattel-Facenda, L. Avila-Gonzalez, M. E. Ruiz-Tachiquin, J. OrtegaLopez, and R. Arroyo. 2000. CP30, a cysteine proteinase involved in Trichomonas vaginalis cytoadherence. Infect. Immun. 68:4907–4912. 110 Hernandez, H., I. Sariego, G. Garber, R. Delgado, O. Lopez, and J. Sarracent. 2004. Monoclonal antibodies against a 62 kDa proteinase of Trichomonas vaginalis decrease parasite cytoadherence to epithelial cells and confer protection in mice. Parasite Immunol. 26:119–125. 111 Arroyo, R., and J. F. Alderete. 1992. Molecular basis of host epithelial cell recognition by Trichomonas vaginalis. Mol. Microbiol. 6:853–862. 112 Hirt, R. P., N. Harriman, A. V. Kajava, and T. M. Embley. 2002. A novel potential surface protein in Trichomonas vaginalis contains a leucine-rich repeat shared by micro-organisms from all three domains of life. Mol. Biochem. Parasitol. 125:195–199. 113 Sogin, M. L., J. H. Gunderson, H. J. Elwood, R. A. Slonso, and D. A. Peattie. 1989. Phylogenetic meaning of the kingdom concept: an unusual ribosomal RNA from Giardia lamblia. Science 243:75–77. 114 Hashimoto, T., Y. Nakamura, T. Kamaishi, F. Nakamura, J. Adachi, K. Okamoto, and M. Hasegawa. 1995. Phylogenetic place of mitochondrionlacking protozoan, Giardia lamblia, inferred from amino acid sequences of elongation factor 2. Mol. Biol. Evol. 12:782–793. 115 Keeling, P. J., and J. D. Palmer. 2000. Parabasalian flagellates are ancient eukaryotes. Nature 405:635–636.

References 116 Roger, A. J., S. G. Svard, J. Tovar, C. G.

117

118 119

120

121

122

123

124

125

Glarck, M. W. Smith, F. D. Gillin, and M. L. Sogin. 1998. A mitochondrial-like chaperonin 60 gene in Giardia lamblia: evidence that diplomonads once harbored an endosymbiont related to the progenitor of mitochondria. Proc. Natl. Acad. Sci. U. S. A. 95:229–234. Lloyd, D., and J. C. Harris. 2002. Giardia: highly evolved parasite or early branching eukaryote? Trends Microbiol. 10:122–127. Knight, J. 2004. Giardia: not so special, after all? Nature 429:236–237. Munasinghe, A., et al. 2001. Serial analysis of gene expression (SAGE) in Plasmodium falciparum: application of the technique to A–T rich genomes. Mol. Biochem. Parasitol. 113:23–34. Patankar, S., A. Munasinghe, A. Shoaibi, L. M. Cummings, and D. F. Wirth. 2001. Serial analysis of gene expression in Plasmodium falciparum reveals the global expression profile of erythrocytic stages and the presence of antisense transcripts in the malarial parasite. Mol. Biol. Cell 12:3114–3125. Blair, P. L., A. Witney, J. D. Haynes, J. K. Moch, D. J. Carucci, and J. H. Adams. 2002. High-throughput global peptide proteomic analysis by combining stable isotope amino acid labeling and data-dependent multiplexed–Ms/Ms. Anal. Chem. 74:4995–5000. Calvo, E., C. Rubiano, A. Vargas, and M. Wasserman. 2002. Expression of housekeeping genes during the asexual cell cycle of Plasmodium falciparum. Parasitol. Res. 88:267–271. Nirmalan, N., P. Wang, P. F. Sims, and J. E. Hyde. 2002. Transcriptional analysis of genes encoding enzymes of the folate pathway in the human malaria parasite Plasmodium falciparum. Mol. Microbiol. 46:179–190. Bozdech, Z., J. Zhu , M. P. Joachimiak, F. E. Cohen, B. Pulliam, and J. L. DeRisi. 2003. Expression profiling of the schizont and trophozoite stages of Plasmodium falciparum with a long-oligonucleotide microarray. Genome Biol. 4:R9. Le Roch, K. G., Y. Zhou, P. L. Blair, M. Grainger, J. K. Moch, J. D. Haynes,

126

127

128

129

130

131

132

133

P. De La Vega, A. A. Holder, S. Batalov, D. J. Carucci, and E. A. Winzeler. 2003. Discovery of gene function by expression profiling of the malaria parasite life cycle. Science 301:1503–1508. Le Roch, K. G., et al., 2004. Global analysis of transcript and protein levels across the Plasmodium falciparum life cycle. Genome Res. 14:2308–2318. Hayward, R. E., J. L. Derisi, S. Alfadhli, D. C. Kaslow, P. O. Brown, and P. K. Rathod. 2000. Shotgun DNA microarrays and stage-specific gene expression in Plasmodium falciparum malaria. Mol. Microbiol. 35:6–14. Ben Mamoun, C., I. Y. Gluzman, C. Hott, S. K. MacMillan, A. S. Amarakone, D. L. Anderson, J. M. Carlton, J. B. Dame, D. Chakrabarti, R. K. Martin, B. H. Brownstein, and. D. E. Goldberg. 2001. Co-ordinated programme of gene expression during asexual intraerythrocytic development of the human malaria parasite Plasmodium falciparum revealed by microarray analysis. Mol. Microbiol. 39:26–36. Kappe, S. H. I, M. J. Gardner, S. M. Brown, J. Ross, K. Matuschewski, J. M. Ribeiro, J. H. Adams, J. Quackenbush, J. Cho, D. J. Carucci, S. L. Hoffman, and V. Nussenzweig. 2001. Exploring the transcriptome of the malaria sporozoite stage. Proc. Natl. Acad. Sci. U. S. A. 98:9895–9900. Matuschewski, K., J. Ross, S. M. Brown, K. Kaiser, V. Nussenzweig, and S. H. I. Kappe. 2002. Infectivity-associated changes in the transcriptional repertoire of the malaria parasite sporozoite stage. J. Biol. Chem. 277:41948–41953. Kaiser, K., K. Matuschewski, N. Camargo, J. Ross, and S. H. I. Kappe. 2004. Differential transcriptome profiling identifies Plasmodium genes encoding pre-erythrocytic stage-specific proteins. Mol. Microbiol. 51:1221–1232. Horgan, G., A. Creasey, and B. Fenton. 1992. Superimposing two-dimensional gels to study genetic variation in malaria parasites. Electrophoresis 13:871–875. Washburn, M. P., and J. R. Yates III. 2000. Analysis of the microbial proteome. Curr. Opin. Microbiol. 3:292– 297.

443

444

19 Genomics of Pathogenic Parasites 134 Yates III, J. R. 2000. Mass spectrometry.

135

136

137

138

139

140

From genomics to proteomics. Trends Genet. 16:5–8. Carucci, D. J., J. R. Yates III, and L. Florens. 2002. Exploring the proteome of Plasmodium. Int. J. Parasitol. 32:1539– 1542. Hoffman, S. L., G. M. Subramanian, F. H. Collins, and J. C. Venter. 2002. Plasmodium, human and Anopheles genomics and malaria. Nature 415:702– 709. Doolan, D. L., J. C. Aguiar, W. R. Weiss, A. Sette, P. L. Felgner, D. P. Regis, P. Quinones–Casas, J. R. Yates III, P. L. Blair, T. L. Richie, S. L. Hoffman, and D. J. Carucci. 2003. Utilization of genomic sequence information to develop malaria vaccines. J. Exp. Biol. 206:3789– 3802. Florens, L., M. P. Washburn, J. D. Raine, R. M. Anthony, M. Grainger, J. D. Haynes, J. K. Moch, N. Muster, J. B. Sacci, D. L. Tabb, A. A. Witney, D. Wolters, Y. Wu, M. J. Gardner, A. A. Holder, R. E. Sinden, J. R. Yates III, and D. J. Carucci. 2002. A proteomic view of the Plasmodium falciparum life cycle. Nature 419:520–526. Ridley, R. G. 2002. Medical need, scientific opportunity and the drive for antimalarial drugs. Nature 415:685–693. Jasinskiene, N., C. J. Coates, M. Q. Benedict, A. J. Cornel, C. S. Rafferty, A. A. James, and F. H. Collins. 1998. Stable

141

142

143

144

145

146

transformation of the yellow fever mosquito, Aedes aegypti, with the Hermes element from the housefly. Proc. Natl. Acad. Sci. U. S. A. 95:3743–3747. Catteruccia, F., T. Nolan, T. G. Loukeris, C. Blass, C. Savakis, F. C. Kafatos, and A. Crisanti. 2000. Stable germline transformation of the malaria mosquito Anopheles stephensi. Nature 405:959–962. Grossman, G. L., C. S. Rafferty, J. R. Clayton, T. K. Stevens, O. Mukabayire, and M. Q. Benedict. 2001. Germline transformation of the malaria vector, Anopheles gambiae, with the piggyBac transposable element. Insect Mol. Biol. 10:597–604. Blandin, S., L. F. Moita, T. Kocher, M. Wilm, F. C. Kafatos, and E. A. Levashina. 2002. Reverse genetics in the mosquito Anopheles gambiae: targeted disruption of the Defensin gene. EMBO Rep. 3:852–856. Osta, M. A., G. K. Christophides, and F. C. Kafatos. 2004. Effects of mosquito genes on Plasmodium development. Science 303:2030–2032. Rivera, M. C., and J. A. Lake. 2004. The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature 431:152–155. Brown, J. R., and W. F. Doolittle. 1997. Archaea and the prokaryote-to-eukaryote transition. Micro. Mol. Biol. Rev. 61:456–502.

445

20 Model Host Systems: Tools for Comprehensive Analysis of Host–Pathogen Interactions Michael Steinert and Gernot Glckner

20.1 Introduction

Models and reality are linked together by abstraction and careful interpretation. The ability of host models to provide insights into infections of humans is due to the fact that many fundamental processes in host defense are evolutionarily conserved. This new approach is rapidly gaining ground since in many cases it is much easier to study a particular aspect in a model organism than in the original host. Frequently studied organisms are the plant Arabidopsis thaliana, the invertebrates Dictyostelium discoideum, Caenorhabditis elegans, Drosophila melanogaster, the nonmammalian vertebrate zebrafish (Danio rerio), and the mammalian mouse model (Mus musculus). While the choice of most of the nonmammalian model organisms was primarily defined by their evolutionary position in the tree of life, others like the mouse were chosen for their prospective ability to answer ques-

Fig. 20.1 Phylogenetic tree of important model organisms and their relation to Homo sapiens.

446

20 Model Host Systems: Tools for Comprehensive Analysis of Host–Pathogen Interactions

tions aimed at human-related traits (Fig. 20.1). Important prerequisites for model systems in general are a short life cycle with rapid development, small size, inexpensive availability, and tractability to molecular and genetic tools [1]. In recent years a large body of information has been derived from model organisms. The insights include behavior, immunity, aging, inheritance, physiology, development, cellular processes, cancer, and neurodegenerative and other diseases. Studies of the genomic organization of widely divergent model organisms have revealed a remarkable degree of evolutionary conservation and enabled meaningful crossspecies comparisons. In addition, interactions between organisms have been studied. These interactions range on a continuous scale from symbioses to pathogenesis. Here we want to focus on host–pathogen interactions.

20.2 Host–Pathogen Interactions

The relationships between pathogens and hosts are antagonistic interactions. The outcome of these interactions depends on the virulence of the pathogen and on the susceptibility and resistance of the host [2]. Infection is not synonymous with disease since an infection does not always lead to damage to the host. Accordingly, it is important to recognize that the extent of damage during an infection depends on both the potency of the pathogen and the efficiency and mode of the host defense. Pathogens often produce virulence factors such as adhesins and toxins, which enable them to persist or multiply on or within their host. Capsules of bacteria or fungi such as Cryptococcus neoformans are defensive pathogenicity factors. Numerous important pathogenicity mechanisms rely on the secretion of molecules. The most simple secretory apparatus is the type I secretion system, which consists of only three proteins. The so-called main branch of the general secretion pathway is the type II secretion pathway. Via this pathway toxins and hydrolytic enzymes with a discrete conformation can be excreted. It is widely distributed among Proteobacteria, where it contributes to pathogenesis in plants as well as in animals [3]. Type III and IV secretion systems are often encoded by pathogenicity islands. Type III secretion systems occur in human, animal, and plant pathogens. The injected proteins alter or inhibit the normal function of the host cell to allow the pathogen to survive in the host environment. Type IV secretion systems are evolutionarily related to the conjugal transfer system used for horizontal gene transfer. Several important human pathogens such as Pseudomonas aeruginosa and Legionella pneumophila inject crucial virulence factors by this system [4, 5]. In addition to effector molecules, virulence is also largely influenced by regulatory systems. Many pathogens like P. aeruginosa sense cell densities and regulate specific sets of genes by quorum sensing [6]. Susceptibility to one particular disease and immunity to others is an innate property of a given host species and is governed by a complex number of different factors. Differences in physiology, anatomy, variations in tissue surface receptors,

20.3 Arabidopsis thaliana: A Plant as a Model for Human Disease

age, diet, and stress are important factors in this respect. Leucine-rich repeat (LRR) proteins seem to play a major role as sensors for pathogen invasion in all eukaryote cells. LRRs are versatile binding motifs found in a variety of proteins and are involved in protein–protein interactions. The LRR domain is composed of repeats forming a characteristic solenoid horseshoe structure which provides a scaffold for numerous insertions involved in binding to pathogen-associated molecular patterns and surface receptors. LRRs have been shown to be involved in the host defense systems of both plants (resistance genes) and mammals (Toll-like receptors and nucleotide-binding oligomerization domain proteins), where they sense specific pathogen-associated molecules and activate the innate immune system [7]. Other defense mechanisms are constituted by physical barriers, the complement system, antimicrobial peptides, apoptosis, effector cells of the innate immune system, and the adaptive immune system. The different model organisms provide species-specific advantages for the analysis of the pathogen and host side of infection. Mammalian models obviously share many basic biological functions with humans, such as the development of organs and the adaptive immune response. However, invertebrate models can have advantages for the study of certain mammalian biological processes. Because of their lower level of complexity, invertebrate models allow dissection of specifically the innate immune system, undisturbed by superimposed effects of the acquired immune system. Furthermore, a smaller set of genes and/or a smaller proportion of noncoding sequences in the genome can make it easier to trace back mutations to the genes responsible for phenotypic alterations of the host. Last but not least, ethical restrictions are less to the fore in nonvertebrate models. In the following sections we compare and highlight the specific features of different models.

20.3 Arabidopsis thaliana: A Plant as a Model for Human Disease

Arabidopsis thaliana has a long history as a laboratory plant owing to its small size (10 cm) and relatively short life cycle (about 6 weeks from germination to mature seed; for details see http://www.arabidopsis.org). This flowering plant, which belongs to the Brassicaceae family, was established as a model for all higher plants in the early 1980s. Compared to many other plants it has a very small genome (120 Mbp). The analysis of the genome resulted in the prediction of more than 24 000 genes [8]. Furthermore, a collection of mutant strains is available that covers about three-quarters of the genome [9]. At first glance, to take a plant as model for human infectious diseases seems ridiculous. However, A. thaliana contains numerous genes similar to those related to human diseases ranging from cancer and premature aging. Moreover, the fact that some broad-host-range pathogens such as Pseudomonas spp. are able to infect plants as well as animals led to the conclusion that at least parts of the attacking strategies are the same for all host organisms. On the host side, there seem to be

447

448

20 Model Host Systems: Tools for Comprehensive Analysis of Host–Pathogen Interactions

larger differences in defense strategies. Whereas plants recognize specific effectors of the pathogen, animals can sense molecular patterns [10] that differ between pathogen species. Generally, plant pathogens do not enter the host cells, but occupy the intercellular spaces in leaves by entering the natural leave openings, the stomata. Many bacteria then activate the Hrp-locus genes, which encode for a type III secretion system [11]. This secretion system then is used to inject virulence effector proteins into the plant cell. These effectors are specifically recognized by the plant cell through LRR-containing proteins (NB-LRR), which in turn activate a signal transduction cascade to trigger the host response. Animals use also LRR-containing molecules as receptors, albeit with different specificities and different additional domains [10]. Thus, whereas the plant–pathogen interaction model can tell us much about the strategies a pathogen uses for infection, the host factors responsible for susceptibility and resistance differ between the eukaryote kingdoms.

20.4 Dictyostelium discoideum: Perspectives from a Social Amoeba

Many different evolutionary branches seem to have evolved their own ameboid life form. A common aspect of their conditions for thriving is reliance on bacteria as a major food source. On an evolutionary time scale, protozoa–bacteria interactions may have generated a pool of virulence traits, which preadapted some bacterial species as human pathogens. The protozoan mechanisms of phagocytosis use signaling pathways and cytoskeleton proteins closely related to those of macrophages, linking “primitive” organisms to highly organized multicellular species. D. discoideum belongs to a taxonomic entity called the social amoebae, since it is able to transform from a unicellular to a multicellular life stage under adverse living conditions [12]. The entire branch to which this group belongs is thought to constitute a kingdom in the eukaryote tree of life in its own right besides the plant and animal/fungi kingdoms [13]. The haploid genome is 34 Mbp in size. The six chromosomes carry approximately 13 000 genes, of which many are homologous to those in higher eukaryotes (http://dictybase.org) [14]. Since many D. discoideum mutants exist, it is relatively easy to test these mutants for their ability to alter the outcome of an infection [15]. The pathogens predominantly analyzed in D. discoideum are Legionella pneumophila, Mycobacterium spp., Pseudomonas aeruginosa, and Cryptococcus neoformans [16–22]. In the case of Legionella, the causative agent of legionnaires’ disease, a roadmap of infection-relevant host cell factors is being developed [20]. This road map shows that many general features such as the mode of invasion seem to be the same for macrophages and amoebae. Phagocytosis assays with specific cellular inhibitors and the effects of well-defined host-cell mutants revealed that cytoplasmic calcium levels, cytoskeleton-associated proteins, and the calcium-binding proteins of the endoplasmic reticulum specifically influence the uptake and intracellular growth of L. pneumophila. In contrast, macroautophagy appears to be dis-

20.5 Caenorhabditis elegans: Answers from a Worm

pensable for intracellular replication of L. pneumophila in D. discoideum [15]. For some host factors further analyses were performed by GFP (green fluorescence protein) fusion proteins. On the pathogen side it was shown that Legionella mutants deficient for growth in macrophages are also unable to grow in D. discoideum. This also includes the Legionella type IV secretion system Icm/Dot [23]. Dictyostelium wild-type cells were also used as screening system for other mutagenized pathogens. By simple plating assays it was shown that the extracellular pathogen P. aeruginosa utilizes virulence pathways mediated by quorum sensing to infect D. discoideum [24, 25]. An acapsular mutant of the environmental fungus C. neoformans was found to be avirulent in Dictyostelium spp. and mice [22].

20.5 Caenorhabditis elegans: Answers from a Worm

Caenorhabditis elegans is a small free-living soil nematode. This nematode species was first introduced by Sidney Brenner as a model for the determination of the structure of the nervous system [26]. The adults of C. elegans are about 1 mm in length and consist of an exact number of 959 somatic cells. The life cycle of this roundworm is very short, about 3 days, if grown on a bacterial lawn. The genome sequence of C. elegans was determined in a collaborative effort by Washington University and the Sanger Centre [27]. The analysis of the sequence revealed a coding potential of about 19 000 genes, more than half as much as found in the human genome (25 000), and thus ahead of an organism as complex as Drosophila melanogaster [28]. However, a large part of the gene complement seems to consist of large gene families (http://wormbase.org). C. elegans was extensively tested for its ability to serve as a model host for many different human pathogens [29]. C. elegans adult nematodes die over the course of several days while feeding on lawns of Enterococcus faecalis. The individuals are killed after an active invasion of the pathogen. Interestingly, E. faecium is only able to kill eggs and hatchlings, not the adult animal [30]. It was shown that there is a remarkable overlap between virulence factors required for mouse and nematode pathogenesis [31]. This includes pathogens like Streptococcus pneumoniae, Salmonella typhimurium, P. aeroginosa, Yersinia pestis, and Serratia marcescens [32, 33]. Staphylococcus aureus virulence factors were also identified by using a highthroughput C. elegans-killing model [34]. For streptococcal species hydrogen-peroxide-mediated killing was shown in C. elegans larvae [35]. In a large screen of Pseudomonas mutant strains it was demonstrated that the production and secretion of phenazines were required for the fast host-killing phenotype. The slow-killing phenotype was due to the Pseudomonas quorum-sensing system LasR. A killing assay was also used to screen a library of C. neoformans insertion mutants. This method led to the identification of the serine/threonine protein kinase Kin1 [36]. On the host side it was shown that impairment of the apoptosis machinery of C. elegans leads to hypersensitivity towards Salmonella spp. infections [37, 38]. The innate immune response triggered by Salmonella enterica requires intact LPS and

449

450

20 Model Host Systems: Tools for Comprehensive Analysis of Host–Pathogen Interactions

is mediated by a mitogen-activated protein kinase (MAPK) pathway [39, 40]. Other signaling cascades involved in antibacterial defense are the transforming growth factor-b (TGF-b) pathway and the insulin/insulin-like growth factor pathway [41, 42].

20.6 Drosophila melanogaster: A Fruitful Model

The fruit fly Drosophila melanogaster is one of the most extensively used species in genetics and development for almost a century. Its short life cycle of just 2 weeks and its small size make it easy to keep in large numbers. The sequencing of the entire genome revealed that a set of around 14 000 genes is required for the generation of a multicellular organism of this kind that is able to perform complex actions. Several thousand mutant fly strains with a defect in a gene are available for the genetic dissection of traits (http://flybase.net). Insects have long been known for their ability to ward off invaders with a variety of immune responses. Since there is a good correlation between immune responses of mammals and of insects, these systems can be good models of host– pathogen interactions [43]. Drosophila is able to react to different kinds of infections caused by gram-positive or gram-negative bacteria, fungi, or parasitic protozoa [44–47]. Most studies use injection methods to directly stimulate the immune system, since only few pathogens are capable of naturally infecting D. melanogaster [48], and the genetic tractability of Drosophila has made it possible to define many of the genes involved in innate immunity. Upon infection a signaling cascade is activated which leads to the production of antimicrobial peptides in the fat body of the fly [49]. It turned out that the activation of peptide production depends mainly on two distinct signal transduction pathways, the Toll and IMD (immune deficiency) pathways. Both pathways share striking similarities with the immune response pathways of mammals [50, 51]. This innate immune system is activated differently depending on the nature of the attacker, thus discriminating between different classes of pathogens: fungi and gram-positive bacteria induce the Toll pathway, whereas gram-negative bacteria are sensed by the IMD pathway. Other signaling pathways such as the JAK-STAT and JNK pathways maybe also involved in the immune response, although the exact nature of these pathways is not yet clear [52]. The most prominent pathogen studied in D. melanogaster is P. aeruginosa. For this pathogen it could be shown that the virulence factors required for full infectivity in Drosophila are the same as for mammals [53]. Moreover, the verification of the activation of the type III secretion system upon entry of the bacteria into the animal opens up the prospect of Drosophila as a real in vivo model for infection with this pathogen [54]. Further pathogens currently under investigation are Serratica marcescens, Listeria monocytogenes, and Mycobacterium marinum. Infections with fungi can also be monitored. In one study the authors infected immune-deficient flies with different Candida albicans mutants. They found that virulence patterns against Drosophila in these strains reproduced those in a murine model.

20.8 Mus musculus: Of Mice and Men

Importantly, using this insect model they found additional virulence properties undetectable in the murine model system [55].

20.7 Danio rerio: Fishing for Knowledge

The zebrafish, Danio rerio, is a teleost fish and belongs to the family Cyprinidae. Being vertebrates, zebrafish are evolutionarily closer to humans than nematodes and fruit flies. Its well-developed adaptive and innate cellular immune system makes the zebrafish an ideal model for the study of infectious diseases. Danio has immunoglobulins, antigen-processing cells, T cells, B cells, phagocytic cells, and leukocytes capable of producing reactive species of oxygen and nitrogen. In addition, cytokines, complement, lectins, and Toll-like receptors have been identified. The ongoing genomic sequencing (http://www.sanger.ac.uk/Projects/D_rerio) and expressed sequence tag projects have identified a high number of orthologues to human genes (http://www.zfin.org). Experimental infections are usually introduced by incubation in media containing the bacteria or by microinjections. Infection of zebrafish with Streptococcus iniae, a natural fish pathogen, or S. pyogenes, a human pathogen, reproduces many features of human infection caused by a variety of streptococcal pathogens [56, 57]. Zebrafish embryos and larvae have also been exploited to examine Mycobacterium and Salmonella pathogenesis. This has already provided new insight into tuberculous granuloma formation and pathogenic effects of LPS, respectively. Unlike other vertebrate infection models, the transparent zebrafish embryos allow real-time monitoring in live animals.

20.8 Mus musculus: Of Mice and Men

Of the model organisms amenable to genetic analysis, the mouse is by far the most physiologically relevant system. Transgenic and knockout mouse lines have enormously advanced our understanding of infections. Moreover, mice possess an adaptive immune system. During the past ten years, comprehensive genetic maps spanning the genomes of mouse and human have been developed (http:// www.informatics.jax.org). Correlations of the inheritance of susceptibility or resistance to a specific infectious challenge with a certain chromosomal region has successfully been performed by linkage analysis [58]. Moreover, genome-wide gene knockout programs have been initiated. Mouse models have been used to characterize the response to LPS. Remarkably, the C3H/heJ strain exhibits natural tolerance and an altered inflammatory response to LPS. It has been shown that Toll-like receptor TLR-4 does not lead to the external signal to the nucleus due to a mutation affecting the cytoplasmic domain of TLR-4 [59].

451

452

20 Model Host Systems: Tools for Comprehensive Analysis of Host–Pathogen Interactions

By infection experiments of inbred mice, natural resistance to infection with Mycobacterium bovis (BCG), Salmonella typhimurium, and Leishmania donovani was shown to be controlled by the Bcg locus, also known as Ity and Lsh. The corresponding gene was called Nramp1 (natural resistance-associated macrophage protein). The protein belongs to a large family of metal ion transporters, and it appears that Nramp1 contributes to the pH homeostasis while depleting the lysosomal milieu from metal ions [60]. On the basis of phenotypic similarity, the beige (bg) mutation in mice has long been regarded as a model for Chdiak–Higashi syndrome, an autosomal recessive disorder of humans characterized by partial albinism and impaired activity of neutrophils and NK cells. The resulting immunodeficiency disease leads to an increased susceptibility to bacterial infections [61]. Another inherited immunodeficiency is X-linked agammaglobulinemia (XLA). The CBA/N inbred mouse X-linked immunodeficiency (xid) strain is regarded as

Tab. 20.1 Pathogens analyzed in infection model organisms.

Model organism

Pathogen

Current research areas

Arabidopsis thaliana

Pseudomonas spp.

Type III secretion, LRR proteins

Dictyostelium discoideum

L. pneumophila

Type III secretion, genetically determined infection susceptibility

P. aeruginosa

Quorum sensing

C. neoformans

Capsule

Enterobacteriaceae

LPS, apoptosis

Staph. aureus

Various virulence factors

Strep. pneumoniae

Hydrogen peroxide mediated killing

P. aeruginosa

Quorum sensing

C. neoformans

Serine/threonine protein kinases

Drosophila melanogaster

Various bacteria, protozoa, and fungi

Type III secretion, antimicrobial peptides, Toll pathway, IMD pathway, innate immune system

Danio rerio

Strep. iniae

Adaptive and innate immune system

Staph. pyogenes

Various virulence factors

Mycobacterium spp.

Granuloma formation

Salmonella spp.

LPS

Various bacteria, protozoa, fungi, and helminths

Physical barriers, microbial flora, genetically determined infection susceptibility, innate and adaptive immune system

Caenorhabditis elegans

Mus musculus

20.9 Clean Models and Dirty Reality

an experimental model of human XLA. The impaired B-lymphocyte development results in marked reductions in the serum levels of all three major classes of immunoglobulins. Affected patients suffer from frequent infections with enteroviruses, Salmonella enterica, Campylobacter spp., Streptococcus pneumoniae, Haemophilus influenzae, Staphylococcus aureus, and P. aeruginosa [61].

20.9 Clean Models and Dirty Reality

Arabidopsis thaliana can reveal pathogenetic strategies of bacteria, Dictyostelium discoideum can serve as a surrogate macrophage, Caenorhabditis elegans and Drosophila melanogaster are established models of innate immune function, and Danio rerio and Mus musculus are suited to the study of both innate and adaptive immunity (table 20.1). With the determination of the genomes of both pathogen and host, and an impressive toolbox including microarrays and proteomic analysis, it is envisaged that a comprehensive understanding of the respective interactions is at hand. A common reproach against these model systems is that they do not represent a real human infection. In the present review we set out to demonstrate that model systems can indeed sometimes better model selected aspects of bacteria–host interactions. A prerequisite for this, however, is that conclusions drawn from model organisms take into account that fact that certain properties are not represented in the model and some model properties cannot be found in the real system. In consequence, the selection of an appropriate model organism with an appropriate level of complexity appears to be essential to elaborate valid results. In the past a combination of different complementary model systems has proven helpful. The Toll proteins, for instance, were originally identified in Drosophila mutants. Further members of this protein family and important agonists have later been identified in mice. Meanwhile, related genes that recognize pathogenassociated molecular patterns have also been identified in humans. This demonstrates that rapid advances can be achieved by the translation of knowledge acquired through the study of model organisms.

Acknowledgments

The work was supported by the Competence Network PathoGenoMik and the Deutsche Forschungsgemeinschaft.

453

454

20 Model Host Systems: Tools for Comprehensive Analysis of Host–Pathogen Interactions

References 1 Steinert, M., M. Leippe, and T. Roeder.

2003. Surrogate hosts: invertebrates as models for studying pathogen-host interactions. Int. J. Med. Microbiol. 293:1–12. 2 Hentschel, U., M. Steinert, and J. Hacker. 2000. Common molecular mechanisms of symbiosis and pathogenesis. Trends Microbiol. 8:226–231. 3 Sandkvist, M. 2001. Biology of type II secretion. Mol. Microb. 40:271–283. 4 Bitar, D. M., M. Molmeret, Y. Abu Kwaik. 2004. Molecular and cell biology of Legionella pneumophila. Int. J. Med. Microb. 78:519–527. 5 Kagan, J. C., M.-P. Stein, M. Pypaert, and C. R. Roy. 2004. Legionella subvert the functions of Rab1 and Sec22b to create a replicative organelle. J. Exp. Med. 199:1201–1211. 6 Schuster, M., M. L. Urbanowski, E. P. Greenberg. 2004. Promoter specificity in Pseudomonas aeruginosa quorum sensing revealed by DNA binding of purified LasR. Proc. Natl. Acad. Sci. U. S. A. 101:15833–15839. 7 Bell, J. K., G. E. D. Mullen, C. A. Leifer, A. Mazzoni, D. R. Davies, and D. M. Segal. 2003. Leucine-rich repeats and pathogen recognition in Toll-like receptors. Trends Immunol. 199:528–533. 8 The Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815. 9 Alonso, J. M., A. N. Stepanova, T. J. Leisse, et al. 2003. Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301:653–657. 10 Staskawicz, B. J., M. B. Mudgett, J. L. Dangl, and J. E. Galan. 2001. Common and contrasting themes of plant and animal diseases. Science 292:2285–2289. 11 Alfano, J.R., and A. Collmer. 1997. The type III (Hrp) secretion pathway of plant pathogenic bacteria: trafficking harpins, Avr proteins, and death. J. Bacteriol. 179:5655–5662. 12 Loomis, W. F. 1996. Genetic networks that regulate development in Dictyostelium cells. Microbiol. Rev. 60:135–150.

13 Baldauf, S. L., A. J. Roger, I. Wenk-

14

15

16

17

18

19

20

21

Siefert, I., and W. F. A. Doolittle. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972–977. Glckner, G., L. Eichinger, K. Szafranski, K., et al. 2002. Sequence and analysis of chromsome 2 of Dictyostelium discoideum. Nature 418:79–85. Otto, G. P., M. Y. Wu, M. Clarke, H. Lu, O. R. Anderson, H. Hilbi, H. A. Shuman, and R. H. Kessin. 2004. Macroautophagy is dispensible for intracellular replication of Legionella pneumophila in Dictyostelium discoideum. Mol. Microbiol. 51:63–72. Hgele, S., R. Khler, H. Merkert, M. Schleicher, J. Hacker, and M. Steinert. 2000. Dictyostelium discoideum: a new host model system for intracellular pathogens of the genus Legionella. Cell. Microbiol. 2:165–171. Skriwan, C., M. Fajardo, S. Hgele, M. Horn, M. Wagener, R. Michel, G. Krohne, M. Schleicher, J. Hacker, and M. Steinert. 2002. Various bacterial pathogens and symbionts infect the amoeba Dictyostelium discoideum. Int. J. Med. Micobiol. 291:615–624. Schreiner, T., M. R. Mohrs, R. Blau-Wasser, A. von Krempelhuber, M. Steinert, M. Schleicher, and A. A. Noegel. 2002. Loss of the F-actin binding and vesicleassociated protein comitin leads to a phagocytosis defect. Euk. Cell. 1:906–914. Heuner, K., C. Dietrich, C. Skriwan, M. Steinert, and J. Hacker. 2002. Influ28 ence of the alternative r factor on virulence and flagellum expression of Legionella pneumophila. Infect. Immun. 70:1604–1608. Fajardo, M., M. Schleicher, A. Noegel, S. Bozzaro, S. Killinger, K. Heuner, J. Hacker, and M. Steinert. 2004. Calnexin, calreticulin and cytoskeleton associated proteins modulate uptake and growth of Legionella pneumophila in Dictyostelium discoideum. Microbiology 150:2825–2835. Solomon, J. M., G. S. Leung, and R. R. Isberg. 2003. Intracellular replication of Mycobacterium marinum within Dictyos-

References telium discoideum: efficient replication in the absence of host coronin. Infect. Immun. 71:3578–3586. 22 Steenbergen, J. N., J. D. Nosanchuk, S. D. Malliaris, and A. Casadevall, A. 2003. Cryptococcus neoformans virulence is enhanced after growth in the genetically malleable host Dictyostelium discoideum. Infect. Immun. 71:4862– 4872. 23 Solomon, J. M., A. Ruper, J. A. Cardelli, and R. R. Isberg. 2000. Intracellular growth of Legionella pneumophila in Dictyostelium discoideum, a system for genetic analysis of host–pathogen interactions. Infect. Immun. 68:2939–2947. 24 Pukatzki, S., R. H. Kessin, and J. J. Mekalanos. 2002. The human pathogen Pseudomonas aeruginosa utilizes conserved virulence pathways to infect the social amoeba Dictyostelium discoideum. Proc. Natl. Acad. Sci. U. S. A. 99:3159– 3164. 25 Cosson, P., L. Zulianello, O. Join-Lambert, F. Faurisson, L. Gebbie, M. Benghezal, C. van Delden, L. K. Curty, and T. Khler. 2002. Pseudomonas aeruginosa virulence analyzed in a Dictyostelium discoideum host system. J. Bacteriol. 184:3027–3033. 26 Brenner, S. 1974. The genetics of Caenorhabditis elegans. Genetics 77:95–104. 27 C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2018. 28 Adams, M. D., S. E. Celniker, R. A. Holt, et al. 2000. The genome sequence of Drosophila melanogaster. Science 287:2185–2195. 29 Kurz, C. L., and J. J. Ewbank. 2000. Caenorhabditis elegans for the study of host– pathogen interactions. Trends Microbiol. 8:142–144. 30 Garsin, D. A., C. D. Sifri, E. Mylonakis, X. Qin, K. V. Singh, B. E. Murray, S. B. Calderwood, and F. M. Ausubel. 2001. A simple model host for identifying Gram-positive bacteria. Proc. Natl. Acad. Sci. U. S. A. 98:10892–10897. 31 Sifri, C. D., E. Mylonakis, K. V. Singh, X. Qin, D. A. Garsin, B. E. Murray, F. M. Ausubel, and S. B. Calderwood. 2002. Virulence effect of Enterococcus faecalis protease genes and the quorum-sensing

32

33

34

35

36

37

38

39

40

locus fsr in Caenorhabditis elegans and mice. Infect. Immun. 70:5647–5650. O’Quinn, A. L., E. M. Wiegand, and J. A. Jeddeloh. 2001. Burkholderia pseudomallei kills the nematode Caenorhabditis elegans using an endotoxin-mediated paralysis. Cell. Microbiol. 3:381–394. Mylonakis, E., F. M. Ausubel, J. R. Perfect, J. Heitman, and S. B. Calderwood. 2002. Killing of Caenorhabditis elegans by Cryptococcus neoformans as a model of yeast pathogenesis. Proc. Natl. Acad. Sci. U. S. A. 99:15675–15680. Begun, J., C. D. Sifri, S. Goldman, S. B. Calderwood, and F. M. Ausubel. 2005. Staphylococcus aureus virulence factors identified by using a high-throughput Caenorhabtidis elegans-killing model. Infect. Immun. 73:872–877. Bolm, M., W. T. Jansen, R. Schnabel, and G. S. Chatwal. 2004. Hydrogen peroxide-mediated killing of Caenorhabditis elegans is a common feature of different streptococcal species. Infect. Immun. 72:1192–1194. Mylonakis, E., A. Idnurm, R. Moreno, J. El Khoury, J. B. Rottman, F. M. Ausuble, J. Heitman, and S. B. Calderwood. 2004. Cryptococcus neoformans Kin1 protein kinase homologue identified through a Caenorhabditis elegans screen promotes virulence in mammals. Mol. Microbiol. 54:407–419. Labrousse, A., S. Chauvet, C. Couillault, C. L. Kurz, and J. L. Ewbank. 2000. Caenorhabditis elegans is a model host for Salmonella typhimurium. Curr. Biol. 10:1543–1545. Aballay, A., and F. M. Ausubel. 2001. Programmed cell death mediated by ced-3 and ced-4 protects Caenorhabditis elegans from Salmonella typhimuriummediated killing. Proc. Natl. Acad. Sci. U. S. A. 98:2735–2739. Kim, D. H., R. Feinbaum, G. Alloing, F. E. Emerson, D. A. Garsin, H. Inoue, M. Tanaka-Hino, N. Hisamoto, K. Matsumoto, M. W. Tan, and F. M. Ausubel. 2002. A conserved p38 MAP kinase pathway in Caenorhabditis elegans innate immunity. Science 297:623–626. Aballay, A., E. Drenkard, L. R. Hilbun, and F. M. Ausubel. 2003. Caenorhabditis elegans innate immune response triggered by Salmonella enterica requires

455

456

20 Model Host Systems: Tools for Comprehensive Analysis of Host–Pathogen Interactions

41

42

43

44

45

46

47

48

49

50

51

intact LPS and is mediated by a MAPK signaling pathway. Curr. Biol. 13:47–52. Mallo, G. V., C. L. Kurz, C. Couillaut, N. Pujol, S. Granjeaud, Y. Kohara, and J. L. Ewbank. 2002. Inducible antibacterial defense system in C. elegans. Curr. Biol. 12:1209–1214. Garsin, D. A., J. M. Villanueva, J. Begun, D. H. Kim, C. D. Sifri, S. B. Calderwood, G. Ruvkun, and F. M. Ausubel. 2003. Long-lived C. elegans daf-2 mutants are resistent to bacterial pathogens. Science 3000:1921. Jander, G., L. G. Rahme, and F. M. Ausubel. 2000. Positive correlation between virulence of Pseudomonas aeroginosa mutants in mice and insects. J. Bacteriol. 182:3843–3845. Flyg, C., K. Kenne, and H. G. Boman. 1980. Insect pathogenic properties of Serratia marcescens: phage-resistant mutants with a decreased resistance to Cecropia immunity and a decreased virulence to Drosophila. J. Gen. Microbiol. 120:173–181. D’ Argenio, D. A., L. A. Gallagher, C. A. Berg, and C. Manoil. 2001. Drosophila as a model host for Pseudomonas aeruginosa infection. J. Bacteriol. 183:1466–1471. Boulanger, N., L. Ehret–Sabatier, R. Brun, D. Zachary, P. Bulet, and J. L. Imler. 2001. Immune response of Drosophila melanogaster to infection with the flagellate parasite Crithidia spp. Insect Biochem. Mol. Biol. 31:129–137. Schneider, D., and M. Shahabuddin. 2000. Malaria development in a Drosophila model. Science 288:2376–2379. Cheng , L. W. and D. A. Portnoy. 2003. Drosophila S2 cells: an alternative infection model for Listeria monocytogenes. Cell. Microbiol. 5:875–885. Boman, H. G., I. Nilsson, and B. Rasmuson. 1971. Inducible antibacterial defence system in Drosophila. Nature 237:232–235. Imler, J. L., and J. A. Hoffmann. 2000. Signaling mechanisms in the antimicrobial host defense of Drosophila. Curr. Opin. Microbiol. 3:16–22. Hoffmann, J. A., and J. M. Reichhart. 2002 Drosophila innate immunity: an

evolutionary perspective. Nat. Immunol. 3:121–126. 52 Gregorio, E. D., P. T. Spellman, P. Tzou, G. M. Rubin, and B. Lemaitre. 2002. The Toll and Imd pathways are the major regulators of the immune response in Drosophila. EMBO J. 21:2568–2579. 53 Lau, G. W., B. C. Goumnerov, C. L. Walendziewicz, J. Hewitson, W. Xiao, S. Mahajan-Miklos, R. G. Tompkins, L. A. Perkins, and L. G. Rahme. 2003. The Drosophila melanogaster Toll pathway participates in resistance to infection by the gram-negative human pathogen Pseudomonas aeruginosa. Infect. Immun. 71:4059–4066. 54 Fauvarque, M. O., E. Bergeret, J. Chabert, D. Dacheux, M. Satre, and I. Attree. 2002. Role and activation of type III secretion system genes in Pseudomonas aeruginosa-induced Drosophila killing. Microb. Pathog. 32:287–295. 55 Alarco, A.–M., A. Marcil, J. Chen, B. Suter, D. Thomas, and M. Whiteway. 2004. Immune-deficient Drosophila melanogaster: a model for the innate immune response to human fungal pathogens. J. Immunol. 172:5622–5628. 56 Neely, M. N., J. D. Pfeifer, and M. Caparon. 2002. Streptococcus–zebrafish model of bacterial pathogenesis. Infect. Immun. 70:3904–3914. 57 Van der Sar, A. M., B. J. Appelmelk, M. J. E. Vandenbroucke-Grauls, and W. Bitter. 2004. A star with stripes: zebrafish as an infection model. Trends Microbiol. 12:451–457. 58 Jordan, E., and F. S. Collins. 1996. A march of genetics maps. Nature 380:111–112. 59 Qureshi, S. T., L. Larviere, G. Leveque, S. Clermont, K. J. Moore, P. Gros, D. Malo. 1999. Endotoxin-tolerant mice have mutations in Toll-like receptor 4 (Tlr4). J. Exp. Med. 189:615–625. 60 Bellamy, R. 1999. The natural resistance-associated macrophage protein and susceptibility to intracellular pathogens. Microb. Infect. 1:23–27. 61 Qureshi, S. T., E. Skamene, D. Malo. 1999a. Comparative genomics and host resistance against infectious diseases. Emerg. Infect. Dis. 5:36–47.

457

21 Expression Analysis of Human Genes During Infection Erwin Bohn and Ingo B. Autenrieth

21.1 Introduction

Infectious diseases are the result of interaction between pathogenic microorganisms and their host. This process is very dynamic for both the pathogen and the host and usually results in dramatic changes of gene expression in both microbial pathogen and host. During the various phases of an infection, many types of host cells including epithelial cells, endothelial cells, mesenchymal cells, cells representing innate immunity such as polymorphonuclear leukocytes, dendritic cells, and macrophages, and cells representing components of adaptive immunity such as B and T lymphocytes are involved, and some of them even directly interact with the pathogen. In consequence, these interactions result in (a) clearance of the infection by the host, or (b) an arrangement between host and pathogen in which both find their way to survive, or (c) a fatal outcome for the host. It is one of the great goals in infectious diseases to identify genes or gene products that can be used as early diagnostic markers for the different outcome of infections in order to develop individually risk-adapted treatment strategies. In the postgenomic era a considerable number of studies on host–pathogen interactions have been published in which the gene expression response of the host or the microorganisms or even both of them have been addressed in detail. In fact, the development of novel sophisticated high-throughput technologies including transcriptomics, proteomics, and large-scale genetics led to new strategies to investigate infectious diseases. Recent data shed new light on how microbial virulence factors affect host responses, particularly immune responses. Moreover, these novel approaches allowed a more comprehensive approach to: . identifying the specific bacterial determinants that trigger the expression of distinct sets of host genes and identifying the various signaling pathways that are involved in these gene expression programs . identifying and investigating which features of the microbialtriggered host gene expression are common responses to diverse pathogens and which are variable and unique for certain pathogens

458

21 Expression Analysis of Human Genes During Infection .

.

dissecting how microbial virulence mechanisms which may be unique to pathogens influence and modify host responses, and unraveling how pathogens exploit host cell pathways to their own advantage in order to establish an infection discovering new signaling pathways and effector functions involved in host defense which are as yet unknown

In this chapter we will give an overview on how functional genomics has been used to understand the cross-talk between host and pathogen. Usually, this includes in vitro studies with various cell types and pathogens; however, few studies were done with tissues from infected patients. As many pathogens invade the host at mucosal surfaces, there are several examples of interactions with epithelial cells which will be discussed in detail. Also, many studies deal with the interaction of pathogens with cells of the innate immune system, such as macrophages and dendritic cells, as well as with pathways stimulated by pattern recognition receptors such as Toll-like receptors (TLRs).

21.2 Comparison of Gene Expression Profiles of Macrophages and Dendritic Cells In Vitro Upon Infection with Different Pathogens

Several studies [1–3] compared the responses of macrophages or dendritic cells upon infection with different pathogens. Huang et al. [1] determined the gene expression profiles of human monocyte-derived dendritic cells in response to Escherichia coli SD54, Candida albicans, and influenzavirus as well as to their molecular components by using oligonucleotide arrays representing 6800 genes. While 166 genes were found to be regulated in common by all these pathogens, 118 genes were specifically regulated by E. coli, 58 specifically by influenzavirus whereas C. albicans only modulated the expression of a subset of E. coli-regulated genes. The 166 common regulated genes may represent a core response of dendritic cells against microbes, while other genes specifically reflect the interaction with a certain type of pathogen. Interestingly, rapidly after cell contact with any of the pathogens, a decline in the transcripts of genes associated with phagocytosis and pathogen recognition was observed, whereas genes encoding cytokines were simultaneously upregulated. Strikingly, these authors also found that the molecular components of the pathogens (lipopolysaccharide, mannan, dsRNA), particularly lipopolysaccharide (LPS), may mimic a large part of the bacterial response. Surprisingly, the fungal component mannan reflected the response to a bacterial pathogen rather than that of the fungal agent. Unfortunately, the nature of the E. coli strains (e.g., expression of pathogenicity factors that might affect gene expression) was not addressed in this study. Nau et al. [2] infected monocyte-derived macrophages with Listeria monocytogenes EGD, Staphylococcus aureus ISP794, Mycobacterium bovis BCG, Salmonella typhimurium (ATCC 14028), or E. coli O157:H7. In addition, they used several

21.2 Comparison of Gene Expression Profiles of Macrophages and Dendritic Cells In Vitro …

bacterial components such as E. coli or Salmonella LPS, lipoteichoic acid (LTA) and muramyl dipeptide, fMLP protein, protein A, mannose, or heat shock proteins. Microarray analyses were performed using a microarray comprising 6800 genes. A total of 977 genes were significantly changed upon stimulation by one or more bacteria. Despite the diversity of the bacteria studied, a shared transcriptional response was elicited consisting of 132 genes induced and 59 repressed. The authors defined this response as a common activation program of macrophages against gram-positive bacteria, gram-negative bacteria, and mycobacteria. This activation program included cytokines such as IL-6, TNF-a, IL-12, chemokines such as IL-8, IP-10, MCP-1, adhesion molecules such as CD44, ICAM-1, cytokine receptors, genes involved in tissue remodeling (e.g., MMP1, MMP10), stress response (GADD45A) and several transcription factors. The activation program induced by several bacteria in macrophages was also elicited by bacterial components which function as ligands for Toll-like receptor (TLR)-2 (e.g., LTA) or TLR-4 (e.g., LPS) but not by other bacterial components such as fMLP, protein A, or mannose, indicating that the activation program is due to signaling mediated by macrophage TLRs. In addition to the common response, distinct alterations of the response due to specific bacteria were noticed. For example, Mycobacterium bovis induced IL-12 p40 and IL-15 only poorly compared to E. coli and S. aureus. A further study carried out by Boldrick et al. [3] compared infection of peripheral blood mononuclear cells with viable or killed E. coli, S. aureus clinical strains, and various Bordetella pertussis strains. In this study eight time points in a range of 24 h after infection were analyzed. A group of 515 unique genes was found whose expression level changed most dramatically. Most of these genes responded in a strikingly stereotypic manner to all bacterial treatments and also to pharmacological stimulants such as phorbol myristate acetate (PMA) ionomycin. More than 200 genes were regulated in common by all bacterial stimuli. Chaussabel et al. [4] infected monocyte-derived dendritic cells and macrophages from a total of seven healthy donors with five different pathogens which all produce chronic infections: the bacterium Mycobacterium tuberculosis, the intracellularly located protozoons Leishmania major, Leishmania donovani, and Toxoplasma gondii, and the extracellular helminth Brugia malayi for 16 h after infection. One interesting finding of this study was that the constitutive gene expression in dendritic cells and macrophages was very similar, comprising about 3692 shared genes, while only 130 genes were uniquely expressed in macrophages and 286 genes specifically expressed in dendritic cells. However, after exposure to the pathogens, macrophages and dendritic cells exhibited different gene expression profiles. In general, the response to each pathogen was much more diverse and specific in dendritic cells than in macrophages. Interestingly, the extracellular parasite B. malayi induced only up to 14 genes in dendritic cells, depending on the number of parasites, indicating that the transcriptional response to this pathogen is almost silent. Since dendritic cells responded more diversely upon exposure to these pathogens, the authors focused on four gene clusters found in dendritic cells. Leishma-

459

460

21 Expression Analysis of Human Genes During Infection

nia spp., M tuberculosis, and T. gondii induced genes belonging to the NF-jB signaling pathway (NFKB1, NFKB2), apoptosis regulators (TRAF1, TRADD), and TNF-related molecules as well as genes involved in cell growth and cytokine production (STAT1, PTPN2, WNT5a, DUSP1) (cluster I). Cluster II comprises inflammatory mediators (e.g., RANTES, MIP-1b, GRO-1-3, IL-8, IL-6, IL-1b, TNFa), and adhesion molecules (e.g., CD44, ICAM1), which were induced by Leishmania spp. and M. tuberculosis, but not by T. gondii. A third cluster of genes was highly induced by T. gondii and M. tuberculosis, moderately by L. major, and slightly by L. donovani. Most of these genes were interferon-regulated genes involved in signaling (STAT1 and 4, IRF-4 and 7), antiviral activities (MX1, MX2, ISG15, ISG20), or proliferation (IFITM). This subgroup was found to be also transcriptionally regulated in macrophages, but the pattern of regulation induced by the different pathogens was different from that in dendritic cells. A second subgroup included interferon-induced chemokines (IP10, MIG). This subgroup showed a similar expression pattern in dendritic cells and macrophages. In addition, genes involved in antigen processing and presentation were highly upregulated only in T. gondii and M. tuberculosis, but exclusively in dendritic cells and not in macrophages. While genes of cluster I (e.g., NFKB1) were induced in dendritic cells by M. tuberculosis, T. gondii, L. major, and L. donovani, in macrophages only M. tuberculosis and L. major induced these genes. Genes of cluster II which in dendritic cells were induced by M. tuberculosis, L. major, and L. donovani, but not by T. gondii, were induced by all these pathogens in macrophages. The most diverse pattern was found in cluster III for the subgroups of interferon-induced genes upon infection with different pathogens in dendritic cells and macrophages. Taken together, the most striking finding of this study is that, besides the general differences of the response of dendritic cells and macrophages to these pathogens, unique and pathogen-specific expression profiles can be identified for both dendritic cells and macrophages. This might reflect the idea that antigen-presenting cells respond with specific signaling pathways to various pathogens, which explains the specific features of host–pathogen interaction resulting in a distinct course of infection. Clearly, such studies need to be extended, and the pattern recognition receptors engaged or modulated by various pathogens have to be defined in detail in order to explain both the specificity and the convergence of gene expression patterns.

21.3 Septicemia

Despite improvements in hemodynamic monitoring, antibiotics, and other supportive therapies, septicemia remains the most common cause of death in intensive care units. Although there have been modest successes in treating sepsis in animal models, little progress has been made in human studies. Possible reasons

21.4 Gene Expression in Epithelial Cells Modulated by Bacteria

for this failure include heterogeneity of the patient population and the complex redundancy of pathways of inflammation by different microbial pathogens. One pilot study examining only eight patients with diverse degrees of concomitant disease, variable sources of infection, and a broad range of sepsis severity addressed the question whether gene expression profiling may be a helpful tool for diagnosis of sepsis [5]. The patient group was deliberately selected to be heterogeneous in age, sources of infection, and diagnosis in order to identify robust sepsis-related expression signatures. As a control, four patients were included who had undergone spine surgery. For gene expression analysis, mRNA from whole blood was analyzed using a microarray comprising 340 probes for human genes relevant to inflammation and related processes. A list of 19 upregulated and 31 downregulated genes was found with most significant changes. A strong similarity of the expression pattern of these genes was found in all patients. For example, genes such as TGF-b1, DUSP-9 and 10, IL-18, S100A8, and S100A12 were upregulated while TNF, TIMP-1, and GILZ were downregulated. The unity of the responses suggests that the principle of this approach can be adapted to diagnosis of the early stage of sepsis. Moreover, the authors claimed a striking correlation between the conventional diagnostic classification and the gene expression analysis. Therefore, such approaches may be helpful if larger cohorts of patients are used to identify marker genes which can predict the outcome of sepsis at an early phase of the disease and provide clinicians with a tool by which to optimize individual therapeutic strategies.

21.4 Gene Expression in Epithelial Cells Modulated by Bacteria

Epithelial cells have been recognized to play an important role in mucosal immunity and therefore exhibit a number of immunological functions including expression of adhesion molecules, secretion of effector molecules such as defensins, and expression of cytokines and cytokine receptors [6, 7]. Thus, epithelial cells are an integral part of the mucosal immunity network which by secretion of inflammatory cytokines and chemokines may signal the presence of pathogenic bacteria to the mucosal immune system [7–9]. The inflammatory reaction at mucosal sites can assist pathogen control, but it may also be subverted by the pathogen to promote its dissemination in the host. Moreover, epithelial cells constitute a crucial physical barrier between the underlying mucosa and microorganisms residing in the lumen of the human intestinal tract. To overcome this barrier, many enteric pathogens have evolved the ability to invade and pass through the intestinal epithelium as a key initial step to establish mucosal and, subsequently, systemic infection of the host. Microbial entry into the epithelium is an active process that requires signaling from the invading pathogen to the host cell, although the specific signaling pathways involved differ for various microorganisms.

461

462

21 Expression Analysis of Human Genes During Infection

21.4.1 Helicobacter pylori

H. pylori, which adheres to the luminal surface of gastric epithelial cells, is the causative agent of gastritis and possesses a number of virulence factors that modulate its interaction with the host [10, 11]. These virulence factors include the secreted cytotoxin VacA and the gene products of the pathogenicity island cagPAI. The cagPAI encodes a type IV secretion system that facilitates the injection of bacterial proteins into host cells [10, 11]. One of these proteins, CagA, undergoes tyrosine phosphorylation within epithelial cells and deregulates SHP-2 [12], and mediates alteration of the actin cytoskeleton [13]. Several analyses of the interaction of Helicobacter with epithelial cells and the role of the cagPAI for modulation of gene expression were performed. Cox et al. [14] infected Kato 3 gastric epithelial cells with a wildtype cagPAI+ or an isogenic cagPAI– H. pylori mutant strain and analyzed host gene expression by cDNA arrays. Exposure of cells to the various H. pylori strains revealed 106 genes which were at least 1.3-fold differentially expressed upon infection with cagPAI+ or cagPAI– strains. In line with previous findings [15], several NF-jB regulated genes were found to be differentially expressed upon CagPAI+ infection. Moreover, several genes involved in regulation of cell cycle (e.g., cyclin D1) and apoptosis (e.g., Bclx) were more highly expressed upon infection with a cagPAI+ strain, confirming that cagPAI+ strains are associated with reduced apoptosis and higher gastric epithelial cell proliferation [16]. Infection with cagPAI+ strain was associated with early decreased expression of genes coding for cellular regulation such as elongin B. Elongin B is part of the multifunctional regulatory elongin BC complex, which is thought to play an important role in negative regulation of hypoxia-inducible proteins by promoting degradation of HIF1-a [17]. Several genes were found in gastric biopsies from patients undergoing routine upper gastrointestinal endoscopy [14]. The mRNA expression of amphiregulin, a member of the epidermal growth factor family which has mitogenic effects on epithelial cells, was confirmed and is cagPAI-independently regulated. Four genes of the ADAMs family of membrane metalloproteinases which may have important functions in the release of cell surface molecules, cell–cell and cell–extracellular matrix interaction were changed upon H. pylori infection. ADAM10, which is expressed in hematological malignancies [18], has collagenase type IV activity [19], and cleaves proTNF-a to the soluble forms [20], is expressed more frequently in patients infected with cagPAI strains. Further studies have to show how these findings are linked to H. pylori-induced gastric pathology. Guillemin et al. [21] infected gastric epithelial AGS cells with H. pylori G27 and isogenic deletion mutants for cagA, vacA, cagE, cagN, or the entire cagPAI. A total of 206 genes were designated as a canonical H. pylori response. The most abundant functional groups were genes encoding proteins involved in either innate immune response or the regulation of cell shape and adhesion, consistent with the characterized cytokine secretion by and cell elongation of gastric epithelial

21.4 Gene Expression in Epithelial Cells Modulated by Bacteria

AGS cells upon H. pylori infection. Moreover, genes involved in NF-jB signaling were found to be upregulated upon H. pylori infection. A large number of genes involved in cytoskeleton function were found to be upregulated, such as Cdc42 effector protein 2 (CEP2), genes associated with the actin cytoskeleton (aV-integrin) and intermediate filaments (keratin 17), as well as genes encoding components of cell junctions (claudin 1 and 4). In addition, gene repression was also found. Twenty percent of these repressed genes were signal transduction molecules such as the wingless (Wnt) signaling pathway members frizzled 7 and dickkopf, the antagonist of tyrosine kinase signaling DAB2, and the mitoattractant CTGF. Deletion of the entire cagPAI abolishes most of the gene modulation by H. pylori, indicating that the major – response of AGS cells to H. pylori is mediated through the cagPAI. CagA infection differed from wildtype infection in a subset of 18 genes including genes involved in cell shape changes such as pleckstrin homologue domain 1, RhoB, CEP2, claudin 4, and enigma. Since cagA requires cagE to be translocated into host cells the cagA-dependent genes represent a subset of cagE-dependent genes. In contrast to – – cagA mutant, cagE mutant no longer modulates gene expression involved in inflammation such as NF-jB and TNF signaling, which may indicate that other factors of the cagPAI are crucial to elicit these responses. The identification of cagA-mediated gene expression may help to shed new light on the cagA-modulated host responses such as CagA signal transduction and small GTPAse regulation of the cytoskeleton. Environmental and genetic factors are important in gastric carcinogenesis. Chronic gastritis caused by H. pylori infection is associated with gastric cancer, and infection with H. pylori is an almost invariant feature of chronic gastritis in patients with gastric cancer [22]. Boussioutas et al. [23] used more than one hundred tumor and adjacent mucosa samples from Australian and Chinese patients displaying different gastric tumor types as well as mucosa which showed chronic gastritis or intestinal metaplasia in order to perform gene expression profiling using a spotted cDNA array containing 9381 nonredundant gene elements. Each nonneoplastic sample and each gastric cancer subtype could be defined by a distinct gene expression profile [23]. Unsupervised clustering of the data led to a highly structured partitioning of samples into the recognized histological subgroups. Comparison of Chinese and Australian patients showed no clear segregation of gastric cancer samples based on ethnicity. Chronic gastritis was characterized by expression patterns of groups of genes sharing similar biological functions, including metallothionein or elongation factor gene cluster. In particular, a large number of nuclear genes encoding mitochondrial proteins was found. For VacA it was shown that the p34 fragment of VacA localizes specifically to mitochondria [24], causing release of cytochrome c and apoptosis [25, 26]. Furthermore, VacA induces mitochondrial damage when applied to the gastric epithelial cell line AZ521 [26]. Therefore the finding presented above may be consistent with the loss or damage of mitochondria which may lead to a high level of mitochondrial biogenesis in chronic gastritis. Interestingly, the above-mentioned in vitro studies using gastric epithelial cell lines for

463

464

21 Expression Analysis of Human Genes During Infection

H. pylori infection did not describe any changes in mitochondrial genes, which may be due to the different microarrays used or to the cell line used. Taken together, this study and others [27] may help to introduce a gene-expression-based classification of gastric carcinoma and may contribute to creating novel hypotheses to explore the pathogenesis of gastritis and the development of gastric cancer at the molecular level. 21.4.2 Yersinia enterocolitica

Y. enterocolitica is an extracellularly located bacterium that by means of a type III secretion system injects at least six pathogenicity factors, the Yersinia outer proteins (Yops), into the cytosol of host cells [28]. Three studies addressed the role of the Yops to modulate the host response at the gene expression level [29–31]; only one study was investigating the gene expression response in human cells [31]. The mutant strains Y. enterocolitica WA-C, lacking the pYV virulence plasmid, and Y. enterocolitica WA-C Dinv, lacking in addition invasin (Inv), were selected to address gene regulation in epithelial cells modulated by Inv and other chromosomally encoded pathogenicity factors. The wildtype strain Y. enterocolitica WA-P and the mutant strain Y. enterocolitica WA-P DyopP were used to investigate YopPspecific modulation of gene regulation in epithelial cells. The outer membrane protein Inv is important in the early phase of the infection by promoting intestinal translocation of the pathogen and colonization of Peyer’s patches [32, 33, 34]. Inv is a high-affinity ligand for b1 integrins, thereby inducing a zipper-like uptake into host cells [34, 35]. Additionally, Inv may promote proinflammatory host cell responses [36]. In fact, Inv triggers activation of NF-jB and production of cytokines in intestinal epithelial cells [37] in conjunction with Rac-1 and MAP kinase cascades [38]. In contrast, YopP inhibits NF-jB and MAPK signaling pathways by targeting IjB kinase and MAPK kinases, resulting in reduced cytokine production and apoptosis in macrophages [39, 40]. Analysis of the expression profiles of Yersinia-infected cells revealed a total of 165 genes which were differentially expressed compared to uninfected cells. One striking finding of this study was that only 27 genes were differentially expressed upon infection with wildtype Yersinia bearing the pYV virulence plasmid. Among those genes the small GTPase RhoB and transcriptional regulators such as the nuclear receptor Kruppel-like factor (KLF)2 and glucocorticoid-induced leucine zipper (GILZ) were exclusively regulated by virulence-plasmid-bearing bacteria but not by plasmidless bacteria. It is interesting to note that KLF2, RhoB, and GILZ are reported to be involved in silencing of gene transcription [41–46], which is consistent with the fact that pYV-encoded Yops may inhibit immune responses [47]. Only three NF-jB-activated proinflammatory molecules, namely immediate early response 3 (IER3), TNFAIP3 (A20), and cyclooxygenase 2 (COX-2) were + expressed 4 h after Y. enterocolitica pYV infection. Cysteine-rich angiogenic inducer 61 (CYR61), an extracellular matrix protein involved in cell survival and angio-

21.4 Gene Expression in Epithelial Cells Modulated by Bacteria

genesis [48], and serum-induced kinase (SNK), which plays a role in cell division, + were both repressed by pYV infection. + These data suggest that wildtype pYV infection promotes weak transcriptional responses in epithelial cells, including proinflammatory responses, antiproliferative responses, and genes whose products sustain silencing of host responses. In contrast, 110 probesets were differentially regulated in HeLa cells exposed to Yersi– nia lacking the virulence plasmid (pYV ), indicating that the virulence plasmid represses gene expression in epithelial cells very efficiently. Since Inv leads to activation of NF-jB, a large number of known NF-jB-regulated genes are found to be – regulated upon infection with Y. enterocolitica WA-C pYV but not with Y. enterocolitica WA-C Dinv. Interestingly, these genes (e.g., IL-8, Gro1, NFKB1) were also + found expressed in gastric epithelial AGS cells infected with H. pylori cagPAI strain. Thus, similar to what has been described for dendritic cells and macrophages, there exists a common response of epithelial cells upon bacterial infec-

Fig. 21.1 Modulation of gene expression in epithelial cells or macrophages by Y. enterocolitica virulence factors. (a) YadA/Inv in epithelial cells lead to activation of a strong inflammatory response including activation of the transcription factors NF-jB. (b) The translocation of Yersinia outer proteins suppresses

the host response directly by interference with signal cascades, and possibly indirectly by induction of a host response which may lead to inhibition of signal cascades. Inv, Invasin; ECM, extracellular matrix; YOPS, Yersinia outer proteins.

465

466

21 Expression Analysis of Human Genes During Infection

tion. However, in contrast to H. pylori, Yersinia may also suppress these genes. Comparison of yopP deletion mutant and wildtype Yersinia revealed that YopP represses the expression of NF-jB-activated genes. This finding is in line with previous data showing that YopP inhibits MAPKinases and IjB kinase (IKK). Interestingly, the inhibitory effect of YopP (a) is restricted to a subset of NF-jB-regulated genes and (b) appears to act only transiently at 2 h after infection, indicating that other Yops translocated into host cells also contribute to the Yersinia-mediated gene repression. The as yet unanswered, but crucial question arising from these studies is: In which host cells, and when, upon infection in vivo, is the gene expression induced by Inv or repressed by YopP (Fig. 21.1). To answer this, genomic and molecular in vivo studies are required. Comparison of host gene expression upon Yersinia and Helicobacter infections reveals different strategies of the bacteria to interact with epithelial cells. Helico+ bacter cagPAI , strains which are associated with high virulence and severe clinical outcome, induce a strong proinflammatory response which is partially comparable with the host response induced by avirulent plasmidless Yersinia strains. In contrast, highly virulent Yersinia represses gene expression effectively. From this we can speculate that at mucosal surfaces, pathogenic bacteria trigger proinflammatory host gene expression in order to promote infection, whereas invasive pathogens trigger either repression of inflammatory host genes or expression of anti-inflammatory host genes in order to evade the host immune system. 21.4.3 Pseudomonas aeruginosa

P. aeruginosa is an opportunistic pathogen causing pneumonia in cystic fibrosis and hospitalized patients [49, 50]. A study by Ichikawa et al. [51] investigated the host response of the human lung carcinoma A549 epithelial cell upon exposure to P. aeruginosa strain PAK and an isogenic strain lacking expression of type IV pili. Since type IV pili are essential for adherence, this study dissected the gene expression in host cells infected with a P. aeruginosa strain which is able to adhere to epithelial cells and a strain which can not. Using high-density DNA microarrays consisting of 1506 human cDNA clones, gene expression analysis was carried out. A total of 22 genes were at least two-fold differentially expressed, including several inflammatory genes. Sixteen genes were found to be at least two-fold differentially – + expressed if the pilA strain and pilA strains were compared. In addition, 11 – + genes were expressed upon infection with both the pilA and pilA strains. The + response upon pilA strain infection showed similarities to the profiles found + upon infection with cagPAI H. pylori and plasmidless Y. enterocolitica. Motile P. aeruginosa phenotypes characterized by the presence of flagella are essential in the establishment of acute infection, while mucoid P. aeruginosa phenotypes characterized by the production of the polysaccharide alginate are critical in the development of chronic infections. To compare the host gene expression upon infection with motile and mucoid P. aeruginosa strains, Calu3 human airway epithelial cells were infected with different alginate and flagellin mutants [52].

21.4 Gene Expression in Epithelial Cells Modulated by Bacteria

Subsequently, gene expression profiling was performed using arrays comprising + 14 239 human genes. The pattern of gene expression induced by flagellin strains includes innate host defense genes, proinflammatory cytokines, and chemokines representing the typical set of NF-jB- and TNF-regulated genes. In contrast, mucoid P. aeruginosa strains led to an overall attenuation of host response, and do not induce NF-jB activity, which is in line with the finding that almost no inflammatory response genes are induced by mucoid P. aeruginosa. The flagellin-mediated host response is mediated by TLR-5. While alginate is known to signal through TLR-2 and TLR-4 in macrophages [53], alginate does not elicit a proinflammatory response in epithelial cells [54] because epithelial cells may lack components of TLR signaling. Another study highlighted the role of type III secretion systems of P. aeruginosa for modulation of gene expression [55]. P. aeruginosa PA103, a nonmotile strain deficient for synthesis of complete flagellar filament, secretes ExoU and ExoT [56]. ExoU is a potent cytotoxin whose host cell targets and mechanism of action are not yet known [57, 58, 59]. ExoT is a bifunctional protein possessing an N-terminal GTPase-activating domain with GAP activity for RhoA, Rac, and Cdc42 and a C-terminal ADP-ribosyltransferase domain [60, 61]. Microarray analysis (a total of 9243 total cDNA elements) of a human tracheal epithelium cell line exposed to P. aeruginosa PA103 or various isogenic PA103 mutants such as exoT, exoU, exoUT, pscJ, which results in a nonfunctional type III secretion apparatus (TTSS), and pilA, a type IV pili mutant, was performed [55]. The authors reported 46 unique genes regulated by PA103. Further analysis defined 28 genes which were TTSS-independently regulated. Among these, several genes, such as VEGF-a, IGFBP3, adrenomedullin, stanniocalcin1, and NF-IL3 are known to be regulated by the transcription factor HIF-1a [62–65], suggesting that PA103 products induce expression of genes regulated by HIF1a. Fourteen genes were found to be regulated by ExoU. This group revealed a striking number of genes involved in transcriptional regulation, such as RhoB, which may inhibit NF-jB activation [45], tristetraprolin, which attenuates the mRNA stability of TNF-a [66], dual specificity phosphatase 1 (DUSP1, MKP1), which inactivates MAP kinases [67], and transcription factor 8 (TCF8, ZEB, NIL2A), which is known to be a transcriptional repressor [68, 69]. In addition, Exo U induced AP-1 activation which coincided with expression of AP-1-regulated genes such as MCP1, and IL-6. Cystic fibrosis is a monogenic disorder characterized by a dysfunction of the cystic fibrosis transmembrane conductance regulator CFTR [50], resulting in disturbed ion transport across epithelia and a severe form of lung disease that is characterized by chronic infection with P. aeruginosa and S. aureus. The mechanism by which CFTR mutation leads to lung disease is not well understood, although recent studies show that a common mutant form of CFTR (DF408) leads to a cell stress response resulting in increased NF-kB activation [70]. Virella-Lowell et al. [71] addressed the question how CFTR deficiency may affect response of epithelial cells exposed to P. aeruginosa. For this purpose the epithelial cell line IB3-1 showing CFTR genotype DF408 and the isogenic transduced cell line with low-level

467

468

21 Expression Analysis of Human Genes During Infection

expression of wildtype CFTR were compared with regard to gene expression. Infection with P. aeruginosa PAO1 showed that CFTR deficiency exaggerated activation of typical NF-jB-activated genes, cytokine receptors such as IL15Ra, IL-18R, and genes involved in activation of IFN signaling such as IRF3 and STAT1. Moreover, enzymes involved in protease inhibition, and enzymes involved in metabolism showed blunted activation. Thus, CFTR mutations change both the constitutive and the P. aeruginosa-induced host gene expression. Further studies will have to reveal which of these changes have an impact on the outcome of the chronic infection. Taken together, various virulence factors of Pseudomonas account for different activation programs in epithelial cells: (a) flagellin triggers NF-jB activation and proinflammatory gene expression, (b) ExoU accounts for a transcriptional repression program and activation of AP-1, and (c) as yet unrecognized bacterial factors account for a HIF-1a-mediated gene expression program. 21.4.4 Bartonella henselae

B. henselae, the causative agent of cat scratch disease and bacillary angiomatosis, is the only known bacterial pathogen causing vasculoproliferative disorders in humans (hence the name bacillary angiomatosis) [72, 73]. Gene expression analysis upon infection of epithelial HeLa cells with B. henselae revealed only 20 genes to be regulated at 6 h after infection [74]. At least 14 of these genes are known to be regulated by the transcription factor HIF-1a. HIF-1a is a key transcription factor for the induction of angiogenic growth factors that adjust the vascular blood supply to tissues on metabolic demand [75, 76]. It is speculated that by this particular gene expression profile B. henselae reprograms host cells in order to create a perfect habitat for its own growth: in fact, B. henselae invades and replicates in endothelial cells [77, 78]. The signal cascade which leads to the proangiogenic gene signature is yet unknown. One interesting question in this context is why and how B. henselae provokes this specific gene expression program in host cells but prevents induction of the typical afore-mentioned proinflammatory host cell response induced by engagement of, e.g., TLRs. The activation of the transcription factor HIF-1a by bacteria appears not to be restricted to B. henselae. There is evidence that viable as well as killed S. aureus [79] or P. aeruginosa [55] and probably many other bacteria induce activation of HIF-1a and subsequently HIF-1a-regulated genes. Future studies will need to investigate whether or not HIF-1a-mediated gene expression is a unique or specific host response in infectious diseases.

21.5 Common Signatures

21.5 Common Signatures

A common signature of the host cell responses to infection with pathogens is that many of these genes are known to be regulated by the NF-jB signaling pathway. The Rel/NF-jB family is comprised of several structurally related proteins that form homodimers and heterodimers. The most common Rel/NF-jB dimer contains p50–p65. These dimers bind to a set of related 10-bp DNA sites, called jB sites, to regulate the expression of genes. NF-jB-regulated gene products play a critical role in orchestrating both innate and adaptive immune responses. More than 150 different stimuli are known to induce activation of NF-kB, and more than 150 target genes are known to be regulated by NF-jB [80, 81]. The most prevalent classes among genes regulated by NF-jB include cytokines such as IL-1a, IL-1b, IL-6, IL-10, TNF-a, lymphotoxin, G-CSF and GM-CSF, chemokines IL-8, MIP1a, MIP1b, MIP2a, and MIP3a, chemokine receptors such as CCR5 as well as other factors such as COX-2. Genes encoding NF-jB Rel proteins were also found to be stereotypically regulated upon infections, reflecting the autoregulation of this signaling pathway [81]. During the past decade, important questions concerning the mechanisms of innate immunity, particularly the molecular basis of recognition of pathogens by the host, have been addressed. In fact, a number of pathogen-associated molecular patterns (PAMPs) and their recognition by so-called pattern recognition receptors (PRRs) of the host have been identified. Probably the most important system of PRRs is the Toll-like receptor (TLR) system [82]. There are 10 TLRs which recognize a broad spectrum of bacterial, fungal, viral, and protozoic compounds such as, for example, lipoteichoic acids (recognized by TLR2), lipopolysaccharides (TLR4), CpG DNA (TLR9), dsRNA (TLR3), and flagellin (TLR5). Several alternative signaling pathways that are initiated by engagement of TLRs and the adaptor molecules involved in transmitting these signaling pathways have been identified. The signaling pathways via TLRs originate from the cytoplasmic TIR domain. There are at least four TIR-domain-containing adaptors (MyD88, TIRAP, TRIF, and TRAM) that play important roles in TLR signaling. MyD88 is essential to signaling pathways of all TLR family members that lead to the production of inflammatory cytokines. In addition, TLRs bind to other adaptor molecules such as TRIF (TLR3 and 4) and TRAM (TLR4), which leads to the activation of the transcription factor IRF-3 and subsequently to the expression of IFN-b and via activation of STAT1 to the induction of interferon-inducible genes. The TLR signaling pathway explains the common response pattern induced by bacteria in dendritic cells and macrophages as observed by transcriptomics. Alternatively to the TLR mediated NF-jB activation, NF-kB activation can also be triggered either by binding to other surface molecules like b1-integrins, as was shown for the Y. enterocolitica proteins invasin and YadA [83], or by internalized bacteria or bacterial products via NOD/CARD proteins [84]. Although NF-jB- and interferon-mediated activation might be the most obvious response upon interaction of pathogens with bacteria, other pathways which have not yet been worked

469

470

21 Expression Analysis of Human Genes During Infection

Fig. 21.2 Possible mechanisms for the generation of common and specific host response by various pathogens in a particular cell type. (a) Common host response. Ligand/ receptor interaction of different pathogenassociated molecular patterns (PAMPs) such as, e.g., lipopolysaccharides, lipoproteins, etc, with different pattern recognition receptors (PRRs) such as Toll-like receptors (TLRs) can result in activation of the same stereotypic signal transduction cascades, leading to a common response in host cells. (b) Variation in common response. The different compositions of PAMPs as well as the different compositions of PRRs may also result in the activation of signal transduction cascades which are partially overlapping but in addition may integrate other, nonoverlapping signal transduction cascades, as for examples shown for

TLR2- and TLR4-mediated signaling leading to variations in a common host response. Moreover, cross-talk between different signal transduction cascades may lead to a different response than activation of only one signal transduction cascade. (c) Modulation of the host response by virulence factor composition. Distinct virulence factors and unique other components of the pathogen may act very specifically with distinct host receptors, or toxic effector mechanisms of a distinct virulence factor may inhibit signaling cascades activated by PRRs. (d) Modulation of the host response by genetic factors. Allelic variation in the host, including point mutations, insertions, and deletions in distinct genes or gene regions (e.g., in a receptor), may lead to modulation of the host response.

21.6 Genetic Polymorphisms and Mutations Affect Gene Expression …

out in infections such as HIF1a-mediated and WNT signaling pathways might be involved. Taking all the data together, we can conclude that there are both a common canonical and a variable specific host gene expression response in bacterial infections (Fig. 21.2). This can be explained by the diverse nature of pathogens which, however, is recognized by the host via PAMP-PRR recognition, leading to a convergence of pathways which promote common responses. However, individual virulence factors of pathogens may cause profound changes of this common response. Moreover, as discussed in the next section, mutations in the host genome may also account for the diversity in gene expression upon bacterial infections.

21.6 Genetic Polymorphisms and Mutations Affect Gene Expression: Impact on Infection Susceptibility and Infection Course

Gene expression profiles help us to understand the complexity of immune response mechanisms, and animal models including gene-targeted mice help to prove the biological relevance of particular genes for defense against pathogens. The impact of such genes for human disease is apparent in primary immunodeficiency due to genomic deletions or null mutations of single genes. Furthermore, point mutations in important genes of the immune system are often sufficient to affect the severity and the course of infection. Such genetic predispositions can explain why certain individuals develop chronic infections and others do not. As summarized recently [85, 86], several autosomal dominant primary immunodeficiencies are known which lead to an increased susceptibility to infectious diseases. Data from mouse infection models indicate that TLR signaling pathways play an important role in protection against pathogens but may contribute to the development of inflammatory disorders. Studies of genetic disorders in man may provide evidence that this is also the case for humans. In fact, there are three mendelian primary immunodeficiencies associated with impaired TLR signaling [86]. These deficiencies are downstream of TLRs and their adaptor molecules and link TLR signaling to NF-jB activation. Thus, NEMO (IKK-c) is affected in X-linked recessive anhidrotic ectodermal dysplasia with immunodeficiency (EDA-ID), resulting in unresponsiveness to NF-jB activation by TLR agonists. Besides a lot of other phenotypes (hypotrichosis, hypodontia), the phenotype of these patients is characterized by infections with encapsulated bacteria (H. influenzae, S. pneumoniae) or infections caused by weakly pathogenic Mycobacterium avium [87–91]. These infections can be found in peripheral tissues such as skin and the respiratory and digestive tracts, but also systemically in spleen, bones, and joints. Additionally, viral (CMV, herpes simplex virus) and fungal (Pneumocystis jiroveci) infections occur in these patients.

471

472

21 Expression Analysis of Human Genes During Infection

The hypomorphic mutation in NEMO results in a similar outcome as the autosomal dominant hypermorphic mutation of IjBa found in one patient [92]. The missense mutation prevents phosphorylation, ubiquitination, and degradation of IjBa, which also results in an impairment of NF-jB activation. However, both NEMO and IjB have a broad immunological impact besides just affecting TLR signaling. Gene expression analyses from these patients are not yet available. IRAK-4 is a kinase which plays a crucial role downstream of individual TLR and IL-1 receptors and upstream of TNF-receptor-associated factor 6, which is upstream of NEMO in the signaling cascade that links TLR and downstream events. IRAK-4 deficiency is an autosomal recessive disorder [85, 86]. Such patients fail to produce TNF-a, IL-6, and IFN-c in response to IL-1b and IL-18 and in addition IL-8, IL-1b, and IL-12 in response to TLR agonists. Despite the broad impairment of inflammation, the clinical phenotype is relatively weak. The most commonly found pathogen in these patients is Streptococcus pneumoniae, followed by S. aureus, which causes cellulitis and furunculosis. In one patient shigellosis was diagnosed which led to a systemic infection but not to colitis, indicating that IRAK-4 protects from systemic dissemination of the microbe at the cost of intestinal inflammation [86]. IRAK-4 deficiency may point to an importance of TLR signaling pathways in humans, but does not prove the essential importance of TLR signaling because again IRAK-4 is crucial for other signaling pathways such as IL-1 and IL-18 signaling [85, 86]. Microbial stimuli (PAMPs) lead to the secretion of IL-12 and IL-23 by macrophages and dendritic cells. These cytokines stimulate T cells, NK, cells and NKT cells to produce IFN-c, which binds to the IFN-cR expressed by macrophages and dendritic cells. In addition, PAMPs also induce IFN-b. Both IFN-b and IFN-c, by binding to IFN-a/bR or IFN-cR, lead to activation of the transcription factor STAT1 and subsequently to the induction of a number of genes encoding effector functions against pathogens. Patients with inherited deficiency of the interleukin (IL)-12/IL-23-interferon/(IFN)-c pathway show increased susceptibility to invasive infections caused by mycobacteria and Salmonella spp. [93]. Besides these important genes, mutations in chemokine receptors such as CXCR4, IjBa, which regulates NF-jB signaling, and Elastase 2, which may be involved in myeloid differentiation leading to defective development of neutrophil granulocytes (neutropenia), result in a high susceptibility to various bacterial and fungal infections [85]. Genetic predisposition due to genetic polymorphisms is an important criterion for the severity of or susceptibility to a large number of disorders including infectious diseases, and for the outcome of therapeutic approaches. Knowledge of genetic polymorphisms may be helpful not only for diagnosis and prognosis, but also to find crucial targets for therapeutic intervention. Single nucleotide polymorphisms (SNPs) in several cytokines or cytokine receptors were shown to be linked to the severity of diseases [94]. Polymorphism in the TNF-a gene may be associated with increased mortality from sepsis in ventilated very-low-birth-weight infants [95]. In addition, genetic polymorphisms may also influence antiviral therapy for chronic hepatitis [96]. In these studies it was also shown that polymorph-

References

isms in the IL-10 gene and in the interferon-induced MxA promoter, which is often used as a specific surrogate marker for interferon action, influence the response of hepatitis C to therapy with ribavirin and IFN-a2b [97]. As reviewed in Ref. [94], genetic variations of FcR-c receptors, TLR-4, complement deficiencies, plasminogen activator inhibitor type 1, properdin, or IL-1b may influence disease severity and the susceptibility of inviduals to meningococcal infections. Increasing knowledge of SNPs will help us to understand much better the biology of particular infections and may help to improve diagnostic tools to identify and treat at an early stage these diseases in high-risk patients.

21.7 Concluding Remarks

Gene expression profiling is, with all the problems that such methods have, a very powerful tool with which to study global gene expression programs triggered by interaction between pathogens and hosts, and may lead to the generation of new hypotheses about the function of genes and single molecules as well as the function of complex signaling cascades during infections. Although such experimental approaches dramatically accelerate the gain of information in microbiology and infectious diseases, the bottleneck is in translating gene expression profiling into hypothesis-driven approaches which result in clearly proven functions of single genes and new ideas for diagnosing, treating, and preventing infections.

Acknowledgements

This work was supported by the Bundesministerium fr Bildung und Forschung, the program of the Nationales Genomforschungsnetz, and the Deutsche Forschungsgemeinschaft. We thank David Goeppel for generation of the figures and Susanne Berchtold for critical reading of the manuscript.

References 1 Huang, Q., Liu D, P. Majewski ,L. C.

Schulte, J. M. Korn, R.A. Young, E. S. Lander, and N. Hacohen. 2001. The plasticity of dendritic cell responses to pathogens and their components. Science 294:870–875. 2 Nau, G. J., J. F. Richmond, A. Schlesinger, E. G. Jennings, E. S. Lander, and R. A. Young. 2002. Human macrophage activation programs induced by bacter-

ial pathogens. Proc. Natl. Acad. Sci. U. S. A. 99:1503–1508. 3 Boldrick, J. C. , A. A. Alizadeh, M. Diehn, S. Dudoit, C. L. Liu, C. E. Belcher, D. Botstein, L. M. Staudt, P. O. Brown, and D. A. Relman. 2002. Stereotyped and specific gene expression programs in human innate immune responses to bacteria. Proc. Natl. Acad. Sci. U. S. A. 99:972–977.

473

474

21 Expression Analysis of Human Genes During Infection 4 Chaussabel, D., R. T. Semnani, M. A.

5

6

7

8

9

10

11

12

13

14

McDowell, D. Sacks, A. Sher, and T. B. Nutman. 2003. Unique gene expression profiles of human macrophages and dendritic cells to phylogenetically distinct parasites. Blood 102:672–681. Prucha, M., A. Ruryk, H. Boriss, E. Moller, R. Zazula, I. Herold, R. A. Claus, K. A. Reinhart, P. Deigner, and S. Russwurm. 2004. Expression profiling: toward an application in sepsis diagnostics. Shock 22:29–33. O’Neil, D. A., E. M. Porter, D. Elewaut, G. M. Anderson, L. Eckmann, T. Ganz, and M. F. Kagnoff. 1999. Expression and regulation of the human betadefensins hBD-1 and hBD-2 in intestinal epithelium. J. Immunol. 163:6718– 6724. Kagnoff, M. F., and L. Eckmann. 1997. Epithelial cells as sensors for microbial infection. J Clin. Invest. 100:6–10. Eckmann, L., M. F. Kagnoff, and J. Fierer. 1995. Intestinal epithelial cells as watchdogs for the natural immune system. Trends Microbiol. 3:118–120. Kagnoff, M. F., and L. Eckmann. 1997. Epithelial cells as sensors for microbial infection. J. Clin. Invest. 100:6–10. Sugiyama, T., and M. Asaka. Helicobacter pylori infection and gastric cancer. 2004. Med. Electron. Microsc. 2004 37:149–157. Blanchard, T.G., M. L. Drakes, and S. J. Czinn. 2004. Helicobacter infection: pathogenesis. Curr. Opin. Gastroenterol. 20:10–15. Hatakeyama, M. 2004. Oncogenic mechanisms of the Helicobacter pylori CagA protein. Nat. Rev. Cancer 4:688– 694. Moese, S., M. Selbach, T. Kwok, V. Brinkmann, W. Konig, T. F. Meyer, and S. Backert. 2004. Helicobacter pylori induces AGS cell motility and elongation via independent signaling pathways. Infect. Immun. 72:3646–3649. Cox, J. M., C. L. Clayton, T. Tomita, D. M. Wallace, P. A. Robinson, and J. E. Crabtree. 2001. cDNA array analysis of cag pathogenicity island-associated Helicobacter pylori epithelial cell response genes. Infect. Immun. 69:6970–6980.

15 Keates, S., Y. S. Hitti, M. Upton, and

C. P. Kelly. 1997. Helicobacter pylori infection activates NF-kappa B in gastric epithelial cells. Gastroenterology. 113:1099–1109. 16 Peek, R. M. Jr., S. F. Moss, K. T. Tham, G. I. Perez-Perez, S. Wang, G. G. Miller, J. C. Atherton, and P. R. Holt, and M. J. Blaser. 1997. Helicobacter pylori cagA+ strains and dissociation of gastric epithelial cell proliferation from apoptosis. J. Natl. Cancer Inst. 89:863–868. 17 Schoenfeld, A. R., E. J. Davidowitz, and R. D. Burk. 2000. Elongin BC complex prevents degradation of von Hippel-Lindau tumor suppressor gene products. Proc. Natl. Acad. Sci. U. S. A. 97:8507– 8512. 18 Wu, E., P. I. Croucher, and N. McKie. 1997. Expression of members of the novel membrane linked metalloproteinase family ADAM in cells derived from a range of haematological malignancies. Biochem. Biophys. Res. Commun. 235:437–442. 19 Millichip, M.I. , D. J. Dallas, E. Wu, S. Dale, and N. McKie. 1998. The metallo-disintegrin ADAM10 (MADM) from bovine kidney has type IV collagenase activity in vitro. Biochem. Biophys. Res. Commun. 245:594–598. 20 Lunn, C. A., X. Fan, B. Dalie, K. Miller, P. J. Zavodny, S. K. Narula, and D. Lundell. 1997. Purification of ADAM 10 from bovine spleen as a TNFalpha convertase. FEBS Lett. 400:333–335. 21 Guillemin, K., N. R. Salama, L. S. Tompkins, and S. Falkow. 2002. Cag pathogenicity island-specific responses of gastric epithelial cells to Helicobacter pylori infection. Proc. Natl. Acad. Sci. U. S. A. 99:15136–15141. 22 Crowe, S. E. 2005. Helicobacter infection, chronic inflammation, and the development of malignancy. Curr. Opin. Gastroenterol. 21:32–38. 23 Boussioutas, A., H. Li, J. Liu, P. Waring, S. Lade, A. J. Holloway, D. Taupin, K. Gorringe, I. Haviv, P. V. Desmond, and D. D. Bowtell. 2003. Distinctive patterns of gene expression in premalignant gastric mucosa and gastric cancer. Cancer Res. 63:2569–2577.

References 24 Galmiche, A., J. Rassow, A. Doye,

S. Cagnol, J. C. Chambard, S. Contamin S, V. de Thillot, I. Just, V. Ricci, E. Solcia, E. van Obberghen, and P. Boquet. 2000. The N-terminal 34 kDa fragment of Helicobacter pylori vacuolating cytotoxin targets mitochondria and induces cytochrome c release. EMBO J. 19:6361– 6370. 25 Yahiro, K., A. Wada, M. Nakayama, T. Kimura, K. Ogushi, T. Niidome, H. Aoyagi, K. Yoshino, K. Yonezawa, J. Moss, and T. Hirayama. 2003. Protein-tyrosine phosphatase alpha, RPTP alpha, is a Helicobacter pylori VacA receptor. J. Biol. Chem. 278:19183–9. 26 Kimura, M., S. Goto, A. Wada, K. Yahiro, T. Niidome, T. Hatakeyama, H. Aoyagi, T. Hirayama, and T. Kondo. 1999. Vacuolating cytotoxin purified from Helicobacter pylori causes mitochondrial damage in human gastric cells. Microb. Pathog. 26:45–52. 27 Norsett, K. G., A. Laegreid, H. Midelfart, F. Yadetie, S. E. Erlandsen, S. Falkmer, J. E. Gronbech, H. L. Waldum, J. Komorowski, and A. K. Sandvik. 2004. Gene expression based classification of gastric carcinoma. Cancer Lett. 210:227–237. 28 Cornelis, G. R. 2002. The Yersinia YscYop ’type III’ weaponry. Nat. Rev. Mol. Cell Biol. 3:742–752. 29 Sauvonnet, N., B. Pradet-Balade, J. A. Garcia-Sanz, and G. R. Cornelis. 2002. Regulation of mRNA expression in macrophages after Yersinia enterocolitica infection. Role of different Yop effectors. J. Biol. Chem. 277:25133– 25142. 30 Hoffmann, R., K. van Erp, K. Trulzsch, and J. Heesemann. 2004. Transcriptional responses of murine macrophages to infection with Yersinia enterocolitica. Cell. Microbiol. 6:377–390. 31 Bohn, E., S. Muller, J. Lauber, R. Geffers, N. Speer, C. Spieth, J. Krejci, B. Manncke, J. Buer, A. Zell, and I.B. Autenrieth. 2004. Gene expression patterns of epithelial cells modulated by pathogenicity factors of Yersinia enterocolitica. Cell. Microbiol. 6:129–141. 32 Pepe, J. C., and V. L. Miller. 1993. Yersinia enterocolitica invasin: a primary role

in the initiation of infection. Proc. Natl. Acad. Sci. U. S. A. 90:6473–6477. 33 Pepe, J. C., and V. L. Miller. 1993. The biological role of invasin during a Yersinia enterocolitica infection. Infect. Agents Dis. 2:236–241. 34 Schulte, R., S. Kerneis, S. Klinke, H. Bartels, S. Preger, J. P. Kraehenbuhl, E. Pringault, and I. B. Autenrieth. 2000. Translocation of Yersinia enterocolitica across reconstituted intestinal epithelial monolayers is triggered by Yersinia invasin binding to beta1 integrins apically expressed on M-like cells. Cell. Microbiol. 2:173–185. 35 Isberg, R. R., and J. M. Leong. 1990. Multiple beta 1 chain integrins are receptors for invasin, a protein that promotes bacterial penetration into mammalian cells. Immunity 60:861–71. 36 Kampik, D., R. Schulte, and I. B. Autenrieth. 2000. Yersinia enterocolitica invasin protein triggers differential production of interleukin-1, interleukin-8, monocyte chemoattractant protein 1, granulocyte-macrophage colony-stimulating factor, and tumor necrosis factor alpha in epithelial cells: implications for understanding the early cytokine network in Yersinia infections. Infect. Immun. 68:2484–2492. 37 Schulte, R., G. A. Grassl, S. Preger, S. Fessele, C. A. Jacobi, M. Schaller, P. J. Nelson, and I. B. Autenrieth. 2000. Yersinia enterocolitica invasin protein triggers IL-8 production in epithelial cells via activation of Rel p65-p65 homodimers. FASEB J. 14:1471–84. 38 Grassl, G. A., M. Kracht, A. Wiedemann, E. Hoffmann, M. Aepfelbacher, C. von Eichel-Streiber, E. Bohn, and I.B. Autenrieth. 2003. Activation of NF-jB and IL-8 by Yersinia enterocolitica invasin protein is conferred by engagement of Rac1 and MAP kinase cascades. Cell. Microbiol. 5:957–971. 39 Orth, K. 2002. Function of the Yersinia effector YopJ. Curr. Opin. Microbiol. 5:38–43. 40 Ruckdeschel, K., O. Mannel, K. Richter, C. A. Jacobi, K. Trulzsch, B. Rouot, and J. Heesemann. 2001. Yersinia outer protein P of Yersinia enterocolitica simultaneously blocks the nuclear factor-kappa

475

476

21 Expression Analysis of Human Genes During Infection

41

42

43

44

45

46

47

48

49

50

B pathway and exploits lipopolysaccharide signaling to trigger apoptosis in macrophages. J. Immunol. 166:1823– 1831. Ayroldi, E., G. Migliorati, S. Bruscoli, C. Marchetti, O. Zollo, L. Cannarile, F. D’Adamio, and C. Riccardi. 2001. Modulation of T-cell activation by the glucocorticoid-induced leucine zipper factor via inhibition of nuclear factor kappaB. Blood 98:743–753. Banerjee, S. S., M. W. Feinberg, M. Watanabe, S. Gray, R. L. Haspel, D. J. Denkinger, R. Kawahara, H. Hauner, and M. K. Jain. 2003. The Kruppel-like factor KLF2 inhibits peroxisome proliferator-activated receptor-gamma expression and adipogenesis. J. Biol. Chem. 278:2581–2584. Buckley, A. F., C. T. Kuo, and J. M. Leiden. 2001. Transcription factor LKLF is sufficient to program T cell quiescence via a c-Myc–dependent pathway. Nat. Immunol. 2:698–704. Engel, M. E., P. K. Datta, and H. L. Moses. 1998. RhoB is stabilized by transforming growth factor beta and antagonizes transcriptional activation. J. Biol. Chem. 273:9921–9926. Fritz, G., and B. Kaina. 2001. Rasrelated GTPase Rhob represses NF-kappaB signaling. J. Biol. Chem. 276:3115–3122. Mittelstadt, P. R., J. D. Ashwell. 2001. Inhibition of AP-1 by the glucocorticoidinducible protein GILZ. J. Biol. Chem. 276:29603–29610. Juris, S. J., F. Shao, and J. E. Dixon. 2002. Yersinia effectors target mammalian signalling pathways. Cell. Microbiol. 4:201–211. Chen, C. C., F. E. Mo, and L. F. Lau. 2001. The angiogenic factor Cyr61 activates a genetic program for wound healing in human skin fibroblasts. J. Biol. Chem. 276:47329–47337. Sadikot, R. T., T. S. Blackwell, J. W. Christman, and A. S. Prince. 2005. Pathogen-host interactions in Pseudomonas aeruginosa pneumonia: the state of the art. Am. J. Respir. Crit. Care Med. 164:873–878. Eberl, L., and B. Tummler. 2004. Pseudomonas aeruginosa and Burkholderia

51

52

53

54

55

56

57

58

cepacia in cystic fibrosis: genome evolution, interactions and adaptation. Int. J. Med. Microbiol. 294:123–131. Ichikawa, J. K., A. Norris, M. G. Bangera, G. K. Geiss, A. B. ’t Wout, R. E. Bumgarner, and S. Lory. 2000. Interaction of Pseudomonas aeruginosa with epithelial cells: identification of differentially regulated genes by expression microarray analysis of human cDNAs. Proc. Natl. Acad. Sci. U. S. A. 97:9659– 9664. Cobb, L. M., J. C. Mychaleckyj, D. J. Wozniak, and Y. S. Lopez-Boado. 2004. Pseudomonas aeruginosa flagellin and alginate elicit very distinct gene expression patterns in airway epithelial cells: implications for cystic fibrosis disease. J. Immunol. 173:5659–5670. Flo, T. H., L. Ryan, E. Latz, O. Takeuchi, B. G. Monks, E. Lien, O. Halaas, S. Akira, G. Skjak-Braek, D. T. Golenbock, and T. Espevik. 2002. Involvement of toll-like receptor (TLR) 2 and TLR4 in cell activation by mannuronic acid polymers. J. Biol. Chem. 277:35489–35495. Muir, A., G. Soong, S. Sokol, B. Reddy, M. I. Gomez, A. van Heeckeren, and A. Prince. 2004. Toll-like receptors in normal and cystic fibrosis airway epithelial cells. Am. J. Respir. Cell Mol. Biol. 30:777–783. McMorran, B., L. Town, E. Costelloe, J. Palmer, J. Engel, D. Hume, and B. Wainwright. 2003. Effector ExoU from the type III secretion system is an important modulator of gene expression in lung epithelial cells in response to Pseudomonas aeruginosa infection. Infect. Immun. 71:6035–6044. Kelly-Wintenberg, K., and T. C. Montie. 1989. Cloning and expression of Pseudomonas aeruginosa flagellin in Escherichia coli. J. Bacteriol. 171:6357–6362. Finck-Barbancon, V., and D. W. Frank. 2001. Multiple domains are required for the toxic activity of Pseudomonas aeruginosa ExoU. J. Bacteriol. 183:4330– 4344. Finck-Barbancon, V., J. Goranson, L. Zhu, T. Sawa, J. P. Wiener-Kronish, S. M Fleiszig, C. Wu, L. Mende-Mueller L, and D. W. Frank. 1997. ExoU expression by Pseudomonas aeruginosa corre-

References

59

60

61

62

63

64

65

66

67

lates with acute cytotoxicity and epithelial injury. Mol. Microbiol. 25:547–57. Hauser, A. R., P. J. Kang, and J. N. Engel. 1998. PepA, a secreted protein of Pseudomonas aeruginosa, is necessary for cytotoxicity and virulence. Mol. Microbiol. 27:807–818. Garrity-Ryan, L., B. Kazmierczak, R. Kowal, J. Comolli, A. Hauser, J. N. Engel. 2000. The arginine finger domain of ExoT contributes to actin cytoskeleton disruption and inhibition of internalization of Pseudomonas aeruginosa by epithelial cells and macrophages. Infect. Immun. 68:7100–7113. Krall, R., G. Schmidt, K. Aktories, and J. T. Barbieri. 2000. Pseudomonas aeruginosa ExoT is a Rho GTPase-activating protein. Infect. Immun. 68:6066–6068. Olenyuk, B. Z., G. J. Zhang , J. M. Klco, N. G. Nickols, W. G. Kaelin Jr., and P. B. Dervan. 2004. Inhibition of vascular endothelial growth factor with a sequence-specific hypoxia response element antagonist. Proc. Natl. Acad. Sci. U. S. A. 101:16768–16773. Bacon, A. L., and A. L. Harris. 2004. Hypoxia-inducible factors and hypoxic cell death in tumour physiology. Ann. Med. 36:530–539. Leonard, M. O., D. C. Cottell, C. Godson, H. R. Brady, C. T. Taylor. 2003. The role of HIF-1 alpha in transcriptional regulation of the proximal tubular epithelial cell response to hypoxia. J. Biol. Chem. 278:40296–40304. Ang, S. O., H. Chen, K. Hirota, V. R. Gordeuk, J. Jelinek, Y. Guan, E. Liu, A. J. Sergueeva, G. Y. Miasnikova, D. Mole, P. H. Maxwell, D. W. Stockton, G. L. Semenza, and J. T. Prchal. 2002. Disruption of oxygen homeostasis underlies congenital Chuvash polycythemia. Nat. Genet. 32:614–621. Lai, W. S., E. Carballo, J. R. Strum, E. A. Kennington, R. S. Phillips, P. J. Blackshear. 1999. Evidence that tristetraprolin binds to AU-rich elements and promotes the deadenylation and destabilization of tumor necrosis factor alpha mRNA. Mol. Cell Biol. 19:4311–4323. Keyse, S. M. 1995. An emerging family of dual specificity MAP kinase phospha-

tases. Biochim. Biophys. Acta. 1265:152–160. 68 Williams, T. M., D. Moolten, J. Burlein, J. Romano, R. Bhaerman, A. Godillot, M. Mellon, F. J. Rauscher III, and J. A. Kant. 1991. Identification of a zinc finger protein that inhibits IL-2 gene expression. Science 254:1791–1794. 69 Yasui, D. H., T. Genetta, T. Kadesch, T. M Williams, S. L. Swain, L. V. Tsui, and B. T. Huber. 1998. Transcriptional repression of the IL-2 gene in Th cells by ZEB. J. Immunol. 160:4433–4440. 70 DiMango, E., A. J. Ratner, R. Bryan, S. Tabibi, and A. Prince. 1998. Activation of NF-kappaB by adherent Pseudomonas aeruginosa in normal and cystic fibrosis respiratory epithelial cells. J. Clin. Invest 101:2598–2605. 71 Virella-Lowell, I., J. D. Herlihy, B. Liu, C. Lopez, P. Cruz, C. Muller, H. V. Baker, and T. R. Flotte. 2004. Effects of CFTR, interleukin-10, and Pseudomonas aeruginosa on gene expression profiles in a CF bronchial epithelial cell Line. Mol. Ther. 10:562–573. 72 Chomel, B. B., R. W. Kasten, J. E. Sykes, H. J. Boulouis, and E. B. Breitschwerdt. 2003. Clinical impact of persistent Bartonella bacteremia in humans and animals. Ann. N. Y. Acad. Sci. 990:267– 278. 73 Andersson, S. G. , and V. A. Kempf. 2004. Host cell modulation by human, animal and plant pathogens. Int. J. Med. Microbiol. 293:463–470. 74 Kempf, V. A. , M. Lebiedziejewski, K. Alitalo, J. H. Wlzlein, U. Ehehalt, J. Fiebig, B. Schtt, C. Sander, S. Mller, G. Grassl, B. Brem, and I. B. Autenrieth. 2005. Activation of hypoxia-inducible factor-1 in bacillary angiomatosis: evidence for a role of HIF-1 in bacterial infections. Circulation. 111:1054–1062. 75 Semenza, G. L. 2001. HIF-1, O(2), and the 3 PHDs: how animal cells signal hypoxia to the nucleus. Immunity 107:1–3. 76 Pugh, C. W., and P. J. Ratcliffe. 2003. Regulation of angiogenesis by hypoxia: role of the HIF system. Nat. Med. 9:677–684. 77 Kempf, V. A., M. Schaller, S. Behrendt, B. Volkmann, M. Aepfelbacher, I. Cak-

477

478

21 Expression Analysis of Human Genes During Infection man, and I. B. Autenrieth. 2000. Interaction of Bartonella henselae with endothelial cells results in rapid bacterial rRNA synthesis and replication. Cell. Microbiol. 2:431–441. 78 Kempf, V. A., N. Hitziger, T. Riess, and I.B. Autenrieth. 2002. A two-step pathogenicity strategy of bacteria promotes tumour formation: a common strategy shared by plant and human pathogens? Trends Microbiol. 10:269–275. 79 Moreilhon, C., D. Gras, C. Hologne, O. Bajolet, F. Cottrez, V. Magnone, M. Merten, H. Groux, E. Puchelle, and P. Barbry. 2005. Live Staphylococcus aureus and bacterial soluble factors induce different transcriptional responses in human airway cells. Physiol. Genomics 20:244–255. 80 Pahl, H. L. 1999. Activators and target genes of Rel/NF-kappaB transcription factors. Oncogene 18:6853–6866. 81 Hayden, M. S., and S. Ghosh. 2004. Signaling to NF-kappaB. Genes Dev. 18:2195–2224. 82 Takeda, K., and S. Akira. 2005. Toll-like receptors in innate immunity. Int. Immunol. 17:1–14. 83 Schmid, Y., G. A. Grassl, O. T. Buhler, M. Skurnik, and I. B. Autenrieth, and E. Bohn. 2004. Yersinia enterocolitica adhesin A induces production of interleukin-8 in epithelial cells. Infect. Immun. 72:6780–6789. 84 Murillo, L. S., S. A. Morre, and A. S. Pena. 2003. Toll-like receptors and NOD/CARD proteins: pattern recognition receptors are key elements in the regulation of immune response. Drugs Today (Barc. ) 39:415–438. 85 Lawrence, T., A Puel, J. Reichenbach, C. L. Ku, A. Chapgier, E. Renner, V. Minard-Colin, M. Ouachee, and J. L. Casanova. 2005. Autosomal-dominant primary immunodeficiencies. Curr. Opin. Hematol. 12:22–30. 86 Ku, C. L., K. Yang, J. Bustamante, A. Puel, H. von Bernuth, O. F. Santos, T. Lawrence, H. H. Chang, H. Al Mousa, C. Picard, and J. L. Casanova. 2005. Inherited disorders of human Toll-like receptor signaling: immunological implications. Immunol. Rev. 203:10–20.

87 Niehues, T., J. Reichenbach, J. Neubert,

88

89

90

91

92

S. Gudowius, A. Puel, G. Horneff, E. Lainka, U. Dirksen, H. Schroten, R. Doffinger, and J. L. Casanova, and V. Wahn. 2004. Nuclear factor kappaB essential modulator-deficient child with immunodeficiency yet without anhidrotic ectodermal dysplasia. J. Allergy Clin. Immunol. 114:1456–62. Doffinger, R., A. Smahi, C. Bessia, F. Geissmann, J. Feinberg, A. Durandy, C. Bodemer, S. Kenwrick, S. DupuisGirod, S. Blanche, P. Wood, S. H. Rabia, D. J. Headon, P. A. Overbeek, F. Le Deist, S. M. Holland, K. Belani, D. S. Kumararatne, A. Fischer, R. Shapiro, M. E. Conley, E. Reimund, H. Kalhoff, M. Abinun, A. Munnich, A. Israel, G. Courtois, and J. L. Casanova. 2001. X-linked anhidrotic ectodermal dysplasia with immunodeficiency is caused by impaired NF-kappaB signaling. Nat. Genet. 27:277–285. Zonana, J., M. E. Elder, L. C. Schneider, S. J. Orlow, C. Moss, M. Golabi, S. K. Shapira, P. A. Farndon, D. W. Wara, S. A. Emmal, and B. M. Ferguson. 2000. A novel X-linked disorder of immune deficiency and hypohidrotic ectodermal dysplasia is allelic to incontinentia pigmenti and due to mutations in IKKgamma (NEMO). Am. J. Hum. Genet. 67:1555–1562. Orange, J. S, A. Jain, Z. K. Ballas, L. C. Schneider, R. S. Geha, and F. A. Bonilla. 2004. The presentation and natural history of immunodeficiency caused by nuclear factor kappaB essential modulator mutation. J. Allergy Clin. Immunol. 113:725–33. Orange, J. S., S. R. Brodeur, A. Jain, F. A. Bonilla, L. C. Schneider, R. Kretschmer, S. Nurko, W. L. Rasmussen, J. R. Kohler, S. E. Gellis, B. M. Ferguson, J. L. Strominger, J. Zonana, N. Ramesh, Z. K. Ballas, and G. S. Geha. 2002. Deficient natural killer cell cytotoxicity in patients with IKKgamma/NEMO mutations. J. Clin. Invest. 109:1501–1509. Courtois, G., A. Smahi, J. Reichenbach, R. Doffinger, C. Cancrini, M. Bonnet, A. Puel, C. Chable-Bessia, S. Yamaoka, J. Feinberg, S. Dupuis-Girod, C. Bode-

References mer, S. Livadiotti, F. Novelli, P. Rossi, A. Fischer, A. Israel, A. Munnich, F. Le Deist, and J. L. Casanova. 2003. A hypermorphic IkappaBalpha mutation is associated with autosomal dominant anhidrotic ectodermal dysplasia and T cell immunodeficiency. J. Clin. Invest 112:1108–1115. 93 MacLennan, C., C. Fieschi, D. A. Lammas, C. Picard, S. E. Dorman, O. Sanal, J. M. MacLennan, S. M. Holland, T. H. Ottenhoff, J. L. Casanova, and D. S. Kumararatne. 2004. Interleukin (IL)-12 and IL-23 are key cytokines for immunity against Salmonella in humans. J. Infect. Dis. 190:1755–1777. 94 Emonts, M., J. A. Hazelzet, R. de Groot, and P. W. Hermans. 2003. Host genetic determinants of Neisseria meningitidis infections. Lancet Infect. Dis. 3:565– 577.

95 Hedberg, C. L., K. Adcock, J. Martin,

J. Loggins, T. E. Kruger, and R. J. Baier. 2004. Tumor necrosis factor alpha – 308 polymorphism associated with increased sepsis mortality in ventilated very low birth weight infants. Pediatr. Infect. Dis. J. 23:424–428. 96 Jiao, J., and J. B. Wang. 2005. Hepatitis C virus genotypes, HLA-DRB alleles and their response to interferon-alpha and ribavirin in patients with chronic hepatitis C. Hepatobiliary Pancreat. Dis. Int. 4:80–83. 97 Vidigal, P. G., J. J. Germer, and N. N. Zein. 2002. Polymorphisms in the interleukin-10, tumor necrosis factor-alpha, and transforming growth factor-beta1 genes in chronic hepatitis C patients treated with interferon and ribavirin. J. Hepatol. 36:271–277.

479

481

22 Pathogenomics: Application and New Diagnostic Tools Sren Schubert and Jrgen Heesemann

22.1 Introduction: “In Our Hands”

During the last 20 years the growing field of genome research has provided pivotal insights into the molecular mechanisms of bacterial infections and the evolution of pathogenicity of bacterial microorganisms. Specific virulence traits have been characterized in different bacterial species at the genome level. A majority of these genes have been shown to be encoded by genetic elements which are expected to be horizontally transferred, e.g., bacteriophages, plasmids, and genomic islands [1, 2] leading to a mosaic genome structure in bacterial pathogens. By defining the blueprint of pathogenicity at the genomic scale, the former differentiation of chromosomal (genome) and episomal factors (plasmid) has been expanded in terms of a “core genome” and a “flexible gene pool.” The first encompasses the genomic backbone that determines the metabolic functions of the bacteria. The latter encodes the pathogenic and fitness functions that enable the bacteria to exert their virulence properties in contact with host cells, finally leading to cell damage and disease. Before the genomic era only a few virulence genes were known in pathogenic species, leading to a merely fragmentary picture of the interaction of bacteria with the host. The determination of whole-genome sequences of pathogenic bacteria and the comparison of genome sequences from pathogenic and closely related nonpathogenic species revealed a much more detailed picture of bacterial virulence or even invalidated previous concepts. On the other hand, however, the enormous amount of data collected in genome research often clouds the issue. Even in bacterial species such as E. coli, which has been investigated for several decades, pathogenomic studies provide novel insights [3, 4], revealing a rather complex pattern of virulence factors with a genomic patchwork of chromosomal virulence determinants in the instance of extraintestinal pathogenic E. coli (ExPEC) [5, 6]. The same is true for the genomic structure of antibiotic resistance determinants, which basically undergo a comparable distribution among bacteria using a very similar mechanism of horizontal spreading [7, 8]. The emergence of antibiotic resistance and multidrug resistance in bacterial pathogens underscores the

482

22 Pathogenomics: Application and New Diagnostic Tools

need for the development of novel classes of antibiotics. The availability of complete genome sequence data from many important human pathogens has already provided a wealth of fundamental information from which to identify potential molecular targets for drug discovery. Determining the presence or absence of certain pathogenicity islands or genomic islands harboring multiple antibiotic resistance genes in bacterial isolates may further aid in identifying the cause of a disease, estimating its pathogenic potential, and predicting its antibiotic resistance. Thus, pathogenomic research has contributed to microbial diagnostics, pathotyping of bacteria, and the detection of novel target structures for the therapy and prevention of microbial infections.

22.2 Microbiological Diagnostics of Bacterial Pathogens: Aims, Tasks, and Current Limitations

The microbiological diagnostics of bacteria has historically focused on the detection and identification of bacteria by providing a differentiation at the species level. In the case of medical microbiology a further emphasis is on determination of the antibiotic resistance pattern revealed by the isolate under investigation. Three important fields of microbiological bacterial diagnostics are: (1) medical and food microbiology, (2) environmental microbiology, and (3) the detection of biological warfare agents. These fields differ considerably with regard to the diversity of bacteria to be detected and the time scale for diagnosis [9]. Methods for the detection and identification of microorganisms applied in medical microbiology and food technology are directed towards the reliable detection and/or identification at the species/subspecies/strain level of one or a few microbes out of many that may be present in the diagnostic sample [10, 11]. The further major aspect of clinical microbiology is characterization of the antibiotic resistance pattern revealed by an isolated bacterium. Microbial diagnostic methods in environmental and industrial microbiology, on the other hand, are applied to obtain a picture of the structure of the entire microbial community under analysis [12, 13]. Requirements for this class of diagnostic methods are the parallel detection of many microbes at the level of the species, genus, or even higher taxon, and the potential for some level of quantification. The detection of bioweapons is characterized by the need for fast and unambiguous detection of a circumscriptive panel of agents and rapid genetic correlation of their virulence traits [14, 15]. A detailed analysis of the molecular fine structure of the agent in question may give further clues to its source and origin. In medical microbiology the workflow and interpretation of diagnostic results differ depending on the source of the sample. In samples drawn from primarily sterile body sites, e.g., blood or cerebrospinal fluid, the presence of any bacteria is meaningful so long as contamination during harvesting of the sample has been excluded. Fast and reliable differentiation of the bacteria together with a rapid diagnostic process is crucial in these often life-threatening infections. A different

22.3 The Pregenomic Era: Conventional and Molecular Methods in Microbiological Diagnostics

diagnostic scenario is faced when the samples, e.g., sputum or stool samples, are obtained from primarily nonsterile body sites of the skin or mucosa. Here, the diagnostic procedure is directed towards identification of a bacterial pathogen among the varied multitude of nonpathogenic bacteria comprising the normal flora. This task is hampered by the facts that (a) “nonpathogenic” bacteria of the normal flora may cause disease in certain circumstances, e.g., in immunocompromised patients, (b) bacterial pathogens in diagnostic samples are often a minor subset compared to the bacteria of the concomitant normal flora, and (c) facultative pathogenic bacteria, e.g., E. coli, may cause disease depending on their genomic virulence gene repertoire. These virulence traits, however, often remain undetected in a laboratory routine that uses primarily biochemical and serological tests for the identification and characterization of the cultured bacteria. A further drawback of the traditional microbiological diagnostic procedure relates to the detection of fastidious bacteria and of those microorganisms which despite all efforts cannot be cultured. The isolation of others requires laborious procedures that cannot be routinely performed in small laboratories, and, more importantly, the recovery of a large number of organisms commonly involved in human and veterinary infections cannot be achieved in a short time. Consequently, information is not provided rapidly enough to affect initial treatment decisions. In some instances, culture and subsequent identification methods are limited in sensitivity, specificity, or both. During the past 10 years many efforts have focused on the development of new microbiological methods to improve the sensitivity of conventional culture, shorten detection times, and detect nonculturable, fastidious or slow-growing microorganisms.

22.3 The Pregenomic Era: Conventional and Molecular Methods in Microbiological Diagnostics 22.3.1 Conventional Culture-Based Methods in Microbiological Diagnostics

The traditional microbiological diagnostic assays include microscopy and microbial culture techniques as well as the detection of antigens or toxins. The culturebased procedures are complemented by serological methods using immunoassays for the detection of pathogen-specific antibodies in blood or cerebrospinal fluid. All these methods can be limited by poor sensitivity, nongrowing or slow-growing or poorly viable organisms, narrow detection windows, complex interpretation, immunosuppression of the patient, antimicrobial therapy, high levels of backgrounds, and nonspecific cross-reactivity. Nonetheless, microbial cultures produce valuable epidemiological data, revealing new, uncharacterized, or atypical microbes and yielding intact or infectious organisms for further study. For this reason, the role of traditional assays continues to be an important one [16, 17]. This is particularly true of antimicrobial susceptibility testing of bacterial patho-

483

484

22 Pathogenomics: Application and New Diagnostic Tools

gens, which is one of the primary functions of a diagnostic microbiology laboratory. Individual results have important therapeutic implications for the patient. The detection of resistant bacteria further provides a fundamental basis for infection control measures and antimicrobial surveillance systems. Antibiotic resistance is routinely determined in diagnostic laboratories using culture-based methods, e.g., disc diffusion assay or broth dilution. These traditional methods, however, are largely confined to rapidly growing bacteria and depend on the expression of antibiotic resistance genes under laboratory culture conditions. 22.3.2 Molecular Microbiological Diagnostic Methods

Medical microbiologists have increasingly used molecular methods over the past 20 years to improve the sensitivity and speed of diagnosis of infectious diseases. Various methods and techniques have since evolved in the molecular diagnostics of microorganisms, the two most important of which are the polymerase chain reaction and the more recent microarray technology. Although the routine use of these techniques is as yet often limited to the detection of pathogens that are difficult to culture in vitro, “real-time” methods, commercial kits, quantification, and automation will increase their potential applications. Molecular methods are now widely used for epidemiological fingerprinting of isolates of public health importance.

22.3.2.1 Typing of Bacterial Isolates Using 16S-rRNA A resounding progress in the characterization of various bacteria in diagnostic samples has been the determination and characterization of the 16S-/18S-rRNA sequences of the respective isolates. This technique allows the identification of different prokaryotes at the species or genus level [18, 19]. The 16S-rRNA gene has been sequenced from over 70 000 bacterial and archeal strains, while its eukaryotic equivalent, the 18S-rRNA gene, is known from about 20 000 eukaryotic microbes. Commercially available differentiation kits for bacterial pathogens are based on the specific detection of rRNA. These methods rely on DNA hybridization using rRNA or rDNA as a template for specific PCR amplifications or other nucleic acid amplification techniques [18, 20–24].

22.3.2.2 Fluorescence In Situ Hybridization Within the last 10 years, fluorescence in situ hybridization has become established in microbiological diagnostics, using labeled DNA probes directed against the bacterial 16S-rRNA [25, 26]. The ongoing focus of microbiological diagnostics on molecular genetics and pathogenomics has helped to introduce FISH technology to both clinical laboratories and research facilities. FISH provides information about the presence, number, morphology, spatial distribution, and species of microorganisms. To date, FISH has yielded important insights into the structure

22.3 The Pregenomic Era: Conventional and Molecular Methods in Microbiological Diagnostics

of complex microbiological communities in humans, animals, and the environment. Furthermore, rRNA-based antibiotic resistance mechanisms are detectable by means of specific FISH probes [27]. FISH has been successfully used to identify pathogens in smears or tissue preparations from humans and animals; however, the number of target organisms has to be above the detection level, for microscopic techniques [28, 29]. FISH using peptide nucleic acid probes (PNAFISH) is a novel diagnostic technique combining FISH with the unique performance of PNA probes to provide rapid and accurate diagnosis of infectious diseases. Both FISH and PNA-FISH are well suited to routine application and enable clinical microbiology laboratories to report information important for patient treatment within a time frame that is unreachable using classic biochemical methods [30]. However, the FISH technique still remains quite laborious, it is restricted to the sensitivity of light microscopy, and is only to a limited extent suitable for laboratory automation. A further drawback of the FISH method is the limited number of probes which can be applied in one hybridization experiment. This limitation becomes a distinct bottleneck when this technique is used in clinical diagnostics. In addition, there is a need for multiple probes to check for false-positive and false-negative results caused by individual probes used for identification of the selected target organism. Thus, FISH is yet not appropriate for high-throughput applications or the simultaneous detection of a multitude of bacteria.

22.3.2.3 PCR Methods for Microbial Diagnostics In vitro amplification of a pathogen-specific nucleic acid sequence by polymerase chain reaction (PCR) allows rapid diagnosis with a high degree of sensitivity and specificity. Over the past 10–15 years, there has been increasing use of nucleic acid amplification tests in the routine diagnosis of infectious disease [31]. PCR technology is applicable to virtually any bacterial pathogen and is commonly used along with pulsed-field gel electrophoresis (PFGE) in molecular epidemiology of bacterial pathogens [32]. The ability to include virtually all known sequence targets, the enormous sensitivity, and the application of the technique in various diagnostic settings has contributed to the widespread acceptance of PCR as the standard method for detecting nucleic acids from a number of sample and microbial types [33]. Conventional PCR is an important tool for pathogen detection, but it has not been possible to accurately identify PCR products without sequencing, digestion by restriction enzymes, or Southern blotting. The more recent development of real-time PCR applications has gained wider acceptance because it is more rapid, sensitive, and reproducible, while the risk of carryover contamination is minimized [34, 35]. An increasing number of chemistries are used to detect PCR products as they accumulate within a closed reaction vessel during real-time PCR, including the nonspecific DNA-binding fluorophores and the specific fluorophore-labeled oligonucleotide probes [36]. In addition to molecular typing methods, growing knowledge of different pathogenic traits and genetically distinct pathotypes offers the opportunity to unravel

485

486

22 Pathogenomics: Application and New Diagnostic Tools

the pathogenic fine structure of bacterial isolates obtained from diagnostic samples. Virulence genes or gene clusters have been characterized in numerous pathogenic bacteria, leading to a complex picture of virulence traits in several bacterial species [37]. Besides the identification and classification of bacterial pathogens, the detection of specific antibiotic resistance in clinical isolates has always been the second major task of microbiological diagnostics. In recent years the molecular mechanisms of antibiotic resistance have been thoroughly characterized, providing the basis for molecular tool to detect a multitude of antibiotic resistance mechanisms on the level of chromosomal or episomal DNA [34, 38, 39]. The introduction of real-time PCR methods offers a cost-effective, user-friendly format for genetic methods that fuels their use for the detection and characterization of antimicrobial resistance determinants in routine diagnostic microbiology. The implementation of these assays to detect resistance in clinically important slow-growing organisms (e.g., Mycobacterium tuberculosis), to rapidly identify clinically important resistance mechanisms, and to overcome laborious and time-consuming culture techniques in the control and surveillance of methicillin-resistant Staphylococcus aureus (MRSA) and glycopeptide-resistant enterococci (GRE) carriage is of particular interest. For reference laboratories it is important to have a broad repertoire of genetic assays to confirm defined resistance determinants, to sort out ambiguous phenotypic results, and to provide a reliable scientific basis for molecular surveillance of antimicrobial-resistant bacteria and resistance determinants in a global network. Thus, molecular testing methods based on DNA/RNA amplification have the potential to replace many conventional microbiology laboratory assays. Although the molecular techniques in microbial diagnostics may be unlikely to replace culture for normally sterile specimens, e.g., blood or cerebrospinal fluid, they are particularly useful for identifying specific pathogens against the background of a complex mixed microflora. Quality criteria include appropriate specimens for analysis, performance characteristics of different analytical methods, optimal specimen processing, the effects of PCR inhibitors, and false-positive results caused by contaminating nucleic acids. Recent refinements in PCR technology have resulted in more user-friendly testing platforms. These platforms are automated and have lowered risks for contamination, decreased costs, and are faster than former platforms. Additionally, the extreme sensitivity of these techniques coupled with the potential presence of small numbers of pathogenic organisms in asymptomatic individuals should be considered carefully. However, as DNA-based molecular detection techniques do not generally have the ability to determine whether an organism is dead or alive, these techniques have limited benefit in monitoring therapy, e.g., in infections due to M. tuberculosis. Quality control guidelines for molecular microbiological diagnostic assays are in their infancy and require further development. Finally, a major drawback of today’s PCR-based methods in microbiological diagnostics is that parallel testing in a single step is usually restricted to a limited number of genes, making these methods inappropriate for simultaneous scanning of large quantity of diagnostic items.

22.4 The Postgenomic Era: Use of DNA Microarrays in the Diagnosis …

22.4 The Postgenomic Era: Use of DNA Microarrays in the Diagnosis of Infectious Diseases in Humans and Animals 22.4.1 DNA Arrays: Platforms, Techniques and Targets

DNA array technology is based on the well-established and long-exploited principle of nucleic acid hybridization. However, now for the first time it offers the possibility of simultaneously conducting tens or hundreds of thousands of simultaneous hybridizations. Structurally different platforms are currently used for DNA arrays: Macroarrays consisting of dot blots on nitrocellulose or nylon membranes have the disadvantage of moderate throughput and uncontrolled binding of oligonucleotides [40]. A modification to this technique, reverse line blot hybridization [41], enables reasonable throughput if the number of probes is limited to about 10–20. With the widespread availability of microarray core technologies, planar glass microarrays have become the most widely used type of array, owing to their general utility and moderate cost. DNA microarrays consist of microscopic checkerboards of hundreds to thousands of different DNA sequences produced by several methods, of which the two most commonly used techniques are spotted glass slide microarray and high-density oligonucleotide array technology. In the spotted microarray, presynthesized single-stranded or double-stranded DNA is bound or “printed” onto glass slides. The DNA can be generated from cloned, synthesized, or PCR-amplified material. Because of the technical simplicity of this approach, spotted microarrays can be produced in-house as well as purchased from commercial providers. High-density oligonucleotide arrays are constructed by synthesizing short (< 25-mer) oligonucleotides in situ on glass wafers using a photolithographic manufacturing process and are thus only available commercially. Regarding the structure of microarrays, those with a selected panel of targets can be distinguished from whole-genome DNA microarrays. The former may consist of particular virulence genes or virulence-associated DNA fragments and are applied in virulence typing of microbial strains, whereas the whole-genome DNA microarrays are more designed to perform comprehensive analyses of single isolates, e.g., microbial transcriptome studies. This whole-genome approach has been used to construct a 4290 open reading frame (ORF) E. coli microarray [42, 43] and a 3834-ORF M. tuberculosis microarray [44] as well as an array containing 1660 unique sequences for Helicobacter pylori [45]. Each of the ORFs can be obtained by PCR amplification that uses ORF-specific oligonucleotides, but photolithographic synthesis of oligonucleotides in situ has also been successfully employed for the production of complete ORF chips.

487

488

22 Pathogenomics: Application and New Diagnostic Tools

22.4.2 Detection and Typing of Microbial Pathogens

The microarray technology offers the most rapid and practical tool to detect the presence or absence of a large set of virulence genes simultaneously within a given bacterial strain. Although microarray technology theoretically permits a genome-scale analysis for the presence of thousands of genes, only a few studies have reported using microarrays as a diagnostic tool (Table 22.1). The majority of published studies are of technical nature; only a few proof-of-concept application experiments have been published [46–49]. Likewise, microbial detection via DNA microarrays has not been the subject of much commercial attention in recent Table 22.1 Detection and characterization of microbial

pathogens using array technology. Target bacteria

Pathogenic potential

Marker gene

Reference

Listeria spp.

Facultative pathogen

plcA, plcA, inlBm, iap, hly, clpE [56]

Campylobacter spp.

Obligate pathogen

cdtABC, fur, glyA

[55]

E. coli, Salmonella, Shigella spp.

Obligate pathogen

gyrB

[83]

Mycobacterium spp.

Obligate pathogen

gyrB

[84]

Streptococcus pyogenes

Facultative pathogen

Various target sequences

[85]

Streptococcus pneumoniae

Facultative pathogen

gyrB, parE

[86]

E. coli K1 (meningitis)

Facultative pathogen

Various virulence genes

[87, 88]

Human intestinal flora

Nonpathogenic

16S rRNA

[89]

Biowarfare agents

Obligate pathogen

Various target sequences

[51]

Influenza viruses

Obligate pathogen

Various target sequences

[90]

Orthopoxviruses

Obligate pathogen

crmB

[91]

Rotaviruses

Obligate pathogen

VP7

[92]

Herpesviruses

Obligate pathogen

Various target sequences

[93]

Hepatitis (HBV, HCV, HDV) Obligate pathogen

Various target sequences

[94, 95]

hsp70

[96]

Various target sequences

[97]

Bacteria

Viruses

Parasites Cryptosporidium sp.

Facultative pathogen

Entamoeba spp., Giardia spp. Facultative pathogen

22.4 The Postgenomic Era: Use of DNA Microarrays in the Diagnosis …

years. One exception is an industrial collaboration between Affymetrix and bioMerieux to develop arrays to identify pathogens and drug resistance genes in human samples. The field has recently received further public attention when a viral-typing microarray was set up to confirm the identity of the severe acute respiratory syndrome (SARS) virus [50]. Although relatively simple in concept, DNA microarrays are powerful tools for pathogen detection and characterization [51]. Direct detection of nucleic acids from bacteria is feasible, but may lack the level of sensitivity needed for routine screening of diagnostic samples. When the amount of nucleic acid is not limiting, however, microarrays may prove very valuable as a fingerprinting tool and as a tool for marker discovery. Coupled to PCR, microarrays have detection sensitivity equal to conventional methods with the added flexibility needed for discriminating multiple PCR reactions and for pathogen detection based on 16S-rDNA sequence. 22.4.3 Pathoarrays

In accordance with the concept that bacterial microorganisms can be divided with respect to pathogenicity at least into three groups (nonpathogenic, facultative pathogenic, and obligate pathogenic) there is not always a close correlation between a given species and its pathogenicity potential. Thus, identification of the bacterial species using surrogate biochemical and immunological markers may not be sufficient to characterize the virulence properties. Virulence traits, which are frequently encoded by plasmids, phage, integrons, and other horizontally transferred vectors, are conferred independently of the surrogate chromosomal markers and extensively determine the pathogenicity of bacteria [47]. Microarray analysis of microbial virulence factors appears to be very useful for automated identification and characterization of bacterial pathogens [5, 52] (Fig. 22.1). For instance, an E. coli pathoarray carrying different target genes can be used to differentiate E. coli strains causing urinary tract infections or septicemia (Table 22.2). Multiple specific genes can be used to identify each organism, thus turning microbial identification into a pattern recognition process, a process that is amenable to automated, computer-based analysis. Previously developed techniques, such as multiplex PCR, can be rapidly adapted to take advantage of the specificity and speed of pathoarray analysis.

489

490

22 Pathogenomics: Application and New Diagnostic Tools

Fig. 22.1 Section of an E. coli pathoarray. (a) Composition of target genes spotted on the glass slide. White boxes refer to virulence genes of extraintestinal pathogenic E. coli (ExPEC), light gray boxes represent genes of intestinal pathogenic E. coli (controls); further

internal controls are given in dark gray boxes. (b) The results of a hybridization experiment with genomic DNA of a uropathogenic E. coli isolate; light dots represent strong hybridization signals.

Table 22.2 Target genes represented on an E. coli “pathoarray”.

Target / source

Gene

Accession no.

Gene product

CdtB-III

U89305

Cnf1

AF483829

Cnf2

U01097

EHEC-HlyA

X79839

UPEC-HlyA East1-2

AF411067

Cytolethal distending toxin-IIIB. Induces cell growth arrest and apoptosis Cytotoxic necrotizing factor 1. Induces reorganization of actin cytoskeleton Cytotoxic necrotizing factor 2. Induces reorganization of actin cytoskeleton Hemolysin toxin protein of enterohemorrhagic E. coli Hemolysin toxin protein of uropathogenic E. coli Heat-stable enterotoxin 1

Hbp

AJ223631

Hemoglobin protease

Toxins

22.4 The Postgenomic Era: Use of DNA Microarrays in the Diagnosis … Table 22.2 Continued.

Target / source

Gene

Accession no.

Gene product

Pet

AF056581

Pic

AE015310

PicU Sat

NC_004431 NC_004431

SigA

NC_004741

Serine protease autotransporter. Induces cytopathic effects on epithelial cells Serine protease autotransporter. Induces cytopathic effects on epithelial cells Pic homologue in uropathogenic E. coli Secreted autotransporter toxin. Induces cytopathic effects on epithelial cells Serine protease autotransporter. Induces cytopathic effects on epithelial cells

Iron uptake systems EntF FepA FyuA IroB

AY205565 AY205565 AY205565 AY205565

IroD IroE IroN

AF320691 AE013844 NC_004431

IreA

NC_004431

Irp1 Irp2 IucA IucC IutA TonB

AF136296 NC_003143 NC_004431 NC_004431 NC_004431 NC_004431

Adhesins Afimbrial adhesins AfaD-3 AfaD-8 AfaE-3 Afa/DraBC BmaE/AfaE-8 GafD Fimbrial adhesins AafB AufA AufC

X76688 AF072900 X76688 M15677 L33969

Enterobactin synthetase Ferric enterobactin receptor Ferric yersiniabactin receptor Putative UDP-glucoronosyl and UDP-glucosyl transferase Putative ferric enterochelin esterase Protein of unknown function Receptor for siderophores enterochelin, dihydrobenzoic acid, and salmochelin Iron responsible element, putative siderophore receptor HMWP1, yersiniabactin biosynthesis protein HMWP2, yersiniabactin biosynthesis protein Aerobactin biosynthesis protein Aerobactin biosynthesis protein Ferric aerobactin receptor Outer membrane energy transductor

Afimbrial adhesin type III, invasin subunit Afimbrial adhesin type VIII, invasin subunit Afimbrial adhesin type III, adhesin subunit Consensus sequence for Afa/Dr adhesins M-agglutinin subunit/ afimbrial adhesin type VIII, adhesin subunit N-acetyl-d-glucosamine specific fimbrial lectin

AF114828 AE016768

Aggregative adherence fimbriae II, invasin subunit Uropathogenic E. coli-associated fimbriae, putative major fimbrial subunit

AE016768

Uropathogenic E. coli-associated fimbriae, hypothetical outer membrane usher protein

491

492

22 Pathogenomics: Application and New Diagnostic Tools Table 22.2 Continued.

Target / source

Gene

Accession no.

Gene product

AufG

AE016768

CsgA FimH

NC_000913 AE016771

FocA MatB

AF298200 AF325731

PapA

AE016771

PapF PapG-I PapG-II PapG-III SfaA/Foc

AE016771 AF240678 AE016771 AF237473

SfaA SfaS

X16664 X16664

Uropathogenic E. coli-associated fimbriae, putative fimbrial adhesin subunit Thin aggregative fimbriae, curli, major subunit Type 1C fimbriae, d-mannose-specific adhesin subunit Type F1C fimbriae, major fimbrial subunit Meningitis-associated and temperature regulated fimbriae, major fimbrial subunit Pyelonephritis-associated pilus (P-fimbriae), major fimbrial subunit P-fimbriae, minor fimbrial protein P-fimbriae, type I adhesin subunit P-fimbriae, type II adhesin subunit P-fimbriae, type III adhesin subunit Consensus sequence for S-fimbriae and type 1C fimbriae S-fimbriae, major fimbrial subunit S-fimbriae, adhesin subunit

Pathogenicity islands Tag-prrA-modD – PAICFT073 FecD AE016759

Iron(III) dicitrate transport system permease protein Putative pyrophosphorylase Putative outer membrane receptor Transglycosylase-associated protein

ModD PrrA YmgE

AE016759 AE016759 AE016759

ORF 9

AF447814

ORF 30 ORF48 ORF 49 ORF 50 ORF 70

AF447814 AF447814 AF447814 AF447814 AF447814

R6-like protein, similar to transposase of Chelatobactor heintzii Iron-chelating periplasmic-binding protein R4-like protein, Iha, nonhemagglutinating adhesin Lysine decarboxylase Cadaverine/lysine antiporter IrgA-like protein

ORF 2 ORF 47

AJ488511 AJ488511

Putative DNA methylase Putative periplasmic solute-binding protein

ORF 4 ORF 40

AJ494981 AJ494981

Hypothetical protein Hypothetical protein

ORF 41

AJ494981

HecB-like conserved hypothetical protein

PAI IICFT073

PAI I536

PAI II536

22.4 The Postgenomic Era: Use of DNA Microarrays in the Diagnosis … Table 22.2 Continued.

Target / source

Gene

Accession no.

Gene product

ORF 49 Hek PrfG HlyA-UPEC

AJ494981 AJ494981 AJ494981 AJ494981

Hypothetical protein possibly involved in transport Hek, adhesin/virulence factor Adhesin of P-related fimbriae Hemolysin toxin protein of uropathogenic E. coli

ORF 11 ORF 12 ORF 36 ORF 52 MchF

X16664 X16664 X16664 X16664 X16664

Hypothetical protein Hypothetical protein Putative hemin receptor Hypothetical protein, putative adhesin Microcin transport protein

23S rRNA Adk Ag43 CadA

NC_000913 NC_000913 NC_004431 AE016771

ChuA CvaC DsdA DsdC

AE016768 AJ223631 NC_004431 NC_004431

EaeA

AP002566

IbeA Iss Mdh

AF289032 AY205565 NC_000913

OmpT RfaH TraA TraT-EHEC TspE4.C2 Usp YjaA

NC_004431 NC_004431 NC_002483 AE005307 AE016770 AB027193 AE000474

rrlH, 23S ribosomal RNA Adenylate kinase, a housekeeping gene in E. coli Antigen 43 involved in biofilm formation Lysine decarboxylase, a housekeeping gene in E. coli and absent in Shigella spp. Outer membrane heme/hemoglobin receptor Colicin V of plasmid pColV d-serine dehydratase Positive transcriptional regulator, d-serine deaminase activator Enteropathogenic E. coli factor for attachment and effacement, invasion factor gamma intimin Blood–brain barrier invasion protein of NMEC Increased serum survival protein Malate dehydrogenase, a housekeeping gene in E. coli Outer membrane protease VII Transcriptional regulator Pilin subunit of F-plasmid Complement resistance protein Putative lipase Uropathogen-specific protein of unknown function Hypothetical protein

PAI III536

Others

493

494

22 Pathogenomics: Application and New Diagnostic Tools

22.4.4 16S-/23S-rDNA Arrays

Microarrays offer tremendous potential for microbial community analysis, pathogen detection, and process monitoring in both basic and applied environmental science. In the hybridization fingerprinting approaches, the sensitivity of PCR and the specificity of oligonucleotide microarray hybridization are combined to enable microbial identification through analysis of the 5¢ region of prokaryotic 16S-rRNA genes of different bacterial strains. Special attention is paid to the impact of molecular tools and applications on the diagnostics of tuberculosis. Responsible for more than 2 million deaths and 8 million new cases annually, tuberculosis is one of the leading infectious diseases in the world. Because of the slow growth rate of the causative agent Mycobacterium tuberculosis, isolation, identification, and drug susceptibility testing of this organism and other clinically important mycobacteria can take several weeks or longer. High-density oligonucleotide arrays have been developed based on 82 unique 16S-rRNA sequences, and these arrays allow for discrimination of 54 mycobacterial species and 51 sequences that contain unique rpoB gene mutations with a turnaround time of only 4 h when performed on culture-positive specimens [53, 54]. Using 16S-/23SrRNA microarrays, different studies were dedicated to the identification and typing of common bacterial pathogens such as Campylobacter spp. [55], Listeria spp. [56, 57], Neisseria meningitidis [58], Chlamydia spp. [59], and S. aureus [60–62] (Table 22.1). A recent study described the use of uropathogen-specific 16S-rDNA genes in a novel electrochemical 16-sensor array to demonstrate that rapid molecular hybridization approaches can be adapted to, room temperature conditions [63]. Microarrays for the detection of pathogens have also increasingly been used in veterinary medicine. A microarray with 32 oligoprobes targeting the 23S-rRNA gene was developed and successfully applied to detect veterinary pathogens responsible for equine abortion from clinical samples [57]. However, the most widespread application of 16S-/23S-rRNA-based microarrays has been in environmental microbiology. A recent study revealed the specific hybridization and detection of microbial 16S-rRNA directly from a total-RNA soil extract, without further purification or removal of soluble soil constituents [64]. In conclusion, the rRNAbased methods have great potential to be incorporated in commercial diagnostic kits for the routine microbiology laboratory. 22.4.5 Detection of Antibiotic Resistance in Microbial Pathogens Using Microarray Technology

DNA microarray technologies offer a promising method for detection of antimicrobial resistance genes and point mutations in resistance-related sequences (rRNA, katG, gyrA) [9]. Detection and identification of multiple tetracycline resistance genes by a glass-based DNA microarray were recently described [46]. In this study, microarray probes for 17 tet genes, the b-lactamase blaTEM-1 gene, and a 16SrDNA gene (E. coli) were generated and successfully applied to clinical isolates.

22.5 Microarray Technology in Bacteria: Further Areas of Applications

Recently, the application of the microarray-based single-base-mutation identification assay for detection of resistance against fluoroquinolones in clinical diagnostics has been demonstrated by identifying prevalent gyrA mutations in 30 clinical E. coli isolates [65]. Moreover, oligonucleotide microarrays offer an attractive option for the identification and epidemiologic monitoring of TEM b-lactamases in the routine clinical diagnostic laboratory. Using these DNA arrays Grimm et al. describe the identification of the single nucleotide polymorphisms (SNPs) of 96% of the TEM b-lactamase variants known to date which are related to extended-spectrum beta-lactamase (ESBL) and/or inhibitor-resistant TEM (IRT) phenotype [66]. High-density oligonucleotide arrays have also been applied to simultaneous species identification and detection of mutations that confer rifampicin and pyrazinamide resistance in mycobacteria [54, 67]. Consequently, the DNA microarray strategy could be expanded to include parallel testing of various genes mediating drug resistance in M. tuberculosis. The design of such an array for the simultaneous testing of isoniazid, rifampicin, streptomycin, and fluoroquinolone susceptibilities is desirable, as this approach could provide an essential contribution to the rapid diagnosis of drug-resistant tuberculosis. In conclusion, the detection of antibioticresistance mutations shows that microarray technology is very promising and will probably soon be among the laboratory tools available to microbiologists. However, it should be emphasized that it is the functional expression rather than the presence of the antibiotic resistance gene that determines the success of antibiotic treatment.

22.5 Microarray Technology in Bacteria: Further Areas of Applications 22.5.1 Gene Expression Microarrays and Host–Pathogen Interaction

Bacterial pathogens adapt to their host environments predominantly by switching on complex transcriptional programs, and whole-genome microarray experiments promise to uncover this complexity. By monitoring microbial gene expression arrays, one can predict the functions of uncharacterized genes, probe the physiological adaptations made under various environmental conditions, identify virulence-associated genes, and predict the effects of drugs. Increased experimental efficiency permits high throughput and whole-genome expression profiling of pathogens and hosts [68]. Such a gene expression approach has recently been used for characterization of the etiological agent of plague, Yersinia pestis. This pathogen must acclimatize itself to temperature shifts between the temperature (26 C) for flea blockage and the body temperature (37 C) of warm-blooded hosts during its life cycle [69]. The array results provided a genome-wide profile of gene transcription induced by temperature shift and shed light on the pathogenicity and host–microbe interaction of this deadly pathogen. Gene expression microarrays promise to accelerate our understanding of the host side of the host–patho-

495

496

22 Pathogenomics: Application and New Diagnostic Tools

gen interaction. The complex interaction between host and pathogen has attracted the particular attention of researchers and is now being explored using microarrays [70, 71]. The basic model consists of ex vivo measurement of gene expression of host cells before and after they are infected with a microorganism. Moreover, in gene expression experiments the bacteria can be used as sensors’ to report the environment they experience at selected phases during the infection pathogenesis. For example, Shigella flexneri can be used to probe the cytoplasmic compartment of macrophages, whereas Salmonella enterica and M. tuberculosis describe different modified phagosomal vacuoles. As more expression profiles are obtained for pathogens adapted to the intracellular environment, the construction of more complete pictures of intracellular parasitism become more readily available [72]. 22.5.2 DNA Microarray Technology in Food Technology

As modern food production has become more and more centralized and food delivery routes become longer, single foodborne outbreaks have expanded to wider and wider areas. This has presented a challenge in attempts to solve these outbreaks. Molecular typing methods used together with relevant analytical epidemiology enable better identification of epidemics. Molecular diagnostics will increasingly play a key role in food safety related to genetically modified foods, foodborne pathogens and novel nutraceuticals [73]. DNA microarray technology provides a flexibility in molecular diagnostics by permitting the simultaneous analysis of large sets of genes. Thus, automation of assay and novel bioinformatics tools make DNA microarrays a robust technology for diagnostics in food technology as well. Efforts to develop diagnostic custom arrays and simplified bioinformatics tools for field use are warranted [73]. 22.5.3 DNA Microarray Technology in Environmental Microbiology

The microarray-based genomic technologies for bacterial detection and microbial community analysis have received a great deal of attention. The high density and high-throughput capacity of microarray-based genomic technologies have considerably changed the analysis of microbial community structure, function, and dynamics. The initial targets for sequencing projects have been microorganisms of medical of biotechnological interest. However, ecologically relevant organisms have recently become the focus of many whole genome sequencing projects. Large-scale metagenomic approaches, which comprise culture-independent genomic analyses of an assemblage of microorganisms, have been performed to accumulate an increasing number of DNA sequences. The tremendous input of sequence information in the past decades improves our ability to engineer more and better probes to target specifically phylogenetic and functional markers, and microarrays provide the platforms for detailed phylogenetic and functional analyses of environmental samples [74]. These projects reveal a plethora of new genes

22.5 Microarray Technology in Bacteria: Further Areas of Applications

and a high prokaryotic diversity. From this, several different microarray formats have been developed and evaluated for bacterial detection and microbial community analyses of environmental samples. Going beyond the genomes of cultured organisms, meta-genomics analyses attempt to gain insight into the genomic potential of microbial communities as a whole. However, more rigorous and systematic assessment and development are needed to realize the full potential of microarrays for microbial ecology studies. 22.5.4 Pathogenomic Tools (Microarrays) in the Diagnosis of Microbiologic Agents as Bioweapons

There is increasing concern within both the scientific and security communities that the ongoing revolution in biology has great potential to be misused in offensive biological weapons programs [75–77]. Interdisciplinary and international efforts to increase the monitoring, surveillance, identification, and reporting of disease agents and to better understand the potential dynamics of disease transmission within human and animal populations in both industrialized and developing country settings will greatly enhance our ability to combat the effects of bioweapons and emerging diseases on biological communities and biodiversity [14, 15, 78–80]. Current approaches in the detection and differentiation of bioweapons are directed towards the developing of biodefense microarrays that can detect hundreds of top-priority bacterial and viral biological agents, such as anthrax, plague, and smallpox. The new generation of biodefense microarray tests is expected to offer researchers the quickest, most comprehensive single test. Earlier DNA tests required a time-consuming approach, testing for one pathogen at a time. Traditional test methods, which include growing a culture of the bacteria and then identifying it by sight, can overlook genetically engineered organisms expressing unusual toxins or antibiotic resistance. The new arrays are expected to work in as little as 4 h and offer three advantages: 1. The biodefense array will present a comprehensive, singlestep test to simultaneously identify genetic fingerprints for 26 different bacterial species, 10 viral species, hundreds of their subspecies selected from the National Institute of Allergy and Infectious Disease (NIAID) high-priority pathogen list, and 56 different toxic genes from bacteria. As a result, this array could replace dozens of existing tests. It could even detect an attack where multiple pathogens are used, something current methods may not detect. 2. The novel biodefense arrays are able to detect DNA from pathogens that have been inserted into apparently harmless bacteria, which traditional pathogen identification methods could miss. 3. The new generation of biodefense array will detect whether or not genes that make organisms resistant to antibiotics

497

498

22 Pathogenomics: Application and New Diagnostic Tools

have been inserted into a pathogen by simultaneously testing for 62 different antibiotic resistance genes. Such a multidrug-resistant phenotype has been described in a wildtype isolate of Yersinia pestis [81]. If an antibiotic resistance gene goes undetected, physicians could end up treating patients with medication that simply wouldn’t work. Using current methods, researchers have to test for every antibiotic resistance gene one at a time, a slow and cumbersome process. While effective biodefense utilizes a variety of tactical tools, microarray technology is a valuable arrow in that quiver [82].

22.6 Current Limitations on the Use of DNA Microarrays in Diagnostics in Medical Microbiological Laboratories

Currently, the limitation on a broader application of pathogenomic tools (DNA microarrays) in microbial diagnostics basically relates to costs and operative overheads. Upfront investment and molecular expertise are required for the development and validation of in-house genetic techniques and may hinder application in smaller laboratories. Aside from their costs and the difficulties associated with designing and making a suitable array, the most obvious problems are those of quality control, because of the difficulties of standardization and reproducibility associated with the large number of probes on an array, and the large number of slides that need to be made. Commercially available arrays may be inflexible because they are based on just one target sequence, whereas in-house arrays of oligonucleotides or PCR amplicons spotted on glass slides may be contaminated if PCR amplicons are used. Additionally, a number of relevant experimental factors must be considered in order to obtain satisfactory research results. Sample preparation and processing are crucial, and researchers must pay careful attention to RNA isolation, target amplification, target labeling, hybridization, detection, data processing, and data analysis. Further commercialization will certainly improve the user-friendliness of these techniques, increase their use, and make microarray technology more affordable. As microbial evolution continues, the composition of microarrays has to be continuously adapted to the current pattern of microbial virulence factors and antibiotic resistance genes.

22.7 Final Remarks

In the past 10 years, clinical microbiology laboratories have undergone important changes with the introduction of molecular biology techniques and laboratory automation. Molecular methods will be used increasingly for the rapid diagnosis

References

and study of the pathogenesis and epidemiology of infectious diseases. The availability of complete genome sequences of large numbers of pathogenic microorganisms will provide a better understanding of their evolutionary genetics, virulence, and host interactions. Microbial and host gene expression profiles have been used to develop tools that allow faster and more precise diagnosis and individually tailored treatment regimens and outcome prediction. In the future, there will be a need for more rapid diagnosis, increased standardization of testing, and greater adaptability to cope with new threats from infectious microorganisms, such as agents of bioterrorism and emerging pathogens. The combination of the new tools that are now being developed in research laboratories and improved communication between physicians and clinical microbiologists should lead to intense changes in the way that clinical microbiologists work. References 1 Hacker, J., B. Hochhut, B. Middendorf,

2

3

4

5

6

G. Schneider, C. Buchrieser, G. Gottschalk, and U. Dobrindt. 2004. Pathogenomics of mobile genetic elements of toxigenic bacteria. Int. J. Med. Microbiol. 293:453–461. Hacker, J., U. Hentschel, and U. Dobrindt. 2003. Prokaryotic chromosomes and disease. Science 301:790–793. Sorsa, L. J., S. Dufke, and S. Schubert. 2004. Identification of novel virulenceassociated loci in uropathogenic Escherichia coli by suppression subtractive hybridization. FEMS Microbiol. Lett. 230:203–208. Janke, B., U. Dobrindt, J. Hacker, and G. Blum-Oehler. 2001. A subtractive hybridisation analysis of genomic differences between the uropathogenic E. coli strain 536 and the E. coli K-12 strain MG1655. FEMS Microbiol. Lett. 199:61– 66. Dobrindt, U., F. Agerer, K. Michaelis, A. Janka, C. Buchrieser, M. Samuelson, C. Svanborg, G. Gottschalk, H. Karch, and J. Hacker. 2003. Analysis of genome plasticity in pathogenic and commensal Escherichia coli isolates by use of DNA arrays. J. Bacteriol. 185:1831–1840. Dobrindt, U., B. Hochhut, U. Hentschel, and J. Hacker. 2004. Genomic islands in pathogenic and environmental microorganisms. Nat. Rev. Microbiol. 2:414– 424.

7 Normark, B. H. and S. Normark. 2002.

8

9

10

11

12

13

14

15

Evolution and spread of antibiotic resistance. J. Intern. Med. 252:91–106. Oelschlaeger, T. A. and J. Hacker. 2004. Impact of pathogenicity islands in bacterial diagnostics. APMIS 112:930–936. Bodrossy, L. and A. Sessitsch. 2004. Oligonucleotide microarrays in microbial diagnostics. Curr. Opin. Microbiol. 7:245–254. Stenger, D. A., J. D. Andreadis, G. J. Vora, and J. J. Pancrazio. 2002. Potential applications of DNA microarrays in biodefense-related diagnostics. Curr. Opin. Biotechnol. 13:208–212. Clewley, J. P. 2004. A role for arrays in clinical virology: fact or fiction? J. Clin. Virol. 29:2–12. Zhou, J. 2003. Microarrays for bacterial detection and microbial community analysis. Curr. Opin. Microbiol. 6:288– 294. Letowski, J., R. Brousseau, and L. Masson. 2004. Designing better probes: effect of probe size, mismatch position and number on hybridization in DNA oligonucleotide microarrays. J. Microbiol. Methods 57:269–278. Nulens, E. and A. Voss. 2002. Laboratory diagnosis and biosafety issues of biological warfare agents. Clin. Microbiol. Infect. 8:455–466. Peruski, L. F., Jr. and A. H. Peruski. 2003. Rapid diagnostic assays in the genomic biology era: detection and

499

500

22 Pathogenomics: Application and New Diagnostic Tools

16

17

18

19

20

21

22

23

24

25

26

identification of infectious disease and biological weapon agents. Biotechniques 35:840–846. Ogilvie, M. 2001. Molecular techniques should not now replace cell culture in diagnostic virology laboratories. Rev. Med. Virol. 11:351–354. Johnson, J. R. 2000. Development of polymerase chain reaction-based assays for bacterial gene detection. J. Microbiol. Methods 41:201–209. Patel, J. B. 2001. 16S rRNA gene sequencing for bacterial pathogen identification in the clinical laboratory. Mol. Diagn. 6:313–321. Ludwig, W., O. Strunk, S. Klugbauer, N. Klugbauer, M. Weizenegger, J. Neumaier, M. Bachleitner, and K. H. Schleifer. 1998. Bacterial phylogeny based on comparative sequence analysis. Electrophoresis 19:554–568. Baker, G. C., J. J. Smith, and D. A. Cowan. 2003. Review and re-analysis of domain-specific 16S primers. J. Microbiol. Methods 55:541–555. Cook, N. 2003. The use of NASBA for the detection of microbial pathogens in food and environmental samples. J. Microbiol. Methods 53:165–174. Rudi, K., H. K. Nogva, B. Moen, H. Nissen, S. Bredholt, T. Moretro, K. Naterstad, and A. Holck. 2002. Development and application of new nucleic acidbased technologies for microbial community analyses in foods. Int. J. Food Microbiol. 78:171–180. Bruce, I. J. 1993. Nucleic acid amplification mediated microbial identification. Sci. Prog. 77(Pt 3–4):183–206. Relman, D. A. and S. Falkow. 1992. Identification of uncultured microorganisms: expanding the spectrum of characterized microbial pathogens. Infect. Agents Dis. 1:245–253. Amann, R. I., W. Ludwig, and K. H. Schleifer. 1995. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev. 59:143–169. Moter, A. and U. B. Gobel. 2000. Fluorescence in situ hybridization (FISH) for direct visualization of microorganisms. J. Microbiol. Methods 41:85–112.

27 Trebesius, K., K. Panthel, S. Strobel,

K. Vogt, G. Faller, T. Kirchner, M. Kist, J. Heesemann, and R. Haas. 2000. Rapid and specific detection of Helicobacter pylori macrolide resistance in gastric tissue by fluorescent in situ hybridisation. Gut 46:608–614. 28 Trebesius, K., D. Harmsen, A. Rakin, J. Schmelz, and J. Heesemann. 1998. Development of rRNA-targeted PCR and in situ hybridization with fluorescently labelled oligonucleotides for detection of Yersinia species. J. Clin. Microbiol. 36:2557–2564. 29 Trebesius, K., L. Leitritz, K. Adler, S. Schubert, I. B. Autenrieth, and J. Heesemann. 2000. Culture independent and rapid identification of bacterial pathogens in necrotising fasciitis and streptococcal toxic shock syndrome by fluorescence in situ hybridisation. Med. Microbiol. Immunol. (Berl) 188:169– 175. 30 Stender, H. 2003. PNA FISH: an intelligent stain for rapid diagnosis of infectious diseases. Expert Rev. Mol. Diagn. 3:649–655. 31 Millar, B. C. and J. E. Moore. 2004. Molecular diagnostics: current options. Methods Mol. Biol. 266:139–166. 32 Blanc, D. S. 2004. The use of molecular typing for epidemiological surveillance and investigation of endemic nosocomial infections. Infect. Genet. Evol. 4:193–197. 33 Wolk, D., S. Mitchell, and R. Patel. 2001. Principles of molecular microbiology testing methods. Infect. Dis. Clin. North Am. 15:1157–1204. 34 Sundsfjord, A., G. S. Simonsen, B. C. Haldorsen, H. Haaheim, S. O. Hjelmevoll, P. Littauer, and K. H. Dahl. 2004. Genetic methods for detection of antimicrobial resistance. APMIS 112:815– 837. 35 Mackay, I. M. 2004. Real-time PCR in the microbiology laboratory. Clin. Microbiol. Infect. 10:190–212. 36 Yang, S., S. Lin, G. D. Kelen, T. C. Quinn, J. D. Dick, C. A. Gaydos, and R. E. Rothman. 2002. Quantitative multiprobe PCR assay for simultaneous detection and identification to species

References level of bacterial pathogens. J. Clin. Microbiol. 40:3449–3454. 37 Yang, S. and R. E. Rothman. 2004. PCR-based diagnostics for infectious diseases: uses, limitations, and future applications in acute-care settings. Lancet Infect. Dis. 4:337–348. 38 Tan, T. Y. 2003. Use of molecular techniques for the detection of antibiotic resistance in bacteria. Expert Rev. Mol. Diagn. 3:93–103. 39 Aarts, H. J., K. S. Boumedine, X. Nesme, and A. Cloeckaert. 2001. Molecular tools for the characterisation of antibioticresistant bacteria. Vet. Res. 32:363–380. 40 Anthony, R. M., T. J. Brown, and G. L. French. 2000. Rapid diagnosis of bacteremia by universal amplification of 23S ribosomal DNA followed by hybridization to an oligonucleotide array. J. Clin. Microbiol. 38:781–788. 41 Zwart, G., E. J. van Hannen, M. P. Kamst-van Agterveld, G. K. Van der, E. S. Lindstrom, J. Van Wichelen, T. Lauridsen, B. C. Crump, S. K. Han, and S. Declerck. 2003. Rapid screening for freshwater bacterial groups by using reverse line blot hybridization. Appl. Environ. Microbiol. 69:5875–5883. 42 Tao, H., C. Bausch, C. Richmond, F. R. Blattner, and T. Conway. 1999. Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media. J. Bacteriol. 181:6425–6440. 43 Richmond, C. S., J. D. Glasner, R. Mau, H. Jin, and F. R. Blattner. 1999. Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res. 27:3821–3835. 44 Wilson, M., J. DeRisi, H. H. Kristensen, P. Imboden, S. Rane, P. O. Brown, and G. K. Schoolnik. 1999. Exploring druginduced alterations in gene expression in Mycobacterium tuberculosis by microarray hybridization. Proc. Natl. Acad. Sci. U. S. A. 96:12833–12838. 45 Salama, N., K. Guillemin, T. K. McDaniel, G. Sherlock, L. Tompkins, and S. Falkow. 2000. A whole-genome microarray reveals genetic diversity among Helicobacter pylori strains. Proc. Natl. Acad. Sci. U. S. A. 97:14668–14673. 46 Call, D. R., M. K. Bakko, M. J. Krug, and M. C. Roberts. 2003. Identifying antimi-

crobial resistance genes with DNA microarrays. Antimicrob. Agents Chemother. 47:3290–3295. 47 Chizhikov, V., A. Rasooly, K. Chumakov, and D. D. Levy. 2001. Microarray analysis of microbial virulence factors. Appl. Environ. Microbiol. 67:3258–3263. 48 Cho, J. C. and J. M. Tiedje. 2001. Bacterial species determination from DNA– DNA hybridization by using genome fragments and DNA microarrays. Appl. Environ. Microbiol. 67:3677–3682. 49 Murray, A. E., D. Lies, G. Li, K. Nealson, J. Zhou, and J. M. Tiedje. 2001. DNA/ DNA hybridization to microarrays reveals gene-specific differences between closely related microbial genomes. Proc. Natl. Acad. Sci. U. S. A. 98:9853–9858. 50 Long, W. H., H. S. Xiao, X. M. Gu, Q. H. Zhang, H. J. Yang, G. P. Zhao, and J. H. Liu. 2004. A universal microarray for detection of SARS coronavirus. J. Virol. Methods 121:57–63. 51 Wilson, W. J., C. L. Strout, T. Z. DeSantis, J. L. Stilwell, A. V. Carrano, and G. L. Andersen. 2002. Sequence-specific identification of 18 pathogenic microorganisms using microarray technology. Mol. Cell Probes 16:119–127. 52 Schoolnik, G. K. 2002. Microarray analysis of bacterial pathogenicity. Adv. Microb. Physiol 46:1–45. 53 Gingeras, T. R., G. Ghandour, E. Wang, A. Berno, P. M. Small, F. Drobniewski, D. Alland, E. Desmond, M. Holodniy, and J. Drenkow. 1998. Simultaneous genotyping and species identification using hybridization pattern recognition analysis of generic Mycobacterium DNA arrays. Genome Res. 8:435–448. 54 Troesch, A., H. Nguyen, C. G. Miyada, S. Desvarenne, T. R. Gingeras, P. M. Kaplan, P. Cros, and C. Mabilat. 1999. Mycobacterium species identification and rifampin resistance testing with high-density DNA probe arrays. J. Clin. Microbiol. 37:49–55. 55 Volokhov, D., V. Chizhikov, K. Chumakov, and A. Rasooly. 2003. Microarraybased identification of thermophilic Campylobacter jejuni, C. coli, C. lari, and C. upsaliensis. J. Clin. Microbiol. 41:4071–4080.

501

502

22 Pathogenomics: Application and New Diagnostic Tools 56 Volokhov, D., A. Rasooly, K. Chumakov,

64 Small, J., D. R. Call, F. J. Brockman,

and V. Chizhikov. 2002. Identification of Listeria species by microarray-based assay. J. Clin. Microbiol. 40:4720–4728. 57 Mitterer, G., M. Huber, E. Leidinger, C. Kirisits, W. Lubitz, M. W. Mueller, and W. M. Schmidt. 2004. Microarraybased identification of bacteria in clinical samples by solid-phase PCR amplification of 23S ribosomal DNA sequences. J. Clin. Microbiol. 42:1048– 1057. 58 Swiderek, H., H. Claus, M. Frosch, and U. Vogel. 2005. Evaluation of custommade DNA microarrays for multilocus sequence typing of Neisseria meningitidis. Int. J. Med. Microbiol. 295:39–45. 59 Sachse, K., H. Hotzel, P. Slickers, T. Ellinger, and R. Ehricht. 2005. DNA microarray-based detection and identification of Chlamydia and Chlamydophila spp. Mol. Cell Probes 19:41–50. 60 Trad, S., J. Allignet, L. Frangeul, M. Davi, M. Vergassola, E. Couve, A. Morvan, A. Kechrid, C. Buchrieser, P. Glaser, and N. El Solh. 2004. DNA macroarray for identification and typing of Staphylococcus aureus isolates. J. Clin. Microbiol. 42:2054–2064. 61 van Leeuwen, W. B., C. Jay, S. Snijders, N. Durin, B. Lacroix, H. A. Verbrugh, M. C. Enright, A. Troesch, and A. Van Belkum. 2003. Multilocus sequence typing of Staphylococcus aureus with DNA array technology. J. Clin. Microbiol. 41:3323–3326. 62 Couzinet, S., C. Jay, C. Barras, R. Vachon, G. Vernet, B. Ninet, I. Jan, M. A. Minazio, P. Francois, D. Lew, A. Troesch, and J. Schrenzel. 2005. High-density DNA probe arrays for identification of staphylococci to the species level. J. Microbiol. Methods 61:201–208. 63 Sun, C. P., J. C. Liao, Y. H. Zhang, V. Gau, M. Mastali, J. T. Babbitt, W. S. Grundfest, B. M. Churchill, E. R. McCabe, and D. A. Haake. 2005. Rapid, species-specific detection of uropathogen 16S rDNA and rRNA at ambient temperature by dot-blot hybridization and an electrochemical sensor array. Mol. Genet. Metab. 84:90–99.

T. M. Straub, and D. P. Chandler. 2001. Direct detection of 16S rRNA in soil extracts by using oligonucleotide microarrays. Appl. Environ. Microbiol. 67:4708–4716. 65 Yu, X., M. Susa, C. Knabbe, R. D. Schmid, and T. T. Bachmann. 2004. Development and validation of a diagnostic DNA microarray to detect quinolone-resistant Escherichia coli among clinical isolates. J. Clin. Microbiol. 42:4083–4091. 66 Grimm, V., S. Ezaki, M. Susa, C. Knabbe, R. D. Schmid, and T. T. Bachmann. 2004. Use of DNA microarrays for rapid genotyping of TEM betalactamases that confer resistance. J. Clin. Microbiol. 42:3766–3774. 67 Wade, M. M., D. Volokhov, M. Peredelchuk, V. Chizhikov, and Y. Zhang. 2004. Accurate mapping of mutations of pyrazinamide-resistant Mycobacterium tuberculosis strains with a scanning-frame oligonucleotide microarray. Diagn. Microbiol. Infect. Dis. 49:89–97. 68 Kato–Maeda, M., Q. Gao, and P. M. Small. 2001. Microarray analysis of pathogens and their interaction with hosts. Cell Microbiol. 3:713–719. 69 Han, Y., D. Zhou, X. Pang, Y. Song, L. Zhang, J. Bao, Z. Tong, J. Wang, Z. Guo, J. Zhai, Z. Du, X. Wang, X. Zhang, J. Wang, P. Huang, and R. Yang. 2004. Microarray analysis of temperature-induced transcriptome of Yersinia pestis. Microbiol. Immunol. 48:791–805. 70 Cummings, C. A. and D. A. Relman. 2000. Using DNA microarrays to study host–microbe interactions. Emerg. Infect. Dis. 6:513–525. 71 Manger, I. D. and D. A. Relman. 2000. How the host sees’ pathogens: global gene expression responses to infection. Curr. Opin. Immunol. 12:215–218. 72 Hinton, J. C., I. Hautefort, S. Eriksson, A. Thompson, and M. Rhen. 2004. Benefits and pitfalls of using microarrays to monitor bacterial gene expression during infection. Curr. Opin. Microbiol. 7:277–282. 73 Liu–Stratton, Y., S. Roy, and C. K. Sen. 2004. DNA microarray technology in

References

74

75

76

77

78

79

80

81

82

83

84

nutraceutical and food safety. Toxicol. Lett. 150:29–42. Riesenfeld, C. S., P. D. Schloss, and J. Handelsman. 2004. Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet. 38:525–552. Hilleman, M. R. 2002. Overview: cause and prevention in biowarfare and bioterrorism. Vaccine 20:3055–3067. Dudley, J. P. and M. H. Woodford. 2002. Bioweapons, bioterrorism and biodiversity: potential impacts of biological weapons attacks on agricultural and biological diversity. Rev. Sci. Tech. 21:125–137. Bhalla, D. K. and D. B. Warheit. 2004. Biological agents with potential for misuse: a historical perspective and defensive measures. Toxicol. Appl. Pharmacol. 199:71–84. Kiechle, F. L. and X. Zhang. 2002. The postgenomic era: implications for the clinical laboratory. Arch. Pathol. Lab. Med. 126:255–262. Fraser, C. M. and M. R. Dando. 2001. Genomics and future biological weapons: the need for preventive action by the biomedical community. Nat. Genet. 29:253–256. Krafft, A. E. and D. A. Kulesh. 2001. Applying molecular biological techniques to detecting biological agents. Clin. Lab Med. 21:631–660. Galimand, M., A. Guiyoule, G. Gerbaud, B. Rasoamanana, S. Chanteau, E. Carniel, and P. Courvalin. 1997. Multidrug resistance in Yersinia pestis mediated by a transferable plasmid. N. Engl. J. Med. 337:677–680. Weinberg, S. 2004. Emergent FDA biodefense issues for microarray technology: process analytical technology. Expert. Rev. Mol. Diagn. 4:779–781. Kakinuma, K., M. Fukushima, and R. Kawaguchi. 2003. Detection and identification of Escherichia coli, Shigella, and Salmonella by microarrays using the gyrB gene. Biotechnol. Bioeng. 83:721– 728. Fukushima, M., K. Kakinuma, H. Hayashi, H. Nagai, K. Ito, and R. Kawaguchi. 2003. Detection and identification of Mycobacterium species isolates by DNA

85

86

87

88

89

90

91

92

microarray. J. Clin. Microbiol. 41:2605– 2615. Smoot, J. C., K. D. Barbian, J. J. Van Gompel, L. M. Smoot, M. S. Chaussee, G. L. Sylva, D. E. Sturdevant, S. M. Ricklefs, S. F. Porcella, L. D. Parkins, S. B. Beres, D. S. Campbell, T. M. Smith, Q. Zhang, V. Kapur, J. A. Daly, L. G. Veasy, and J. M. Musser. 2002. Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks. Proc. Natl. Acad. Sci. U. S. A. 99:4668–4673. Roth, S. B., J. Jalava, O. Ruuskanen, A. Ruohola, and S. Nikkari. 2004. Use of an oligonucleotide array for laboratory diagnosis of bacteria responsible for acute upper respiratory infections. J. Clin. Microbiol. 42:4268–4274. Korczak, B., J. Frey, J. Schrenzel, G. Pluschke, R. Pfister, R. Ehricht, and P. Kuhnert. 2005. Use of diagnostic microarrays for determination of virulence gene patterns of Escherichia coli K1, a major cause of neonatal meningitis. J. Clin. Microbiol. 43:1024–1031. van Ijperen, C., P. Kuhnert, J. Frey, and J. P. Clewley. 2002. Virulence typing of Escherichia coli using microarrays. Mol. Cell Probes 16:371–378. Wang, R. F., M. L. Beggs, L. H. Robertson, and C. E. Cerniglia. 2002. Design and evaluation of oligonucleotide-microarray method for the detection of human intestinal bacteria in fecal samples. FEMS Microbiol. Lett. 213:175– 182. Sengupta, S., K. Onodera, A. Lai, and U. Melcher. 2003. Molecular detection and identification of influenza viruses by oligonucleotide microarray hybridization. J. Clin. Microbiol. 41:4542–4550. Lapa, S., M. Mikheev, S. Shchelkunov, V. Mikhailovich, A. Sobolev, V. Blinov, I. Babkin, A. Guskov, E. Sokunova, A. Zasedatelev, L. Sandakhchiev, and A. Mirzabekov. 2002. Species-level identification of orthopoxviruses with an oligonucleotide microchip. J. Clin. Microbiol. 40:753–757. Chizhikov, V., M. Wagner, A. Ivshina, Y. Hoshino, A. Z. Kapikian, and K. Chumakov. 2002. Detection and genotyping

503

504

22 Pathogenomics: Application and New Diagnostic Tools

93

94

95

96

of human group A rotaviruses by oligonucleotide microarray hybridization. J. Clin. Microbiol. 40:2398–2407. Boriskin, Y. S., P. S. Rice, R. A. Stabler, J. Hinds, H. Al Ghusein, K. Vass, and P. D. Butcher. 2004. DNA microarrays for virus detection in cases of central nervous system infection. J. Clin. Microbiol. 42:5811–5818. Zhao, W., J. M. Wan, W. Liu, Q. J. Liu, L. Zhang, Z. X. Zhou, X. J. Liu, and H. R. Zhang. 2003. Hepatitis gene chip in detecting HBV DNA, HCV RNA in serum and liver tissue samples of hepatitis patients. Hepatobiliary Pancreat. Dis. Int. 2:234–241. Zhaohui, S., Z. Wenling, Z. Bao, S. Rong, and M. Wenli. 2004. Microarrays for the detection of HBV and HDV. J. Biochem. Mol. Biol. 37:546–551. Straub, T. M., D. S. Daly, S. Wunshel, P. A. Rochelle, R. DeLeon, and D. P. Chandler. 2002. Genotyping Cryptosporidium parvum with an hsp70 single-

nucleotide polymorphism microarray. Appl. Environ. Microbiol. 68:1817–1826. 97 Wang, Z., G. J. Vora, and D. A. Stenger. 2004. Detection and genotyping of Entamoeba histolytica, Entamoeba dispar, Giardia lamblia, and Cryptosporidium parvum by oligonucleotide microarray. J. Clin. Microbiol. 42:3262–3271. 98 Guyer, D. M., J. S. Kao, and H. L. Mobley. 1998. Genomic analysis of a pathogenicity island in uropathogenic Escherichia coli CFT073: distribution of homologous sequences among isolates from patients with pyelonephritis, cystitis, and catheter-associated bacteriuria and from fecal samples. Infect. Immun. 66:4411–4417. 99 Kao, J. S., D. M. Stucker, J. W. Warren, and H. L. Mobley. 1997. Pathogenicity island sequences of pyelonephritogenic Escherichia coli CFT073 are associated with virulent uropathogenic strains. Infect. Immun. 65:2812–2820.

505

23 The Search for New Antibiotics Harald Labischinski, Christoph Freiberg, and Heike Brtz-Oesterhelt

23.1 The Need for Novel Antibiotics

There is little doubt that the discovery, development, and clinical introduction of antibiotics to combat bacterial infections are the most important contributions of medical science and the pharmaceutical industry to length and quality of life, especially in the more developed countries. The introduction of effective means to treat life-threatening infections raised human life expectancy dramatically and paved the way for almost all aspects of progress in modern medicine, such as organ transplantation, aggressive surgery procedures, and cancer chemotherapy. In addition, the impaired immune status of an aging population increases the severity of infections by pathogens previously thought to be almost nonpathogenic. This is particularly true for the loss of immunocompetence due to other reasons, such as HIV infection. In the 60 years following the discovery of the first antibiotics, several important antibacterial classes were introduced into clinical practice, especially in the often cited “golden years” of the 1950s and 1960s (Fig. 23.1). The steady flow of novel, ever improved antibacterial drugs tempted public health authorities as well as academic and industrial research and development groups to forget that there are several factors that constantly undermine the effectiveness of antibiotics and threaten the progress reached: 1. The perception of having solved at least large parts of the problem of infectious diseases led to a subsequent decline in antibacterial research and development activities. 2. Medical progress itself led to an increase in the numbers of elderly and immune-compromised patients, who increasingly became infected by a wide spectrum of pathogens for which the older antibiotics often have not been optimized (e.g., enterococci, Acinetobacter spp., Legionella spp., atypical mycobacteria, and so forth). 3. The availability of a large number of cheap, generic antibiotics, the rising costs of development (currently estimated at close to a billion US$ per successful drug [1]), the complexity

506

23 The Search for New Antibiotics

of drug discovery and regulatory processes [2, 3], as well as the trend for many prescribers to give priority to the cheapest available antibiotics, often even if inappropriate, persuaded many big pharmaceutical companies to shift resources to other disease areas. 4. Most importantly, due to their relatively high mutation rates (as compared to eukaryotes), short generation times, and various mechanisms of gene transfer, bacteria responded to the selective pressure exerted by antibiotic use with development of resistance by diverse mechanisms, including target mutations, enzymatic drug metabolism, reduced uptake, and overexpression of drug pump mechanisms, thus rendering many previously effective antibiotics useless. As a result, the current arsenal of clinically available antibiotics is in large part dominated by compounds with increasing and serious resistance problems. There have been just three large product classes (b-lactams, including penicillins, cephalosporins, and penems, together with quinolones and macrolides) which control the market in terms of sales. Smaller, but nevertheless important classes include the glycopeptide vancomycin and the tetracyclines as well as the oxazolidinones and daptomycin, the latter two of which were recently launched. While resistance

Fig. 23.1 In the 60 years following the discovery of the first antibiotics, several important antibacterial classes were introduced into clinical practice, especially in the often cited “golden years” of the 1950s and 1960s. Dark boxes represent antibiotics derived from natural compound classes.

23.2 Where Will the New Antibiotics Come From?

is emerging even against these newest antibiotics, the older ones are all plagued with problems of resistance, in some cases already very serious, in some only just emerging. The most problematic pathogens with respect to resistance include among the gram-positive organisms MRSA/E (methicillin resistant-Staphylococcus aureus epidermidis, often multiresistant against all b-lactams, macrolides, quinolones, and, especially worryingly, increasingly also against vancomycin), VRE (vancomycin-resistant enterococci), and DRSP [drug (i.e., penicillin and macrolide)-resistant Streptococcus pneumoniae]. An additional problem is posed by Mycobacterium tuberculosis, which is showing an increasing trend towards multiresistant variants. Among gram-negative organisms, Pseudomonas aeruginosa – which owing to its extremely effective permeation barrier and efflux systems is naturally resistant against many available antibiotics – and the growing number of Enterobacteriaceae microbes with extended spectrum b-lactamases pose the greatest problems. For more details on current resistance issues, the reader is referred to some excellent recent reviews [4–6]. Measures such as improved infection control and hygiene procedures, together with more prudent use of antibiotics are certainly of great importance in dealing with these issues and should help to preserve the usefulness of antibiotics already on the market for longer. However, it should be clear from what has been said that, eventually, novel antibiotic classes free from cross-resistance against those currently on the market need to be discovered and developed to prevent us from a public health disaster. The question then arises: Where will the new antibiotics come from, and what are the chances of discovering, developing, and introducing them into clinical practice?

23.2 Where Will the New Antibiotics Come From?

As it is impossible to understand the present situation or even future directions without analyzing the historical background, we will start by discussing the experiences of the very successful past. 23.2.1 The Past

In the so-called golden age of antibiotic discovery, all existing antibiotic classes were found by fairly simple MIC (minimal inhibitory concentration) type tests, in which the suppression of the growth of a low inoculum of bacteria in a test tube to visible optical densities was measured in rich growth media. Only much later did it become possible to understand at the molecular level how these compounds acted. As a rule, it took years, up to decades of work for several competing groups of scientists to elucidate the molecular details of their mode of action. In the very early days, many compounds (e.g., penicillin G, erythromycin, streptomycin, and tetracycline, to mention just a few) were detected in such simple growth tests with

507

508

23 The Search for New Antibiotics

extracts from bacteria and fungi. After isolation of the active principle and structure determination they were brought right into the clinic without further modification. Thus, the early antibiotic classes stemmed almost exclusively from natural products produced by bacteria or fungi, and it is believed that microorganisms produce those so-called secondary metabolites to combat other microorganisms which compete for their ecological niches. In fact, the term “antibiotic” reflects that origin, although it is nowadays used for all antibacterial compounds, synthetic as well as natural-product-derived. It is interesting to note that, indeed, a huge number of such antibiotics have been discovered by such simple MIC screening procedures; an excellent overview about the large number of structurally diverse natural compounds with antibacterial properties can be found in Ref. [7]. However, most representatives of this pool of antibacterial agents showed some unwanted properties, for example, with respect to antibacterial spectrum or toxicity. Since, seemingly, enough novel compounds without such unwanted properties could be found, only the latter were selected for further development, and, eventually, taken into the hospitals. While it is hardly possible to overestimate the role of natural compound chemistry in those pioneer years, it soon became obvious that man-made, synthetic chemistry had a very important role to play, too. Firstly, the examples of the sulfonamides and the quinolones showed that it was possible to discover and introduce into clinical practice fully synthetic compounds with selective broad-spectrum antibacterial activity. The high chemical variability of such synthetic compounds was exploited with extremely gratifying success to optimize their properties with respect to, e.g., microbial spectrum, side-effect profile, kinetics, including oral availability, and drug interactions. Secondly, it turned out that similarly beneficial properties could be reached when semisynthetic or even fully synthetic variants of the natural products were produced. An additional rational for chemical variation within the same compound class was the observation that bacterial resistance development can be overcome at least partially by compound modification. A striking example is provided by penicillin and its derivatives. When penicillin G was discovered, almost all staphylococcal strains were susceptible to the drug, but the first b-lactamase-producing strains were already detected soon after its introduction into clinical practice [8], and shortly afterwards more than 80% of all staphylococcal isolates proved to be resistant by this mechanism. However, chemical modification of the b-lactam side chain resulted in novel penicillin derivatives (e.g., methicillin) which were no longer amenable to the hydrolyzing activity of the staphylococcal b-lactamases. Needless to say, this race between bacteria and scientists continues to this day. Thus, shortly after the introduction of methicillin in 1960, the first methicillin-resistant S. aureus strains were detected [9], which evaded the drug action by a completely different mechanism, namely the acquisition of an extra target protein, PBP 2a, with a much lower affinity for b-lactam antibiotics than the standard set of staphylococcal target proteins, PBPs 1–4 (for review, see, e.g., Ref. [10]). The seamless stream of novel compound classes and the ability to overcome resistance problems at least to some extent within a certain class led to a clear

23.2 Where Will the New Antibiotics Come From?

decline in the antimicrobial discovery activities of industrial, academic, and public health groups, because the “problem” appeared to be largely solved. Not before the late 1980s and 1990s did it became clear how misconceived this expectation was. In fact, an unprecedented resistance development took place, at the beginning mostly confined to hospitals, but which has since also alarmingly spread into the community. Fortunately, this led to a revitalization in antibiotic drug discovery, and, for those companies and academic groups who never stopped, to an increase in efforts to effectively combat bacterial pathogens. The unprecedented progress made in the molecular understanding of microbial pathogens and the mechanisms of drug action in just that time period played an important role in this process, which brings us right to the present era in antibiotic discovery efforts. 23.2.2 The Present

The deciphering of the complete genome of Haemophilus influenzae in 1995 [11] heralded a flood of unprecedented prokaryotic sequence information and triggered a paradigm shift in antibacterial drug discovery. It therefore seems appropriate to define this event as the starting point of the “present era” of antibiotic discovery strategies. Since that time, more than 190 complete bacterial genomic sequences have become available. That large collection of fully sequenced bacterial genomes includes those of important pathogens such as H. influenzae, S. aureus, Enterococcus faecium, Enterococcus faecalis, S. pneumoniae, P. aeruginosa, M. tuberculosis, and Escherichia coli, which are in the focus of antibacterial drug discovery today (http://www.ncbi.nlm.nih.gov/genomes/Complete.html). This development not only allowed analysis and comparison the inherently static genome of various important pathogens, but also paved the way for the introduction of genome-wide gene expression technologies such as transcriptome and proteome analysis (for review, see, e.g., Refs. [12–15]). We will discuss the importance of that progress for drug discovery in detail in Section 23.3. Suffice it to say here that this development really fuelled the hope that the urgently needed novel antibiotics could now be discovered and developed at an unprecedented pace. In fact, those technologies helped dramatically in the selection of novel antibacterial targets, in validating their importance for bacterial survival and pathogenesis, and in rapid determination of the molecular mode of action of novel lead structures in vitro and in vivo. In addition, other, unrelated technologies added to this belief, most prominently the development of high-throughput robotic screening, combinatorial and parallel synthesis methods, and the development of highly predictive animal models for all important antibacterial indications. Today, pharmaceutical companies are able to screen more than a million compounds against a specific target in biochemical or cellular test systems based on 1536 well plates in less than a week. However, a look at the number and nature of novel antibiotics in clinical development (i.e., those that could reach the market by 2010) appears to contradict the expectation of rapid antibiotic discovery and development for the near future

509

510

23 The Search for New Antibiotics

(Table 23.1). Firstly, if we look just at the number of compounds, only a very limited number of novel antibacterials can be expected to become clinically available in that time span. This calculation takes into account the historical probabilities of success during clinical development, which are much higher in anti-infective development than in all other indications, but nevertheless only of the order of 30% to reach the market after first dosing to humans in phase I [1, 16]. Even more revealing appears the fact that most, if not all, of those compounds are derived from existing classes (glycopeptides, quinolones, b-lactams, macrolides), and even the peptide deformylase inhibitors such as LBM415 stem from a well-known natural compound, actinonine [17], although they have been sometimes called the first compound class derived from genomics/target-driven approaches. On the other hand, the compounds listed in Table 23.1 also tell us that much progress can still be attained within existing classes [18, 19]. For example, BAL 9141, now also known as ceftobiprole, represents the most progressed of a series of novel b-lactams which, for the first time, show promising activity against MRSA [20]. Similarly, novel molecules derived from the tetracycline class such as tigecycline and the aminomethylcycline BAY 73-7388 reveal excellent broad-spectrum activity even against gram-positive pathogens which have developed resistance to tetracyclines via various pump and ribosomal protection mechanisms, including MRSA, VRE, DRSP, and many more [21–24]. Also, for some of the quinolones, e.g., DX-619, some remarkable activity against MRSA and quinolone-resistant grampositive organisms in general has been reported [25, 26]. Finally, glycopeptides with enhanced killing activity and long-lasting human kinetics could be mentioned, which might translate into clinical advantages [20, 27]. It remains to be seen how successful those novel antibiotics derived from already marketed classes Table 23.1 Overview of the most important novel antibiotics in

clinical development. Product

ClassMain

Segment

Status

Garenoxacin

Quinolone

Community

Phase III

DX-619

Quinolone

Community

Phase I

Doripenem

b-Lactam

Hospital

Phase III

CS-023

bLactam

Hospital

Phase II

Ceftobiprole

b-Lactam

Hospital

Phase II

Dalbavancin

Glycopeptide

Hospital

Phase III

Telavancin

Glycopeptide

Hospital

Phase II

Tigecycline

Glycylcycline

Hospital

Registered

Iclaprim

Trimethoprim

Hospital

Phase II

LBM 415

PDF inhibitor

Community

Phase I

23.2 Where Will the New Antibiotics Come From?

will be in the hospital and in the market. However, such compounds clearly demonstrate that almost all pharmaceutical companies, the big ones as well the smaller, more biotech-like ones, are still willing and able to bring novel derivatives into development if they fulfill a medical need, especially with respect to overcoming resistance traits of clinical importance. In fact, at least some such novel compounds, even though derived from existing classes, deserve to be described as novel classes, as they have overcome several important and previously class-inherent disadvantages. Nevertheless, despite this (still insufficient) progress, it is generally noted with some disappointment that the large investment put into modern target- and genomic-based discoveries has apparently failed to deliver the urgently needed novel antibiotics. This leads us to the next question: What direction will the future search for novel antibiotics take? 23.2.3 Future Directions

As has already been mentioned in Section 23.2.2, only a very few new antibiotics will reach the clinics by 2010. This is because it takes about 10 years to bring a new drug from early research through development onto the market. In addition, the costs of that endeavor are steadily rising (presently more than US$ 1 billion [1]), and it is becoming increasingly apparent that society is willing to direct large amounts of money into other therapy areas, but is very reluctant to accept highpriced novel antibiotics. For instance, the best selling drug of today, the lipid-lowering agent Lipitor (atorvastatin calcium), has sales of more than US$ 10 billion annually, whereas the sales of linezolide, the novel antibiotic marketed by the same company, is presently in the order of US$ 250 million. Because of this dilemma, several organizations are active in trying to reverse the trend that many pharmaceutical companies (with several notable exceptions) have been reducing or abandoning their discovery efforts [3]. A good example is the initiative of the Infectious Diseases Society of America (IDSA) described in their brochure “Bad Bugs, No Drugs – As Antibiotic Discovery Stagnates, a Public Health Crisis Brews” [28]. They ask for commission to prioritize antimicrobial discovery, supplemental intellectual property protections, statutory incentives for antibacterial research and development, accelerated review processes, and other regulatory support, as well as increased funding for antibacterial research and development in all sectors. While such measures are crucial for future social and public health preferences, we will concentrate in the following on the more scientific directions of future antimicrobial drug research, provided that funding and support of research and development by industrial, academic, public health authorities, and society as a whole continue. In general, there is good reason to believe that future antibiotic research will be successful [3, 29]: 1. Never before have we had such a deep understanding of bacterial physiology and the pathogenesis of disease at the molecular and genomic level. Current genomics and functional

511

512

23 The Search for New Antibiotics

genomics technologies allow unprecedented accuracy in target selection, target validation, and mode of action determination, and are supported by intelligent screening procedures and highly predictive animal models. 2. These newly available technologies have, indeed, already led to a large series of novel preclinical lead structures [20, 30– 41], which are devoid of cross-resistance to other classes due to their unprecedented targets, thereby illustrating the potential for the discovery of novel antibiotic classes. 3. The knowledge base of why previous approaches failed has dramatically increased and is taken into account. This includes experience relating to physicochemical properties, toxicological hurdles, kinetic peculiarities, drug interactions, microbial spectrum problems of certain targets, spontaneous resistance frequencies associated with certain targets, and so forth. 4. The large number of underexploited natural compound classes with drug-like properties are more and more used as a starting point for novel antibiotics. The progress achieved in total synthesis capabilities, biocombinatorial technologies, and enzymatic structure manipulation greatly facilitates natural product derivatizations. 5. Ultrarapid diagnostic procedures for bacterial infections appear to be becoming feasible in the near future, which may make accurate species diagnosis as well as resistance profiling fast enough to allow the use of extremely narrowspectrum drugs. Such antibiotics are by their very nature much easier to discover, but would be useless where broadspectrum coverage is needed, since it is well known that the first appropriate drug choice is crucial for clinical success [42, 43]. 6. Alternative approaches, while still in their early days, could in the future add to our anti-infective armory and may become applicable in conjunction with classical antibiotics. This includes phage therapy, virulence-based and pathogenicity-based approaches, the use of bacteriolytic enzymes, and immunologically active therapies. In the next two sections, Sections 23.3 and 23.4, we will discuss in more detail the current status of research with regard to two of these aspects, namely the contributions of genomics to antibacterial drug discovery, and the potential role of alternative, nonclassical approaches.

23.3 Contributions of Genomic Technologies to Antibacterial Research

23.3 Contributions of Genomic Technologies to Antibacterial Research

The availability of complete genome sequences provided novel strategies for antibacterial drug discovery in pharmaceutical companies. The antibiotic classes currently in clinical application inhibit only a limited number of essential cellular functions (the ribosome, RNA polymerase, DNA gyrase, late stages of cell wall synthesis, and folic acid metabolism) [44, 45]. Agents inhibiting essential functions not yet addressed by clinically applied antibiotics would circumvent preexisting resistances. Thus, the pharmaceutical industry started to exploit genomic information on bacteria as early as possible, driven by the expectation of identifying numerous novel target structures (mainly proteins) which could be used in screening for novel inhibitory compounds. 23.3.1 Target Identification and Validation

Comparative genome analysis allows identification of the proteins that are conserved across the medically important pathogens. However, broad conservation of genes does not necessarily mean that each protein carries out an essential function for bacterial growth. Genetic analyses in different species have revealed that the same gene may encode an essential function in one organism but not in another. Such a function can be made dispensable by the presence of biochemical bypasses or additional analogous enzymes, which are structurally unrelated but catalyze the same reaction. Genetic diversity needs to be considered in target evaluation even for different strains of the same species. Differences in S. pneumoniae strains relating to the essentiality of the broadly conserved methionyl-tRNA synthetase were detected by researchers of GlaxoSmithKline [46]. They discovered and optimized an antibacterial compound class active against this target. However, 40% of the clinical isolates of S. pneumoniae were resistant against this compound class, since these isolates harbor an additional insensitive methionyl-tRNA synthetase. This finding demonstrates that we have not reached the end of the genomic sequencing era. An essential prerequisite for the study of species-immanent variations is the continuation of sequencing numerous isolates from important pathogens. A gene is regarded as being essential if the bacterium cannot tolerate its genetic inactivation. For this reason, genome-wide gene inactivation studies represent the starting point for target-based drug discovery. In many studies, genes are disrupted by insertions of transposons. Such mobile genetic elements are randomly inserted into genomes mediated via intracellularly expressed transposases or via delivery of transposon–transposase complexes (“transposomes”) to the cell [47– 50]. Alternatively, such insertions are achieved by in vitro transposon insertion into purified chromosomes [51] or defined chromosomal areas [52, 53], followed by cellular transformation and genomic recombination. Genetic footprinting using diverse hybridization and polymerase chain reaction (PCR) techniques

513

514

23 The Search for New Antibiotics

enables mapping of transposon insertion sites in the genomes. Only in vitro transposition into defined chromosomal fragments, which are usually generated by PCR, reduces this mapping effort [52, 53]. Other genome-wide gene inactivation studies are either based on cloning a portion of a targeted gene into a plasmid suicide vector, which is integrated into the genome by single recombination [54, 55], or marker sequences are inserted into PCR-amplified sequences using crossover PCR techniques before integration into the bacterial genome by double recombination [35]. The most precise way of gene inactivation is base pair position-specific gene deletion without leaving long marker sequences (such as resistance cassettes) in the genome. Methods for this, generally based on PCR techniques, are more elaborate and are mainly reported from the best studied model bacterium, E. coli [56, 57]. In any case, the fact that a gene cannot be inactivated is not final proof of its essentiality; there may be experimental reasons why gene insertion or deletion is impaired. Only conditional mutants, such as temperature-sensitive (ts) mutations and controlled gene expression systems, allow the essential role of a gene to be demonstrated. Today, genes of interest are generally put under the control of regulable promoter systems. In most cases inducible promoters are used, regulated by arabinose, rhamnose, tetracycline, or lactose (IPTG) in gram-negative species, and by tetracycline, lactose (IPTG), xylose, fucose, or acetamide in gram-positives including mycobacteria (for a review see Ref. [15]). Instead of directly controlling the gene of interest, gene-specific antisense DNA can be cloned together with inducible promoters enabling genome-wide approaches of conditional gene expression. Forsyth et al. and Ji et al. applied the naturally occurring RNA antisense principle for gene silencing by conditional expression of random genomic fragments and then screening for fragments whose expression blocks growth of S. aureus [58, 59]. Recently, a tetracycline-dependent repression system has been developed [60]. Unfortunately, an example of the effect of repression of essential bacterial genes has not yet been published. The genome-wide conditional and nonconditional gene inactivation studies make it possible to estimate how many genes are essential for growth in the different bacterial species (Table 23.2). The number of genes encoding putative broad-spectrum targets is below 200 for major gram-positive/gramnegative or exclusively gram-positive pathogenic bacterial species (Table 23.3). The tests are generally performed in nutrient-rich medium, since one assumes that most of the essential genes identified in such a medium are also required for growth in the host. Only a few genes shown to be essential in vitro have already been checked for their importance for bacterial growth in animal models, such as in the study performed by Ji et al., who used inducible antisense constructs [59]. Whatever growth condition for conditional mutants is chosen (in vitro or in vivo), one has to thoroughly check how the mutant has been generated and whether any polar or other side effect can be excluded that might affect another essential gene besides the one of interest.

23.3 Contributions of Genomic Technologies to Antibacterial Research Table 23.2 Results of genome-wide gene inactivation studies.

Organism

Gene inactivation method

Total number Number of of genes (potentially) essential genes[a]

Bacillus subtilis

Plasmid-insertion mutagenesis; conditional mutants; estimations derived from literature study [102]

4101

271

Escherichia coli

Transposon mutagenesis [49]

4279

620

Haemophilus influenzae

Transposon mutagenesis [52]

1709

256

Helicobacter pylori

Transposon mutagenesis [51]

1552

344

Mycoplasma genitalium

Transposon mutagenesis [101]

484

256–350

Staphylococcus aureus

Antisense RNA expression [58, 59]

2595

150–658

Staphylococcus aureus (strain Newmann)

Transposon mutagenesis [50]

2700

450–550

Streptococcus pneumoniae Plasmid-insertion mutagenesis [55] 2043

113 out of 347 studied genes

a (Potentially) essential genes are defined as genes which could not be inactivated. Validation of essentiality using conditional expression systems will reduce the number of essential genes. Essentiality of genes has been studied in vitro in complex medium. Genes validated this way are considered to be probably indispensable in vivo, too. The best validated study has been performed in B subtilis, so that in this case the number of essential genes seems to be realistic.

Another way to identify novel targets is to look at the molecular mechanisms which bacteriophages apply in order to arrest critical cellular processes. By sequencing 26 S. aureus phages Liu et al. identified 31 novel polypeptide families that inhibited growth upon expression in S. aureus [61]. For one of these polypeptides, they found DnaI as the target by affinity chromatography. DnaI is an essential protein, which is a helicase loader required for primosome assembly during the initiation of DNA replication. Screening for small molecule inhibitors of the phage protein–DnaI interaction using a fluorescence polarization assay even enabled them to identify antibiotic compounds acting via the helicase loader. A phage genomics platform of this kind could be expanded to other species and could provide valuable information as well as screening tools for novel or unexplored targets.

515

516

23 The Search for New Antibiotics Table 23.3 Number of potential targets for discovery of

gram-positive/negative and gram-positive-only problem solver (subdivided into enzyme classes) Gram-positive/ gram-negative Gram-positive only Conserved essential proteins[a]

167

183

Essential functional systems[b]

112

125

79

88

4

5

28

32

Kinases

5

7

Nucleotidyl transferases

6

8

Hydrolases

9

10

Proteases/peptidases

4

4

Lyases

6

7

Isomerases

4

4

28

30

Enzymes Oxidoreductases Transferases

Ligases (incl. aminoacyl-tRNA synthetases)

a Genes (i.e., so-called orthologous genes) are counted which occur in E. coli, P. aeruginosa, S. aureus, S. pneumoniae, S. pyogenes, and E. faecalis and which are described as being probably essential in at least two of the mentioned species. Where data for only one strain were available, the essentiality test results from B. subtilis were also considered. The species spectrum of a grampositive/gram-negative problem solver drug is mainly represented by the bacteria mentioned above plus E. faecium, of which no complete genome data have been published yet. The species E. coli and P. aeruginosa do not need to be considered when focusing on a gram-positive-only spectrum. It is not yet known whether all of these genes are really essential in all of the listed organisms. Thus, the number of target structures will decrease. One example is the broadly conserved and essential methionyltRNA synthetase which is counted in this table (enzyme class: ligases), but which is nonessential in 40% of clinical isolates of S. pneumoniae (see text). Genome comparison for counting orthologous genes and enzyme classification were performed using the genome analysis software-database system Phylosopher (Genedata, Basel, Switzerland). b For definition of functional systems, complexes of two or more proteins carrying out one enzymatic function were counted as one entity. For instance, the ribosomal subunits were summarized as one functional system, or the four bacterial acetyl-CoA carboxylase subunits represent one enzymatic function. The definitions were made according to common knowledge deduced from the literature.

23.3 Contributions of Genomic Technologies to Antibacterial Research

23.3.2 Target Prioritization

Although proteins are conserved in relevant pathogens and are essential for bacterial growth, they might not be equally suitable as antibacterial targets. For instance, significant dissimilarity of targets to homologous human proteins or, even better, nonexistence of homologues in humans diminishes the danger of target-based toxicity of specific inhibitors in the host. Such a bacterial selectivity criterion, which can easily be evaluated with bioinformatics tools, is not enough by a long way for prioritization of proteins according to their suitability as targets. Considerable experimental effort is necessary to clarify target-related physiological aspects. Conditional mutants represent valuable tools with which to address the relevant questions. Firstly, they are instrumental in measuring the degree of target downregulation required for reaching bacteriostatic or bacteriolytic effects. However, no systematic study has been performed to date. Secondly, conditional mutants might aid in clarifying whether by-pass mutations can readily suppress the defect in an essential gene, which makes this target less interesting for drug screening. For instance, in some organisms such as S. aureus the essential role of the peptide deformyase PDF is suppressed by mutation of the formyltransferase Fmt, which makes peptide deformylation superfluous [62]. In other organisms such as S. pneumoniae the Fmt proved to be essential, impairing a suppressor mutation in Fmt [63]. However, cross-species genome-wide target prioritization is still in its infancy, since for many targets information about the cellular consequences of inhibiting their function in different species is insufficient, not to mention that some targets themselves are not yet functionally characterized. The number of under-explored targets increases especially when focusing on functions conserved in narrowed spectra of bacteria. Growth studies with various supplements, cytological evaluation, metabolic labeling experiments, and transcriptome and proteome analyses with conditional mutants under semipermissive conditions might aid in the functional characterization of the respective genes. 23.3.3 Genetic Tools for Drug Screening and Mode-of-Action Determination

The genomics era has enabled the generation of numerous genetically modified bacteria which not only serve as valuable tools by which to functionally evaluate the pool of targets which might be addressed by novel antibiotics, but also offer novel cellular mechanism-based screening opportunities to complement the protein-based screening approaches in drug discovery. They have also become indispensable cellular tools for verifying the target-mediated antibiotic activities of screening hits and for mode-of-action characterization of unexplored agents that come from traditional whole-cell screening for antibacterial activities. The applications of genetically modified bacteria for antibiotic drug discovery are summarized below.

517

518

23 The Search for New Antibiotics

1. Resistance determination. A common way of defining the mode of action of antimicrobials is the generation and characterization of resistant mutants, since point mutations in a target gene can confer resistance. However, traditional methods of mutation mapping after selection of random mutants are too time-consuming for characterizing compound collections. Belanger et al. developed a rapid PCR-based method to map resistance genes in the naturally competent bacterium S. pneumoniae [64]. They generated an ordered library of overlapping amplicons (4 kbp each) carrying random mutations introduced by error-prone PCR. Some of the amplicons contain mutations in drug-resistance genes including the drug target. Pools of defined amplicons are used to transform S. pneumoniae in order to introduce mutations into defined chromosomal regions. In this way, individual amplicons that confer high frequencies of resistance can easily be identified. Using this method, not only possible target mutations, but also other resistance mechanisms such as efflux, altered gene regulation, and bypass mutations may be found. However, target-related resistance mutations against compounds that work on more than one target may not be identifiable, which could also be true for gene expression assay systems measuring hyper- or hyposusceptibility to antimicrobial compounds as described below. 2. Underexpression systems. Conditional mutants with reduced target gene expression are generally more sensitive to distinct target-specific antibiotics. Comparing the relative growth inhibition of such strains to the wild type provides a simple screen for identifying the target specificity of antimicrobials, such as published by DeVito et al., who used arabinose-regulable expression systems [65], and Forsyth et al. using antisense RNA [58]. Instead of using individual mutant/wild-type strain pairs, conditional mutants with regulable antisense RNA constructs can also be pooled to concomitantly identify different drug targets of novel antimicrobials. A proof-of-concept study revealed that within a pool of 78 diverse antisense-RNA-expressing strains, the ones with increased susceptibility to known antibiotics could be selectively detected via DNA dot blot or microarray hybridization [40]. Such sensitization assays on a miniaturized highthroughput scale remain challenging, since the degree of increased susceptibility is dependent on the extent of target gene repression and growth phase. These parameters might vary for different target genes. In addition, certain types of inhibition might not be detected by such assays, e.g., forma-

23.3 Contributions of Genomic Technologies to Antibacterial Research

tion of toxic enzyme compound complexes (such as in case of the quinolones targeting topoisomerase II and IV). 3. Overexpression systems. In contrast to the hypersusceptibility tests described above, desensitization assays represent another tool for target identification. For instance, Huang et al. generated an expression library of 2300 unique open reading frames (ORFs) in S. aureus. Overexpression of these ORFs led to reduced antibiotic susceptibility and enabled identification of targets for antimicrobials and of resistance mechanisms such as the multidrug resistance efflux protein MdeA [66]. Such tests might be helpful in mechanistic analysis for novel antimicrobials. However, the low sensitivity of overexpression assays and high amounts of compound needed for testing do not make such tests suitable for screening approaches. 4. Promoter induction assays. Promoter induction assays represent an attractive complement for cell-based mechanistic characterization of antimicrobials. Some publications describe promoters coupled to reporter genes in order to measure their specific response to certain types of antibiotic stress [67–71]. Compendia of antibiotics-triggered expression profiles now make it possible to identify regulatory networks responsive to antibiotics and to select and functionally characterize the promoters most appropriate for such purposes. Coregulated genes and operons are generally controlled by the same transcriptional regulator, so it is likely that they share common regulatory elements. The combination of DNA sequence-motif detection algorithms and expressionbased correlation analyses allows systematic prediction of promoters controlling specific bacterial stress responses. A genome-wide, systematic approach based on a transcriptome data compendium has recently been reported. Fischer et al. described the identification and high-throughput screening application of FapR regulator-dependent promoters that selectively and strongly respond to inhibitors of fatty acid biosynthesis in B. subtilis [72]. Concomitantly, antibiotics-triggered expression profiles enabled identification of a panel of novel B. subtilis reporter strains indicative of various modes of action [73, 74]. Whole-cell-based reporter assays have limitations especially due to the limited concentration window in which compounds may be detected as inducing agents. In addition, some effort needs to be invested afterwards to identify the precise target of an inducing agent. Nevertheless, such assays are elegant tools for the detection of bioactive compounds that interfere with specific pathways. The application of diverse promoter induction assays represents one

519

520

23 The Search for New Antibiotics

outcome of expression profiling. However, holistic genomics technologies such as transcriptome- and proteome-based expression analysis offer many more opportunities in antibiotic drug discovery. 23.3.4 Genome-Wide Expression Profiling for Mode-of-Action Characterization

Transcriptome- and proteome-based expression profiling techniques are applied for the identification of modes of action of novel antibiotics, since expression profiles reflect the bacterial defense mechanisms against antibiotic stress. In order to deduce characteristic expression signatures for antibiotics, comparative global expression analyses need to be performed on the basis of a large collection of diverse expression profiles that represent different cellular stress states. Such socalled reference compendia of expression profiles are suitable for comparison with profiles of novel antibacterial agents with unknown function [75]. Bandow et al. investigated the effects of 30 antibacterial compounds on the B. subtilis proteome [17], while Freiberg et al. and Hutter et al. studied the genome-wide transcriptional response of B. subtilis to 14 and 37 antibiotics, respectively [73, 76]. As a proof of principle, the mechanism of action of the novel natural-product-derived pyrimidinone antibiotic BAY 50-2369 was correctly predicted as peptidyltransferase inhibition by proteome analysis. An additional example for successful modeof-action identification is provided by the novel class of phenylthiazolylurea sulfonamides, which originate from a lead optimization program on a screening hit from a biochemical target assay. Compendia of proteome and transcriptome profiles revealed that these compounds triggered the increased expression of the direct target phenylalanyl-tRNA synthetase as well as the stringent response [33, 76]. An essential prerequisite for these findings is that antibiotics with closely related modes of action are members of the reference data set. The identification of completely novel mechanisms may remain difficult, but can be facilitated by inclusion of profiles derived from mutants conditionally expressing antibacterial targets. The equivalence of conditional mutant profiles and antibiotics-triggered profiles was demonstrated with a B. subtilis mutant downregulating the peptide deformylase. The proteome profile of this mutant correlated well with the profile of the wild type treated with the deformylase inhibitor actinonin [17]. An example of prediction of a new antibiotic mode of action using conditional mutant profiles is provided by the recently published mechanistic characterization of the natural product antibiotic moiramide B. The study was based on a reference compendium of transcriptome profiles of antibiotic-stressed B. subtilis cells as well as of diverse strains with downregulated essential genes. Moiramide B was predicted to be the first antibiotic targeting the bacterial acetyl-CoA carboxylase (ACC), since it triggered a transcriptional response strongly resembling the one of a mutant downregulating subunits of the ACC [76]. To be sure, such predictions need to be verified by additional biochemical and genetic tests such as performed for moiramide B [72].

23.4 Alternative Approaches in Antibacterial Drug Discovery

23.3.5 Outlook for Genomic Technologies for Antibiotic Drug Discovery

The elucidation of sequence information for dozens to hundreds of microbial genomes represented the basis for the development of new genetic tests in various bacterial pathogens and the establishment of genomics technologies such as transcriptome and proteome analysis. Substantial technical progress in gene chip production, two-dimensional gel electrophoresis, and mass spectrometry greatly improved the reproducibility and throughput of transcriptome and proteome analysis. Equally importantly, software tools were elaborated and are constantly being improved to efficiently deal with the large data sets generated by genomics techniques. Although the first decade of genomics is over, the application of genomics techniques in antibiotic drug discovery is still in its early days. Time was needed to develop novel genetic tools and to generate reference compendia of genomewide expression profiles which open up new avenues for target validation, wholecell drug screening, and mode-of-action characterization of unexplored antimicrobials [75]. Some examples of successful application of genomics technologies, as described above, raise the hope that the contribution of genomics-based technologies to antibiotic drug discovery will significantly improve the success rate of antibiotic drug discovery in the near future. The genomics technologies in combination with the physiological knowledge about pathogens, which has been enormously increased in the genomics era, will make it possible to select from the pool of compounds with antibacterial activity those leads with a desired and promising biological profile more precisely than ever before. This will reduce the target-based attrition rates in later, more costly, stages of development.

23.4 Alternative Approaches in Antibacterial Drug Discovery

All aspects discussed so far have centered mostly around classical broad-spectrum antibiotics aimed at in vitro essential targets. One simple reason for this is the outstanding success of this paradigm in the past together with the fact that only a small fraction of such targets has so far been exploited by marketed antibiotics, and that distinct classes of antibiotics can even interfere with the same target. Thus, in principle, there should be ample room for repeating that success story. However, the apparently slow pace at which novel antibiotics have been discovered and successfully developed in recent years, and the desire to at least partially circumvent the resistance development inherent in the classical approaches, have led many scientists, especially in the academic world, to come up with alternative strategies. These include: 1. Nonantibiotic strategies aiming to reverse the resistance against existing antibiotic classes 2. Extremely narrow-spectrum drugs

521

522

23 The Search for New Antibiotics

3. Phage therapies and other nonantibiotic bacteriolytic approaches 4. Strategies to reduce virulence and/or influence pathogenesis 5. Immunology-based approaches We will review the general status of these alternative strategies in brief, 1) as almost all of them are still at an early experimental stage and, with the exception of resistance breaker combination strategies, no clinical proof of concept is available yet. 23.4.1 Targeting the Resistance Mechanism

Instead of searching for a novel antibiotic free of cross-resistance, an alternative strategy would be to combine an existing drug with a compound that overcomes or at least reduces the resistance against that particular antibiotic. This approach is especially appealing because it would restore the power of a previously valuable class, and because it has already been clinically validated by the very successful combination of b-lactams with b-lactamase inhibitors capable of protecting the active b-lactam from its destruction by the enzyme [80]. To pursue this approach, the mode of action and, especially, the molecular nature of the resistance mechanism(s) have to be carefully considered. There are examples where a combination is not even needed, but structural modification of the antibiotic itself is sufficient to protect it from certain resistance mechanisms (e.g., penicillinase-resistant b-lactams such as methicillin, or overcoming inducible MLSB resistance by ketolides, etc.). A similar strategy aims at overcoming clinical resistance by synthesizing a drug derivative with increased potency. Although in this case the resistant bacterial isolates still show higher MIC values than susceptible strains, the power of the more active congener may be sufficient to achieve clinical success. For instance, the potency of the experimental quinolone DX-619 [26] against MRSA might be sufficient also to treat such strains as are resistant to other members of the quinolone class. Also, novel b-lactams like ceftobiprole [20] display exceptionally high affinity for the additional penicillin binding protein PBP 2a. An interesting variation of this approach was provided by the detection of the so-called fem factors (factor essential for methicillin resistance) in staphylococci and many other gram-positive bacteria [81]. These genes, usually involved in certain steps of gram-positive cell wall synthesis [82], were originally detected by transposon mutagenesis experiments aiming to select for the loss of methicillin resistance in staphylococci without interfering with the resistance determinant (PBP 2a) itself. While some of these fem factors such as femX proved to be essential for bacterial growth and thus constitute a valid classical target per se [83], the most interesting finding was that inactivation of some fem factors such as femA led not only to complete loss of methicillin resistance, but even to hypersuscept1) With the exception of immunology-based strategies, because this field

is too broad to cover within the limits of this chapter. For recent reviews, see, for example, Refs. [77–79].

23.4 Alternative Approaches in Antibacterial Drug Discovery

ibility. In addition, it was demonstrated that such strains were also much more susceptible to a wide variety of unrelated drugs including antibiotics with a mode of action completely unrelated to cell-wall synthesis. The later finding that such genes were present not only in staphylococci but also in most other clinically relevant gram-positive bacteria rendered this approach even more attractive [84]. However, to date no useful inhibitor of fem factors has been described. Another strategy to be mentioned here is based on the fact that besides very compound-specific resistance mechanisms, bacteria have developed pump mechanisms as very general tools to export a wide variety of compounds including almost all antibiotic classes. In fact, overexpression of such pumps is an important resistance mechanism in many pathogens. Thus, in principle, blocking of such pumps would render bacteria much more susceptible towards antibiotics. An additional advantage of this approach is that spontaneous resistance frequencies have been shown to be much smaller if the relevant pumps are deleted. One of the major difficulties, however, is the large number of different pumps usually present in a single species. For example, at least five RND-type pumps can influence sensitivity to many antibiotics in P. aeruginosa [85]. In addition, a single bacterium very often harbors several mechanistically different pump systems, and pump variety is even wider among distinct species. In spite of these obstacles, however, the first examples of pump inhibitors with potential against gram-negative bacteria such as P. aeruginosa have been described [86], and it will be interesting to follow their fate in the future. In spite of the attractiveness of such approaches, it should be mentioned that, as a rule, a combination therapy of antibiotic plus resistance breaker compound would be required. Therefore, a pharmacokinetic fit between those two components is another prerequisite, and is sometimes difficult to achieve. In addition, additive toxicities of the two components is of concern. 23.4.2 Extremely Narrow-Spectrum Drugs

The often dramatic course of life-threatening infections, the lack of ultrarapid diagnostic methods to detect the causative agent(s), and, especially, the fact that the appropriateness of the first therapeutic agent chosen is decisive for the further course of the disease [42, 43] has led to a general need for very broad-spectrum antibiotics capable of attacking all important pathogens which might be causative of a certain infectious condition. An obvious drawback of such a broad-spectrum approach is the concomitant unavoidable attack on bacteria that are not responsible for the disease but are beneficial and important for our well-being. It should be kept in mind that most bacteria are, in fact, essential for us, and that we carry about a kilogram of such bacteria within our body (mostly in the gut). Furthermore, a broad-spectrum attack of this kind carries the risk of resistance development not only among the targeted pathogen(s) but also among many other species. However, at present, the market potential of narrow-spectrum drugs, which could be applied only after identification of the causative pathogen, is too small to

523

524

23 The Search for New Antibiotics

warrant the investment necessary to develop them. Even for the most prevalent pathogens the cost of development would be clearly higher than the potential sales, given the present expectations for the price of antibiotics. A good example of this dilemma is provided by the so called pyrimidinones [87]. This class of antibiotics stems from a natural compound called TAN 1057, which was too toxic to be clinically useful and had a very narrow spectrum of gram-positive activity. Following a total synthesis approach, this class was optimized and resulted in a toxicologically safe derivative with outstanding, extremely bactericidal activity against two of the most pressing resistance problems in today’s hospitals, MRSA and VRE. However, all attempts to expand its spectrum to include at least all other relevant gram-positive pathogens have failed so far, and the market potential of such a reserve drug was too low to warrant development. Thus, we cannot expect a narrow-spectrum approach to become reality until either ultrarapid diagnostic methods are introduced as routine tools in clinical practice or the pricing of such special drugs reflects the costs of their development. 23.4.3 Phage Therapies and other Bacteriolytic Approaches

The idea of using bacteriophages as weapons to combat infectious diseases is older than the application of low-molecular-weight antibiotics. In fact, there are a large number of mainly anecdotal reports of the successful use of bacteriophages especially in Eastern Europe and under military conditions [61]. With the advent of the modern antibiotic area, the interest in that approach faded, and, thus, we do not now have access to clinically valid, controlled data that meet today’s standards. The increasing threat of rising antibiotic resistance has led to a renewed interest in phage therapy itself as well as in therapeutic approaches derived from the knowledge of modern phage biology. In principle, the concept appears appealing, because phages can be viewed as a self-adapting therapy in that phages could replicate as long as target bacteria are present and would stop doing so if the infection is cured. However, several problems remain so far unresolved, such as the high selectivity of a phage for a single species or even single strains, the high rate of resistance mutations (about three magnitudes higher than conventional antibiotics), and concerns about safety and immunogenicity as well as cost, to mention just a few. A lively discussion about the pros and cons together with conflicting statements about the chances of resolving these open questions can be found in two recent detailed statements published in Nature Biotechnology in 2004 [88, 89]. So long as further experimental data are lacking, especially animal model results and eventually clinical trials, the concept must be regarded as speculative. Inspired by the phage concept, other groups have tried to use phage products involved in the killing of bacteria or even bacteriolytic enzymes not derived from phages as therapeutics per se, thereby avoiding the complexity of a whole phage therapeutic strategy. For example, lytic enzymes encoded by bacteriophages have been used to kill bacteria with excellent efficacy [90], and other bacteriolytic

23.4 Alternative Approaches in Antibacterial Drug Discovery

enzymes of different origin have also been investigated. One of the furthest progressed projects is provided by lysostaphin, an enzyme capable of hydrolyzing the interpeptide bridge of staphylococcal cell walls and thus leading to rapid lysis of the pathogen. A special formulation of this enzyme is presently undergoing clinical phase I/II trials for the nasal eradication of S. aureus by Biosynexus [91], and the enzyme has also shown some promising efficacy in staphylococcal biofilm eradication as well as in an endocarditis model, one of the most challenging animal infection models [92]. However, some questions remain to be solved before systemic application can be envisaged, which in the same way as for phage therapy itself are linked to safety and immunogenicity, narrow spectrum range, and resistance development. 23.4.4 Strategies for Reducing Virulence and/or Influencing Pathogenesis

The general progress in bacterial physiology combined with the rapidly increasing knowledge of the molecular biology of infection and bacterial genomics have given an enormous boost to our understanding of the bacterial infection process. As a consequence we have gained deeper insight into the factors that enable certain bacteria to induce disease in particular hosts, while others behave as innocent bystanders or are even beneficial for the host. The recent literature and the present volume are full of examples about the diverse molecular factors expressed by various pathogens which are required to establish an infection, to promote its further progress, and to cause later sequelae. Based on this almost revolutionary gain in knowledge, and fuelled by the threat of recent resistance developments, the concept of disarming a pathogen of its disease-inducing, disease promoting, and/or disease-worsening factors [93] might be a viable alternative to classical antibiotics, which aim at killing or at least suppressing the growth of the entire pathogen. Of course, like most of the alternative strategies discussed before, a drastic improvement in ultrarapid diagnostic measures is important, as bacterial evolution has invented a bewildering range of diverse virulence and pathogenicity factors, which usually differ strongly between different species and even between distinct strains of the same species. Thus, depending on the exact set of such factors, an E. coli strain may behave as a harmless commensal, an enteropathogen, or might be predisposed to cause uroseptic illness. Additionally, many predisposing host factors are usually equally important in determining whether and to what extent a certain disease is induced or how it will progress. The general immune status of the host, in particular, but also other predisposing factors such as organ abnormalities (e.g., in urinary tract infections) have a decisive role to play. It is also compelling that several important hospital pathogens in today’s highly developed countries would have been classified as almost nonpathogenic 50 years before and owe their present role to the immune deficiencies of an ever-aging population and to other treatments that drastically impair immune status, such as aggressive surgery, organ transplantation, and cancer therapy. The strategy of targeting virulence and/or pathogenicity factors carries the additional difficulty of

525

526

23 The Search for New Antibiotics

establishing simple in vitro susceptibility tests, comparable to a standard MIC test for classical antibiotics, in order to predict the potential success or failure of the therapy. Thus, because a clinical proof of concept still needs to be demonstrated, the approach has been the subject of both great enthusiasm and strong criticism [94–97]. One often-discussed aspect of targeting virulence and/or pathogenicity factors is the presumed low selection pressure for development of resistance. It should be noted, however, that the experimental basis of this assumption is poor at present, and that, in general terms, mutation frequencies in genes coding for nonessential targets should be much higher than in essential genes, because mutations that interfere with the function of the gene product are also tolerated. In spite of all these open questions, it is important to follow up this approach and to arrive at experimentally proven conclusions to decide whether and under what circumstances the new treatment paradigm would be useful. It goes without saying that in other important areas of medically oriented microbiological research, such as prophylactic and diagnostic approaches, the value of this strategy has already become obvious. An interesting example of the kind of investigations needed to come closer to an answer to these open questions is provided by a recent industrially sponsored study to exploit the type III protein secretion systems (TTPS) of gram-negative bacteria [98]. TTPS was selected as a virulence target because (a) these systems are present and structurally conserved among many clinically relevant gram-negative species including the special problem pathogen P. aeruginosa, (b) they are not present in eukaryotes, and (c) they are expected to be essential for virulence under in vivo conditions, as they translocate a variety of bacterial effector proteins which interfere with eukaryotic signal transduction into host cells. Potential inhibitors of TTPS were identified in a whole-cell highthroughput screen measuring the secretion of a reporter protein into the medium [99] and were further optimized by standard medicinal chemistry for specificity of interference with TTPS at low micromolar concentrations as well as low general cytotoxicity. A series of substituted azoles and dipeptides were reported to be especially active against TTPS from P. aeruginosa and Salmonella. As expected, in vitro antibacterial activity of the compounds was minimal or absent, but in an in vivo animal model of Pseudomonas sp. murine lung infection one selected compound showed some initial activity. While the compound alone did not influence the course of the disease under the experimental conditions chosen, a combination of the compound with suboptimal doses of ciprofloxacin resulted in slightly better protection of the animals from death than the same (suboptimal) doses of ciprofloxacin alone [100]. However, it must be mentioned that only the time of death was retarded; the death rate (100%) was not reduced. More such experiments are clearly needed to define the potential therapeutic or prophylactic value of such approaches.

References

References 1 DiMasi, J.A., R.W. Hansen, and H.G.

Grabowski, The price of innovation: new estimates of drug development costs. J Health Econ, 2003. 22(2):151–85. 2 Wenzel, R.P., The antibiotic pipeline – challenges, costs, and values. N Engl J Med, 2004. 351(6):523–526. 3 Thomson, C.J., et al., Antibacterial research and development in the 21(st) Century – an industry perspective of the challenges. Curr Opin Microbiol, 2004. 7(5):445–450. 4 Walsh, F.M. and S.G. Amyes, Microbiology and drug resistance mechanisms of fully resistant pathogens. Curr Opin Microbiol, 2004. 7(5):439–444. 5 Livermore, D.M., Bacterial resistance: origins, epidemiology, and impact. Clin Infect Dis, 2003. 36(Suppl 1):S11–S23. 6 Wright, G.D., Mechanisms of resistance to antibiotics. Curr Opin Chem Biol, 2003. 7(5):563–569. 7 Graefe, U., Biochemie der Antibiotika. 1992, Heidelberg: Spektrum Akademischer Verlag. 8 Abraham, E.P. and E. Chain, An Enzyme from Bacteria Able to Destroy Penicillin. Nature, 1940. 3713:837. 9 Enright, M.C., The evolution of a resistant pathogen – the case of MRSA. Curr Opin Pharmacol, 2003. 3:1–6. 10 Berger-Bchi, B., Resistance mechanisms of gram-positive bacteria. Int J Med Microbiol, 2002. 292(1):27–35. 11 Fleischmann, R.D., et al., Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 1995. 269(5223):496–512. 12 Bandow, J.E., et al., Proteomic approach to understanding antibiotic action. Antimicrob Agents Chemother, 2003. 47(3):948–955. 13 Brotz-Oesterhelt, H., J.E. Bandow, and H. Labischinski, Bacterial proteomics and its role in antibacterial drug discovery. Mass Spectrom Rev. 2005, 24(4):549– 565. 14 Freiberg, C., H. Brotz-Oesterhelt, and H. Labischinski, The impact of transcriptome and proteome analyses on antibiotic drug discovery. Curr Opin Microbiol, 2004. 7(5):451–459.

15 Miesel, L., J. Greene, and T.A. Black,

16

17

18

19

20

21

22

23

24

25

Genetic strategies for antibacterial drug discovery. Nat Rev Genet, 2003. 4(6):442– 456. DiMasi, J.A., Risks in new drug development: Approval success rates for investigational drugs. Clin Pharmacol Therap, 2001. 69:297–307. Bandow, J.E., et al., The role of peptide deformylase in protein biosynthesis: A proteomic study. Proteomics, 2003. 3(3):299– 306. Bush, K., M. Macielag, and M. WeidnerWells, Taking inventory: antibacterial agents currently at or beyond phase 1. Curr Opin Microbiol, 2004. 7(5):466– 476. Bush, K., Antibacterial drug discovery in the 21st century. Clin Microbiol Infect, 2004. 10 Suppl 4:10–17. Abbanat, D., M. Macielag, and K. Bush, Novel antibacterial agents for the treatment of serious gram-positive infections. Expert Opin Investig Drugs, 2003. 12(3):379– 399. Zhanel, G.G., et al., The glycylcyclines: a comparative review with the tetracyclines. Drugs, 2004. 64(1):63–88. Macone, A., Donatelli, J., Dumont, T., Weir, S., Levy, S. B., Tanaka, K., Abstract P926: Potent activity of BAY 73-7388, a novel aminomethycycline, against susceptible and resistant gram-positive and gramnegative organisms. Clin Microbiol Infect, 2004. 10(3):243. Broetz-Oesterhelt, H., Endermann, R., Ladel, C. H., Labischinski, H., Abstract P930: Superior efficacy of BAY 73-7388, a novel aminomethylcycline, compared with linezolid and vancomycin in murine sepsis caused by susceptible or multiresistant staphylococci. Clin Microbiol Infect, 2004. 10(3):244. Postier, R.G., et al., Results of a multicenter, randomized, open-label efficacy and safety study of two doses of tigecycline for complicated skin and skin-structure infections in hospitalized patients. Clin Ther, 2004. 26(5):704–714. Bozdogan, B., Appelbaum, C., Abstract F-1940: Activity of DX-619, a new quinolone, against vancomycin non-susceptible

527

528

23 The Search for New Antibiotics staphylococci. 44th Interscience Conference on Antimicrobial Agents and Chemotherapy, Washington, DC, 2004, 2004:234. 26 Ishida, H., Fujikawa, K., Chiba, M., Tanaka, M., Otani, T., Sao, K., Abstract F1935: DX-619, a novel Des-F(6)-Quinolone-resistant MRSA. 44th Interscience Conference on Antimicrobial Agents and Chemotherapy, Washington, DC, 2004, 2004. 27 Van Bambeke, F., Glycopeptides in clinical development: pharmacological profile and clinical perspectives. Curr Opin Pharmacol, 2004. 4(5):471–478. 28 IDSA, Bad bugs, no drugs. Infectious Diseases Society of America, 2004. 29 Labischinski, H., New antibiotics. Int J Med Microbiol, 2001. 291(5):317–318. 30 Payne, D.J., The potential of bacterial fatty acid biosynthetic enzymes as a source of novel antibacterial agents. Drug News Perspect, 2004. 17(3):187–194. 31 Artsimovitch, I., et al., A new class of bacterial RNA polymerase inhibitor affects nucleotide addition. Science, 2003. 302(5645):650–654. 32 Broetz-Oesterhelt, H., et al., Specific and potent inhibition of NAD+-dependent DNA ligase by pyridochromanones. J Biol Chem, 2003. 278(41):39435–39442. 33 Beyer, D., et al., New class of bacterial phenylalanyl-tRNA synthetase inhibitors with high potency and broad-spectrum activity. Antimicrob Agents Chemother, 2004. 48(2):525–32. 34 Dandliker, P.J., et al., Novel antibacterial class. Antimicrob Agents Chemother, 2003. 47(12):3831–3839. 35 Choudhry, A.E., et al., Inhibitors of pantothenate kinase: novel antibiotics for staphylococcal infections. Antimicrob Agents Chemother, 2003. 47(6):2051–2055. 36 Chen, D., et al., Peptide deformylase inhibitors as antibacterial agents: identification of VRC3375, a proline-3-alkylsuccinyl hydroxamate derivative, by using an integrated combinatorial and medicinal chemistry approach. Antimicrob Agents Chemother, 2004. 48(1):250–261. 37 Gross, M., et al., Pharmacology of novel heteroaromatic polycycle antibacterials. Antimicrob Agents Chemother, 2003. 47(11):3448–3457.

38 Payne, D.J., et al., Discovery of a Novel

39

40

41

42

43

44

45

46

47

and Potent Class of FabI-Directed Antibacterial Agents. Antimicrob Agents Chemother, 2002. 46(10):3118–3124. Butler, M.M., et al., Low Frequencies of Resistance among Staphylococcus and Enterococcus Species to the Bactericidal DNA Polymerase Inhibitor N(3-Hydroxybutyl 6-(3¢-Ethyl-4¢-Methylanilino) Uracil. Antimicrob Agents Chemother, 2002. 46(12):3770–3775. Azoulay-Dupuis, E., J. Mohler, and J.P. Bedos, Efficacy of BB-83698, a novel peptide deformylase inhibitor, in a mouse model of pneumococcal pneumonia. Antimicrob Agents Chemother, 2004. 48(1):80–85. Ruzin, A., et al., Mechanism of action of the mannopeptimycins, a novel class of glycopeptide antibiotics active against vancomycin-resistant gram-positive bacteria. Antimicrob Agents Chemother, 2004. 48(3):728–738. Kang, C.-I., Kim, S.-H., Park, W. B., Lee, K.-D., Kim, H.-B., Kim, E.-C., Oh, M.-D., Choe K.-W, Bloodstream infections caused by antibiotic-resistant gram-negative bacilli: Risk factors for mortality and impact of inappropriate initial antimicrobial therapy on outcome. Antimicrob Agents Chemother, 2005. 49(2):760– 766. Kollef, M., Appropriate empirical antibacterial therapy for nosocomial infections: getting it right the first time. Drugs, 2003. 63(20):2157–2168. Chopra, I., L. Hesse, and A. O’Neill, Exploiting current understanding of antibiotic action for discovery of new drugs. J Appl Microbiol, 2002. 92 Suppl:4S– 15S. Walsh, C., Antibiotics – Actions, origins, resistance. 2003, Washington: ASM Press. Gentry, D.R., et al., Variable sensitivity to bacterial methionyl–tRNA synthetase inhibitors reveals subpopulations of Streptococcus pneumoniae with two distinct methionyl-tRNA synthetase genes. Antimicrob Agents Chemother, 2003. 47(6):1784– 1789. Hutchison, C.A., et al., Global transposon mutagenesis and a minimal mycoplasma

References genome. Science, 1999. 286(5447):2165– 2169. 48 Hare, R.S., et al., Genetic footprinting in bacteria. J Bacteriol, 2001. 183(5):1694– 1706. 49 Gerdes, S.Y., et al., Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J Bacteriol, 2003. 185(19):5673–5684. 50 Bae, T., et al., Staphylococcus aureus virulence genes identified by bursa aurealis mutagenesis and nematode killing. Proc Natl Acad Sci U S A, 2004. 101(33):12312–12317. 51 Salama, N.R., B. Shepherd, and S. Falkow, Global transposon mutagenesis and essential gene analysis of Helicobacter pylori. J Bacteriol, 2004. 186(23):7926– 7935. 52 Akerley, B.J., et al., A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae. Proc Natl Acad Sci U S A, 2002. 99(2):966–971. 53 Kang, Y., et al., Systematic mutagenesis of the Escherichia coli genome. J Bacteriol, 2004. 186(15):4921–4930. 54 Kobayashi, K., et al., Essential Bacillus subtilis genes. Proc Natl Acad Sci U S A, 2003. 100(8):4678–4683. 55 Thanassi, J.A., et al., Identification of 113 conserved essential genes using a highthroughput gene disruption system in Streptococcus pneumoniae. Nucleic Acids Res, 2002. 30(14):3152–3162. 56 Freiberg, C., et al., Identification of novel essential Escherichia coli genes conserved among pathogenic bacteria. J Mol Microbiol Biotechnol, 2001. 3(3):483–489. 57 Arigoni, F., et al., A genome-based approach for the identification of essential bacterial genes. Nat Biotechnol, 1998. 16(9):851–856. 58 Forsyth, R.A., et al., A genome-wide strategy for the identification of essential genes in Staphylococcus aureus. Mol Microbiol, 2002. 43(6):1387–400. 59 Ji, Y., et al., Identification of critical staphylococcal genes using conditional phenotypes generated by antisense RNA. Science, 2001. 293(5538):2266–2269. 60 Kamionka, A., et al., Two mutations in the tetracycline repressor change the indu-

61

62

63

64

65

66

67

68

69

70

71

cer anhydrotetracycline to a corepressor. Nucleic Acids Res, 2004. 32(2):842–847. Liu, J., et al., Antimicrobial drug discovery through bacteriophage genomics. Nat Biotechnol, 2004. 22(2):185–191. Margolis, P.S., et al., Peptide deformylase in Staphylococcus aureus: resistance to inhibition is mediated by mutations in the formyltransferase gene. Antimicrob Agents Chemother, 2000. 44(7):1825– 1831. Margolis, P., et al., Resistance of Streptococcus pneumoniae to deformylase inhibitors is due to mutations in defB. Antimicrob Agents Chemother, 2001. 45(9):2432–2435. Belanger, A.E., et al., PCR-based ordered genomic libraries: a new approach to drug target identification for Streptococcus pneumoniae. Antimicrob Agents Chemother, 2002. 46(8):2507–2512. DeVito, J.A., et al., An array of targetspecific screening strains for antibacterial discovery. Nat Biotechnol, 2002. 20(5):478–483. Huang, J., et al., Novel chromosomally encoded multidrug efflux transporter MdeA in Staphylococcus aureus. Antimicrob Agents Chemother, 2004. 48(3):909–917. Alksne, L.E., et al., Identification and analysis of bacterial protein secretion inhibitors utilizing a SecA-LacZ reporter fusion system. Antimicrob Agents Chemother, 2000. 44(6):1418–1427. Bianchi, A.A. and F. Baneyx, Stress responses as a tool To detect and characterize the mode of action of antibacterial agents. Appl Environ Microbiol, 1999. 65(11):5023–5027. Mascher, T., et al., Cell wall stress responses in Bacillus subtilis: the regulatory network of the bacitracin stimulon. Mol Microbiol, 2003. 50(5):1591–1604. Shapiro, E. and F. Baneyx, Stress-based identification and classification of antibacterial agents: second-generation Escherichia coli reporter strains and optimization of detection. Antimicrob Agents Chemother, 2002. 46(8):2490–2497. Sun, D., et al., A pathway-specific cell based screening system to detect bacterial cell wall inhibitors. J Antibiot (Tokyo), 2002. 55(3):279–287.

529

530

23 The Search for New Antibiotics 72 Fischer, H.P., et al., Identification of anti-

84 Rohrer, S. and B. Berger-Bchi, FemABX

biotic stress-inducible promoters: a systematic approach to novel pathway-specific reporter assays for antibacterial drug discovery. Genome Res, 2004. 14(1):90–98. 73 Hutter, B., et al., Panel of Bacillus subtilis reporter strains indicative of various modes of action. Antimicrob Agents Chemother, 2004. 48(7):2588–2594. 74 Mascher, T., et al., Antibiotic-inducible promoter regulated by the cell envelope stress-sensing two-component system LiaRS of Bacillus subtilis. Antimicrob Agents Chemother, 2004. 48(8):2888–2896. 75 Broetz–Oesterhelt, H., J.E. Bandow, and H. Labischinski, Bacterial proteomics and its role in antibacterial drug discovery. Mass Spectrom Rev. 2005. 24(4):549– 565. 76 Freiberg, C., H.P. Fischer, and N.A. Brunner, Discovering the mechanism of action of novel antibacterial agents through transcriptional profiling of conditional mutants. Antimicrob Agents Chemother, 2005. 49(2):749–759. 77 Ulevitch, R.J., Therapeutics targeting the innate immune system. Nat Rev Immunol, 2004. 4(7):512–520. 78 Aoki, N. and Z. Xing, Use of cytokines in infection. Expert Opin Emerg Drugs, 2004. 9(2):223–236. 79 Bayry, J., et al., Intravenous immunoglobulin for infectious diseases: back to the pre-antibiotic and passive prophylaxis era? Trends Pharmacol Sci, 2004. 25(6):306– 310. 80 Jacobi, G.A. and L.S. Munoz-Price, The new beta-lactamases. N Engl J Med, 2005. 352(4):380–391. 81 Berger-Bchi, B. and S. Rohrer, Factors influencing methicillin resistance in staphylococci. Arch Microbiol, 2002, 178, 165–171. 82 Labischinski, H., K. Ehlert, and B. Berger-Bchi, The targeting of factors necessary for expression of methicillin resistance in staphylococci. J Antimicrob Chemother, 1998. 41:581–584. 83 Rohrer, S., et al., The essential Staphylococcus aureus gene fmhB is involved in the first step of peptidoglycan pentaglycine interpeptide formation. Proc Natl Acad Sci U S A, 1999. 96:9351–9356.

peptidyl transferases: a link between branched-chain cell wall peptide formation and beta-lactam resistance in gram-positive cocci. Antimicrob Agents Chemother, 2003. 47(3):837–846. Lomovskaya, O., et al., Use of a genetic approach to evaluate the consequences of inhibition of efflux pumps in Pseudomonas aeruginosa. Antimicrob Agents Chemother, 1999. 43(6):1340–1346. Lomovskaya, O., et al., Identification and characterization of inhibitors of multidrug resistance efflux pumps in Pseudomonas aeruginosa: novel agents for combination therapy. Antimicrob Agents Chemother, 2001. 45(1):105–116. Brands, M., et al., Novel antibiotics for the treatment of gram-positive bacterial infections. J Med Chem, 2002. 45(19):4246–4253. Projan, S., Phage-inspired antibiotics? Nat Biotechnol, 2004. 22(2):167–168. Schoolnik, G.K., W.C. Summers, and J.D. Watson, Phage offer a real alternative. Nat Biotechnol, 2004. 22(5):505–506. Yoong P., S.R., Nelson D., Fischetti V.A., Identification of a broadly active phage lytic enzyme with lethal activity against antibiotic-resistant Enterococcus faecalis and Enterococcus faecium. J Bacteriol, 2004. 186(14):4808–4812. Kokai–Kun, J.F., et al., Lysostaphin cream eradicates Staphylococcus aureus nasal colonization in a cotton rat model. Antimicrob Agents Chemother, 2003. 47(5):1589–1597. Wu, J.A., et al., Lysostaphin disrupts Staphylococcus aureus and Staphylococcus epidermidis biofilms on artificial surfaces. Antimicrob Agents Chemother, 2003. 47(11):3407–3414. Hacker, J., Heesemann, J., Molecular Infection Biology: Interactions Between Microorganisms and cells. Wiley–Spektrum, Heidelberg, Berlin, 2002. Alksne, L.E. and S.J. Projan, Bacterial virulence as a target for antimicrobial chemotherapy. Curr Opin Biotechnol, 2000. 11(6):625–636. Alksne, L.E., Virulence as a target for antimicrobial chemotherapy. Expert Opin Investig Drugs, 2002. 11(8):1149–1159.

85

86

87

88 89

90

91

92

93

94

95

References 96 Hacker, J. and J.B. Kaper, Pathogenicity

97

98

99

100

101

islands and the evolution of microbes. Annu Rev Microbiol, 2000. 54:641–679. Lee, Y.M., F. Almqvist, and S.J. Hultgren, Targeting virulence for antimicrobial chemotherapy. Curr Opin Pharmacol, 2003. 3(5):513–519. Li, X., Guan, Q., Macielag, M., Murray, W., Fernanadez, J., Montenegro, D., Bush, K., and Goldschmidt, R., Abstract F-711: Synthesis and SAR of inhibitors of bacterial type III protein secretion. 2004. 44th Interscience Conference on Antimicrobial Agents and Chemotherapy, Washington, DC, 2004. Goldschmidt, R.M., Loeloff, M., Fernandez, J., Montenegro, D., Galan, J. E., Macielag, M., and Bush, K., Abstract F-712: Identification and characterization of inhibitors of bacterial type III protein secretion systems (TTPS) as potential antimicrobial agents. 2004. 44th Interscience Conference on Antimicrobial Agents and Chemotherapy, Washington, DC, 2004. Fernanadez, J., Abbanat, D., Bush, K., Hilliard, J., Guan, Q., Li, X., Macielag, M., and Goldschmidt, R. M., Abstract F-710: In vivo efficacy of the bacterial type III protein secretion systems (TTPS) inhibitors JNJ-10275798 and JNJ-10278385. 2004. 44th Interscience Conference on Antimicrobial Agents and Chemotherapy, Washington, DC, 2004. Hutchison, C. A., Peterson, S. N., Gill, S. R., Cline, R. T., White, O., Fraser, C. M., Smith, H. O., and Venter, J. C. Global transposon mutagen-

esis and a minimal mycoplasma genome. Science, 1999. 286:2165–2169. 102 Kobayashi, K., Ehrlich, S. D., Albertini, A., Amati, G., Andersen, K. K., Arnaud, M., Asai, K., Ashikaga, S., Aymerich, S., Bessieres, P., Boland, F., Brignell, S. C., Bron, S., Bunai, K., Chapuis, J., Christiansen, L. C., Danchin, A., Debarbouille, M., Dervyn, E., Deuerling, E., Devine, K., Devine, S. K., Dreesen, O., Errington, J., Fillinger, S., Foster, S. J., Fujita, Y., Galizzi, A., Gardan, R., Eschevins, C., Fukushima, T., Haga, K., Harwood, C. R., Hecker, M., Hosoya, D., Hullo, M. F., Kakeshita, H., Karamata, D., Kasahara, Y., Kawamura, F., Koga, K., Koski, P., Kuwana, R., Imamura, D., Ishimaru, M., Ishikawa, S., Ishio, I., Le Coq, D., Masson, A., Mauel, C., Meima, R., Mellado, R. P., Moir, A., Moriya, S., Nagakawa, E., Nanamiya, H., Nakai, S., Nygaard, P., Ogura, M., Ohanan, T., O’Reilly, M., O’Rourke, M., Pragai, Z., Pooley, H. M., Rapoport, G., Rawlins, J. P., Rivas, L. A., Rivolta, C., Sadaie, A., Sadaie, Y., Sarvas, M., Sato, T., Saxild, H. H., Scanlan, E., Schumann, W., Seegers, J., Sekiguchi, J., Sekowska, A., Seror, S. J., Simon, M., Stragier, P., Studer, R., Takamatsu, H., Tanaka, T., Takeuchi, M., Thomaides, H. B., Vagner, V., van Dijl, J. M., Watabe, K., Wipat, A., Yamamoto, H., Yamamoto, M., Yamamoto, Y., Yamane, K., Yata, K., Yoshida, K., Yoshikawa, H., Zuber, U., and Ogasawara, N. Essential Bacillus subtilis genes. Proc Natl Acad Sci U S A, 2003. 100:4678–4683.

531

533

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design Laura Serino, Mariagrazia Pizza, and Rino Rappuoli

24.1 Impact of Genomics on Vaccine Design

Despite advances in the treatment of infectious diseases, pathogenic microorganisms are the single most important threat to health worldwide. Approaches to vaccine development have made remarkable progress in the last 200 years, and vaccination has prevented illness and death for millions of individuals every year. However, there are many infectious diseases still waiting for efficacious formulations, and many emerging pathogens. For these reasons, novel vaccines together with new ways to discover and produce them are needed. Most of the vaccines currently available are based on killed or live-attenuated microorganisms, toxins detoxified by chemical treatment or site-directed mutagenesis, purified antigens, or polysaccharides or oligosaccharides conjugated to proteins. Knowledge of the pathogenesis of many microorganisms, the identification of the main virulence factors, and characterization of the immune response after infection have been fundamental to the design of second-generation vaccines mainly based on highly purified antigenic components [1]. The first important innovation in the vaccine field was the introduction of modern molecular biology and microbiology techniques. This approach generated two efficacious recombinant vaccines: the hepatitis B vaccine, which is based on a highly purified capsid protein [2], and the acellular vaccine against Bordetella pertussis, based on three highly purified proteins, including a genetically detoxified toxin [3, 4]. The conventional vaccinology approach requires the pathogen to be grown in laboratory conditions in order to produce individual components in a pure form and sufficient amounts to be tested for its ability to induce an immune response. There are many limitations to this approach: it is time-consuming, it is not applicable to noncultivable pathogens, and in many cases the antigens expressed in vivo during infections are not produced under laboratory conditions, or are variable in sequence. A second revolution in vaccine design started with the genomic era. The complete genome sequence of a bacterium can be obtained in a brief period of time

534

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design

using the “shotgun sequencing strategy.” By means of this technique, in 1995 the first complete genome sequence for a free-living organism (Haemophilus influenzae) was obtained by The Institute for Genomic Research (TIGR; http://www. tigr.org) [5]. In the last few years the number of available genomes has grown considerably (Fig. 24.1). It is now possible to determine the complete genome sequence of a pathogen in a short period of time (months) at low cost. To date, 175 bacterial genomes have been completed and published, and nearly 500 other microorganisms are being sequenced (GOLD Genomes OnLine database at http://www.genomesonline.org/). This panel of bacterial genomes already covers most of the pathogens impacting heavily on human health and therefore of interest for vaccine researchers. As genome sequences become available, it is now possible to compare related bacteria and pathogens against commensals of the same or related species and even bacteria with different or similar pathogenic profiles, identifying putatively disease-related genes (comparative genomics). Bioinformatics is essential to interpret the immense amount of information contained in whole genome sequences (genomic mining). A variety of software can be used to assign gene functions and predict key features such as topology, molecular weight, isoelectric point (pI), and solubility. Moreover, a putative function can be assigned to each open reading frame (ORF) on the basis of homology to known proteins. Sophisticated computer programs are also available to predict cellular localization of newly identified ORFs, so that it becomes possible to choose potentially surface-exposed proteins. One of the most interesting applications of the genome analysis of pathogenic bacteria is the screening of the inclusive set of proteins potentially encoded by a microorganism in search of potential vaccine candidates, regardless of their abundance or expression conditions. This new approach has been termed “reverse vaccinology” [6], indicating that, in contrast to conventional vaccinology, the starting point for vaccine design is the in silico analysis of the genome sequences and not the live bacterium (Fig. 24.2). Complementary to in silico antigen discovery approaches are strategies referred to as “functional genomics.” These approaches include the large-scale analysis of gene transcription, using DNA microarray technology; the whole set of proteins encoded by an organism (proteomics) using two-dimensional gel electrophoresis and mass spectrometry; and the comparative genome–proteome technologies. In this chapter we will describe how genomic information has been successfully used to identify novel potential vaccine candidates against various human pathogens, and illustrate the application of functional genomics to vaccine research.

Fig. 24.1 A representative list of available bacterial genomes, showing the rapid increase that has occurred in recent years. The data were obtained from different sources: the TIGR web site (www.tigr.org),

the Sanger web site (www.sanger.ac.uk), the NCBI web site (www.ncbi.nlm.nih.gov/ PMGifs/Genomes/micr.html), and the GOLD Genomes OnLine database (www.genomesonline.org).

"

24.1 Impact of Genomics on Vaccine Design

535

536

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design

Fig. 24.2 Application of reverse vaccinology to identify vaccine candidates, showing the different steps of this approach, starting from in silico analysis of the genome (a), to selection of potential vaccine candidates (b),

cloning and purification of the selected antigens (c), and analysis of the immune sera in vivo or in vitro for the evaluation of the best candidates to be considered for vaccine development (d).

24.2 MenB Vaccine Approach by Reverse Vaccinology

The first example in which genomic technology has been used for the identification of potential vaccine candidates is the vaccine against the human pathogen Neisseria meningitidis serogroup B (MenB), the major cause of sepsis and meningitis in children and young adults. In the last forty years, conventional vaccinology approaches failed to provide an effective and universal vaccine against MenB. Although for other meningococcal serogroups (A, C, Y, and W135) conjugate vaccines based on the capsular polysaccharide are available and those based on oligosaccharides are under development, in the case of serogroup B the capsular vaccine cannot be used, as its capsule contains a major component (a(2-8)-linked

24.2 MenB Vaccine Approach by Reverse Vaccinology

N-acetylneuraminic acid), which is also a common carbohydrate present on human tissues. The MenB capsule is therefore poorly immunogenic and may elicit autoantibody. To overcome this obstacle, the new approach named “reverse vaccinology” was applied to MenB [7]. The complete genome of the virulent strain MC58 was sequenced in collaboration with TIGR using the shotgun strategy. The MenB genome consists of 2 272 352 basepairs with an average G+C content of 53%. Eightythree percent of the genome codes for 2158 ORFs. Out of these, 1158 have a putative biological role assigned on the basis of their similarity with known proteins, whereas the remaining 1000 do not have a predicted function [8]. Based on the concept that surface-exposed antigens are more susceptible to antibody recognition and therefore are the most suitable candidates for a vaccine, the full genome was screened using bioinformatics tools in order to select ORFs coding for putative surface-exposed or secreted proteins. Within 18 months after the beginning of sequencing, more than 600 potential vaccine candidates had been predicted by computer analysis and classified as: secreted or outer membrane proteins (13%); lipoproteins (20%); periplasmic proteins (27%); inner membrane proteins (34%); and proteins with interesting homology (6%). All these ORFs were amplified by PCR and cloned in Escherichia coli, in order to express them as N-terminal glutathione-S-transferase (GST) or Cterminal histidine-tag fusion. Three hundred and fifty recombinant proteins were successfully expressed, purified, and used to immunize mice. The antisera obtained were tested on whole-cell bacteria using enzyme-linked immunosorbent assay (ELISA) and fluorescence-activated cell sorting (FACS) techniques to evaluate the surface localization of the antigens. In addition, the antisera were tested for bactericidal activity, a property known to correlate with protection in humans. Ninety-one novel surface-exposed antigens were identified, 29 of which were able to induce complement-mediated bactericidal antibody response, a strong indication of proteins capable of inducing protective immunity. One of the main problems to face in the design of a vaccine against MenB is the sequence variability of the antigens among different strains. For example, the most abundant antigen of MenB, PorA, is extremely variable and able to confer protection only against the homologous strain. In view of that, some of the vaccine candidates selected by the reverse vaccinology approach were analyzed for their sequence variability using a representative panel of strains. Each gene was amplified by PCR and sequenced. The sequences were subjected to multiple alignments to verify the level of homology among the different alleles [7, 9, 10]. The conserved antigens were tested for their ability to induce complement-mediated bacterial killing in a subset of strains representative of the global diversity of the N. meningitidis population, demonstrating that the antigens identified by in silico analysis are good candidates for the development of a vaccine against MenB. These promising vaccine candidates are currently under evaluation and have entered into development.

537

538

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design

24.3 Following the MenB Experience: Other Pathogens

MenB has been the first example where the genome has been used for vaccine discovery. Recently, several other groups, following the success of MenB, utilized the same approach of reverse vaccinology and functional genomics to identify vaccine candidates against other human pathogens. Grandi and coworkers used a similar strategy to discover vaccine candidates against group B streptococcus (GBS) [11]. This bacterium is a major cause of neonatal sepsis in Europe and North America. Early-onset infection is due to acquisition of the bacteria from colonized mothers during delivery. Infections in babies are associated with a lack of maternal immunoglobulin, whereas if a good antibody response can be induced in women, this can protect the babies from invasive infection. In recent years infections in adults have also been emerging and are mostly seen in the elderly or in patients with underlying diseases. There are nine recognized capsular serotypes that account for approx. 95% of clinical isolates. High titers of anticapsular maternal antibodies are protective, but the protection is serotype-specific. In addition, many surface-exposed proteins that are able to induce protective immunity are highly variable in different isolates. Following the successful vaccinology approach to MenB, in collaboration with TIGR we undertook to sequence the complete genome of a serotype V GBS strain to use the genomic information as a start from which to identify novel protein antigens for inclusion in a vaccine [11]. The computer-based selection of proteins likely to be secreted or associated with the bacterial surface identified 681 potential ORFs. All proteins predicted to contain five or more transmembrane domains were discarded from further analysis. Thus, of the remaining 473 GBS genes, 359 were successfully cloned and expressed. Each recombinant purified protein was used to immunize mice, and the sera obtained were used to verify correct localization of the antigens. In the case of GBS, an animal model to screen the antigens for their ability to induce immunological protection consisted in passive protection of newborn mice against a lethal dose of a virulent serotype III GBS strain. Of the 296 immune sera tested, 46 gave higher than 25% protection (taken as an arbitrary cutoff) of the mice compared to a preimmune control. The same antigens were also further evaluated in a second model of active maternal immunization. Female mice were first immunized with the recombinant proteins, then the resulting offspring were challenged with a lethal dose of GBS within the first 48 h of life. Using this model, ten recombinant antigens were demonstrated to induce 30–80% protection of the challenged pups. To enhance the level of protection, combinations of the best antigens were also tested; in five of them the protection rose to 95%. All these results are quite encouraging in showing that an efficacious vaccine for use in man may be developed based on the antigens identified in this genome-wide screening. Currently, similar approaches have been undertaken for other pathogens, such as group A streptococcus (GAS), Neisseria gonorrhoeae, Chlamydia pneumoniae [12], and Chlamydia trachomatis. Other pathogens that have been studied by other

24.4 Functional Genomics

groups using the reverse vaccinology approach include Bacillus anthracis [13], Streptococcus pneumoniae [14], Staphylococcus aureus [15, 16], Porphyromonas gingivalis [17], Edwardsiella tarda [18], and Mycobacterium tuberculosis [19].

24.4 Functional Genomics

Based on the availability of the entire genome sequences of an organism, new disciplines of molecular biology have emerged. Techniques such as in vivo expression technology, signature-tagged mutagenesis, DNA microarrays, and proteomics have the potential to accelerate the process of identifying protective protein antigens as subunit vaccine targets as well as identifying new virulence factors. Moreover, by providing functional correlation of genes and proteins with phenotypes such as the presence and mode of pathogenicity, they offer new perspectives for

Fig. 24.3 Identification of potential vaccine candidates using different functional genomic approaches, i. e. DNA microarray, proteomics and in vivo gene expression.

539

540

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design

the production of attenuated mutants that might be used as live vaccines, or delivery systems for heterologous antigens. The applications of some of these technologies in vaccine research are summarized in Fig. 24.3.

24.5 Gene Expression In Vivo: IVET and STM

Growing a pathogenic organism in vitro and studying its gene expression pattern may not tell us which genes are important for causing pathogenicity in a living organism. Genes involved in these processes can only be determined in vivo. A genetic system termed IVET (in vivo expression technology) was developed to identify bacterial genes that are induced when a pathogen infects its host [20]. A subset of these induced genes should include those that encode virulence factors – products specifically required for the infection process. The system is based on complementation of an attenuating auxotrophic mutation by gene fusion, and it is designed to be used in a wide variety of pathogenic organisms. However, alternative systems have been proposed based on reporter genes encoding resistance to antibiotics [21], or encoding the green fluorescent protein (GFP). The IVET system has several applications in the area of vaccine and antimicrobial drug development. The technique was designed for the identification of virulence factors and thus may lead to the discovery of new antigens useful as vaccine components. The IVET system facilitates the isolation of mutations in genes involved in virulence and therefore should aid in the construction of live-attenuated vaccines. In addition, the identification of promoters that are optimally expressed in animal tissues provides a means of establishing in vivo regulated expression of heterologous antigens in live vaccines, an area that has been previously problematic. Finally, this methodology may also be used to uncover many biosynthetic, catabolic, and regulatory genes that are required for growth of microbes in animal tissues. The elucidation of these gene products should provide new targets for antimicrobial drug development. Recently, a new version of the IVET method has been proposed, called RIVET (recombination-based in vivo expression technology), which allows the detection of genes that are transiently turned on during adaptation to a new environment [22]. The principle of in vivo induced antigen technology (IVIAT) is vice versa that of identifying those gene products targeted by the host immune system [23]. Although IVET and IVIAT have been both developed and applied to the identification of virulence genes independently of genomics, the genome sequence of the microorganism under study can facilitate their use, by assisting the design of gene libraries intended for the experimental screenings as well as the rapid identification of genes by short fragment sequencing. Another approach in vaccine design, which is also facilitated by genome sequencing, is signature-tagged mutagenesis (STM), developed by David Holden and coworkers [24]. A bacterial pathogen is subjected to random transposon-mediated mutagenesis to identify genes required for in vivo survival. Each mutant

24.6 Transcriptome Analysis and Comparative Genomics

incorporates a specific DNA sequence tag and can be recovered and recognized after infection of the animal. Mutants that fail to be recovered after the infection are likely to be attenuated and therefore altered in virulence or essential genes. The advantage of using this approach is that the technique allows for the identification of attenuated mutants that may be used as live vaccines. Moreover, proteins identified as being essential for infection or disease are likely to be good candidates for inclusion in subunit vaccines. STM has been successfully used to discover virulence genes from a variety of bacterial species including M. tuberculosis [25], S. aureus [26, 27], Salmonella typhimurium [24], Vibrio cholerae [28], Yersinia enterocolitica [29], Streptococcus pneumoniae [30], and N. meningitidis [31]. In the case of N. meningitidis Sun and coworkers [31], combining the use of STM technology with two available genome sequences, identified in an infant rat model 73 genes that are essential for bacteremia, many of them of unknown function. The majority were novel genes, and in particular 16 surface-exposed antigens are currently under investigation as potential vaccine candidates. Other transposon-based approaches for the identification of essential genes required for bacterial growth include genome analysis and mapping by in vitro transposition (GAMBIT) and transposon site hybridization (TraSH). The first approach uses high-density mutagenesis of restricted regions of the genome. Using this technique, Mekalanos and collaborators identified the complete set of genes required by H. influenzae for growth and viability in vitro [32]. Genes essential for growth in Mycoplasma were defined using the same approach [33, 34]. The combined use of transposon mutagenesis with microarray hybridization resulted in the development of TraSH, a method suitable for the identification of bacterial genes that are required for growth under specific conditions [35]. TraSH has been applied to identify conditionally essential genes of Mycobacterium bovis BCG, which represent promising targets for rational attenuation.

24.6 Transcriptome Analysis and Comparative Genomics

The complete set of transcripts of an organism, called the transcriptome, can be studied by using a very powerful tool, the DNA microarray (or microchips), a recently developed genomic technology [36–39]. DNA microarray technology is particularly attractive in that DNA chips carrying the entire bacterial genome can be easily prepared and can be used for several applications including gene expression profiling, genotyping, and DNA sequencing. The applications of DNA microarray technology in all these biological fields have been extensively described elsewhere in several excellent reviews [40–43]. In this section we describe some of the recent studies that provide strong proof that microarray technology allows examination of the dynamics of a whole biological system by simultaneously interrogating the expression of thousands of genes.

541

542

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design

There are several studies focused on host–pathogen interactions using DNA microarrays, and they include the interactions between intestinal epithelial cells and Salmonella [44], human promyelocytic cells and Listeria monocytogenes [45], bronchial epithelial cells and B. pertussis [46], epithelial cells and Pseudomonas aeruginosa [47], mouse gastric epithelial cells with and without exposure to Helicobacter pylori [48], HeLa cells infected with C. trachomatis [49], human macrophage response to infection with M. tuberculosis [50], and the global response of human intestinal epithelial cells to Shigella flexneri invasion [51]. However, these studies consider gene activation from the host perspective. Recently, DNA microarray technology has been applied to study the gene expression profile of N. meningitidis serogroup B during different stages of infection. Grifantini and coworkers [52] demonstrated how this technology could also be exploited for vaccine design. The authors found that bacterial adhesion to epithelial cells altered the expression of approximately 350 genes: 189 genes were upregulated and 151 genes were downregulated, while 7 were either up- or downregulated depending on the time point of infection. Moreover, of the 12 adhesion-induced surface-exposed antigens identified, five were able to induce bactericidal antibodies. These novel MenB antigens had not been identified by the reverse vaccinology approach. This study shows that DNA microarray technology is able to complement other genome mining methods, such as reverse vaccinology, and accelerate the identification and development of a vaccine. Using the same technology they have also been able to identify the iron-activated and iron-repressed genes of N. meningitidis B and define the Fur regulon of this human pathogen [53]. In an independent study, the transcriptional changes of N. meningitidis were investigated in a model system of three keys steps of meningococcal infection. RNA was isolated from meningococci incubated in human serum as well as adherent to human epithelial and endothelial cells. The authors discovered that a wide range of surface proteins are induced under in vivo conditions. These antigens could represent novel candidates for a protein-based vaccine for meningococcal diseases [54]. Understanding the mechanism of protection of a vaccine is important information for the development of new-generation vaccines. Recently, Falkow and collaborators used gene expression profiling and immunohistochemical analysis to elucidate the mechanism of protection of a whole-cell sonicate vaccine of Helicobacter felis in mice [55]. This approach, applied to other immunization strategies, will help us to better understand the mechanism of protection of several vaccine formulations. Virulence gene expression can also be monitored by growing the pathogens in the appropriate in vivo models (cell cultures and/or animals) and, after recovery of the bacteria for RNA preparation and labeling, the gene activity is analyzed and compared with the expression of the genes under in vitro conditions. Recently, some pioneer work showed how microarray expression profiling could be used to discover new drugs and their mode of action. Wilson and coworkers used a DNA microarray containing 97% of the ORFs of the M. tuberculosis genome

24.6 Transcriptome Analysis and Comparative Genomics

to examine changes in the gene expression in response to the antituberculous drug isoniazid. They reported that isoniazid treatment of mid-log phase bacterial cultures induced several genes that encode proteins physiologically relevant to the drug’s mode of action. Other genes were induced and probably mediate processes that are linked to the toxic consequences of the drug [56]. This study points up how gene expression analysis can contribute to the drug discovery process: exposure of a microorganism to a drug or compound of unknown mode of action should elicit an expression profile that incriminates the affected pathway and even the target in the pathway [40]. DNA microarray is also a powerful technology to investigate genome diversity and relatedness from a comparative genomics perspective [57]. The systematic comparison of genomic sequences from different microorganisms represents a central focus of contemporary genome analysis and provides a lot of new concepts in bacterial pathogenesis. The availability of the different genomes allows comparative analysis of related bacteria, pathogens versus commensals, and even of bacteria with similar pathogenic profiles that occupy different host niches [58, 59]. Nassif and coworkers [60] have applied to N. meningitides the technique of comparative genomics. They used DNA arrays to compare the genome of N. meningitidis with those of N. gonorrhoeae, which colonizes a different host niche, and Neisseria lactamica, a commensal of the nasopharynx. They identified genes either specific for meningococcus or shared with gonococcus, but absent in N. lactamica. However, unlike many other pathogens, these meningococcal-specific genes are not organized in large chromosomal islands. Differently, using microarrays, Grifantini et al. [61] compared the differential gene expression in N. meningitidis and N. lactamica after bacterial interaction with human epithelial cells. They found that different subsets of genes were activated by host-cell contact in the pathogen and in the commensal species. The technique of comparative genomic hybridization (CGH) circumvents the need for sequencing of multiple closely related genomes. Briefly, DNA microarrays containing the genome of one strain can be hybridized against total genomic DNA from different strains for which sequence data are not available, permitting the identification of genes present in one strain and absent in another. Recently, using this technique [11] we were able to compare the genomes of 22 strains of Streptococcus agalactiae (GBS) of different serotypes, with the genome of the only sequenced strain of serogroup V. This analysis revealed a number of regions highly variable in the genome but, most importantly, genes common to all genomes and therefore possibly the best candidates for a vaccine able to induce a cross-protective response. The same technique has also been applied to compare the genome of Bacillus anthracis with 19 strains of related Bacillus species, such as B. thuringensis and B. cereus [62]. Previously, a comparative genome analysis among B. anthracis and other nonpathogenic Bacillus species had been restricted to the virulence plasmid pXO1 for the identification of potential vaccine candidates [13]. Owing to its intrinsic technical limitations, CGH analysis necessarily fails to detect acquisition events with respect to the reference strains. However, the increasing number of CGH studies directed to detecting gene presence/

543

544

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design Table 24.1 Examples of application of functional genomic

approaches to bacterial pathogens. Bacterium

Approaches

References

Bacillus anthracis

Reverse vaccinology Comparative genome analysis Serological proteome analysis Microarray

[13] [62] [71] [72]

Bordetella pertussis

Microarray[a]

[46]

Brucella spp.

Comparative genome analysis

[73]

Campylobacter jejuni

Comparative genome analysis

[74]

Chlamydia pneumoniae

Proteomics/reverse vaccinology

[12]

[a]

Chlamydia trachomatis

Microarray Microarray

[49] [75]

Escherichia coli

Comparative genome analysis Microarray

[63] [76, 77]

Haemophilus influenzae

Proteomics

[68, 69]

Helicobacter felis

Microarray

[55]

[a]

Helicobacter pylori

Microarray Microarray Comparative genome analysis Comparative proteome analysis Serological proteome analysis

[48] [78] [79] [80, 81] [82, 83]

Listeria monocytogenes

Microarray[a] Comparative genome analysis

[45] [84, 85]

Mycobacterium tuberculosis

Microarray[a] Microarray STM Comparative genome analysis Comparative proteome analysis

[50] [56, 86] [25] [87] [88, 89]

Mycoplasma pulmonis

DNA vaccination

[90]

Neisseria meningitidis

Reverse vaccinology Microarray STM Whole-genome expression library Comparative genome analysis Proteomics

[7] [52–54, 61] [31] [91] [60] [92]

Porphyromonas gingivalis

Reverse vaccinology

[17]

24.6 Transcriptome Analysis and Comparative Genomics Table 24.1 Continued.

Bacterium

Approaches

References

Pseudomonas aeruginosa

Microarray[a] Microarray IVET

[47] [93] [94, 95]

Salmonella typhimurium

Microarray[a] STM IVET DFI

[44] [24] [21] [96, 97]

Shigella flexneri

Microarray[a]

[51]

Staphylococcus aureus

Microarray STM IVET Genomic peptide libraries Serological proteome analysis

[98] [26, 27] [99, 100] [15] [16]

Streptococcus agalactiae

Comparative genome analysis Proteomics

[11] [66]

Streptococcus pneumoniae

Reverse vaccinology STM Comparative genome analysis Microarray

[14] [30] [101] [102]

Streptococcus pyogenes

Comparative genome analysis

[103]

Vibrio cholerae

Microarray STM IVET

[104] [28] [105]

Yersinia enterocolitica

STM IVET

[29] [106]

Yersinia pestis

Comparative genome analysis

[107, 108]

a Microarray studies evaluating host gene activation; in all the other studies the bacterial gene expression profile was analyzed. STM, signature-tagged mutagenesis; IVET, in vivo expression technology; DFI, differential fluorescence induction.

absence profiles in bacterial populations [57, 59] permits a more accurate evaluation of the genetic stability of a great number of bacterial pathogens. To analyze genome plasticity in pathogenic and commensal E. coli isolates, Hacker and coworkers made use of a whole-genome approach. Using DNA microarrays, the presence of all translatable ORFs of nonpathogenic E. coli K-12 was investigated in 26 extraintestinal (ExPEC) and intestinal pathogenic E. coli (IPEC) isolates, three pathogenicity island deletion mutants, and commensal and labora-

545

546

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design

tory strains. In addition, they developed an “E. coli pathoarray,” which consists of hundreds of probes specific for virulence-associated genes of ExPEC, IPEC, and Shigella, in order to evaluate the distribution of these genes among the pathogenic and commensal strains used [63]. In recent years, there have been enormous advances in DNA microarray technology and a remarkable amount of literature published supporting its central role in gene discovery and vaccine and drug development (Table 24.1). However, because the results of pathogen gene expression are influenced by the model system used, such results must be interpreted cautiously. Because of these considerations, traditional biological, pathology, and toxicity studies remain important.

24.7 Proteomics and Vaccine Design

Recent improvements in high-sensitivity biological mass spectrometry have provided a powerful adjunct to traditional 2D SDS-PAGE gel electrophoresis. The entire complement of proteins expressed by a cell (the proteome) can be defined and becomes a valuable and useful tool for antigen discovery [64]. This kind of approach has already been used to provide insight into the function of a specific subset of the proteome, such as the cell envelope of Salmonella typhimurium [65]. Proteome studies are made even more powerful when applied to an organism whose genome has been sequenced. In a recent study, Montigiani and colleagues used the approach of genomics combined with proteomics to characterize the surface proteins of C. pneumoniae [12]. Other examples where proteomics has also been used to study bacteria pathogenesis and identify vaccine candidates include Streptococcus agalactiae [66, 67] and H. influenzae [68, 69]. Further genome-wide studies are underway for S. aureus and Streptococcus pneumoniae, where integrated proteomic strategies have been successfully applied in the discovery of vaccine antigens. The combination of proteomics and serological analysis allowed the development of SERPA (serological proteome analysis), a technology that has been applied to screen and select new in vivo immunogens, potential vaccine candidates [70]. In conclusion, classical proteomics and immunoproteomics approaches have shown themselves to be a powerful tool for the identification of novel bacterial antigens, for the understanding of protein function and in identifying novel vaccine components. Their use is likely to increase in the years to come.

24.8 Conclusions

Analysis of whole-genome sequences from a range of pathogens shows their diversity and adaptability to different environments. Valuable information can be obtained from complete genome sequences, and this has revolutionized the

References

approach to vaccine development. The new approach starts with the complete information about the genome and the gene products, and then identifies among these the important factors involved in virulence. The analysis of the transcriptome and proteome allows a better understanding of the pathogen’s biology as well as its interactions with the host immune system. The encouraging results obtained with meningococcus B has opened the way for reverse vaccinology to be applied to many other infectious diseases for which effective vaccines are still to be developed. A crucial step in expanding its potentiality will be the development of biological assays for large-scale testing of vaccine candidates.

References 1 Del Giudice, G. and R. Rappuoli. 1999.

Genetically derived toxoids for use as vaccines and adjuvants. Vaccine 17 Suppl 2:S44–S52. 2 Andre, F.E. 1990. Overview of a 5-year clinical experience with a yeast-derived hepatitis B vaccine. Vaccine 8 Suppl:S74–S78; discussion S79–S80. 3 Pizza, M., A. Covacci, A. Bartoloni, M. Perugini, L. Nencioni, M.T. De Magistris, L. Villa, D. Nucci, R. Manetti and M. Bugnoli. 1989. Mutants of pertussis toxin suitable for vaccine development. Science 246:497–500. 4 Nencioni, L., M. Pizza, M. Bugnoli, T. De Magistris, A. Di Tommaso, F. Giovannoni, R. Manetti, I. Marsili, G. Matteucci and D. Nucci. 1990. Characterization of genetically inactivated pertussis toxin mutants: candidates for a new vaccine against whooping cough. Infect. Immun. 58:1308–1315. 5 Fleischmann, R.D., M.D. Adams, O. White, R.A. Clayton, E.F. Kirkness, A.R. Kerlavage, C.J. Bult, J.F. Tomb, B.A. Dougherty and J.M. Merrick. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 269:496-512. 6 Rappuoli, R. 2001. Reverse vaccinology, a genome-based approach to vaccine development. Vaccine 19:2688–2691. 7 Pizza, M., V. Scarlato, V. Masignani, M.M. Giuliani, B. Arico, M. Comanducci, G.T. Jennings, L. Baldi, E. Bartolini, B. Capecchi, C.L. Galeotti, E. Luzzi, R. Manetti, E. Marchetti, M. Mora, S. Nuti, G. Ratti, L. Santini, S. Savino,

M. Scarselli, E. Storni, P. Zuo, M. Broeker, E. Hundt, B. Knapp, E. Blair, T. Mason, H. Tettelin, D.W. Hood, A.C. Jeffries, N.J. Saunders, D.M. Granoff, J.C. Venter, E.R. Moxon, G. Grandi and R. Rappuoli. 2000. Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. Science 287:1816–1820. 8 Tettelin, H., N.J. Saunders, J. Heidelberg, A.C. Jeffries, K.E. Nelson, J.A. Eisen, K.A. Ketchum, D.W. Hood, J.F. Peden, R.J. Dodson, W.C. Nelson, M.L. Gwinn, R. DeBoy, J.D. Peterson, E.K. Hickey, D.H. Haft, S.L. Salzberg, O. White, R.D. Fleischmann, B.A. Dougherty, T. Mason, A. Ciecko, D.S. Parksey, E. Blair, H. Cittone, E.B. Clark, M.D. Cotton, T.R. Utterback, H. Khouri, H. Qin, J. Vamathevan, J. Gill, V. Scarlato, V. Masignani, M. Pizza, G. Grandi, L. Sun, H.O. Smith, C.M. Fraser, E.R. Moxon, R. Rappuoli and J.C. Venter. 2000. Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science 287:1809–1815. 9 Masignani, V., M. Comanducci, M.M. Giuliani, S. Bambini, J. Adu-Bobie, B. Arico, B. Brunelli, A. Pieri, L. Santini, S. Savino, D. Serruto, D. Litt, S. Kroll, J.A. Welsch, D.M. Granoff, R. Rappuoli and M. Pizza. 2003. Vaccination against Neisseria meningitidis using three variants of the lipoprotein GNA1870. J. Exp. Med. 197:789–799. 10 Comanducci, M., S. Bambini, B. Brunelli, J. Adu-Bobie, B. Arico, B. Capecchi, M.M. Giuliani, V. Masignani,

547

548

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design L. Santini, S. Savino, D.M. Granoff, D.A. Caugant, M. Pizza, R. Rappuoli and M. Mora. 2002. NadA, a novel vaccine candidate of Neisseria meningitidis. J. Exp. Med. 195:1445–1454. 11 Tettelin, H., V. Masignani, M.J. Cieslewicz, J.A. Eisen, S. Peterson, M.R. Wessels, I.T. Paulsen, K.E. Nelson, I. Margarit, T.D. Read, L.C. Madoff, A.M. Wolf, M.J. Beanan, L.M. Brinkac, S.C. Daugherty, R.T. DeBoy, A.S. Durkin, J.F. Kolonay, R. Madupu, M.R. Lewis, D. Radune, N.B. Fedorova, D. Scanlan, H. Khouri, S. Mulligan, H.A. Carty, R.T. Cline, S.E. Van Aken, J. Gill, M. Scarselli, M. Mora, E.T. Iacobini, C. Brettoni, G. Galli, M. Mariani, F. Vegni, D. Maione, D. Rinaudo, R. Rappuoli, J.L. Telford, D.L. Kasper, G. Grandi and C.M. Fraser. 2002. Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae. Proc. Natl. Acad. Sci. U. S. A. 99:12391– 12396. 12 Montigiani, S., F. Falugi, M. Scarselli, O. Finco, R. Petracca, G. Galli, M. Mariani, R. Manetti, M. Agnusdei, R. Cevenini, M. Donati, R. Nogarotto, N. Norais, I. Garaguso, S. Nuti, G. Saletti, D. Rosa, G. Ratti and G. Grandi. 2002. Genomic approach for analysis of surface proteins in Chlamydia pneumoniae. Infect. Immun. 70:368–379. 13 Ariel, N., A. Zvi, H. Grosfeld, O. Gat, Y. Inbar, B. Velan, S. Cohen and A. Shafferman. 2002. Search for potential vaccine candidate open reading frames in the Bacillus anthracis virulence plasmid pXO1: in silico and in vitro screening. Infect. Immun. 70:6817–6827. 14 Wizemann, T.M., J.H. Heinrichs, J.E. Adamou, A.L. Erwin, C. Kunsch, G.H. Choi, S.C. Barash, C.A. Rosen, H.R. Masure, E. Tuomanen, A. Gayle, Y.A. Brewah, W. Walsh, P. Barren, R. Lathigra, M. Hanson, S. Langermann, S. Johnson and S. Koenig. 2001. Use of a whole genome approach to identify vaccine molecules affording protection against Streptococcus pneumoniae infection. Infect. Immun. 69:1593– 1598.

15 Etz, H., D.B. Minh, T. Henics, A. Dryla,

16

17

18

19

20

21

22

B. Winkler, C. Triska, A.P. Boyd, J. Sollner, W. Schmidt, U. von Ahsen, M. Buschle, S.R. Gill, J. Kolonay, H. Khalak, C.M. Fraser, A. von Gabain, E. Nagy and A. Meinke. 2002. Identification of in vivo expressed vaccine candidate antigens from Staphylococcus aureus. Proc. Natl. Acad. Sci. U. S. A. 99:6573–6578. Vytvytska, O., E. Nagy, M. Bluggel, H.E. Meyer, R. Kurzbauer, L.A. Huber and C.S. Klade. 2002. Identification of vaccine candidate antigens of Staphylococcus aureus by serological proteome analysis. Proteomics 2:580–590. Ross, B.C., L. Czajkowski, D. Hocking, M. Margetts, E. Webb, L. Rothel, M. Patterson, C. Agius, S. Camuglia, E. Reynolds, T. Littlejohn, B. Gaeta, A. Ng, E.S. Kuczek, J.S. Mattick, D. Gearing and I.G. Barr. 2001. Identification of vaccine candidate antigens from a genomic analysis of Porphyromonas gingivalis. Vaccine 19:4135–4142. Srinivasa Rao, P.S., T.M. Lim and K.Y. Leung. 2003. Functional genomics approach to the identification of virulence genes involved in Edwardsiella tarda pathogenesis. Infect. Immun. 71:1343–1351. Betts, J.C. 2002. Transcriptomics and proteomics: tools for the identification of novel drug targets and vaccine candidates for tuberculosis. IUBMB Life 53:239–242. Mahan, M.J., J.M. Slauch, P.C. Hanna, A. Camilli, J.W. Tobias, M.K. Waldor and J.J. Mekalanos. 1993. Selection for bacterial genes that are specifically induced in host tissues: the hunt for virulence factors. Infect. Agents Dis. 2:263–268. Mahan, M.J., J.W. Tobias, J.M. Slauch, P.C. Hanna, R.J. Collier and J.J. Mekalanos. 1995. Antibiotic-based selection for bacterial genes that are specifically induced during infection of a host. Proc. Natl. Acad. Sci. U. S. A. 92:669– 673. Slauch, J.M. and A. Camilli. 2000. IVET and RIVET: use of gene fusions to identify bacterial virulence factors specifi-

References

23

24

25

26

27

28

29

30

31

cally induced in host tissues. Methods Enzymol. 326:73–96. Handfield, M., L.J. Brady, A. ProgulskeFox and J.D. Hillman. 2000. IVIAT: a novel method to identify microbial genes expressed specifically during human infections. Trends Microbiol. 8:336–339. Hensel, M., J.E. Shea, C. Gleeson, M.D. Jones, E. Dalton and D.W. Holden. 1995. Simultaneous identification of bacterial virulence genes by negative selection. Science 269:400–403. Camacho, L.R., D. Ensergueix, E. Perez, B. Gicquel and C. Guilhot. 1999. Identification of a virulence gene cluster of Mycobacterium tuberculosis by signaturetagged transposon mutagenesis. Mol. Microbiol. 34:257–267. Coulter, S.N., W.R. Schwan, E.Y. Ng, M.H. Langhorne, H.D. Ritchie, S. Westbrock-Wadman, W.O. Hufnagle, K.R. Folger, A.S. Bayer and C.K. Stover. 1998. Staphylococcus aureus genetic loci impacting growth and survival in multiple infection environments. Mol. Microbiol. 30:393–404. Mei, J.M., F. Nourbakhsh, C.W. Ford and D.W. Holden. 1997. Identification of Staphylococcus aureus virulence genes in a murine model of bacteraemia using signature-tagged mutagenesis. Mol. Microbiol. 26:399–407. Chiang, S.L. and J.J. Mekalanos. 1998. Use of signature-tagged transposon mutagenesis to identify Vibrio cholerae genes critical for colonization. Mol. Microbiol. 27:797–805. Darwin, A.J. and V.L. Miller. 1999. Identification of Yersinia enterocolitica genes affecting survival in an animal host using signature-tagged transposon mutagenesis. Mol. Microbiol. 32:51–62. Polissi, A., A. Pontiggia, G. Feger, M. Altieri, H. Mottl, L. Ferrari and D. Simon. 1998. Large-scale identification of virulence genes from Streptococcus pneumoniae. Infect. Immun. 66:5620–5629. Sun, Y.H., S. Bakshi, R. Chalmers and C.M. Tang. 2000. Functional genomics of Neisseria meningitidis pathogenesis. Nat. Med. 6:1269–1273.

32 Akerley, B.J., E.J. Rubin, V.L. Novick,

33

34

35

36

37

38

39

40

41

42

43

K. Amaya, N. Judson and J.J. Mekalanos. 2002. A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae. Proc. Natl. Acad. Sci. U. S. A. 99:966–971. Hutchison, C.A., S.N. Peterson, S.R. Gill, R.T. Cline, O. White, C.M. Fraser, H.O. Smith and J.C. Venter. 1999. Global transposon mutagenesis and a minimal Mycoplasma genome. Science 286:2165–2169. Sassetti, C.M., D.H. Boyd and E.J. Rubin. 2003. Genes required for mycobacterial growth defined by high density mutagenesis. Mol. Microbiol. 48:77–84. Sassetti, C.M., D.H. Boyd and E.J. Rubin. 2001. Comprehensive identification of conditionally essential genes in mycobacteria. Proc. Natl. Acad. Sci. U. S. A. 98:12712–12717. Brown, P.O. and D. Botstein. 1999. Exploring the new world of the genome with DNA microarrays. Nat. Genet. 21:33–37. Cheung, V.G., M. Morley, F. Aguilar, A. Massimi, R. Kucherlapati and G. Childs. 1999. Making and reading microarrays. Nat. Genet. 21:15–19. Lipshutz, R.J., S.P. Fodor, T.R. Gingeras and D.J. Lockhart. 1999. High density synthetic oligonucleotide arrays. Nat. Genet. 21:20–24. Lockhart, D.J. and E.A. Winzeler. 2000. Genomics, gene expression and DNA arrays. Nature 405:827–836. Schoolnik, G.K. 2002. Microarray analysis of bacterial pathogenicity. Adv. Microb. Physiol. 46:1–45. Cummings, C.A. and D.A. Relman. 2000. Using DNA microarrays to study host–microbe interactions. Emerg. Infect. Dis. 6:513–525. Kato-Maeda, M., Q. Gao and P.M. Small. 2001. Microarray analysis of pathogens and their interaction with hosts. Cell Microbiol. 3:713–719. Conway, T. and G.K. Schoolnik. 2003. Microarray expression profiling: capturing a genome-wide portrait of the transcriptome. Mol. Microbiol. 47:879–889.

549

550

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design 44 Eckmann, L., J.R. Smith, M.P. Housley,

M.B. Dwinell and M.F. Kagnoff. 2000. Analysis by high density cDNA arrays of altered gene expression in human intestinal epithelial cells in response to infection with the invasive enteric bacteria Salmonella. J. Biol. Chem. 275:14084–14094. 45 Cohen, P., M. Bouaboula, M. Bellis, V. Baron, O. Jbilo, C. Poinot-Chazel, S. Galiegue, E.H. Hadibi and P. Casellas. 2000. Monitoring cellular responses to Listeria monocytogenes with oligonucleotide arrays. J. Biol. Chem. 275:11181–11190. 46 Belcher, C.E., J. Drenkow, B. Kehoe, T.R. Gingeras, N. McNamara, H. Lemjabbar, C. Basbaum and D.A. Relman. 2000. The transcriptional responses of respiratory epithelial cells to Bordetella pertussis reveal host defensive and pathogen counter-defensive strategies. Proc. Natl. Acad. Sci. U. S. A. 97:13847– 13852. 47 Ichikawa, J.K., A. Norris, M.G. Bangera, G.K. Geiss, A.B. van ’t Wout, R.E. Bumgarner and S. Lory. 2000. Interaction of Pseudomonas aeruginosa with epithelial cells: identification of differentially regulated genes by expression microarray analysis of human cDNAs. Proc. Natl. Acad. Sci. U. S. A. 97:9659–9664. 48 Mills, J.C., A.J. Syder, C.V. Hong, J.L. Guruge, F. Raaii and J.I. Gordon. 2001. A molecular profile of the mouse gastric parietal cell with and without exposure to Helicobacter pylori. Proc. Natl. Acad. Sci. U. S. A. 98:13687–13692. 49 Xia, M., R.E. Bumgarner, M.F. Lampe and W.E. Stamm. 2003. Chlamydia trachomatis infection alters host cell transcription in diverse cellular pathways. J. Infect. Dis. 187:424–434. 50 Wang, J.P., S.E. Rought, J. Corbeil and D.G. Guiney. 2003. Gene expression profiling detects patterns of human macrophage responses following Mycobacterium tuberculosis infection. FEMS Immunol. Med. Microbiol. 39:163–172. 51 Pedron, T., C. Thibault and P.J. Sansonetti. 2003. The invasive phenotype of Shigella flexneri directs a distinct gene expression pattern in the human intest-

52

53

54

55

56

57

58

59

60

inal epithelial cell line Caco-2. J. Biol. Chem. 278:33878–33886. Grifantini, R., E. Bartolini, A. Muzzi, M. Draghi, E. Frigimelica, J. Berger, G. Ratti, R. Petracca, G. Galli, M. Agnusdei, M.M. Giuliani, L. Santini, B. Brunelli, H. Tettelin, R. Rappuoli, F. Randazzo and G. Grandi. 2002. Previously unrecognized vaccine candidates against group B meningococcus identified by DNA microarrays. Nat. Biotechnol. 20:914–921. Grifantini, R., S. Sebastian, E. Frigimelica, M. Draghi, E. Bartolini, A. Muzzi, R. Rappuoli, G. Grandi and C.A. Genco. 2003. Identification of iron-activated and -repressed Fur-dependent genes by transcriptome analysis of Neisseria meningitidis group B. Proc. Natl. Acad. Sci. U. S. A. 100:9542–9547. Kurz, S., C. Hubner, C. Aepinus, S. Theiss, M. Guckenberger, U. Panzner, J. Weber, M. Frosch and G. Dietrich. 2003. Transcriptome-based antigen identification for Neisseria meningitidis. Vaccine 21:768–775. Mueller, A., J. O’Rourke, P. Chu, C.C. Kim, P. Sutton, A. Lee and S. Falkow. 2003. Protective immunity against Helicobacter is characterized by a unique transcriptional signature. Proc. Natl. Acad. Sci. U. S. A. 100:12289– 12294. Wilson, M., J. DeRisi, H.H. Kristensen, P. Imboden, S. Rane, P.O. Brown and G.K. Schoolnik. 1999. Exploring druginduced alterations in gene expression in Mycobacterium tuberculosis by microarray hybridization. Proc. Natl. Acad. Sci. U. S. A. 96:12833–12838. Bryant, P.A., D. Venter, R. RobinsBrowne and N. Curtis. 2004. Chips with everything: DNA microarrays in infectious diseases. Lancet Infect. Dis. 4:100– 111. Claverie, J.M., C. Abergel, S. Audic and H. Ogata. 2001. Recent advances in computational genomics. Pharmacogenomics 2:361–372. Schoolnik, G.K. 2002. Functional and comparative genomics of pathogenic bacteria. Curr. Opin. Microbiol. 5:20–26. Perrin, A., S. Bonacorsi, E. Carbonnelle, D. Talibi, P. Dessen, X. Nassif and

References C. Tinsley. 2002. Comparative genomics identifies the genetic islands that distinguish Neisseria meningitidis, the agent of cerebrospinal meningitis, from other Neisseria species. Infect. Immun. 70:7063–7072. 61 Grifantini, R., E. Bartolini, A. Muzzi, M. Draghi, E. Frigimelica, J. Berger, F. Randazzo and G. Grandi. 2002. Gene expression profile in Neisseria meningitidis and Neisseria lactamica upon hostcell contact: from basic research to vaccine development. Ann. N. Y. Acad. Sci. 975:202–216. 62 Read, T.D., S.N. Peterson, N. Tourasse, L.W. Baillie, I.T. Paulsen, K.E. Nelson, H. Tettelin, D.E. Fouts, J.A. Eisen, S.R. Gill, E.K. Holtzapple, O.A. Okstad, E. Helgason, J. Rilstone, M. Wu, J.F. Kolonay, M.J. Beanan, R.J. Dodson, L.M. Brinkac, M. Gwinn, R.T. DeBoy, R. Madpu, S.C. Daugherty, A.S. Durkin, D.H. Haft, W.C. Nelson, J.D. Peterson, M. Pop, H.M. Khouri, D. Radune, J.L. Benton, Y. Mahamoud, L. Jiang, I.R. Hance, J.F. Weidman, K.J. Berry, R.D. Plaut, A.M. Wolf, K.L. Watkins, W.C. Nierman, A. Hazen, R. Cline, C. Redmond, J.E. Thwaite, O. White, S.L. Salzberg, B. Thomason, A.M. Friedlander, T.M. Koehler, P.C. Hanna, A.B. Kolsto and C.M. Fraser. 2003. The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature 423:81–86. 63 Dobrindt, U., F. Agerer, K. Michaelis, A. Janka, C. Buchrieser, M. Samuelson, C. Svanborg, G. Gottschalk, H. Karch and J. Hacker. 2003. Analysis of genome plasticity in pathogenic and commensal Escherichia coli isolates by use of DNA arrays. J. Bacteriol. 185:1831–1840. 64 Grandi, G. 2001. Antibacterial vaccine design using genomics and proteomics. Trends Biotechnol. 19:181–188. 65 Qi, S.Y., A. Moir and C.D. O’Connor. 1996. Proteome of Salmonella typhimurium SL1344: identification of novel abundant cell envelope proteins and assignment to a two-dimensional reference map. J. Bacteriol. 178:5032–5038. 66 Hughes, M.J., J.C. Moore, J.D. Lane, R. Wilson, P.K. Pribul, Z.N. Younes, R.J. Dobson, P. Everest, A.J. Reason,

J.M. Redfern, F.M. Greer, T. Paxton, M. Panico, H.R. Morris, R.G. Feldman and J.D. Santangelo. 2002. Identification of major outer surface proteins of Streptococcus agalactiae. Infect. Immun. 70:1254–1259. 67 Hughes, M.J., R. Wilson, J.C. Moore, J.D. Lane, R.J. Dobson, P. Muckett, Z. Younes, P. Pribul, A. Topping, R.G. Feldman and J.D. Santangelo. 2003. Novel protein vaccine candidates against Group B streptococcal infection identified using alkaline phosphatase fusions. FEMS Microbiol. Lett. 222:263– 271. 68 Langen, H., B. Takacs, S. Evers, P. Berndt, H.W. Lahm, B. Wipf, C. Gray and M. Fountoulakis. 2000. Two-dimensional map of the proteome of Haemophilus influenzae. Electrophoresis 21:411–429. 69 Thoren, K., E. Gustafsson, A. Clevnert, T. Larsson, J. Bergstrom and C.L. Nilsson. 2002. Proteomic study of non-typable Haemophilus influenzae. J. Chromatogr. B. Analyt. Technol. Biomed. Life Sci. 782:219–226. 70 Klade, C.S. 2002. Proteomics approaches towards antigen discovery and vaccine development. Curr. Opin. Mol. Ther. 4:216–223. 71 Ariel, N., A. Zvi, K.S. Makarova, T. Chitlaru, E. Elhanany, B. Velan, S. Cohen, A.M. Friedlander and A. Shafferman. 2003. Genome-based bioinformatic selection of chromosomal Bacillus anthracis putative vaccine candidates coupled with proteomic identification of surface-associated antigens. Infect. Immun. 71:4563–4579. 72 Nubel, U., P.M. Schmidt, E. Reiss, F. Bier, W. Beyer and D. Naumann. 2004. Oligonucleotide microarray for identification of Bacillus anthracis based on intergenic transcribed spacers in ribosomal DNA. FEMS Microbiol. Lett. 240:215–223. 73 Rajashekara, G., J.D. Glasner, D.A. Glover and G.A. Splitter. 2004. Comparative whole-genome hybridization reveals genomic islands in Brucella species. J. Bacteriol. 186:5040–5051. 74 Pearson, B.M., C. Pin, J. Wright, K. l’Anson, T. Humphrey and

551

552

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design J.M. Wells. 2003. Comparative genome analysis of Campylobacter jejuni using whole genome DNA microarrays. FEBS Lett. 554:224–230. 75 Belland, R.J., G. Zhong, D.D. Crane, D. Hogan, D. Sturdevant, J. Sharma, W.L. Beatty and H.D. Caldwell. 2003. Genomic transcriptional profiling of the developmental cycle of Chlamydia trachomatis. Proc. Natl. Acad. Sci. U. S. A. 100:8478–8483. 76 Patten, C.L., M.G. Kirchhof, M.R. Schertzberg, R.A. Morton and H.E. Schellhorn. 2004. Microarray analysis of RpoS-mediated gene expression in Escherichia coli K-12. Mol. Genet. Genomics 272:580–591. 77 Snyder, J.A., B.J. Haugen, E.L. Buckles, C.V. Lockatell, D.E. Johnson, M.S. Donnenberg, R.A. Welch and H.L. Mobley. 2004. Transcriptome of uropathogenic Escherichia coli during urinary tract infection. Infect. Immun. 72:6373– 6381. 78 Thompson, L.J., D.S. Merrell, B.A. Neilan, H. Mitchell, A. Lee and S. Falkow. 2003. Gene expression profiling of Helicobacter pylori reveals a growth-phasedependent switch in virulence gene expression. Infect. Immun. 71:2643– 2655. 79 Salama, N., K. Guillemin, T.K. McDaniel, G. Sherlock, L. Tompkins and S. Falkow. 2000. A whole-genome microarray reveals genetic diversity among Helicobacter pylori strains. Proc. Natl. Acad. Sci. U. S. A. 97:14668– 14673. 80 Jungblut, P.R., D. Bumann, G. Haas, U. Zimny-Arndt, P. Holland, S. Lamer, F. Siejak, A. Aebischer and T.F. Meyer. 2000. Comparative proteome analysis of Helicobacter pylori. Mo. Microbiol. 36:710–725. 81 Govorun, V.M., S.A. Moshkovskii, O.V. Tikhonova, E.I. Goufman, M.V. Serebryakova, K.T. Momynaliev, P.G. Lokhov, E.V. Khryapova, L.V. Kudryavtseva, O.V. Smirnova, I.Y. Toropyguine, B.I. Maksimov and A.I. Archakov. 2003. Comparative analysis of proteome maps of Helicobacter pylori clinical isolates. Biochemistry (Mosc) 68:42–49.

82 Utt, M., I. Nilsson, A. Ljungh and

T. Wadstrom. 2002. Identification of novel immunogenic proteins of Helicobacter pylori by proteome technology. J. Immunol. Methods 259:1–10. 83 Baik, S.C., K.M. Kim, S.M. Song, D.S. Kim, J.S. Jun, S.G. Lee, J.Y. Song, J.U. Park, H.L. Kang, W.K. Lee, M.J. Cho, H.S. Youn, G.H. Ko and K.H. Rhee. 2004. Proteomic analysis of the sarcosine-insoluble outer membrane fraction of Helicobacter pylori strain 26695. J. Bacteriol. 186:949–955. 84 Glaser, P., L. Frangeul, C. Buchrieser, C. Rusniok, A. Amend, F. Baquero, P. Berche, H. Bloecker, P. Brandt, T. Chakraborty, A. Charbit, F. Chetouani, E. Couve, A. de Daruvar, P. Dehoux, E. Domann, G. DominguezBernal, E. Duchaud, L. Durant, O. Dussurget, K.D. Entian, H. Fsihi, F. Garciadel Portillo, P. Garrido, L. Gautier, W. Goebel, N. Gomez-Lopez, T. Hain, J. Hauf, D. Jackson, L.M. Jones, U. Kaerst, J. Kreft, M. Kuhn, F. Kunst, G. Kurapkat, E. Madueno, A. Maitournam, J.M. Vicente, E. Ng, H. Nedjari, G. Nordsiek, S. Novella, B. de Pablos, J.C. Perez-Diaz, R. Purcell, B. Remmel, M. Rose, T. Schlueter, N. Simoes, A. Tierrez, J.A. Vazquez-Boland, H. Voss, J. Wehland and P. Cossart. 2001. Comparative genomics of Listeria species. Science 294:849–852. 85 Buchrieser, C., C. Rusniok, F. Kunst, P. Cossart and P. Glaser. 2003. Comparison of the genome sequences of Listeria monocytogenes and Listeria innocua: clues for evolution and pathogenicity. FEMS Immunol. Med. Microbiol. 35:207–213. 86 Schnappinger, D., S. Ehrt, M.I. Voskuil, Y. Liu, J.A. Mangan, I.M. Monahan, G. Dolganov, B. Efron, P.D. Butcher, C. Nathan and G.K. Schoolnik. 2003. Transcriptional adaptation of Mycobacterium tuberculosis within macrophages: insights into the phagosomal environment. J. Exp. Med. 198:693–704. 87 Cockle, P.J., S.V. Gordon, A. Lalvani, B.M. Buddle, R.G. Hewinson and H.M. Vordermeier. 2002. Identification of novel Mycobacterium tuberculosis antigens with potential as diagnostic reagents or subunit vaccine candidates

References

88

89

90

91

92

93

94

95

96

by comparative genomics. Infect. Immun. 70:6996–7003. Jungblut, P.R., U. Zimny-Arndt, E. Zeindl-Eberhart, J. Stulik, K. Koupilova, K.P. Pleissner, A. Otto, E.C. Muller, W. Sokolowska-Kohler, G. Grabher and G. Stoffler. 1999. Proteomics in human disease: cancer, heart and infectious diseases. Electrophoresis 20:2100–2110. Mattow, J., P.R. Jungblut, U.E. Schaible, H.J. Mollenkopf, S. Lamer, U. ZimnyArndt, K. Hagens, E.C. Muller and S.H. Kaufmann. 2001. Identification of proteins from Mycobacterium tuberculosis missing in attenuated Mycobacterium bovis BCG strains. Electrophoresis 22:2936–2946. Barry, M.A., W.C. Lai and S.A. Johnston. 1995. Protection against mycoplasma infection using expression-library immunization. Nature 377:632–635. Pelicic, V., S. Morelle, D. Lampe and X. Nassif. 2000. Mutagenesis of Neisseria meningitidis by in vitro transposition of Himar1 mariner. J Bacteriol.182:5391–5398. Bernardini, G., G. Renzone, M. Comanducci, R. Mini, S. Arena, C. D’Ambrosio, S. Bambini, L. Trabalzini, G. Grandi, P. Martelli, M. Achtman, A. Scaloni, G. Ratti and A. Santucci. 2004. Proteome analysis of Neisseria meningitidis serogroup A. Proteomics 4:2893–2926. Firoved, A.M. and V. Deretic. 2003. Microarray analysis of global gene expression in mucoid Pseudomonas aeruginosa. J. Bacteriol. 185:1071–1081. Wang, J., S. Lory, R. Ramphal and S. Jin. 1996. Isolation and characterization of Pseudomonas aeruginosa genes inducible by respiratory mucus derived from cystic fibrosis patients. Mol. Microbiol. 22:1005–1012. Wang, J., A. Mushegian, S. Lory and S. Jin. 1996. Large-scale isolation of candidate virulence genes of Pseudomonas aeruginosa by in vivo selection. Proc. Natl. Acad. Sci. U. S. A. 93:10434– 10439. Valdivia, R.H. and S. Falkow. 1997. Fluorescence-based isolation of bacterial genes expressed within host cells. Science 277:2007–2011.

97 Valdivia, R.H. and S. Falkow. 1996.

98

99

100

101

102

103

104

Bacterial genetics by flow cytometry: rapid isolation of Salmonella typhimurium acid-inducible promoters by differential fluorescence induction. Mol. Microbiol. 22:367–378. Dunman, P.M., E. Murphy, S. Haney, D. Palacios, G. Tucker-Kellogg, S. Wu, E.L. Brown, R.J. Zagursky, D. Shlaes and S.J. Projan. 2001. Transcription profiling-based identification of Staphylococcus aureus genes regulated by the agr and/or sarA loci. J. Bacteriol. 183:7341– 7353. Lowe, A.M., D.T. Beattie and R.L. Deresiewicz. 1998. Identification of novel staphylococcal virulence genes by in vivo expression technology. Mol. Microbiol. 27:967–976. Benton, B.M., J.P. Zhang, S. Bond, C. Pope, T. Christian, L. Lee, K.M. Winterberg, M.B. Schmid and J.M. Buysse. 2004. Large-scale identification of genes required for full virulence of Staphylococcus aureus. J. Bacteriol. 186:8478– 8489. Oggioni, M.R. and G. Pozzi. 2001. Comparative genomics for identification of clone-specific sequence blocks in Streptococcus pneumoniae. FEMS Microbiol. Lett. 200:137–143. Orihuela, C.J., J.N. Radin, J.E. Sublett, G. Gao, D. Kaushal and E.I. Tuomanen. 2004. Microarray analysis of pneumococcal gene expression during invasive disease. Infect. Immun. 72:5582–5596. Smoot, J.C., K.D. Barbian, J.J. Van Gompel, L.M. Smoot, M.S. Chaussee, G.L. Sylva, D.E. Sturdevant, S.M. Ricklefs, S.F. Porcella, L.D. Parkins, S.B. Beres, D.S. Campbell, T.M. Smith, Q. Zhang, V. Kapur, J.A. Daly, L.G. Veasy and J.M. Musser. 2002. Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks. Proc. Natl. Acad. Sci. U. S. A. 99:4668–4673. Zhu, J., M.B. Miller, R.E. Vance, M. Dziejman, B.L. Bassler and J.J. Mekalanos. 2002. Quorum-sensing regulators control virulence gene expression in Vibrio cholerae. Proc. Natl. Acad. Sci. U. S. A. 99:3129–3134.

553

554

24 Reverse Vaccinology: Revolutionizing the Approach to Vaccine Design 105 Camilli, A. and J.J. Mekalanos. 1995.

Use of recombinase gene fusions to identify Vibrio cholerae genes induced during infection. Mol. Microbiol. 18:671–683. 106 Young, G.M. and V.L. Miller. 1997. Identification of novel chromosomal loci affecting Yersinia enterocolitica pathogenesis. Mol. Microbiol. 25:319–328. 107 Hinchliffe, S.J., K.E. Isherwood, R.A. Stabler, M.B. Prentice, A. Rakin, R.A. Nichols, P.C. Oyston, J. Hinds, R.W. Tit-

ball and B.W. Wren. 2003. Application of DNA microarrays to study the evolutionary genomics of Yersinia pestis and Yersinia pseudotuberculosis. Genome Res. 13:2018–2029. 108 Zhou, D., Y. Han, E. Dai, Y. Song, D. Pei, J. Zhai, Z. Du, J. Wang, Z. Guo and R. Yang. 2004. Defining the genome content of live plague vaccines by use of whole-genome DNA microarray. Vaccine 22:3367–3374.

555

Index a

b

ABC transporter 161 accessory DNA elements 101 accessory genetic elements 86 adaptive immunity 457 aerobactin 95 affymetrix 23 allelic variation 152, 154 analytical epidemiology 496 annotation 6 annotation of domains – AnDOM 5 – COG 4 – SCOP 5 – SMART 4 antibiotic resistance 481, 507 ff. antibiotics 15, 505 ff. apicomplexa 422 – eimeriorines 423 – eugregarines 423 – haemogregarines 423 – haemosporines 423 – piroplasms 423 Arabidopsis thaliana 447 – bacterial pathogens 448 – genome 447 – type III secretion system 448 Aspergillus 397 ff. – A. flavus 397 – A. fumigatus 397 – A. nidulans 397 – A. niger 397 – A. terreus 397 – genome sequence 398 attB 165 attP 165

Bacillus 265 ff. – B. cereus 273 Bacillus anthracis 265 ff. – amino acid and peptide utilization 271 – anthrax 268 – AtxA 272 – edemafactor (EF) 269 – endospores 268 ff., 271 – genome 270 – iron-acquisition 271 – IS1627 271 – lethal factor (LF) 269 – molecular diversity 272 – pathogenicity island 271 – PlcR 272 – protective antigen (PA) 269 – pXO1 270, 271 ff. – pXO2 270, 271 ff. – variable-number tandem repeats (VNTRs) 272 – virulence factors 269, 270 Bacillus cereus 265 ff. – pBC218 273 – pBCXO1 273 – toxins 267 Bacillus cereus group 266 ff. – B. anthracis 266 – B. cereus 266 – B. mycoides 266 – B. thuringiensis 266 – B. weihenstephanensis 266 – evolution 273 – food poisoning 266 – PlcR 274 Bacillus licheniformis 265 ff. Bacillus subtilis 43 ff., 46 ff., 75, 265 ff. – CrsR regulon 49 – HrcA regulon 49

556

Index – PhoR regulon 50 – rB regulon 49 bacteriophage 99, 152, 154 ff., 162 ff. Bartonella 281 ff., 282 – auxiliary replicon 291 – B. bacilliformis 282, 285 – B. birtlesii 283 – B. bovis 282 – B. capreoli 282 – B. chomelii 282 – B. clarridgeiae 282 – B. doshiae 282 – B. elizabethae 282 – B. grahamii 282 – B. henselae 281 f., 284 ff., 285 – B. Koehlerae 282 – B. quintana 281, 282, 285 ff., 290, 294 – B. schoenbuchii 282 – B. taylorii 282 – B. tribocorum 282 – B. vinsonii 282 – B. washoensis 282 – B. weissi 283 – Carrin’s disease 285 – cat-scratch disease 284 – chromosome II-like segment 291 ff. – evolution 290, 294 – genome 286 – genomic island(s) 287 ff., 288 ff., 292 – infection of reservoir 284 – island 294 – islets 289, 290, 291 – (pro-phage) 287 ff., 288, 290, 294 – rearrrangements 290 – short tandem repeats 291 – trench fever 285 – trw operon 293 – type IV secretion system 292 – virB-D4 operon 293 bioinformatics 3 ff. – bioconductor 4 – elementary mode analysis 4 – iterative sequence alignments 4 – pathway alignment 4 – prediction of antigenicity 13 – PROSITE 4 – R 4 Blastomyces dermatiditis 396 ff. Bordetella pertussis 71 – filamentous hemagglutinin 71 Brucella melitensis 286 Brucella suis 286

c C. glabrata 406 Caenorhabditis elegans – Enterococcus faecalis 449 – genome 449 – Streptococcus pneumoniae 449 Candida – C. albicans 403 – C. glabrata 406 – C. guilliermondii 406 – C. lusitaniae 406 – C. tropicalis 406 – comparative genomic hybridization 405 – DNA microarrays 407 – homologous recombination 408 – in vivo expression technology (IVET) 407 – mutagenesis 407 – RNA interference (RNAi) 409 – sequential disruption of genes in diploid organisms 409 – transformation 407, 408 Candida albicans 403 – genome sequence 404 – heterozygosity 404, 405 – mating-type like locus 404 Candida dubliniensis 405 Candida glabrata – gene loss 406 – phenotypic switching 406 cDNA 30 ff. CDT see cytolethal distending toxin CEs see Correia elements Clostridium 247 ff. – C. acetobutylicicum 257 – C. baratii 262 – C. botulinum 257 – C. butyricum 262 – C. difficile 257 – C. perfringens 257 – C. tetani 257 – genome 258 – pCP13 258 – toxins 258 – virulence factors 259 Clostridium botulinum 262 – BoNT 262 – botulinum toxin (BoNT) 262 – botulism 262 – genome 262 Clostridium difficile – diarrhea 263 – genome 263 – PaLoc 264

Index – pseudomembranous colitis 263 – toxin A 263 – toxin B 263 Clostridium tetani – genome 260 – pE88 260 ff. – putative virulence factors 261 – sodium ion-dependent bioenergetics 262 – tetanus disease 260 – tetanus toxin 260, 261 cobalamin 110 Coccidioides 396 – C. immitis 396 – C. posadasii 396 Caenorhabditis elegans – antibacterial defense 450 comparative genomics 101 conjugative transposons 95 core gene pool 94 core genome 481 Cryptococcus neoformans 398 ff., 448 – antisense transcripts 400 – comparative genomics 399 – genome 448 – genome sequence 399 – invasion 448 – mating-type (MAT) locus 400 Cryptosporidium – C. hominis 427 – C. parvum 427 – genome comparison 428 – genome reduction 428 – genome sequence 427 – introns 427 cystic fibrosis 467

d Danio rerio – adaptive immune system 451 – innate immune system 451 – Mycobacterium 451 – S. pyogenes 451 – Salmonella 451 – Streptococcus iniae 451 data analysis 9 – bioconductor 11, 36 – cluster analysis 11 – false discovery rate 34 – Genespring 36 – graphical representations 35 – image quantification 9 – LIMMA 36 – neural networks 11

– normalization 9 ff. – pattern recognition 35 – R 36 – ScanAlyze 9 – SMA 36 – Spotfinder 9 – TM4suite 36 – YASMA 36 data normalization and analysis – BlueFuse 32 – data analysis 34 – data processing 33 – image quantification 32 – imagene 32 – LOWESS 33 – MAD 33 – normalization 33 – systematic error 33 databases 16 – ApiDots 420 – CryptoDB 419, 427 – Genomes OnLine Database 69 – GOLD 21, 69 – KEGG 55 – PlasmoDB 419 – Staph-2D 56 – TcruziDB 419 – ToxoDB 419, 429 dermatophytes – Epidermophyton 390 – Microsporum 390 – Trichophyton 390 Dictyostelium discoideum 448 dimorphic fungi – Blastomyces 390 – Coccidioides 390 – Histoplasma 390 – Paracoccidioides 390 directed mutagenesis 73 DNA array technology 487 – 16S-/23S-rDNA arrays 494 – detection of antibiotic resistance 494 – detection of bioweapons 497 – expression profiles 496 – food technology 496 – identification of single-base mutation 495 – limitations 498 – macroarrays 487 – microarray 487 ff., 488 – microbial community analysis 496 – molecular diagnostics 496 – monitoring of gene expression 495 – pathoarrays 489

557

558

Index DNA arrays 94 DNA-DNA hybridization 94 domain server 5 Drosophila melanogaster 450 – genome 450 – innate responses 450 drug design 15 drug discovery 506 – alternative approaches 521 – animal models 509 – bacterial genomic sequences 509 – broad-spectrum antibiotics 523 – combinatorial and parallel synthesis methods 509 – complete genome sequences 513 – conditional mutants 514, 517 – essential genes 514 – genetic tools 517 – genome-wide expression profiling 520 – genome-wide gene inactivation studies 514, 515 – high-throughput robotic screening 509 – minimal inhibitory concentration (MIC) 507 – mode-of-action 517 – narrow-spectrum drugs 523 – natural compound chemistry 508 – novel antibiotics 510 ff. – novel target structures 513 – overexpression systems 519 – phage therapy 524 – promoter induction assays 519 – proteome 520 – resistance breaker compound 523 – resistance mechanisms 522 – species-immanent variations 513 – synthetic chemistry 508 – target identification 513 – target prioritization 517 – target validation 513 – target-based drug discovery 513 – targeting virulence and/or pathogenicity factors 425 – targets for discovery 516 – transcriptome 520 – type III protein secretion system 526 – underexpression systems 518

e E. coli 75 – K-12 86 Enterococcus faecalis 125 ff., 161 – Ace 136 – aggregation substance 136 – bacteremica 126 – bile salt hydrolase 138 – biofilm formation 137 – bloodstream infection 126 – capsular polysaccharide 137 – chromosomally integrated plasmids 129 – competence 130 – conjugative plasmids 129 – cytolysin 135 – EfaA 133 ff. – endocarditis 126 – enterococcal polysaccharide antigen 137 – enterococcal surface protein 137 – environmental adaptation 131 – exfoliative toxin A 134 – fsr locus 135 – GelE 134 – genome 126 – heat shock 131 – hemolysin 134 – hyaluronidase 134 – internalins 136 – intra-abdominal abscesses 126 – IS elements 127 ff. – IS256 129 – metalloprotease 134 – mobile DNA elements 127 ff. – N-acetylmannosamine utilization 139 – oxidative stress 131 – pathogenicity island 127, 138 – pH homeostasis 131 – phage 128 – postsurgical wound infections 126 – quorum sensing 133 – resistance genes 130 – sex pheromone cAD1 135 – SprE 135 – stress response 131 – urinary tract infection 126 – virulence factors 134 Escherichia coli 85 ff. – a-hemolysin 97 – APEC 90 – bundle forming pilus 100 – colicins 100 – commensal 100 – comparative genomics 94

Index – – – – – – – – – – – – – – –

cytolethal distending toxin 100 cytotoxic necrotizing factor 99 E. coli K-12 85, 92 E. coli O157:H7 92 EAEC 100 EAF plasmids 100 EAST1 100 EHEC 91, 99, 100 EIEC 100 enterohemolysin 100 enterohemorrhagic E. coli O157:H7 92 ff. enteroinvasive E. coli 89 EPEC 90, 100 ETEC 91, 99 extraintestinal pathogenic E. coli (ExPEC) 93 ff. – F17 fimbrial adhesin 99 – genetic diversity 100 – high pathogenicity island 96 – intestinal pathogenic E. coli (IPEC) 94 ff. – locus of enterocyte affacement 85, 96 – P fimbriae 97 – pINV 100 – pO157 plasmid 100 – probiotic 101 – REPEC 91 – SEPEC 90 – shigatoxins 100 – Shigella enterotoxin 100 – siderophores 100 – STEC 90 – type III secretion system 100 – uropathogenic E. coli (UPEC) 90, 92 ff., 99 EST see expressed sequence tag ethanolamine 110 ff. evolution 95 expressed sequence tags (EST) 419 expression profile 22

f fimbriae 113, 117 FISH see Fluorescence In Situ hybridization fitness 101 flexible gene pool 86, 481 food safety 496 fungi 389 ff. – thermotolerance 410

g GAS 152 – see also group A streptococcus GBS 155 ff. – see also group B streptococcus GCA see group C streptococcus GEIs 95 ff. – see also genomic island gene context – STRING 6 gene synteny 426 gene-survey sequence tags (GSS) 419 genetic distance – evolutionary trees 375 genetic relatedness 375 – distance 375 – DNACOMP 375 – DNAMOVE 375 – DNAPARS 375 – DNAPENNY 375 – EMBOSS 375 – FITCH 375 – KITCH 375 – MacClade 375 – maximum likelihood 375 – maximum parsimony 375 – MEGA 375 – MESQUITE 375 – PAML 376 – PAUP 375 – PHYLIP 375, 376 – PROTPARS 375 – TREE-PUZZLE 376 genome annotation platform – BioScout 15 – GenDB 15 – MAGPIE 15 – PEDANT 15 genome-scale mutational analyses 72 – Bacillus subtilis 75 – E. coli 75 – Mycoplasma genitalium 76 – Mycoplasma pneumoniae 76 – Neisseria meningitidis 77 – Pseudomonas aeruginosa 76 – Saccharomyces cerevisiae 73 ff. – staphylococcus aureus 77 genomic island/pathogenicity island 86, 92, 95, 286 Giardia 431 ff. – complete nucleotide sequence 432 – genome sequence 432 glycolysis 46, 56, 60

559

560

Index gonococcus 231 GSS see gene-survey sequence tag

h H. influenzae 71 heat stress 60 – CtsR 60 – DnaK 60 – GroEL/S 60 Helicobacter 301 ff. – genome comparison 307 – H. hepaticus 301 – H. mustelae 301 – H. pylori 301 Helicobacter hepaticus 305 ff. – cytolethal distending toxin (CDT) 307 – genome sequence 305 – HHGI1 genomic island 306 – motility 306 – urease 306 Helicobacter pylori 302 ff. – allelic variatin 305 – cag pathogenicity island 304 – CagA 304 – comparison 303 – genome 303 – genome comparison 303 – motility 302 – outer membrane proteins 303 – phase variation 302 ff. – type IV secretion system 304 – urease 302 HGT see horizontal gene transfer Histoplasma – genome projects 391 – Histoplasma capsulatum 390 ff. – phase-specific genes 391 horizontal gene transfer 154, 161 host models 445 ff. – Arabidopsis thaliana 445 – Caenorhabditis elegans 445 – Danio rerio 445 – Dictyostelium discoideum 445 – Drosophila melanogaster 445 – immunity 446 – Mus musculus 445 – susceptbility 446 host pathogen interaction – LRR proteins 448 host response 457 host restriction 115, 119 host-pathogen interaction 445 ff., 457 – host defense systems 447

– Leucine-rich repeat(LRR) proteins 447 HPI 96 Hpt see hexose phosphate transporter human host response – ADAM 462 – adhesion molecules 460 – alginate 466 ff. – apoptosis 462 – apoptosis regulators 460 – Bartonella henselae 468 – cagA-modulated hostresponses 463 – cell cycle 462 – chemokines 469 – common activation program of macrophages 459 – common signatures 469 – cystic fibrosis transmembrane conductance regulator CFTR 467 ff. – cytokines 469 – dendritic cells 458, 459 – elongin B 462 – environmental and genetic factors 463 – ExoT 467 – ExoU 467 – flagella 466 ff. – gene expression profiles 458 – glucocorticoid-induced leucine zipper (GILZ) 464 – H. pylori response 462 – Helicobacter pylori 462 ff. – HIF-1a 468 – inflammatory mediators 460 – innate immune response 462 – innate immunity 469 – interferon-induced antiviral activities 460 – interferon-induced chemokines 460 – interferon-induced proliferation 460 – interferon-induced signaling 460 – invasins 464 – Kruppel-like factor (KLF)2 464 – macrophages 458, 459 – MAPK signaling pathways 464 – NF-kB 462, 464, 466, 469 – NF-kB signaling pathway 460 – pathogen-associated molecular patterns (PAMPs) 469 – pattern recognition receptors (PRRs) 469 – Pseudomans aeruginosa 466 – pYV virulenceplasmid 464 – RhoB 464 – septicemia 460 – TLR-5 467 – TNF-related molecules 460

Index – Toll-like receptor (TLR) 469 – transcription profiles 458 ff. – type III secretion system 467 – type IV pili 466, 467 – Yersinia enterocolitica 464 ff. – Yersinia-mediated gene repression 466 – Yersinia-outer proteins (Yops) 464 human host response 457 hydrogen sulfide 110 – 1,2-propanediol 110 ff.

i IHTs see islands of horizontally transferred DNA infection susceptibility 471 – genetic predisposition 471 – (IL)-12/IL-23-interferon/(IFN)-c pathway 472 – IRAK-4 472 – NEMO 471 – TLR signaling 471 innate immunity 457 insertion sequence elements 86, 177 ff. interaction between pathogenic microorganisms and their host 457 intragenic recombinations 158 invasion 111 ipaH – Shigella flexneri 92 IS element 87, 89 ff., 152, 162 – see also insertion sequence element island 155

k kinetoplastids – Leishmania 429 – Trypanosoma brucei 429 – Trypanosoma cruzi 429 ff. Koch’s postulates 407

l labeling – Cy3 30 – Cy5 30 labeling/reverse transcription 30 ff. LEE 96 – see also locus of enterocyte effacement Legionella 315 ff. – L. anisa 316 – L. dumofii 316 – L. feeleii 316 – L. longbeachae 316 – L. micdadei 316

Legionella pneumophila 315 ff., 448 – ankyrin repeat-containing proteins 319 – apyrases 324 – autotransporter 328 – comparative genomics 329 – deletions 333 – dot/icm type IV secretion system 325 ff., 333 – eukaryotic-like proteins 318 ff. – F-box domains 319 – genome rearrangements 333 – genome sequence 317 – genomic islands 329 ff., 333 – insertion 333 – LepA 325 – LepB 325 – LidA 325 – lsp 327 – lssXYZABD locus 326 – lvh region 330 – lvh type IV secretion system 326 – macrophages 448 – plasmids 331 ff., 333 – protein secretion 324 – RalF 325 – SidA-G 325 ff. – spingosine-1-phosphate lyase 324 – twin arginine translocation (TAT) pathway 327 – type I secretion system 326 – type II secretion system 327 – type IV secretion system 325 – type V secretion system 328 – U-box domains 319 Leihsmania – genome 431 – introns 431 – L. braziliensis 431 – L. major 431 lipoteichoic acid 138 Listeria 348 ff. – ActA show 349 – competence genes 349 – evolution 350 – internalins 349 – L. gray 348 – L. innocua 348, 349 – L. ivanovii 348, 349 – L. monocytogenes 348, 349 – L. seeligeri 349 – L. welshimeri 348, 349 – listeriolysin 349 – phospholipase 349

561

562

Index – virulence gene cluster 349 ff. Listeria monocytogenes 339 ff. – ActA 343 ff. – actin polymerization 343 – anaerobic degradation of ethanolamine 354 – autolysin 352 – carbohydrate transport 346 – cell-wall-anchored surface protein 355 – E-cadherin 341 – fibronectin-binding proteins 352 – gene duplications 346 – genome comparison 346 – hexose phosphate transporter 352 ff. – host cell invasion 342 – Hpt 352 ff. – In1A 340 ff. – internalins 340 ff., 351 – intracellular growth 352 – intracellular motility 343 ff. – lipoate protein ligase (LplA1) 352 – lipoproteins 351 – listeriolysin O 342 – LLO 342 – LPXTG motif 341, 355 – metalloprotease Mpl 343 – phospholipase 342 – plasmids 348 – postgenomics 351, 358 – pregenomics 340, 358 – PrfA 344 ff. – PrfA box 356 – PrfA regulon 356 – regulation of virulence gene expression 344 ff., 354 – resistance to bile 353 – SigB 356, 357 – SigL 357 – sigma B-regulons 356 – sigma factor 357 – signature-tagged mutagenesis 352 – surface protein 346 – Tn916 348 – transport proteins 346 – two-component system 354 – two-dimensional gel electrophoresis 355 – virulence gene cluster 340 – vitamin B12 biosynthesis 354 LLO see listeriolysin O loss-of-function 88

m meningococcus 231 metabolic pathway 156 metagenomics 496 methicillin-resistant – S. aureus 53 – Staphylococcus aureus (MRSA) 486 – Spa (“Staphylococcus aureus encoded protein A”) 7 MIC see minimal inhibitory concentration microarray 8 ff., 21 ff., 382, 484 – cold labeling 31 – Cy3 31 – Cy5 31 – data analysis 9 – data normalization and analysis 32 – direct labeling 30 – DNA-DNA arrays 25 – dye switch experiments 9 – experimental design 25 – hybridization 31 – internal standard 9 – labelling/reverse transcription 30 ff. – probes 23, 24 ff. – replicates 28, 29 – RNA extraction 28 – RNA-DNA arrays 25 ff. – RNA-RNA arrays 25 ff. – scanning 31 – standardization 37 microbial – antimicrobial susceptibility 483 – conventional culture-based methods 483 – detection of biological warfare agents 482 – environmental microbiology 482 – medical microbiology 482 ff. microbial diagnostics 482 ff. – 16S-/18S-rRNA 484 – antibiotic resistance 486 – FISH 484 – fluorescence in situ Hybridization 484 ff. – molecular typing 485 – polymerase chain reaction (PCR) 485 ff. – real-time PCR 485 microsporidia 402 ff. – Antonospora locustae 403 – Encephalitozoon cuniculi 402 ff. – Encephalitozoon intestinalis 402 ff. – genome compaction 403 – Glugea atherinae 402 ff. – reductive evolution 402 ff. multilocus enzyme electrophoresis (MLEE) 177

Index multilocus sequence typing (MLST) 178 mobile genetic element 95, 152 mobility factor 178 molecular clock 375 ff. morons 154 MRSA see methicillin resistant S. a. MRSE see methicillin-resistanc S. e. MSG genes see major surface glycoprotein genes MSR genes see MSG-related protein genes mucosal immunity 461 multilocus enzyme electrophoresis 177 – regions of difference (RDs) 177 Mus musculus – adaptive immune system 451 – genome 451 – immunodeficiency disease 452 – innate immune system 451 – knockout mouse lines 451 – transgenic mouse lines 451 mutagenesis 70 ff. mutation – deletion 378 – insertion 378 – nonsynonymous 378 – silent substitution 378 – synonymous substitution 378 – transition 377 – transversion 377 mycobacterium 211 – BCG 215 – comparative genomics 215 – evolution 216 – M. africanum 215, 217 – M. avium 212 – M. bovis 215, 217 – M. canetti 215, 217 – M. fortuitum 212 – M. leprae 211, 222 – M. marinum 212 – M. microti 215 ff. – M. smegmatis 212 – M. tuberculosis 72, 211 ff., 217 – M. tuberculosis complex 72, 215 ff. – M. ulcerans 211, 223 – microdeletion 217 – mycolactone 224 – regions of difference (RDs) 215 – single nucleotide polymorphism 217 – transposon site hybridization 213 – TraSH 213 Mycobacterium leprae 222 – pseudogenes 222

– reduction in coding capacity 222 Mycobacterium marinum 450 Mycobacterium tuberculosis 212 – anaerobic respiration 221 – lipid metabolism 220 Mycobacterium ulcerans 223 Mycoplasma genitalium 76 Mycoplasma pneumoniae 76 Mycobacterium spp. 448 – phagocytosis 448

n natural competence 158 Neisseria 231 ff. – CEs 238 ff. – coding tandem repeats 238 – comparative genomics 240 ff. – Correia elements 238 ff. – cps locus 243 ff. – DNA recombination 236 – DNA uptake sequence 236 ff. – flexible genome 233 – genome sequence 232 – genomic island 236 – IHTs 236 – insertion sequences 238 – islands of horizontally transferred DNA 236 – N. gonorrhoeae 231 ff. – N. lactamica 231 ff. – N. meningitidis 77, 231 ff. – phase variation 237 ff. – plasmids 233 – prophage 233 – REP 239 – repeat-mediated antigenic variation 239 – repetitive DNA sequence elements 236 – repetitive extragenic palindromic 239 – representational difference analysis 241 ff. – resistance 233 – signature-tagged mutagenesis 239 ff. – simple sequence repeats 237 – STM 239 ff. – virulence factors 244 ff. network analysis – METATOOL 7 nonhierarchical clustering 381 nosocomial infection 125 ff., 175, 195 nosocomial pathogen 263 – IS1069 263

563

564

Index

o opportunistic fungal pathogens – Aspergillus fumigatus 397 – Candida albicans 397 – Cryptococcus neoformans 397 – Pneumocystis carinii 397 oxidative damage and protein stress 61 oxidative stress 61

p PAI see pathogenicity island PAIs 95 ff. – Shigella flexneri 92 – see also pathogenicity island Paracoccidioides brasiliensis 396 ff. – EST sequencing project 397 parasites – Cryptosporidium 417, 423 – Eimeria 417, 423 – Giardia 417 – insect vector genetics 436 – Leishmania 417 – new drug discovery 433 – Plasmodium 417, 423 – post-genome analyses 433 – Theileria 417, 423 – Toxoplasma 417, 423 – Trichomonas 417 – Trypanosoma 417 pathogenicity island 13, 92, 156, 162, 286 – regions of difference (RDs) 177 ff. pathway analysis 6 – elementary mode 14 peusodene 117 ff. phage 177 ff. phages 287 phase variation 158 plasmid 86, 99, 177 ff. Plasmodium 423 ff. – drug and vaccine development 435 ff. – gene expression analysis 434 – genome sequence 423 – microarray 434 – P. aeruginosa 450 – P. berghei 426 – P. chabaudi 426 – P. falciparum 423 – P. yoelii 426 – proteomics 434 ff Plasmodium falciparum – antigenic variation 426 – LCCL-domain 426 – linear chromosomes 424

– linear mitochondrial genome 424 – multidomain adhesive molecules 426 – plastid-like genome 424 – surface protein 424 ff. Pneumocystis – genome sequence 401 – major surface glycoproteins 401 – MSG genes 401 – MSG-related proteins (MST genes) 401 – Pneumocystis jirovecii 401 polymerase chain reaction 484 population equilibrium 375 postgenomic era 487 primary fungal pathogens 390 – dermatophytes 390 – dimorphic fungi 390 primer design – genome PRIDE 24 – PrimeArray 24 – Primer 3 24 progressive alignment 374 progressive sequence alignment 374 – CHROMAS 374 – CLUSTAL W 374 – DDBJ 374 – EMBOSS 374 – FASTA 374 – SEQ-CONVERT 374 prophage 86, 89 prophage attachment site 165 protein secretion 62 proteomics 43 ff. – 2-D PAGE 43 – color coding 49 – dual channel imaging 47 – first-level proteomics 65 – proteome signature 51 ff. – second-level proteomics 65 – starvation proteins 47 – stress proteins 47 – vegetative proteome 46, 56 protozoans 417 – apicomplexa 422 ff. – apicoplast 424 – endosymbiosis 424 – genome analysis 418 ff. – horizontal gene transfer 421 – organelle-encoded genes 422 – phylogenetic studies 420 pseudogene 89, 115 Pseudomonas aeruginosa 76, 448

Index

q quasispecies 374, 379 ff., 381, 382, 383

r random mutagenesis 73, 74 RD see regions of difference rearrangement 118, 156, 161, 163 recombination 161 regulon 47 ff. – HrcA 60 – virulence 63 repetitive element 158 resistance 14 ff., 125, 194 ff. reverse genetics 70 reverse vaccinology 247, 533 ff. – group B streptococcus 538 Rhizobiales 281 RNA virus 371, 379, 381 RNAi see RNA interference RTq-PCR 36

s Saccaromyces cerevisiae 73 ff. Salmonella – csrA 112 – lipopolysaccharides 113 ff. – O-antigen 113 ff. – S. arizonae 109 – S. bongori 109 – S. enterica 109 – S. diarizonae 109 – S. houtenae 109 – S. indica 109 – S. enterica serotype Gallinarum 118 – S. enterica serotype Paratyphi A 115 ff. – S. enterica serotype Typhimurium 110 – S. enterica serotypes Typhi 115 ff. – S. salamae 109 – S. pathogenicity system 111 ff. – S. signature genes 110 ff. Salmonella typhimurium 71, 449 – C. neoformans 449 – hypersensitivity 449 – innate immune response 449 – P. aeroginosa 449 – Serratia marcescens 449 – Staphylococcus aureus 449 – Yersinia pestis 449 SaPi see Staphylococcus aureus pathogenicity island SCC see staphylococcal cassette chromosome

SCCmec 190 ff. – see also staphylococcal cassette chromosome mec sequence annotation – BLAST 17 – TESS 14 sequence annotation tools – aptamers 4 – Genepredict 4 – Genescan 4 – Orpheus 4 – Prophet 4 – riboswitch 4 – RNA analyzer 4 – UTRscan 4 – Vienna package 4 sequence space – mutant cloud 380 Serratica marcescens, Listeria monocytogenes 450 serial analysis of gene expression (SAGE) 433 serotype conversion 89 SERPA see serological proteome analysis SHI-2 PAI 92 SHI3, PAI 92 Shigella dysenteriae 88 Shigella flexneri 86 – CadA 88 – OmpT 88 – pINV 89 – SHI-2 95 Shigella sonnei 88 Shigella spp. 85 ff. rB 49, 60, 61, 62 ff., 63 ff. rH 49 rS 133 signature-tagged mutagenesis 71, 213, 410 single nucleotide polymorphisms (SNPs) 152, 495 slipped strand mispairing 237, 302 software platforms – DenDB 6 – Magpie 6 – pedant 6 spotted array 23 ff. Staphylococcus – coagulase-negative 176, 198 – coagulase-positive 176 – genomic island 188 – mecA 190 ff – methicillin resistance 190

565

566

Index – methicillin-resistant S. epidermidis (MRSE) 198 – mSac 187 – phages 188 – S. aureus 175 ff. – S. carnosus 176 – S. chromogenes 187 – S. delphini 176 – S. epidermidis 175 ff., 187 – S. haemolyticus 190 – S. hominis 190 – S. intermedius 176 – S. saprophyticus 175 – S. schleiferi 176 – S. xylosus 187 – staphylococcal cassette chromosome 190 ff. Staphylococcus aureus 43 ff., 53 ff., 77, 175 ff. – agr 63 ff. – antibiotic resistance 177 – ArlR 62, 63 – bacteriophage 192 – biofilm formation 184 – comparative genomics 176 – U11 193 – genetic diversity 177 – cccF/cccG system 177 – information pathway 180 – IS256 129, 178, 201 ff. – lysogenic conversion 193 – mecA genomic island 177 ff. – metabolic pathway 178 – methicillin-resistant S. aureus (MRSA) 175 – MprF 138 – MRSA 178 – mSaa 186 ff. – mSab 186 ff. – pathogenicity island 177 ff., 185 ff. – plasmids 194 – RNA III 62, 181 – SaeR 62, 63 – SaGIm 186 – SaPI 185 ff. – SaPI1 185 – SaPI2 186 – SaPI3 185 – SaPI4 186 – SaPIbov 187 – SaPIbov2 187 – SaPIn1 177 ff. – SaPIn2 177 ff. – SaPIn3 177 ff. – SarA 61, 62, 63 ff.

– rB 180 – SrrA 57, 60 – SrrAB 61 ff. – Th1546 177 – Tn1546 194 – transcriptional regulator 181 – two-component regulatory system 180 – vancomycin resistance 194 – vancomycin-resistant S. aureus 177 – virulence factors 181 ff. Staphylococcus epidermidis 129, 195 ff. – biofilm formation 199 – comparative genomics 196 – genomic island 197 – icaADBC 200 – insertion sequences 201 – mSe1 197 – mSe2 197 – phage SPb 197 – S. albus 175 – SCCmec 198 – staphylococcal cassette chromosome 198 – virulence factors 197 statistical significance 34 stimulon 47 ff. – glucose starvation 51 – heat stress 48 ff. – phosphate starvation 48 ff. STM 410 Streptococcus 149 – ADP-ribosyltransferase 154 – C5a peptidase 153 – CAMP factor 153 – comparative genomics 161 – cysteine proteinase 153 – DNases 153 – endoglycosidase 154 – fibronectin binding proteins 153 – GAS 156 – group A streptococcus 152 – group B streptococcus 155 ff. – group C streptococcus 156 – group D streptococcus 157 – group G streptococcus 156 – hyaluronic acid capsule 153 – hyaluronidase 153 – immunoglobulin G degrading enzyme 154 – IS1562 transposon 155 – laminin binding protein 154 – M protein 153 – M-like proteins 153 – M-type 152

Index – phopholipases 153 – phospholipase A2 154 – prophage 162 ff. – prophage decay 165 – R6 surface protein 154 – S. agalactiae 150, 155 ff. – S. anginosus 159 – S. bovis 157 – S. canis 156 – S. constellatus 159 – S. dysgalactiae 156 – S. equi 151, 156 – S. gordonii 151, 159 – S. intermedius 159 – S. milleri 156 – S. mitis 151, 159 – S. mutans 150, 160 – S. pneumoniae 150, 158 – S. pyogenes 150 – S. salivarius 159 – S. sanguis 151, 159 – S. sobrinus 151, 160 – S. suis 151, 157 – S. thermophilus 150, 160 – S. uberis 151, 157 – S. zooepidemicus 151, 156 – streptokinase 153 – streptolysin O 153 – superantigens 153, 163 – temperate bacteriophage 166 subtractive hybridization 70

t TCA cycle 46, 56, 60 TLR see Toll-like receptor Toll-like receptors (TLRs) 451, 458, 459 Toxoplasma – genetic manipulation 429 – genome 428 ff. – introns 429 – Toxoplasma gondii 428 transcriptome analysis 21 ff. transposon 76 ff., 86, 177 ff., 513 transposon mutagenesis 71 Trichomonas 431 ff. – genome sequence 432 Trypanosoma – kinetoplast genome 430 – nuclear genome 430 – Tr. brucei gambiense 430 – Tr. brucei rhodesiense 430 – Tr. cruzi 430 tuberculosis 211

two-component regulatory system 156 type III secretion system 111 ff. – effector proteins 111 type IV pilus 78

v vaccine development 533 ff. – comparative genomics 534, 541, 543 ff. – conventional vaccinology 533 – functional genomics 534, 539 ff. – gene expression profiles 542 – host-pathogen interactions 542 – in vitro transposition 541 – proteomics 546 – recombinant vaccines 533 – reverse vaccinology 534 – Neisseria meningitidis serogroup B 536 – species-immanent variation 537 – surface-exposed antigens 537 – serological proteome analysis (SERPA) 546 – SERPA (serological proteome analysis) 546 – signature-tagged mutagenesis (STM) 540 – transcriptome analysis 541 – transposon site hybridization 541 – VET (in vivo expression technology) 540 vancomycin 53, 125, 127 vancomycin-resistant enterococci 507 Vibrio cholerae 72 virogenomics 369 – comparative genomics 370, 379 – databases 371 ff. – DNA microarrays 383 – evolution 371 – genome neighbors 379 – identification of functional domains 377 – memory genomes 381 – molecular epidemiology 370 – multifunctional proteins 377 – mutant cloud 379 – partition analysis of quasispecies 381 – PCR 369 – polymerase chain reaction 369 – quasispecies 370 ff. – recombination 377 – reference sequence 379, 381 – reverse transcription 369 – reverse transcription PCR 369 – RT-PCR 369 – second-generation data banks 382 – viral population 370

567

568

Index virulence factors 96, 446 – adhesin 96 – invasins 100 – iron uptake system 96 – lipopolysaccharides 100 – polysaccharide capsules 100 – proteases 100

– secretion systems 96 – toxins 6 virulence plasmid 89 viruses 369 ff. – origin 378 VNTRs see variable-number tandem repeats