NEW APPROACHES FOR THE GENERATION A N D ANALYSIS OF MICROBIAL TYPING DATA
This book is dedicated to the memory of Jan Ursing (1926- 2000),
Swedish microbiologist, taxonomist and philosopher
"...taxonomy is on the borders of philosophy because we do not know the natural continuities and discontinuities..."
This Page Intentionally Left Blank
NEW APPROACHES FOR THE GENERATION AND ANALYSIS OF MICROBIAL TYPING DATA Editedby Lenie Diikshoorn
Department of Infectious DiseasesC5-P Leiden University Medical Center P.O. Box 9600 2300 RC Leiden The Netherlands
Kevin J. Towner
Public Health Laboratory University Hospital Queen's Medical Centre Nottingham NG7 2UH United Kingdom
Marc Struelens
Department of Microbiology H6pital Erasme ULB 808, Routede Lennik B-1070 Bruxelles Belgium
2001 ELSEVIER A m s t e r d a m - L o n d o n - N e w Y o r k - O x f o r d - Paris - S h a n n o n - Tokyo
ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands
o 2001 Elsevier Science B.V. All rights reserved.
This work is protected under copyright by Elsevier Science, and the following terms and conditions apply to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use. Permissions may be sought directly from Elsevier Science Global Rights Department, PO Box 800, Oxford OX5 1DX, UK; phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected]. You may also contact Global Rights directly through Elsevier's home page (http://www.elsevier.nl), by selecting 'Obtaining Permissions'. In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (+ 1) (978) 7508400, fax: (+ 1) (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London W1P 0LP, UK; phone: (+44) 207 631 5555; fax: (+44) 207 631 5500. Other countries may have a local reprographic rights agency for payments. Derivative Works Tables of contents may be reproduced for internal circulation, but permission of Elsevier Science is required for external resale or distribution of such material. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the Publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier Science Global Rights Department, at the mail, fax and e-mail addresses noted above. Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made.
First edition 2001 Library of Congress Cataloging in Publication Data A catalog record from the Library of Congress has been applied for.
ISBN:
0-444-50740-x
O The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in The Netherlands.
Preface Classification and identification methods for microorganisms have been based for many years on the phenotypic properties exhibited by individual isolates. However, increasing use of molecular identification and typing techniques has resulted in a re-evaluation of the whole process of classification and identification, with the recognition of new relationships and the revision of some previously accepted taxonomic schemes. Organisms can be classified at different levels, of which the most immediately important to microbiologists is the concept of the bacterial species. It is now accepted by microbial taxonomists that a complete genomic DNA sequence should form the reference standard for determining phylogeny, and hence taxonomy and species, but rRNA relatedness has also been used increasingly (and more controversially) to make assumptions about the phylogeny of microorganisms and to delineate new species. Once individual species have been defined, they can be arranged on a hierarchical system into genera and families. The different philosophical approaches underpinning microbial classification are outlined in Chapter 1 of this book, but whichever approach is used, it is important that molecular relationships should be depicted accurately in classification schemes. Once accurate identification to the species level has been achieved, many branches of microbiology, particularly medical microbiology, require the investigation of diversity below the species level, a process referred to as typing. As with identification to the species level, many well-established typing methods, developed over many years, are based on the study of phenotypic properties. However, most of these methods are restricted to particular groups of organisms and are not capable of rapid development to deal with emerging problems. While a complete DNA sequence would also form the ultimate reference standard for defining subtypes within a species, it is highly unlikely that routine microbiology laboratories will ever have the resources or capability to routinely sequence all of their isolates requiring investigation at this level. However, for many purposes, it is unnecessary to define the precise 'type' of an isolate. It is often sufficient to provide answers about the relatedness (or otherwise) of small collections of isolates. To this aim, a wide range of readily applicable comparative molecular fingerprinting methods has been developed over the past two decades, several of which have the potential to be used for studying microbial diversity in any competent microbiology laboratory. Details of the most important of these methods can be found in the succeeding chapters of this book. Visual comparison of the different fingerprints is often sufficient for day-to-day analysis (the intuitive pattern recognition capability of the human brain is still superior to all available forms of artificial intelligence for this purpose), but a computer-assisted strategy is required for the analysis of large sets of more complex fingerprints obtained over extended periods of time from different geographical locations. Such a strategy also enables the possibility of constructing reference databases of fingerprint patterns that can be contributed to and accessed by laboratories situated in many different countries. The aim of this book
vi is to describe the novel methods that are currently available for the generation, analysis and classification of microbial typing fingerprints, and to indicate how they can be used to provide timely and accurate information of use to microbiologists working in a range of different specialisations. The idea for this book originated from the discussions held among scientists from diverse backgrounds during workshops organised by the European Study Group on Epidemiological Markers (ESGEM) of the European Society of Clinical Microbiology and Infectious Diseases (ESCMID). We hope that more such initiatives will follow as the state of the art rapidly evolves in this field. We are indebted to all the contributing authors for their collaboration, friendship and enthusiasm. We would also like to thank our families and colleagues for their encouragement, patience and understanding.
Lenie Dijkshoorn Kevin Towner Marc Struelens
vii
List of contributors Antoon D.L. Akkermans Laboratory of Microbiology Wageningen University Hesselink van Suchtelenweg 4 6703 CT Wageningen The Netherlands Tel: + 31 317 483486 Fax: + 31 317 483829 E-mail: antoon.akkermans @algemeen. micr.wau.nl Dominique A. Caugant Department of Bacteriology WHO Collaborating Centre for Reference and Research on Meningococci National Institute of Public Health PO Box 4404 Nydalen N-0403 Oslo Norway Tel: + 47 220 42311 Fax: + 44 220 42518 E-mail: dominique.caugant @folkehelsa.no and Insitute of Oral Biology University of Oslo PO Box 1052 Blindern N-0316 Oslo Norway Willem M. de Vos Laboratory of Microbiology Wageningen University Hesselink van Suchtelenweg 4 6703 CT Wageningen The Netherlands Tel: + 31 317 483100 Fax: + 31 317 483829 E-mail: Willem.deVos @algemeen.micr.wau.nl
Ariane Deplano Centre for Molecular Diagnostic Microbiology Department of Microbiology H6pital Erasme Free University of Brussels 808, Route de Lennik B- 1070 Brussels Belgium Tel: +32 2 555 45 18 Fax: +32 2 555 31 10 E-mail:
[email protected] Lenie Dijkshoorn Department of Infectious Diseases C5-P Leiden University Medical Center P.O. Box 9600 2300 RC Leiden The Netherlands Tel: + 31 (0)71 5263582 Fax: + 31 (0) 71 5266758 E-mail: L.Dijkshoorn @lumc.nl Francine Grimont Unit6 des Ent6robact6ries INSERM U389 Institut Pasteur 28 Rue du Docteur Roux F-75724 Paris Cedex 15 France Tel: + 33 145 68 83 44 Fax: + 33 1 45 688837 E-mail:
[email protected] Patrick A.D. Grimont Unit6 des Ent6robact6ries INSERM U389 Institut Pasteur 28 Rue du Docteur Roux F-75724 Paris Cedex 15 France Tel: + 33 145 68 83 40 Fax: + 33 1 45 688837 E-mail:
[email protected]
viii
Hajo Grundmann Division of Microbiology University of Nottingham University Hospital Queen's Medical Centre Nottingham NG7 2UH UK Tel: + 44 115 9709163 Fax: + 44 115 9422190 E-mail:
[email protected]
Paul J.D. Janssen EMBL Outstation- Hinxton European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton Cambridge CB 10 1SD UK Tel: + 44 (0) 1223 494 418 Fax: + 44 (0) 1223 494 468 E-mail: paul.janssen @advalvas.be
John H. Hauman 8 Oxford Street St. Clair Dunedin New Zealand Tel: +64 3 455 9205 Fax: +64 3 455 9205 E-mail: jhhauman @es.co.nz
Kristin Kremer Diagnostic Laboratory for Infectious Diseases Perinatal Screening National Institute of Public Health and the Environment PO Box 1 3720 BA Bilthoven The Netherlands Tel: +31 (0) 30 2742282 Fax: +31 (0) 30 2744418 E-mail: kristin.kremer @rivm.nl
Herre E Heersma Division of Public Health Research Management Team Computerisation and Methodological Consultancy National Institute of Public Health and the Environment PO Box 1 3720 BA Bilthoven The Netherlands Tel: + 31 (0) 30 2742067 Fax: + 31 (0) 30 2744456 E-mail:
[email protected] Marc Heyndrickx Department for Animal Product Quality Center for Agricultural Research Brusselse Steenweg 370 B-9090 Melle Belgium Tel: + 32 9 252 1861 Fax: + 32 9 252 5085 E-mail:
[email protected]
Arjen van Ooyen Nederlands Institute for Brain Research Meibergdreef 33 1105 AZ Amsterdam The Netherlands Tel: + 31 20 5665500 Fax: + 31 20 566 5483 E-mail: A.van.Ooyen @nih.knaw.nl Raf De Ryck Centre for Molecular Diagnostic Microbiology Department of Microbiology H6pital Erasme Free University of Brussels 808, Route de Lennik B- 1070 Brussels Belgium Tel: +32 2 555 45 17 Fax: +32 2 555 64 59 E-mail: Raf.De.Ryck@ ulb.ac.be
ix Nicholas A. Saunders Molecular Biology Unit Hepatitis and Retrovirus Laboratory Central Public Health Laboratory 61 Colindale Avenue NW9 5HT London UK Tel: +44 20 8200 4400 Ext. 3072 Fax: +44 20 8200 1569 E-mail: nsaunders @phls.nhs.uk Dick van Soolingen Diagnostic Laboratory for Infectious Diseases and Perinatal Screening National Institute of Public Health and the Environment PO Box 1 3720 BA Bilthoven The Netherlands Tel: +31 (0) 30 2742363 Fax: +31 (0) 30 2744418 E-mail: D.van.soolingen @rivm.nl Marc J. Struelens Centre for Molecular Diagnostic Microbiology Department of Microbiology H6pital Erasme Free University of Brussels 808, Route de Lennik B- 1070 Brussels Belgium Tel: + 32 2 555 4519 Fax: + 32 2 5556459 E-mail: marc.struelens @ulb.ac.be
Kevin J. Towner Public Health Laboratory University Hospital Queen's Medical Centre Nottingham NG7 2UH UK Tel: + 44 115 9709163 Fax: + 44 115 9422190 E-mail: Kevin.Towner @nott.ac.uk Mario Vaneechoutte Department of Clinical Chemistry, Microbiology & Immunology University Hospital Ghent De Pintelaan 185 B-9000 Ghent Belgium Tel: + 32 9 2403692 Fax: + 32 9 2403659 E-mail: mario.vaneechoutte @rug.ac.be Erwin G. Zoetendal Laboratory of Microbiology Wageningen University Hesselink van Suchtelenweg 4 6703 CT Wageningen The Netherlands Tel: + 31 317 483486 Fax: + 31 317 483829 E-mail: Erwin.Zoetendal @algemeen. micr.wau.nl
This Page Intentionally Left Blank
xi
Contents Preface List of contributors
v vii
1 An Introduction to the Generation and Analysis of Microbial Typing Data L. Dijkshoorn and K. Towner 2 Theoretical Aspects of Pattern Analysis A. van Ooyen 3 Setting-Up Intra- and Inter-Laboratory Databases of Electrophoretic Profiles H.E Heersma, K. Kremer, D. van Soolingen and J. Hauman 4 Fingerprinting of Microorganisms by Protein and Lipopolysaccharide SDS-PAGE L. Dijkshoorn 5 rRNA Gene Restriction Pattern Determination (Ribotyping) and Computer Interpretation P.A.D. Grimont and F. Grimont 6 Generation and Analysis of RAPD Fingerprinting Profiles K. Towner and H. Grundmann 7 Analysis of Microbial Genomic Macrorestriction Patterns by PulsedField Gel Electrophoresis (PFGE) Typing M.J. Struelens, R. De Ryck and A. Deplano 8 Selective Restriction Fragment Amplification by AFLP TM PJ.D. Janssen 9 Application and Analysis of ARDRA Patterns in Bacterial Identification, Taxonomy and Phylogeny M. Vaneechoutte and M. Heyndrickx 10 Insertion Sequence (IS) Typing and Oligotyping N.A. Saunders 11 Molecular Characterisation of Microbial Communities Based on 16S rRNA Sequence Diversity E.G. Zoetendal, A.D.L. Akkermans and W.M. de Vos 12 From Multilocus Enzyme Electrophoresis to Multilocus Sequence Typing D.A. Caugant
299
Author index
351
Keyword index
353
1 31
47
77
107 135
159 177
211 249
267
This Page Intentionally Left Blank
1
An Introduction to the Generation and Analysis of Microbial Typing Data
Lenie Dijkshoorn ~ and Kevin Towner 2 1Department of Infectious Diseases, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands; 2public Health Laboratory, University Hospital, Queen's Medical Centre, Nottingham NG7 2 UH, UK
CONTENTS 1.1
TYPING, W H A T DO W E M E A N ?
1.2
T H E I M P O R T A N C E OF M I C R O B I A L SPECIES I D E N T I F I C A T I O N AND TYPING . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
A. B.
3 4
1.3
C.
9
The concepts of identification, typing and fingerprinting Definitions (i) Species (ii) Subspecies (iii) Var / type (iv) Strain (v) Clone (vi) Isolate Common strategies in taxonomy, population biology and typing .
.
.
.
.
.
.
.
. 4 4 5 5 6 6 7 7 8 8
~
.
.
.
.
.
.
.
Phenotypic vs genotypic methods Identifying vs comparative typing methods Library typing methods
A P P L I C A T I O N S OF T Y P I N G A. B. C.
1.6
Species identification in microbiology Discrimination below the species level
C A T E G O R I E S OF T Y P I N G M E T H O D S A. B. C.
1.5
2
I D E N T I F I C A T I O N A N D T Y P I N G IN T A X O N O M I C P E R S P E C T I V E A. B.
1.4
....................
......................
Applications in medical microbiology Applications in other microbiological fields Applications in eukaryotic gene analysis
8
8 9 9
10 10 10 11
AN I N T R O D U C T I O N TO C O M M O N L Y U S E D T Y P I N G M E T H O D S
11
A. B. C. D. E. E G.
12 12 13 14 14 15 16
Biotyping Antibiogram typing Serotyping Phage-typing Bacteriocin typing Overview of the development of genotypic typing methods Conclusions
Elsevier Science B.V. All rights reserved.
1.7
C O M P U T E R - A S S I S T E D DATA ANALYSIS
1.8
QUALITY ASPECTS OF M I C R O B I A L TYPING A. Overview B. Typedefinition and delineation C. Discriminatory capacity D. Epidemiologic concordance E. Typingsystem concordance F. Reproducibility G. The stability of strains H. Typability I. Standardisationof data analysis J. Conclusions
1.9
STRAIN C O L L E C T I O N AND DATA M A N A G E M E N T A. Speciesidentification B. Culturecollections C. Storageof strains and data registration
1.10 PROSPECTS FOR THE F U T U R E REFERENCES
1.1
.............. ............
....................
.................................
17
........
18 18 19 19 20 20 20 21 22 22 22 23 23 23 23 24 25
TYPING, WHAT DO WE MEAN?
The term typing is used frequently, but its interpretation varies depending on the context in which it is used. For example, it can be used in daily life as a very general way to describe distinctive features of an object or organism. In microbiology it is used in a more strict sense to describe a microbial isolate in such a way that it can be distinguished from other isolates. Thus, typing can be an identifying method in which a given character, e.g., a serotype, is used to identify an isolate as one already included in an existing typing scheme. Typing can also be done comparatively, which implies that microbial isolates of a defined set are compared to each other for similarity, without any reference to other isolates not included in the set, and without reference to existing classification schemes. If isolates are compared on the basis of complex characters, such as the DNA fragments comprising an electrophoretic profile, the term fingerprinting can be used instead. The taxonomic level at which typing is performed can give rise to confusion among microbiologists. Most commonly it is understood that typing - e i t h e r comparative or identifying - deals with discrimination of isolates below the species level, and this concept is used in this book in most cases. However, the term typing is sometimes used as a synonym for species identification. To some extent this is understandable since the classification of organisms is a dynamic process, and the splitting up of species into novel species previously considered subspecies or variants is not uncommon. Nevertheless, it should be noted that 'identification' implies
a priori identification of an object or organism to an existing classification, whatever the taxonomic level is. In this context, if a subdivision below species level exists, e.g., at strain level, the term identification can be used to allocate isolates to the units of such a classification. Thus, the term 'strain identification' may be appropriate. The following sections of this chapter deal with general aspects of typing, including data management and analysis, and quality control. 1.2
THE IMPORTANCE OF MICROBIAL SPECIES IDENTIFICATION AND TYPING
A.
Species identification in microbiology
Since the early days of microbiology, numerous organisms have been found to be the causative agent of a specific infectious disease. These organisms were described in detail on the basis of methods available at the time (Brock, 1988; Dubos, 1988). In the course of time, more organisms were isolated in pure culture and classified according to their characters. The groups delineated were given names, and characters were sought that allowed their rapid identification. As a result, their association with particular diseases could be clarified and further research performed on their epidemiology and pathogenicity. Hence, strategies could be developed to combat these organisms. Thus, for many infectious agents that were recognised in the first era of microbiology, their identification to a species was, and still is, the signal for the implementation of specific prevention or control measures. Consequently, the threat of severe diseases, including scarlet fever, tuberculosis and typhoid fever, has been reduced drastically in the Western world in the 20th century as compared to the 19th century. Elucidation of the role of particular microbial species in infectious diseases and their impact on human health has long dominated microbiology. However, although at first sight perhaps less spectacular, the diversity and ecology of microorganisms in general has also been studied from the very beginning, as exemplified by the work of Beijerinck and followers of the Delft School of Microbiology (La Rivi~re, 1997). Thus, numerous microbial species have been delineated in the past century that have no apparent clinical significance, and many of these are still the subject of study in other fields of microbiology, including microbial genetics, physiology, environmental and food microbiology (Dubos, 1988). As a result, numerous species are exploited in different ways, e.g., Streptomyces spp. for their production of antibiotics (Korn-Wendisch & Kutzner, 1992), or thermophilic archaebacteria and eubacteria for their production of important enzymes (Bergquist et al., 1989). Other organisms, e.g., Lactobacillus spp. (Vogel & Ehrmann, 1996), are indispensable in the food industry, or are important, e.g., Sphingomonas spp. (Ye et al., 1996) for their capacity to degrade xenobiotic compounds. Exploration of the general microbial diversity and precise recognition of species and strains are important for the study and use of these organisms. Novel DNA-
based techniques, a number of which are described in this book, are useful tools for exploration of the microbial world.
B.
Discrimination below the species level
Exploration of the diversity and biology of microbes during the last few decades has shown that strains of the same species may vary in characters, including pathogenicity or epidemicity. Many infections and outbreaks of infections are caused by specific strains of species which are in general not particularly pathogenic. For example, bacteria of the species Escherichia coli are normal inhabitants of the human colon, but particular strains, e.g., serotype O 157:H7, can give rise to severe infections and large-scale foodborne epidemics (Griffin et al., 1995). In hospitals, the advancement of health care over the past decades has been coupled with an increase in the number of severely ill patients in hospitals, and also with the emergence of nosocomial pathogens. These nosocomial pathogens, including Pseudomonas aeruginosa, Klebsiella pneumoniae, Serratia marcescens, Acinetobacter spp. and Staphylococcus epidermidis, are relatively harmless to the healthy individual, but can give rise to infections in vulnerable hospitalised patients (Bergogne-B6r6zin & Towner, 1996; Farmer, 1999; Archer, 2000; Pollack, 2000). Usually these organisms are resistant to multiple antibiotics, and there are indications t h a t - at least for some species - particular strains are responsible for these problems (Pitt et al., 1989; Mulligan et al., 1993; Dijkshoorn et al., 1996). The precise genetic factors that determine the virulence or epidemicity of particular strains within these species are usually unknown, but typing can be an aid in determining strain characters (e.g., a sero- or phage type, or a DNA fingerprint) that are indicative of these pathogenic and/or epidemic strains. Thus, typing can be a tool for tracing and controlling the spread of clinically important strains. 1.3
IDENTIFICATION AND TYPING IN TAXONOMIC PERSPECTIVE
A~
The concepts of identification, typing and fingerprinting
The purpose and taxonomic levels of identification, typing and fingerprinting are summarised in Table 1.1. These concepts have been been discussed from a taxonomic point of view by Goodfellow & O'Donnell (1993a, b). Identification is both the act and the result of determining whether an unknown organism belongs to a previously defined group, and organisms are given names by recognising that they are members of previously described taxa. The term typing is usually used to denote the differentiation of strains at subspecies level or below, and this activity is often used to trace the spread of a single strain causing an outbreak of infection. Fingerprinting is the characterisation of organisms in such a way that multivariate data are obtained, including graphic curves resulting from mass or infrared spectrometry, or banding patterns resulting from electrophoretic separation of protein or nucleic acid fragments. These fingerprints can be used for calculating the simi-
Table 1.1. Purpose and taxonomic levels of identification, typing and fingerprinting Activity
Purpose
Taxonomic level
Subspecies, species or higher Identification Allocationof unknown strains to described taxa Below species level Typing Allocation of unknown strains to described 'types'. Discrimination of strains (= comparison of strains on the basis of similarity in one or more characters; see fingerprinting) Species level and below Fingerprinting Comparisonof organisms on the basis of multivariate, variable, quantitative characterisation data, spectra or electrophoretic profiles
larity of strains, followed by their grouping on the basis of similarity.
B.
Definitions
Organisms can be classified at different levels. The species level, and categories below species level, have practical value in applied microbiology. As an introduction to the chapters dealing with typing methods in daily practice, it is necessary to provide a brief summary of some terms used to denote the different categories.
(i)
Species
The species concept and the criteria used to delineate bacterial species are a continuous source of research and discussion (Wayne et al., 1987; Dykhuizen & Green, 1991; Pennington, 1994; Stackebrandt & Goebel, 1994; Dijkshoorn et al., 2000). In Bergey's Manual of Systematic Bacteriology, the bacterial species has been described as a collection of strains that share many features in common and differ considerably from other strains (Staley & Krieg, 1984). An ad hoc committee of the International Committee of Systematic Bacteriology has recommended that the complete DNA sequence of the genome should be the reference standard for phylogeny, and that phylogeny should determine taxonomy (Wayne et al., 1987). It was proposed that a species should be the only taxonomic unit to be defined in phylogenetic terms, and that it should include strains of c.70% or greater DNADNA relatedness and with a difference of 5~ or less in thermal stability (A Tm). Furthermore, it was stressed that names should not be allocated to genomic groups that cannot be differentiated by phenotypic properties. The overall concern of the Committee was that phylogenetically-based taxonomic schemes must also show phenotypic consistency. However, some genomic groups that have been delineated by DNA-DNA hybridisation are phenotypically so similar that they cannot be differentiated phenotypically for the time being. In these situations it seems best to allow a nomenspecies to contain more than one genomic group, and to designate the genomic groups (genomovars) by numbers (Ursing et al., 1995). Recently, rRNA relatedness has increasingly been the basis for assumptions on the phylogeny of organisms (Woese et al., 1987). In the course of the 1990s, 16S
rRNA sequence similarity data, rather than DNA-DNA hybridisation data, have increasingly been used for the creation of novel species that contain only one or a few strains (see e.g., Ravot et al., 1999). However, it has been pointed out by Palleroni (1993) that 16S rRNA may not reflect the evolution of host bacteria sufficiently. Thus, the variability of rRNA sequences can be limited in closely related organisms (Fox et al., 1992), while 16S rRNA sequence variation within species has also been noted (Vaneechoutte et al., 1995; Cilia et al., 1996). Recent studies have suggested that sequences of conserved proteins provide an alternative model for microbial evolution (Gupta, 2000). Data on the homogeneity or heterogeneity of conserved proteins, and of rDNA and other DNA sequences within species, are still limited, but it is expected that the rapidly expanding libraries of microbial DNA and protein sequences may provide information in the near future on the validity of particular DNA or protein sequences used for the inclusion of individual strains in a species. At present, the common practice is to delineate species on the basis of a variety of genotypic and phenotypic characters. Such a so-called polyphasic approach (Colwell, 1970; Vandamme et al., 1996) can be considered as a consensus classification, and numerous examples of species thus recognised are described in the International Journal of Systematic and Evolutionary Microbiology (formerly the International Journal of Systematic Bacteriology). These descriptions usually also include a detailed presentation of the phenotypic characters that can be used for identification of the described taxa.
(ii)
Subspecies
In Bergey's Manual of Systematic Bacteriology (Staley & Krieg, 1984), the subspecies is described as the lowest taxonomic rank with a nomenclatural standing, but clearcut criteria for this rank were not given by the authors, nor by the International Committee on Systematic Bacteriology (Wayne et al., 1987). Intra-subspecies ranks (e.g., serovars or phagovars; see below) have been acknowledged to have great practical usefulness, but they have no official standing in nomenclature.
(iii) Var / type It was recommended by Bergey's Manual of Systematic Bacteriology (Staley & Krieg, 1984), that the term 'type' should not be used to denote, for instance, serotypes or biotypes. Instead it was suggested that the term 'var' (derived from variety) should be used, since the term 'type' should only be used to denote an example of a species or genus. However, this recommendation has received little appreciation or attention, and in the different fields of applied microbiology it is common to describe entities below the species level as types. These so-called types can be distinguished within a set of isolates on the basis of one single typing method (e.g., serotyping or biotyping), or on the basis of a combination of typing methods (Sloos et al., 1996; Bernards et al., 1997; Van Pelt et al., 1999).
(iv) Strain In laboratory jargon, a strain is usually considered to be a culture of a specific microorganism which is on the bench of the laboratory worker or stored in the refrigerator. As such, this concept corresponds to the description of Staley & Krieg (1984), i.e., 'a strain is made up of descendants of a single isolation in pure culture and is usually made up of a succession of cultures ultimately derived from a single colony'. A strain in this sense can be stored over longer periods under a specific designation and has been denoted as the strain in the taxonomic sense (Dijkshoorn et al., 2000). The strain in the taxonomic sense may change genetically over time, but will keep its designation. In addition, another type of strain was considered, i.e., a strain in nature, which might include (e.g.) different isolates from different body sites of the same patient which are assumed to have been derived from an initial single colony, but not under human control. The concept of the strain in nature also applies to isolates of the same species that have spread among patients by cross infection, and which are assumed to represent the same strain, based on the finding that they exhibit common phenotypic and genotypic traits that are distinctive from those of other isolates of the same species (Struelens et al., 1996). (v) Clone In many biological text books, the term clone is used to indicate all cells that descended from a common ancestor. This concept is relatively straightforward when applied in hospital epidemiology to isolates in a direct chain of replication and transmission from host to host or from the environment to host (Struelens et al., 1996). In this sense, the terms strain (in nature) and clone are synonyms, and their recognition can be derived from data obtained by one or several typing methods, combined with data on the epidemiological origin of the organisms (Dijkshoorn et al., 2000). The term clone is also used in a wider context 'to denote bacterial cultures isolated independently from different sources, in different locations, and perhaps at different times, but showing so many identical phenotypic and genetic traits that the most likely explanation for this identity is a common origin' (Orskov & Orskov, 1983). Numerous population studies have suggested that a variety of bacterial species contain clones with a wide geographic spread (e.g., Achtman et al., 1983; Ochman & Selander, 1984; Musser & Selander, 1990), while other species are panmictic or superficially clonal (Maynard Smith et al, 1993; see also Chapter 12). During the numerous generations in a line of descent, the genetic diversity within a clone will increase, and there are no strict rules to decide on whether isolates belong to a clone or not. In practice, clones are delineated using methods developed by population biologists, including multilocus enzyme electrophoresis (MLEE) or multilocus sequence typing (MLST), often supplemented by genotypic and phenotypic methods currently used in applied microbiology. The current trend is that population biologists and applied microbiologists are combining their efforts to obtain a better understanding of the diversity or non-diversity within bacterial species and, in particular, on the emergence of specific virulent clones
(Selander & Musser, 1990).
(vi) Isolate Any pure culture (or subculture) of a bacterial species on a solid or liquid culture medium can be denoted as an isolate.
C.
Common strategies in taxonomy, population biology and typing
On first sight, the delineation of species is the field of the taxonomist, while the recognition of clones is the field of the population biologist, cq the applied microbiologist, including the clinical microbiologist or plant biologist. At a closer look, this distinction is arbitrary. Apart from DNA-DNA hybridisation, the strategies used in these fields are increasingly the same, since many studies use a polyphasic approach combining DNA sequence analysis of specific genes or genomic fingerprinting techniques, and one or several methods for phenotypic characterisation. Thus, the study of diversity below the species level does not basically differ from the approaches used in taxonomy, and might be considered as 'taxonomy below the species level', although there are no formal recommendations for this process and there is a great lack of uniformity in the use of methods and criteria for delineation of the groups. 1.4
CATEGORIES OF TYPING METHODS
Typing methods can be grouped in different categories, including phenotypic vs genotypic methods, and comparative vs definitive (absolute or identifying) methods. Recently another term, library typing, has been proposed.
A.
Phenotypic vs genotypic methods
Most commonly used typing methods are denoted as phenotypic or genotypic, depending on the markers used. In the former case, the characters used for distinction of organisms are phenotypic, e.g., an epitope which can be detected with specific antibodies, or the susceptibility to a specific bacteriophage (see section 1.6). If DNA or RNA are the chemical structures to be used for discrimination, then the method is considered as 'genotypic'. The term molecular typing, if used as a synonym for genotyping, is somewhat misleading since it may refer not only to DNA- or RNA-based methods, but also to methods that investigate other chemical classes at the molecular level, including lipopolysaccharides, lipids or proteins. Norris (1980) distinguished different levels of genetic information in living cells, and proposed a scheme for the study of information at these different levels. An updated scheme with currently applied methods is presented in Table 1.2. At the first level, the genetic information is represented by itself, and the methods that differentiate organisms at this level are genotypic methods. At the second level, the genetic information is expressed in the structure of proteins; at the third level, in
Table 1.2. Levels of expression of genetic information and typing/identification methods for each level Level Marker/character
Method
The genome
GC determination, DNA/DNA and DNA/RNA hybridisation, PFGE, PCR fingerprinting, DNA sequencing and methods derived from sequencing, RFLP analysis (including ribotyping), AFLP, ARDRA Proteins Gel electrophoresis including SDS-PAGE and MLEE, aminoacid sequencing, serology Cell components Determination of amino-acid pools and cell wall composition, lipid analysis, infra-red spectrometry, pyrolysis gas liquid chromatography and mass spectrometry, bacteriocin and phage-typing, serology Morphology and behaviour Microscopic structure, motility, enzyme tests, physiology, antibiotic susceptibility, nutritional requirements
Adapted from Norris (1980). the structure of all other cell components and products; and at the fourth level, in the morphology and behaviour of cells. The methods that use markers at levels 2-4 are phenotypic methods. The complexity of interactions of gene products increases from level 1 to 4, and accordingly the genetic relatedness of microorganisms is more difficult to deduce.
B.
Identifying vs comparative typing methods
Apart from subdivisions based on the level of expression of the genome, typing methods can be subdivided according to the category of result obtained. The two categories, i.e., identifying and comparative methods, have already been outlined in section 1.1. With identifying (determinative) typing, organisms can be allocated to an already described type in an existing classification (typing) scheme. This would be the case with (e.g.) serotyping or phage-typing. In comparative typing, a group of organisms are compared to each other and grouped according to their similarity. In this approach, a set of organisms is considered as a whole, and comparisons are within this set without reference to an existing classification scheme. This approach is followed when there is no existing classification scheme encompassing all known types. Most of the recently developed genotypic typing methods fall into this second category.
C.
Library typing methods
A major advantage of some phenotypic typing methods is that the data can be compared between laboratories. Thus, provided that the tools are available, any microbiologist in the world can identify local isolates by reference to the existing classification scheme, and consequently the geographic spread of these types (vars)
10 can be investigated. In contrast, most genotypic fingerprinting methods may allow local recognition of an epidemic strain, but because of the inherent inter-laboratory variation, conclusions as to the geographic distribution of these strains has generally not been possible, despite the vast increase in the use of these methods. It is now realised that so-called library typing methods are essential for large-scale surveillance systems and the study of the prevalence of particular strains or clones in the population (Struelens et al., 1998). These methods must be standardised between laboratories, must use a uniform nomenclature, and should have a high throughput. Apart from serotyping, some promising library methods described elsewhere in this book are ribotyping (Chapter 5), selective amplification of restriction fragments by AFLP (Chapter 8), PCR-RFLP including ARDRA (Chapter 9), and insertion sequence fingerprinting (Chapter 10). In addition, binary typing (Van Leeuwen et al., 1999; Zadoks et al., 2000), combined with DNA chip technology, is a promising method for the future. However, with most of these methods, the necessary high throughput remains to be achieved. 1.5
APPLICATIONS OF TYPING
Microbial typing is probably applied most frequently in clinical, veterinary and food microbiology, and its uses in these fields are diverse (e.g., Maslow et al., 1993; Pitt, 1994; Farber, 1996; Maslow & Mulligan, 1996; Goering, 1998). There are also numerous applications in other fields of microbiology, and genomic fingerprinting methods are also used in eukaryotic genome research. Some of the most obvious applications in the different fields are listed below and many examples can be found in the following chapters. A.
Applications in medical microbiology
Some examples of typing applications in medical microbiology are: 9 solving diagnostic problems, e.g., determining in a patient with suspected endocarditis whether serial blood isolates represent the same strain; 9 determining whether multiple isolates from the same hospital department represent an epidemic or endemic strain; 9 elucidating the ecology or geographical spread of particular strains; 9 studying the population structure of certain bacterial species; 9 tracing specific types with known pathogenic features, e.g., certain Salmonella or E. coli serovars; 9 studying microbial communities. B.
Applications in other microbiological fields
Comparative analysis of the genotypic and phenotypic characters of microorganisms finds many applications in non-medical fields. Some examples of their use are:
11 9 in veterinary microbiology to investigate infections and outbreaks among animals, much along the same lines as in medical microbiology; 9 in plant biology to trace plant pathogens or organisms in the rhizosphere; 9 in the food industry, where the tracing of specific pathogens such as E. coli O157 or certain Listeria strains in food products is not only relevant for the food industry and public health, but also has legal and financial implications; 9 in environmental microbiology where, as is the case in the food industry, identification of microorganisms with pathogenic potential, e.g., Legionella pneumophila, in environmental sources is important for public health and also has legal implications; in addition, fingerprinting methods are increasingly used to assess the microbial diversity of communities in the environment and the changes exerted by (e.g.) pollution; 9 in taxonomy, where numerous examples can be found in the International Jour-
nal of Systematic and Evolutionary Microbiology. C.
Applications in eukaryotic gene analysis
Numerous applications involving the generation and analysis of genomic fingerprints, can be found in studies of eukaryotic genomes. These include: 9 mutation analysis in the study of genetic disorders; 9 studies in evolutionary biology; 9 comparative analysis of genomes in forensic research, human history or anthropology; 9 quality control of plant seeds in agriculture. 1.6
AN INTRODUCTION TO COMMONLY USED TYPING METHODS
The previous sections have described how typing methods can be divided into a number of different categories, including the major sub-division into phenotypic as opposed to genotypic methods. A complete DNA sequence would form the ultimate reference standard for recognising sub-types within a species, but short of achieving this ideal, any typing technique relies on finding detectable differences between isolates. Many existing typing systems for bacteria, particularly in clinical microbiology, are based on the recognition of differences in specific phenotypic properties and have largely stood the test of time for their particular applications. Thus, the main phenotypic typing techniques of biotyping, phage-typing, serotyping and bacteriocin typing are well-established and have been applied to a wide range of microorganisms. In addition to these, some newer 'molecular' phenotypic methods, such as typing based on analysis of proteins or lipopolysaccharides (see Chapter 4), also lend themselves readily to analysis by the fingerprinting techniques that are the main subject of this book. While the newer genotypic methods may eventually supersede the existing phenotypic methods, this is unlikely to happen in the short-term for species that have well-established phenotypic typing systems. It is therefore important to briefly consider the main advantages and disadvantages of these established meth-
12 ods in comparison with the newer genotypic methods that may eventually replace them.
A.
Biotyping
Initial differentiation within a newly-delineated species is often achieved by examining the cultural and biochemical characteristics of a large collection of individual strains belonging to the species. Such characteristics may include colonial morphology, growth requirements, fermentation ability, carbon source utilisation and antibiotic resistances. However, as will be seen below, a major disadvantage shared by many phenotypic typing methods is that such properties may be rather difficult to interpret or subjective in their determination. Many laboratories use commercially available galleries of tests, such as the API 20E system (bioM6rieux, Marcy l'Etoile, France) with 20 different tests, to provide a biochemical profile from which different biotypes can be identified. However, variations in the duration of incubation and the inoculum size may affect interpretation of the results, while strains that are freshly isolated may exhibit different reactions compared with strains that have been stored. Newer automated biochemical fingerprinting systems have been reported to give good reproducibility and discrimination for certain species, e.g., Enterobacter cloacae (Ktihn et al., 1991), but in many cases there may only be a very limited number of biochemical types within a species, resulting partly from the fact that traditional taxonomic procedures, particularly in clinical microbiology, may have relied on minor biochemical differences to define the species in the first place. However, if a suitable level of discrimination can be achieved, results can be scored as positive or negative, and then assessed by computer software. In such cases, following the calculation of similarity coefficients between every possible pair of isolates, the software can use an unweighted pair group method (see Chapter 2) to generate a dendrogram from a matrix of similarity coefficient values as an illustration of the inter-relationships of a particular set of isolates (e.g., Webster et al., 1996).
B.
Antibiogram typing
So far as routine clinical microbiology laboratories are concerned, the first suspicion that an outbreak of infection is occurring is often still based on the observation of an antibiotic susceptibility pattern that is shared by a number of isolates of the same species. Antibiotic susceptibility or resistance profiles are normally easy to determine, but may be associated with potentially unstable extrachromosomal R plasmids. The relatively small number of agents with the potential to give different results also means that discrimination can be poor. Thus, while it is useful to identify recurring resistance patterns in a local situation, such patterns sometimes have little value when considered in isolation in comparative studies involving several different centres. Antibiograms of isolates are often expressed simply in terms of 'resistant' or
13 'sensitive' depending on the breakpoints used. Such an approach has the effect of 'smoothing-out' subtler differences that may exist between strains. An alternative approach is to use a standardised disk inhibition zone method (Horrevorts et al., 1995) which makes full use of all the available information. This method can be used, at least in the short-term, to generate reproducible and epidemiologically useful results for some organisms (Horrevorts et al., 1995; Webster et al., 1996). In such cases, similarity coefficients can be calculated as described above from the diameters of the inhibition zones (Blanc et al., 1994; 1996). However, in general, it can be concluded that the determination of biotypes or antibiotic resistance profiles within a collection of isolates, while possibly helpful in a limited short-term local investigation, will only be useful in association with other typing methods for long-term investigations covering different epidemiological outbreaks or ecological situations. C.
Serotyping
Serotyping is one of the oldest typing procedures and still represents an important tool for typing isolates belonging to many microbial species (Towner & Cockayne, 1993). The method has been developed in particular detail for members of the Enterobacteriaceae (Ewing, 1986). Some of the original serotyping methods, e.g., slide agglutination and immunofluorescence tests, are still used widely today. Such tests are technically simple, albeit time-consuming if large numbers of isolates are to be examined. Immunoblotting allows the generation of complex antigen-based fingerprints that can be analysed with computers in much the same way as results from the other molecular fingerprinting techniques described in this book. ELISA tests can be used in large-scale studies where polysaccharide or heat-labile protein antigens are important for the differentiation of isolates. All of these techniques can be used with either polyclonal or monoclonal antibodies. Serotyping has the advantage that it can be applied to many different genera, although a given set of reagents can usually only be applied to a single species. Many polyclonal and monoclonal antibodies used for routine typing of clinically important organisms are available commercially or through reference laboratories, but the complexity of raising and cross-testing antibodies means that it may require a considerable time to develop a serotyping scheme for a novel application. In general, it seems that the serotype of a microbe is a relatively stable and reliable typing marker, although some possible changes in structural antigens, related particularly to lysogenic conversion, have been noted occasionally (Meitert & Meitert, 1978). The main disadvantages of serotyping seem to be associated with problems in antisera production, standardisation of methodology and, sometimes, subjective assessment of results. It should also be noted that, since the method depends on the production of a range of specific antisera, a task which is logistically and ethically difficult for a routine microbiology laboratory to perform, serotyping schemes for certain genera are normally only available at central reference laboratories.
14
D.
Phage-typing
The basis of phage-typing is the variable sensitivity of isolates to defined collections of bacteriophages which have been selected to provide the maximum sensitivity for differentiating isolates within a particular species. Phage-typing schemes have been developed for numerous bacterial species (Towner & Cockayne, 1993), many of which had not previously been typed successfully by other methods. Phage-typing is the most widely recognised typing method for Staphylococcus aureus (Parker, 1972), and is also still used widely for sub-dividing serotypes of P. aeruginosa (Bergan, 1978) and Salmonella / Shigella spp. (Guin6e & van Leeuwen, 1978; Bergan, 1979). Phage-typing schemes are highly sensitive, but have a number of important limitations (Meitert & Meitert, 1978). First, phage-typing is a technically demanding procedure in which environmental conditions and other variables must be controlled carefully. Second, phage-types can change following lysogenic conversion, loss of prophages, or gain or loss of R plasmids, and this variability is coupled with the continuous need to maintain the typing set of bacteriophages in a viable state by regular serial passage. Third, the discrimination is somewhat variable in that some species may contain too few phage-types and other species may contain several phage-types within a single strain, while with certain species (e.g., Acinetobacter spp.; Bouvet, 1991), a phage-typing scheme may only be reliable with isolates from a particular geographical region. Finally, sets of phages for typing purposes have been developed and refined for particular species over several decades, and it is difficult to apply phage-typing to a novel application in response to a sudden emerging clinical problem. Overall, it seems likely that large laboratories will still continue to perform phage-typing for certain bacterial species in the short to medium term, but smaller laboratories will usually find it necessary to send their isolates to a central reference facility. The phage-typing approach does not lend itself readily to computerassisted analysis, and phage-typing methods are gradually being superseded by genotypic techniques.
E.
Bacteriocin typing
Bactericidal substances, normally proteins, which are active against different strains of bacteria are termed bacteriocins. Typing based on bacteriocins is normally performed by testing, often by cross-streaking, the sensitivity of 'unknown' isolates to bacteriocins produced by a set of standard selected strains. In general, a strain that produces a particular bacteriocin is also resistant to its action. Although bacteriocin production can be encoded by transmissible R plasmids, production or sensitivity to bacteriocins seem to be relatively stable properties. Bacteriocin typing has been applied to a range of different bacteria (Towner & Cockayne, 1993), particularly P. aeruginosa and members of the Enterobacteriaceae, but the method is relatively labour intensive and also requires considerable
15 Table 1.3. Development of genotypic typing methods First generation
9 Analysis ofplasmid content 9 Plasmid DNA restriction digests
Second generation
9 Total chromosomal restriction digests 9 Analysis of RFLPs by hybridisation with probes - Ribotyping
Third generation
9 Pulsed-field gel electrophoresis (PFGE) 9 PCR-based amplification methods - RAPD - REP-PCR - AFLP - PCR-ribotyping
Fourth generation
9 Multilocus sequence typing (MLST) 9 DNA sequencing
development work before it can be used for a novel application.
E
Overview of the development of genotypic typing methods
All phenotypic methods suffer from the disadvantage that the observable characteristics are only an expression of the underlying genotype, and may or may not reflect the actual state of genetic relatedness in a group of isolates. In recent years, the scientific community has been confronted by an avalanche of new genotypic fingerprinting techniques, often with confusing or overlapping names and terminologies (reviewed in detail by Vaneechoutte, 1996). Nevertheless, a retrospective examination allows the development of genotypic typing methods to be divided into a number of phases or 'generations' (Goering, 2000; Table 1.3). Thus, the 'first generation' approach evolved in the 1970s and was based on the analysis of bacterial cells for the carriage of plasmids, together with more detailed analysis of restriction fragment length polymorphisms (RFLPs) by restriction endonuclease digestion of plasmid DNA. However, the potentially transient nature of plasmids meant that this approach was largely superseded in the early 1980s by techniques that examined RFLPs in the whole bacterial chromosome. Initial efforts using restriction endonuclease digestion and/or DNA probing techniques can be viewed as the 'second generation' methods. The development of pulsed-field gel electrophoresis (PFGE) in the mid-1980s ushered in the 'third generation', and PFGE has proven itself to be an especially powerful epidemiological tool that allows global chromosomal comparisons. At the same time, the discovery of PCR led to the development of a plethora of amplification-based techniques. Most recently, 'fourth generation' approaches involving a direct comparison of chromosomal sequences have started to assess potential inter-relationships between individual isolates at the most fundamental level. Examples and more details of the most useful genotypic approaches to typing can be found in the subsequent chapters of this book.
16 Table 1.4. Desired properties for an ideal typing scheme
1.
Ability to type the vast majority of isolates of a particular species encountered (i.e., a very low proportion of untypable isolates)
2.
Good discrimination, with the ability to recognise a reasonable number of types
3.
Good reproducibility over a long period of time and in different centres
4. 5.
Readily applicable to natural isolates in addition to laboratory collections Rapid
6. 7.
Amenable to computerised analysis and comparison with electronic databases Not too complicated or expensive
Fig. 1.1. Taxonomic resolution of current genotypic and phenotypic typing methods. (Courtesy of Dr Paul Janssen, Cambridge, UK.)
G.
Conclusions
As will become apparent in the rest of this book, molecular fingerprinting methods are rapidly becoming the most commonly used approaches for assessing the relatedness of microorganisms in epidemiological studies. Such strategies are most often based on genotypic characteristics but, as described above, certain phenotypic properties can also be used. However, before using a particular strategy, it is important to ensure that the method distinguishes unrelated isolates, is capable of identifying the same strain in separate samples, and reflects genetic relatedness between epidemiologically linked isolates (see section 1.8). From a practical point of view, an ideal typing system should have a number of important characteristics (Table 1.4). Different methods are capable of varying levels of discrimination according to the precise organisms being studied, but an overall impression of the different levels of identification and typing achievable with different approaches is shown in Fig. 1.1. From the above overview, it can be seen that none of the common phenotypic typing methods offers an ideal approach for the sub-dividing of microbial species. The importance and applicability of each of the methods may vary from one spe-
17 cies to another, and also according to the precise geographical location in which they are being used. For some species, combined use of several different phenotypic methods may offer a reasonable approach, but such an approach may not be possible, or may take a long time to develop, for organisms which have not been well-studied previously. In contrast, genotypic methods have the potential to be used for studying diversity in any microbial species, with some genotypic methods also offering the possibility of providing a 'universal' approach to microbial typing in which the same basic methodology can be used to study isolates of any microorganism. 1.7
COMPUTER-ASSISTED DATA ANALYSIS
The human eye and brain are very efficient at recognising differences in electrophoretic patterns in two adjoining lanes of a gel and in correcting for distortions in gels due to technical problems. However, visual comparison and grouping of multiple profiles is extremely difficult, and the analysis of complex electrophoretic patterns is therefore dependent on the use of computer software. This is also the case for data analysis of the large number of phenotypic characters that can be generated by automated phenotypic identification systems such as the Biolog system (Biolog Inc., Hayward, CA, USA). The most commonly used analytical method - cluster or pattern analysis involves the pairwise comparison of isolates or objects, followed by their grouping on the basis of similarity, and their subsequent depiction in a tree-like structure termed a dendrogram. The idea of grouping organisms on the basis of their overall similarity in terms of large numbers of characters was originally suggested by the botanist Michel Adanson (1727-1806). However, it was not until the availability of high-speed computers for calculating these similarities that this concept could be applied in different disciplines. This approach, called numerical taxonomy, was developed for microbiology in the pioneering studies of Sneath (1957a, b), and was initially used for phenotypic characters that were scored as (+) or (-) (e.g., Baumann et al., 1968). The underlying principles of numerical taxonomy are explained in detail elsewhere (Sneath, 1972; Sneath & Sokal, 1973; Bock, 1974; Aldenderfer & Blashfield, 1984; Everitt, 1998). A detailed description of how to score phenotypic characters, including morphological and physiological traits in numerical taxonomy, can be found in Lockhart & Liston (1970). An overview and practical details of applications in microbial fingerprinting are given in Chapters 2 and 3 of the present book. The automated classification of microorganisms on the basis of protein electrophoretic fingerprints was first described by Kersters & De Ley (1975). A similar approach was also followed by Jackman (1985). Since the 1980s, there has been an explosive increase in the use of computer-assisted analysis of electrophoretic fingerprints in microbial taxonomy and epidemiology. The computer program originally developed by the group of Kersters and De Ley has evolved over the years into the highly advanced commercial software packages GelCompar and BioNu-
18 merics (Applied Maths, Kortrijk, Belgium). Other commercial software packages for the analysis of fingerprints include Taxotron (Institut Pasteur, Paris, France), Dendron (Solltech, Oakdale, IA, USA) and Bio Image Whole Band Analyser (Bio Image Inc., Ann Arbor, MI, USA). As already discussed in previous sections, cluster analysis can be applied to both genotypic and phenotypic data, including DNA fragment electrophoretic patterns, profiles generated by mass spectrometry or gas chromatography, or biochemical profiles. A challenging development is the increase in availability of automated systems that generate complex phenotypic data sets. Examples are (i) fatty acid methyl ester analysis (FAME; MIDI Inc., Newark, DE, USA), (ii) 'fingerprint' analysis of the whole organism composition by mass spectrometry (Micromass, Manchester, UK), (iii) automated phenotypic profiling (Biolog Inc.), and (iv) automated antibiotic susceptibility testing based on diffusion zone image capturing by the use of (e.g.) the Sirscan system (i2a, Montpelier, France) or the Biomic system (Giles Scientific, Santa Barbara, CA, USA). These systems provide either spectra or complex data sets, and comparison of organisms on the basis of these character sets cannot be achieved without cluster or principal component analysis. The challenge is that these systems have a high throughput and that the data generated sometimes need to be analysed in combination with other data sets, e.g., genomic fingerprints or information on the epidemiological origin of the organisms obtained from a hospital's information system. The recently available BioNumerics software package allows combinations of all possible types of characters to be analysed. Such an integrated, polyphasic analysis of numerous genotypic and phenotypic characters, combined with automated data capturing, is a promising novel approach for research on microbial diversity at different taxonomic levels. 1.8
QUALITY ASPECTS OF MICROBIAL TYPING
A.
Overview
When a typing system is being developed, a first critical question is whether the system provides distinctive fingerprints from different (i.e., genetically unrelated) isolates, and whether identical or highly similar fingerprints are generated from closely related organisms. If results are positive, further questions concerning the quality of the system become relevant, i.e., how reproducible is the method, how stable is the marker, and how large is the proportion of strains that can be typed? These quality criteria, together with more practical criteria that a typing system should meet, including cost, ease and speed, have been addressed in several previous reviews (e.g., Maslow & Mulligan, 1996; Struelens et al., 1996; Tenover et al., 1997). Some recommendations for typing microorganisms, most of which conform to the guidelines of the European Study Group on Epidemiological Markers (ESGEM) (Struelens et al., 1996) are discussed in the following sections. However, these recommendations and quality criteria cannot be viewed apart from the basic question of what is a type and how to delineate it?
19
B.
Type definition and delineation
With serotyping and phage-typing, a type is defined by its reactivity to an antiserum or phage. When fingerprinting methods are used, a limited number of fingerprints, e.g., PFGE profiles from a set of isolates in an outbreak, can be examined visually to determine whether or not they are identical. Patterns can be designated by letters or numbers for the purpose of such a study. Unambiguous discrimination of 'types' can still be difficult, since small differences in patterns between isolates may raise the question of whether or not the isolates are epidemically related. Suggestions on how to interpret visually the pattern variations generated by PFGE and other typing methods have been published (Tenover et al., 1995; 1997). Nevertheless, a straightforward conclusion on the basis of just one or a few band differences is often difficult. When large numbers of fingerprints are compared, this is usually done by computer-assisted pattern analysis. With this approach, organisms are grouped according to the similarity of their fingerprints. As is the case with visual analysis, it is often difficult to decide where to put the cutting level to delineate clusters that may represent groups of epidemiologically related isolates. If distinctive clusters contain isolates with a high similarity level, and different clusters are well separated, then these clusters may represent distinctive types. Knowledge on the natural discontinuities between strains within a species, as revealed by genotypic fingerprinting methods, is insufficient to define precise cutting levels for such an approach. Thus, the type delineation in such situations has to be based on as many features of the organisms being studied as possible, including the origin of the organisms in time and space, their ecology, and the correlation with other (genomic) typing methods. Furthermore, visual inspection of profiles of isolates in a cluster should intuitively corroborate the decision to use a particular cutting level.
C.
Discriminatory capacity
Typing systems can differ in their discriminatory capacity depending on the marker used for discrimination. According to the ESGEM guidelines, a large collection (n = > 100) of unrelated strains should be used to test the discriminatory capacity of a fingerprinting method and, ideally, each strain should give a distinctive fingerprint. Even larger collections should be tested for surveillance and population studies. The diversity resulting from such an exploration can be quantitated by the Simpson index of diversity (D) (Hunter, 1990). This index is a measure of the probability that a method will assign a different type to two unrelated strains sampled randomly and should generate a value of >0.95 (Blanc et al., 1998). In theory, these guidelines are excellent but, in practice, smaller sets of strains are usually tested in epidemiological studies. The strain collections used for assessing the discriminatory capacity should reflect the purpose of the typing system, but it is difficult to give detailed guidelines on the composition of the test collections while the diversity (within hospitals and
20 in the community) and the evolutionary dynamics of the organisms being studied are unknown. With phage-typing it is usually recommended to use an international set of phages, supplemented with local phages, to cover the diversity of strains to be typed. Similarly, when using fingerprinting methods, it may be advisable to use strains from different countries, supplemented with local isolates, provided that the latter are not epidemiologically related. D.
Epidemiologic concordance
The capacity of a system to establish epidemiological relatedness between strains from the same outbreak has to be tested on sets from different outbreaks, each containing multiple (e.g., five) isolates per outbreak. E.
Typing system concordance
It is quite common to evaluate the performance of a typing system in relation to the performance of other typing systems (Dijkshoorn et al., 1993; Van der Zee et al., 1999; Vogel et al., 2000). The underlying assumption for this approach is that there should be congruence between classification of organisms by different methods. With whole genomic fingerprinting methods this may be the case. However, markers that are subject to rapid evolutionary changes or are encoded by transmissable genetic elements e.g., serotypes or antibiotic susceptibility spectra, may show deviations from the pattern of grouping found by other methods. An example of such discordance is the variation in antibiotic susceptibility and biotype found in groups of Acinetobacter baumannii isolates that share the same ribotype and/or AFLP type (Dijkshoorn et al., 1996; Nemec et al., 1999). This lack of congruence does not necessarily imply that the methods showing rapid variation are useless. Rather, they can be used for subtyping isolates that otherwise show clonal relationship. E
Reproducibility
Guidelines for evaluating the reproducibility of a method have been suggested by ESGEM (Struelens et al., 1996). These comprise serial experiments that assess the influence of all possible steps in the process of type designation, and it is suggested that the reproducibility should ideally be >0.95. During use of a method, controls should always be included to monitor the procedure being used. For example, at least one strain with stable markers (see below) should always be processed in the same way as the test strains, including the initial steps of cultivation and sample preparation. Ideally, a fingerprint generated by a particular typing method should be fully reproducible within and between laboratories. Unfortunately, many genotypic typing methods do not meet this requirement. In particular, early attempts at PCR fingerprinting were notorious for lack of reproducibility (Tyler et al., 1997). The
21 factors that influence the variability of the results of genotypic methods are numerous, including person-to-person variation, differences in sample preparation, in usage of reagents and equipment, in experimental conditions, or in data analysis. The impact of these sources of variation differs, depending on the setting in which the methods are being used. For example, if a small set of five isolates has to be typed during an outbreak, most factors can be kept constant and conclusions on whether the isolates are indistinguishable may be arrived at without any problems. However, even in the same locality, when many isolates have to be compared during endemic episodes, isolates have to be processed on different occasions and reproducibility is more difficult to control. It is even more difficult to control the many variables existing between laboratories, e.g., in multicentre studies on the geographic spread of specific pathogens. Although reproducibility is a major problem in genotypic fingerprinting, recent developments have shown that these difficulties can be overcome if standardised reagents and/or uniform procedures are used. Thus, RFLP fingerprints of Mycobacterium tuberculosis have proven to be sufficiently reproducible to set up a large international database containing these profiles (see Chapter 3). In addition, the PCR fingerprints of a set of Acinetobacter strains generated in a multicentre study using standardised and quality controlled commercially available PCR reagents could be compared successfully between different laboratories (see Chapter 6; Grundmann et al., 1997). Rigorous standardisation has also allowed PFGE patterns to be compared between laboratories through the National Molecular Subtyping Network (PulseNet; http://www.cdc.gov/ncidod/dbmd/pulsenet/pulsenet.htm). Perhaps the best example to date of a reproducible system is the fully automated Riboprinter (Qualicon Europe, Warwick, UK), which generates ribotypes for different microoorganisms according to a standardised protocol. G.
The stability of strains
A prerequisite of an epidemiological typing system is that the marker (e.g., the genomic fingerprint) used to type a strain is stable. However, the genetic material of microorganisms may undergo changes both in vitro (Arber et al., 1994; Nakatsu et al., 1998) and in vivo, and these changes may be reflected in the marker used. ESGEM has recommended that the in-vitro stability of at least 10 strains should be tested after 50 serial passages (Struelens et al., 1996). Vogel et al. (1999) tested the in-vitro stability for three strains each of P. aeruginosa, S. marcescens, K. pneumoniae, and K. oxytoca. The strains were serially subcultured and stored at different temperatures, thereby simulating what happens to a strain in the laboratory when it is under investigation. No changes in fingerprints were observed during this test period. In daily practice, the typing result obtained with a clinical isolate is frequently based on one subculture from a single colony, while the possible variability of the strain is unknown. Pitt (1994) suggested that such variability should be estimated when setting up a typing system by comparing various colonial variants
22 of the same strains; isolates from different body sites; multiple colonies from primary platings of specimens; and antibiotic-sensitive and -resistant isolates from the same patient. The in-vivo changes that occur during the passage from one patient to another can be estimated by comparing multiple isolates from different patients during a clearcut outbreak (Struelens et al., 1996). H.
Typability
Phenotypic typing methods such as serotyping or phage-typing do not always allow the characterisation of all isolates. In contrast, genotyping usually provides a characteristic fingerprint for most isolates. An exception is plasmid typing, since plasmids are accessory genetic elements that are not always present in all strains. Occasionally, other genomic fingerprinting methods, e.g., ribotyping or PFGE, do not produce a fingerprint with a specific isolate. A possible explanation is that such isolates are strong endonuclease producers, which results in breakdown of the DNA and prevents further processing of the specimens for typing. Typability can be calculated approximately when the discriminatory capacity of a typing system is tested on a large set of strains (see above). The actual typability will become apparent once the system is used in practice. Standardisation of data analysis The analysis of fingerprints, whether done visually or with computer assistance, can also contribute to bias in type delineation. For example, the visual type classification of PCR patterns in a multicentre study varied widely among participating centres (Van Belkum et al., 1995; Deplano et al., 2000). Computer-assisted data analysis may also result in different outcomes between scientists and laboratories. Sources of variables are: data capturing equipment, settings for data capture, software packages for analysis of fingerprints, and settings used with these packages. A severe source of bias can also be the visual designation of bands before comparisons are made (Burr & Pepper, 1997), which is a requirement of some software packages (e.g., Bio Image Whole Band Analyser). There is an urgent need to assess the impact of the many variables on computer-assisted data analysis. Once this is known, guidelines for standardising data capture and analysis can be agreed. As a first step, it would be helpful if it became a standard procedure for authors to give a detailed description in their publications of the precise analytical procedure used. J.
Conclusions
It is frequently stated that genomic fingerprinting methods are easy to perform, rapid, and can be applied to a wide range of organisms. However, setting up and validating these methods is a lengthy task and has to be performed for each species independently. Large sets of well-described strains are indispensable for this purpose. These requirements can only be met by specialised laboratories, or within
23 frameworks of collaborating laboratories. Such activities have already resulted in the successful exploitation of MLST for Streptococcus pneumoniae (Enright & Spratt, 1998) and Neisseria meningitidis (Maiden et al., 1998), and the realisation of the M. tuberculosis database (see Chapter 3). Further to these collaborative studies, a number of other networks have been initiated for validating typing methods and investigating the diversity and spread of specific bacterial species. It is expected that these activities will have a great impact on the knowledge of the population structure and spread of many pathogens, and may provide an important tool to aid in their control (Stephenson, 1997). 1.9
STRAIN C O L L E C T I O N AND DATA M A N A G E M E N T
A.
Species identification
It follows from the previous sections that specific strain collections are needed to test a typing method. First, the strains of these test sets must have been identified to a particular species according to the latest taxonomy. It should be noted that the taxonomy of many bacterial genera is revised from time to time due to new insights. Commercial identification systems - although easy to u s e - may not always have been validated with taxonomically well-described strains for all species, and may not use the most recent taxonomic scheme. In case of doubt, it is recommended that experts in the field should be consulted, i.e., specialists on the species being studied or curators of public culture collections.
B.
Culture collections
It may be important to include reference strains in studies, perhaps to enable comparisons with the findings of other workers. Reference strains, e.g., strains from published outbreaks, can be requested directly from the authors or, if they have been deposited, from public culture collections. If specific clones are identified during large surveys, it is important that representatives of these clones are also deposited with culture collections under a specific designation, thus enabling direct comparisons to be made by the rest of the scientific community. As with type strains described in the International Journal of Systematic and Evolutionary Microbiology, the deposition and strain designation should be published in the same paper that describes the strain (clone).
C.
Storage of strains and data registration
Strains should preferentially be stored a t - 1 4 5 ~ as there is no water movement at this temperature which can damage the cell structural components (Meryman, 1966), or otherwise a t - 7 0 ~ o r - 8 0 ~ in glycerol broth. Other ways to store strains are by lyophilisation, or in 0.8% nutrient agar stab cultures (Kirsop & Doyle, 1991).
24
Table 1.5. Summary of data required for administration of epidemiological strain collectionsa 9 9 9 9 9 9 9 9 9 9 9 9 a
Straindesignation Eventualother designations Speciesname Receivedfrom Specimen(type) Patientcode Dateof isolation Hospital Department City Country Speciesidentification (method)
The list can be adapted to the purpose of the collection.
The records of the strains should contain relevant data on their origin, as summarised in Table 1.5. These are essential to decide whether strains are 'unrelated' or 'related'. It may be necessary for legal reasons to code the data so that the strains cannot be related to a specific hospital or patient. Since there is increasing interest in the wide geographical spread of particular epidemic clones, it is important that the strains in publications are retrievable by other researchers. Strains should therefore be designated by their collection number, and not just a serial number. Unfortunately, this important point has not yet been appreciated by the editors of most microbiological journals, with the significant exception of the International Journal of Systematic and Evolutionary Microbiology. There are several commercial software packages available for data administration, such as SPSS (SPSS Inc., Chicago, IL, USA) or Access (Microsoft, Redmond, WA, USA). These packages are also useful in sorting data, which can be useful for epidemiological purposes. Furthermore, the databases generated with these packages can be extended by the incorporation of additional data obtained over the course of time.
1.10 PROSPECTS FOR THE FUTURE As mentioned earlier, a complete DNA sequence forms the ultimate reference standard for recognising sub-types within a species. It is clear that our increasing ability to rapidly sequence and compare the complete genomes of different isolates will bring many new scientific insights, and it is likely that epidemiology (and our insight into why certain strains have a propensity to become 'epidemic') is one of the areas that will benefit from this flood of sequence data. However, even with rapid automated sequencing techniques, it is highly unlikely that diagnostic microbiology laboratories will ever be in a position (or have the resources) to routinely
25 sequence all their isolates of epidemiological interest. One answer may be to concentrate on developing rapid sequencing methods for particular defined regions of the genome that can provide sufficient epidemiological and evolutionary information, and such techniques for sequencing 16S rDNA are already becoming available (see Chapter 11). An alternative advanced but simplified approach could eventually involve the use of DNA microarrays (sometimes called 'DNA chips'). Such microarrays normally consist of a very large number of evenly spaced spots of DNA fixed to a microscope slide. Each spot is a unique DNA fragment, typically a gene or part of a gene, transferred by a gridding robot from 96-well plates on to a slide. Standard microarrays may become commercially available and could then be hybridised in diagnostic laboratories with DNA extracted from 'unknown' isolates to generate different patterns of DNA hybridisation to the microarray. Each isolate would yield its own distinctive microarray pattern in a form that would be readily amenable to computerised analysis and comparison with electronic databases. However, for the time being, such technology remains in the future (but perhaps not too far away) and its cost effectiveness remains to be established. In the meantime, progress in molecular biology has already resulted in the availability of molecular fingerprinting methods, several of which have the potential to be used in any competent microbiology laboratory for studying diversity in any microbial species. If only a few isolates are being compared, and the associated fingerprints are relatively simple, then visual comparison of pattern differences may be sufficient to assess the degree of relatedness. However, at present there is no general consensus as to the number of 'differences' (i.e., changes in fingerprint pattern) required for two isolates to be considered unrelated. For more complex fingerprints, and in cases where isolates from different geographical locations are being compared over significant time periods, a computer-assisted strategy is required that enables the formation of a database of fingerprint patterns. The challenge that is addressed in the rest of this book is to define how computer programs can be used to analyse molecular fingerprinting data and provide timely information of epidemiological importance or evolutionary significance.
REFERENCES Achtman, M., Mercer, A., Kusecek, B., Pohl, A., Heuzenroeder, M., Aaronson, W., Sutton, A. & Silver, R.P. (1983). Six widespread bacterial clones among Escherichia coli K1 isolates. Infection and Immunity 39, 315-335. Aldenderfer, M.S. & Blashfield, R.K. (1984). Cluster analysis (Sage University Press series on quantitative applications in the social sciences, No. 44). Sage UniversityPress, Beverly Hills. Arber, W., Naas, T. & Blot, M. (1994). Generation of genetic diversity by DNA rearrangements in resting bacteria. FEMS Microbiology Ecology 15, 5-14. Archer, G.L. (2000). Staphylococcus epidermidis and other coagulase-negative staphylococci. In Principles and practices of infectious diseases, 5th edn., vol. 2, Mandell, G.L., Bennett, J.E. & Dolin, R., eds, pp. 2092-2100. Churchill Livingstone, Philadelphia. Baumann, P., Doudoroff, M. & Stanier, R.Y. (1968). A study of the MoraxeUa group II. Oxidativenegative species (genusAcinetobacter). Journal of Bacteriology 95, 1520-1541. Bergan, T. (1978). Phage-typing of Pseudomonas aeruginosa. In Methods in microbiology, vol. 10,
26 Bergan, T. & Norris, J.R., eds, pp. 169-199. Academic Press, London. Bergan, T. (1979). Bacteriophage typing of Shigella. In Methods in microbiology, vol. 13, Bergan, T. & Norris, J.R., eds, pp. 178-286. Academic Press, London. Bergogne-Brrrzin, E. & Towner, K.J. (1996). Acinetobacter spp. as nosocomial pathogens: microbiological, clinical, and epidemiological features. Clinical Microbiology Reviews 9, 148-165. Bergquist, P.L., Love, D.R., Croft, J.E., Streiff, M.B., Daniel, R.M. & Morgan, H.W. (1989). Genetics and potential biotechnological applications of thermophylic and extremely thermophylic archaebacteria and eubacteria. Biotechnology & Genetic Engineering Reviews 5, 199-244. Bernards, A.T., de Beaufort, A.J., Dijkshoorn, L. & van Boven, C.P.A. (1997). Outbreak of septicaemia in neonates caused by Acinetobacterjunii investigated by amplified ribosomal DNA restriction analysis (ARDRA) and four typing methods. Journal of Hospital Infection 35, 129-140. Blanc, D.S, Lugeon, C., Wenger, A., Siegrist, H.H. & Francioli, P. (1994). Quantitative antibiogram typing using inhibition zone diameters compared with ribotyping for epidemiological typing of methicillin-resistant Staphylococcus aureus. Journal of Clinical Microbiology 32, 2505-2509. Blanc, D.S., Petignat, C., Moreillon, P., Wenger, A., Bille, J. & Francioli, P. (1996). Quantitative antibiogram as a typing method for the prospective epidemiological surveillance and control of MRSA: comparison with molecular typing. Infection Control and Hospital Epidemiology 17, 654-659. Blanc, D.S., Hauser, P.M., Francioli, P. & Bille, J. (1998). Molecular typing methods and their discriminatory power. Clinical Microbiology and Infection 4, 61-63. Bock, H.H. (1974). Automatische klassifikation. Vandenhoeck & Ruprecht, Grttingen. Bouvet, P.J.M. (1991). Typing of Acinetobacter. In The biology of Acinetobacter: taxonomy, clinical importance, molecular biology, physiology, industrial relevance, Towner, K.J., Bergogne-Brrrzin, E. & Fewson, C.A., eds, pp. 37-51. Plenum, New York. Brock, T.D. (1988). Robert Koch, a life in medicine and bacteriology. In Scientific revolutionaries: a bibliographic series. Springer-Verlag, Berlin. Burr, M.D. & Pepper, I.L. (1997). Variability in presence-absence scoring of AP PCR fingerprints affects computer matching of bacterial isolates. Journal of Microbiological Methods 29, 63-68. Cilia, V., Lafay, B. & Christen, R. (1996). Sequence heterogeneities among 16S ribosomal RNA sequences, and their effect on phylogenetic analyses at the species level. Molecular Biology and Evolution 13, 451-461. Colwell, R.R. (1970). Polyphasic taxonomy of bacteria. In Culture collections of microorganisms. Proceedings of the international conference on culture collections, Tokyo, Oct. 7-11, 1968, Iizuka, H. & Hasegawa, T., eds, pp. 421-436. University Park Press, Baltimore. Deplano, A., Schuermans, A., Van Eldere, J., Witte, W., Meugnier, H., Etienne, J., Grundmann, H., Jonas, D., Noordhoek, G.T., Dijkstra, J, van Belkum, A., van Leeuwen, W., Tassios, P.T., Legakis, N.J., Van Der Zee, A., Bergmans, A., Blanc, D.S., Tenover, EC., Cookson, B.C., O'Neil, G. & Struelens, M.J. (2000). Multicenter evaluation of epidemiological typing of methicillinresistant Staphylococcus aureus strains by repetitive-element PCR analysis. Journal of Clinical Microbiology 38, 3527-3533. Dijkshoorn, L., Aucken, H.M., Gerner-Smidt, P., Kaufmann, M.E., Ursing, J. & Pitt, T.L. (1993). Correlation of typing methods for Acinetobacter isolates from hospital outbreaks. Journal of Clinical Microbiology 31, 702-705. Dijkshoorn, L., Aucken, H.M., Gerner-Smidt, P., Janssen, P., Kaufmann, M.E., Garaizar, J., Ursing, J. & Pitt, T.L. (1996). Comparison of outbreak and nonoutbreak Acinetobacter baumannii strains by genotypic and phenotypic methods. Journal of Clinical Microbiology 34, 1519-1525. Dijkshoorn, L., Ursing, B.M. & Ursing, J.B. (2000). Strain, clone and species: comments on three basic concepts of bacteriology. Journal of Medical Microbiology 49, 397-401. Dubos, R.J. (1988). Pasteur and modem science. In Scientific revolutionaries: a bibliographic series. Springer-Verlag, Berlin. Dykhuizen, D.E. & Green, L. (1991). Recombination in Escherichia coli and the definition of biological species. Journal of Bacteriology 173, 7257-7268.
27 Enright, M.C. & Spratt, B.G. (1998). A multilocus sequence typing scheme for Streptococcus pneumoniae: identification of clones associated with serious invasive disease. Microbiology 144, 3049-3060. Everitt, B. S. (1998). Cluster analysis, 3rd edn. Arnold, London. Ewing, W.H. (1986). Edwards and Ewing's identification of Enterobacteriaceae, 4th edn. Elsevier Science, New York. Farber, J.M. (1996). An introduction to the hows and whys of molecular typing. Journal of Food Protection 59, 1091-1101. Farmer, J.J. (1999). Enterobacteriaceae: introduction and identification. In Manual of clinical microbiology, 7th edn., Murray, P.R., Baron, E.J., Pfaller, M.A., Tenover, F.C. & Yolken R.H., eds. ASM Press, Washington D.C. Fox, G.E., Wisotzkey, J.D. & Jurtshuk, P. (1992). How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity. International Journal of Systematic Bacteriology 42, 166-170. Goering, R.V. (1998). The molecular epidemiology of nosocomial infection. An overview of principles, application and interpretation. In Rapid detection of infectious agents, Specter, S. & Friedman, H., eds, pp. 131-157. Plenum, New York. Goering, R.V. (2000). The molecular epidemiology of nosocomial infection: past, present and future. Reviews in Medical Microbiology 11, 145-152. Goodfellow, M. & O'Donnell, A.G. (1993a). Roots of bacterial systematics. In Handbook of new bacterial systematics, Goodfellow, M. & O'Donnell, A.G., eds, pp. 3-54. Academic Press, London. Goodfellow, M. & O'Donnell, A G., eds (1993b). Glossary of taxonomic terms. In Handbook ofnew bacterial systematics, pp. 525-549. Academic Press, London. Griffin, P.M. (1995). Escherichia coli O157:H7 and other enterohemorrhagic Escherichia coli. In Infections of the gastrointestinal tract, Blaser, M.L., Ravdin, J.I., Greenberg, H.B. & Guerrant, R.L., eds, pp. 739-761. Raven Press, New York. Grundmann, H.J., Towner, K.J., Dijkshoorn, L., Gerner-Smidt, P., Maher, M., Seifert, H. & Vaneechoutte, M. (1997). Multicenter study using standardized protocols and reagents for evaluation of PCR-fingerprinting reproducibility with Acinetobacter spp. Journal of Clinical Microbiology 35, 3071-3077. Guinre, P.A.M. & van Leeuwen, W.J. (1978). Phage-typing of Salmonella. In Methods in microbiology, vol. 11, Bergan, T. & Norris, J.R., eds, pp. 157-191. Academic Press, London. Gupta, R.S. (2000). The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes. FEMS Microbiology Reviews 24, 367-402. Horrevorts, A., Bergman, K., Kollre, L., Breuker, I., Tjernberg, I. & Dijkshoom, L. (1995). Clinical and epidemiological investigations ofAcinetobacter genomospecies 3 in a neonatal intensive care unit. Journal of Clinical Microbiology 33, 1567-1572. Hunter, P.R. (1990). Reproducibility and indices of discriminatory power of microbial typing methods. Journal of Clinical Microbiology 28, 1903-1905. Jackman, P.J.H. (1985). Bacterial taxonomy based on electrophoretic whole-cell protein patterns. In Chemical methods in bacterial systematics, Goodfellow, M. & Minnikin, D., eds, pp. 115-129. Academic Press, London. Kersters, K. & De Ley, J. (1975). Identification and grouping of bacteria by numerical analysis of their protein patterns. Journal of General Microbiology, 87, 333-342. Kirsop, B.E. & Doyle, A., eds (1991). Maintenance of microorganisms and cultured cells. A manual of laboratory methods, 2nd edn. Academic Press, San Diego. Korn-Wendisch, E & Kutzner, H.J. (1992). The family Streptomycetaceae. In The prokaryotes, 2nd edn, vol. 1, Balows, A., Trtiper, H.G., Dworkin, M., Harder, W. & Schleifer, K.-H., eds, pp. 921-995. Springer-Verlag, New York. Ktihn, I., Tullis, K. & Burman, L.G. (1991). The use of the PhP-KE biochemical fingerprinting system in epidemiological studies of faecal Enterobacter cloacae strains from infants in Swedish neonatal
28 wards. Epidemiology and Infection 107, 311-319. La Rivi6re, J.W.M. (1997). The Delft School of Microbiology in historical perspective. Antonie van Leeuwenhoek 71, 3-13. Lockhart, W.R. & Liston, J. (1970). Methods for numerical taxonomy. American Society for Microbiology, Bethesda, MD. Maiden, M.C.J., Bygraves, J.A., Feil, E., Morelli, G., Russell, J.E., Urwin, R., Zhang, Q., Zhou, J., Zurth, K., Caugant, D.A., Feavers, I.M., Achtman, M. & Spratt, B.G. (1998). Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proceedings of the National Academy of Sciences of the United States of America 95, 3140-3145. Maslow, J. & Mulligan, M.E. (1996). Epidemiologic typing systems. Infection Control and Hospital Epidemiology 17, 595-604. Maslow, J.N., Mulligan, M.E. & Arbeit, R.D. (1993). Molecular epidemiology: application of contemporary techniques to the typing of microorganisms. Clinical Infectious Diseases 17, 153-164. Maynard Smith, J., Smith, N.H., O'Rourke, M. & Spratt, B.G. (1993). How clonal are bacteria? Proceedings of the National Academy of Sciences of the United States of America 90, 4384-4388. Meitert, T. & Meitert, E. (1978). Usefulness, applications and limitations of epidemiological typing methods to elucidate nosocomial infections and the spread of communicable diseases. In Methods in microbiology, vol. 10, Bergan, T. & Norris, J.R., eds., pp. 1-37. Academic Press, London. Meryman, H.T. (1966). Review of biological freezing. In Cryobiology, Meryman, H.T., ed., pp. 1-114. Academic Press, London. Mulligan, M.E., Murray-Leisure, K.A., Ribner, B.S., Standiford, H.C., John, J.F., Korvick, J.A., Kauffman, C.A. & Yu, V.L. (1993). Methicillin-resistant Staphylococcus aureus: a consensus review of the microbiology, pathogenesis, and epidemiology with implications for prevention and management. American Journal of Medicine 94, 313-328. Musser, J.M. & Selander, R.K. (1990). Brazilian purpuric fever: evolutionary genetic relationships of the case clone of Haemophilus influenzae biogroup aegyptius to encapsulated strains of Haemophilus influenzae. Journal of Infectious Diseases 161, 130-133. Nakatsu C.H., Korona, R., Lenski, R.E., de Bruijn, F.J., Marsh, T.L. & Fomey, L.J. (1998). Parallel and divergent genotypic evolution in experimental Ralstonia sp. Journal of Bacteriology 180, 4325-4331. Nemec, A., Janda, L, Melter, O. & Dijkshoom, L. (1999). Genotypic and phenotypic similarity of multiresistant Acinetobacter baumannii isolates in the Czech Republic. Journal of Medical Microbiology 48, 287-296. Norris, J.R. (1980). Introduction. In Microbial classification and identification, Goodfellow, M. & Board, R.G., eds, pp. 1-10. Academic Press, London. Ochman, H. & Selander, R.K. (1984). Evidence for clonal population structure in Escherichia coli. Proceedings of the National Academy of Sciences of the United States of America 81, 198-201. Orskov, E & Orskov, I. (1983). Summary of a workshop on the clone concept in the epidemiology, taxonomy, and evolution of the Enterobacteriaceae and other bacteria. Journal of Infectious Diseases 148, 346-357. Palleroni, N.J. (1993). Structure of the bacterial genome. In Handbook of new bacterial systematics, Goodfellow, M. & O'Donnell, A.G., eds, pp. 57-113. Academic Press, London. Parker, M.T. (1972). Phage typing of Staphylococcus aureus. In Methods in microbiology, vol. 7B, Norris, J.R. & Ribbons, D.W., eds, pp. 1-28. Academic Press, London. Pennington, T.H. (1994). Molecular systematics and traditional medical microbiologists- problems and solutions. Journal of Medical Microbiology 41, 371-3. Pitt, T.L. (1994). Bacterial typing systems: the way ahead. Journal of Medical Microbiology 40, 1-2. Pitt, T.L., Livermore, D.M., Pitcher, D., Vatopoulos, A.C. & Legakis, N.J. (1989). Multiresistant serotype O 12 Pseudomonas aeruginosa: evidence for a common strain in Europe. Epidemiology and Infection 103, 565-576.
29 Pollack, M. (2000). Pseudomonas aeruginosa. In Principles and practices of infectious diseases, 5th edn, vol. 2, Mandell, G.L., Douglas, R.G. & Bennett J.E., eds, pp. 1673-1691. Churchill Livingstone, New York. Ravot, G., Magot, M., Fardeau, M.L., Patel, B.K., Thomas, E, Garcia, J.L. & Ollivier, B. (1999). Fusibacter paucivorans gen. nov., sp. nov., an anaerobic, thiosulfate-reducing bacterium from an oil-producing well. International Journal of Systematic Bacteriology 49, 1141-1147. Selander, R.K. & Musser, J.M. (1990). Population genetics of bacterial pathogenesis. In Molecular basis of bacterial pathogenesis (The bacteria, vol. II), Iglewski, B.H. & Clark, V.L., eds, pp. 11-36. Academic Press, San Diego. Sloos, J.H., Dijkshoorn, L., Trienekens, T.A.M., Van Harsselaar, B., Van Dijk, Y. & van Boven, C.EA. (1996). Multiresistant Staphylococcus epidermidis in a neonatal care unit. Clinical Microbiology and Infection 2, 44-49. Sneath, EH.A. (1957a). Some thoughts on bacterial classification. Journal of General Microbiology, 17, 184-200. Sneath, P.H.A. (1957b). The application of computers to taxonomy. Journal of General Microbiology, 17, 201-226. Sneath, P.H.A. (1972). Computer taxonomy. In Methods in microbiology, vol. 7A, Norris, J.R. & Ribbons, D.W., eds, pp. 27-98. Academic Press, London. Sneath, P.H.A. & Sokal, R.R. (1973). Numerical taxonomy. Freeman, San Francisco. Stackebrandt, E. & Goebel, B.M. (1994). Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. International Journal of Systematic Bacteriology 44, 846-849. Staley, J.T. & Krieg, N.R. (1984). Classification of procaryotic organisms: an overview. In Bergey's manual of systematic bacteriology, vol. 1, Krieg, N.R., & Holt, J.G., eds, pp. 1-4. Williams & Wilkins, Baltimore. Stephenson, J. (1997). New approaches for detecting and curtailing foodborne microbial infections. Journal of the American Medical Association 277, 1337-1340. Struelens, M.J. & the members of the European Study Group on Epidemiological Markers (ESGEM) of the European Society for Clinical Microbiology and Infectious Diseases (ESCMID) (1996). Consensus guidelines for appropriate use and evaluation of microbial epidemiologic typing systems. Clinical Microbiology and Infection 2, 2-11. Struelens, M.J., De Gheldre, Y. & Deplano, A. (1998). Comparative and library epidemiological typing systems: outbreak investigations versus surveillance systems. Infection Control and Hospital Epidemiology 19, 565-569. Tenover, EC., Arbeit, R.D., Goering, R.V., Mickelsen, P.A., Murray, B.E., Persing, D.H. & Swaminathan, B. (1995). Interpreting chromosomal DNA restriction patterns produced by pulsedfield gel electrophoresis: criteria for bacterial strain typing. Journal of Clinical Microbiology 33, 2233-2239. Tenover, EC., Arbeit, R.D., Goering, R.V. & the Molecular Typing Working Group of the Society for Healthcare Epidemiology of America (1997). How to select and interpret molecular strain typing methods for epidemiological studies of bacterial infections: a review for healthcare epidemiologists. Infection Control and Hospital Epidemiology 18, 426-439. Towner, K.J. & Cockayne, A. (1993). Molecular methods for microbial identification and typing. Chapman and Hall, London. Tyler, K.D., Wang, G., Tyler, S.D. & Johnson, W.M. (1997). Factors affecting reliability and reproducibility of amplification-based DNA fingerprinting of representative bacterial pathogens. Journal of Clinical Microbiology 35, 339-346. Ursing, J.B., Rossell6-Mora, R.A., Garda-Valdrs, E. & Lalucat, J. (1995). Taxonomic note: a pragmatic approach to the nomenclature of phenotypically similar genomic groups. International Journal of Systematic Bacteriology 45, 406. Van Belkum, A., Kluytmans, J., van Leeuwen, W., Bax, R., Quint, W., Peters, E., Fluit, A., Vandenbroucke-Grauls, C., van den Brule, A., Koeleman, H., Melchers, W., Meis, J., Elaichouni, A.,
30 Vaneechoutte, M., Moonens, E, Maes, N., Struelens, M., Tenover, E & Verbrugh, H. (1995). Multicenter evaluation of arbitrarily primed PCR for typing of Staphylococcus aureus strains. Journal of Clinical Microbiology 33, 1537-1547. Van Leeuwen, W., Verbrugh, H., van der Velden, J., van Leeuwen, N., Heck, M. & van Belkum, A. (1999). Validation of binary typing for Staphylococcus aureus strains. Journal of Clinical Microbiology, 37, 664-674. Van Pelt, C., Verduin, C.M., Goessens, W.H.E, Vos, M.C., Tiimmler, B., Segonds, C., Reubsaet, E, Verbrugh, H. & van Belkum, A. (1999). Identification of Burkholderia spp. in the clinical microbiology laboratory: comparison of conventional and molecular methods. Journal of Clinical Microbiology 37, 2158-2164. Van der Zee, A., Verbakel, H., van Zon, J.C., Frenay, I., van Belkum, A., Peeters, M., Buiting, A. & Bergmans, A. (1999). Molecular genotyping of Staphylococcus aureus strains: comparison of repetitive element sequence-based PCR with various typing methods and isolation of a novel epidemicity marker. Journal of Clinical Microbiology 37, 342-349. Vandamme, P., Pot, B., Gillis, M., de Vos, P., Kersters, K. & Swings, J. (1996). Polyphasic taxonomy, a consensus approach to bacterial classification. Microbiological Reviews 60, 407438. Vaneechoutte, M. (1996). DNA fingerprinting techniques for microorganisms. A proposal for classification and nomenclature. Molecular Biotechnology 6, 115-142. Vaneechoutte, M., Elaichouni, A., Maquelin, K., Claeys, G., Van Liedekerke, A., Louagie, H., Verschraegen, G. & Dijkshoorn, L. (1995). Comparison of arbitrary primed polymerase chain reaction and cell envelope protein electrophoresis for analysis of Acinetobacter baumannii and A. junii outbreaks. Research in Microbiology 146, 457-465. Vogel, R.E & Ehrmann, M. (1996). Genetics of lactobacilli in food fermentations. Biotechnology Annual Reviews 2, 123-150. Vogel, L., Jones, G., Triep, S., Koek, A. & Dijkshoorn, L. (1999). RAPD typing of Klebsiella pneumoniae, Klebsiella oxytoca, Serratia marcescens and Pseudomonas aeruginosa isolates using standardized reagents. Clinical Microbiology and Infection 5, 270-276. Vogel, L., van Oorschot, E., Maas, H.M.E., Minderhoud, B. & Dijkshoorn, L. (2000). Epidemiologic typing of Escherichia coli using RAPD analysis, ribotyping and serotyping. Clinical Microbiology and Infection 6, 82-87. Wayne, L., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, O., Krichevsky, M.I., Moore, L.H., Moore, W.E.C., Murray, R.G.E., Stackebrandt, E., Starr, M.P. & Tffiper, H.G. (1987). Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. International Journal of Systematic Bacteriology 37, 463-464. Webster, C.A., Towner, K.J., Humphreys, H., Ehrenstein, B., Hartung, D. & Grundmann, H. (1996). Comparison of rapid automated laser fluorescence analysis of DNA fingerprints with four other computer-assisted approaches for studying relationships between Acinetobacter baumannii isolates. Journal of Medical Microbiology 44, 185-194. Woese, C.R. (1987). Bacterial evolution. Microbiological Reviews 51, 221-271. Ye, D., Siddiqi, A., Maccubin, A.E., Kumar, S. & Sikka, H.C. (1996). Degradation of polynuclear aromatic hydrocarbons by Sphingomonas paucimobilis. Environmental Scientific Technology 30, 136-142. Zadoks, R., van Leeuwen, W., Barkema, H., Sampimon, O., Verbrugh, H., Schukken, Y.H. & van Belkum, A. (2000). Application of pulsed-field gel electrophoresis and binary typing as tools in veterinary clinical microbiology and molecular epidemiologic analysis of bovine and human Staphylococcus aureus isolates. Journal of Clinical Microbiology 38, 1931-1939.
31
2
Theoretical Aspects of Pattern Analysis
Arjen van Ooyen Netherlands Institute for Brain Research, Meibergdreef 33, 1105 AZ Amsterdam, The Netherlands
CONTENTS 2.1
INTRODUCTION TO PATTERN DETECTION . . . . . . . . . . . . .
31
2.2
PRINCIPAL COMPONENT ANALYSIS . . . . . . . . . . . . . . . . .
32
CLUSTER ANALYSIS
33
2.3
A. B. C.
D.
..........................
A simple example of cluster analysis General protocol for cluster analysis Similarity measures (i) City-block distance (ii) Euclidean distance (iii) Pearson or product-moment correlation coefficient (iv) Band-based similarity coefficients Clustering methods (i) UPGMA or group average (ii) Ward's averaging
2.4
EXAMPLES OF APPLICATIONS OF CLUSTER ANALYSIS
2.5
DISCUSSION
REFERENCES
2.1
...............................
.................................
33 36 38 38 38 38 39 4O 4O 41
.....
41 42 44
INTRODUCTION TO PATTERN DETECTION
The purpose of most pattern detection methods is to represent the variation in a data set in a more manageable form by recognising classes or groups. The data typically consist of a set of objects described by a number of characters. An object could be (e.g.) a strain of bacteria, while a character could define how well a strain of bacteria grows on a particular C-source, or whether a strain of bacteria contains a particular protein. If the objects were always described by only two or three characters, there would not be much need for pattern detection methods. Just plotting the data in two or three dimensions, respectively, would be sufficient to distinguish groups (the number of dimensions is the number of axes that are needed in order to plot the data, with one axis for each character). However, typically, objects are characterised by more than three characters, so that simply plotting the data is not possible. Other ways need to be found to represent the data. 9
Elsevier Science B.V. All rights reserved.
32
x
'x
x
x
character 1
Fig. 2.1. Simple example illustrating principal componentanalysis (see text).
There are two main approaches that can be taken to manage large data sets. The first involves reducing the number of characters by finding two or three new characters that are combinations of the old characters. Using these new characters, the data can again be plotted in two or three dimensions, and groups can be distinguished by visual inspection. This is the approach taken by principal component analysis (see section 2.2). The second approach for managing large data sets does not reduce the number of characters, but involves a stepwise reduction in the number of objects by placing them into groups. This is the approach taken by cluster analysis (see section 2.3). In this chapter, simple examples of both principal component analysis and cluster analysis will be given to explain the ideas behind the methods. Detailed reviews of pattern detection methods and their applications can be found elsewhere (Sokal &Sneath, 1963; Sneath& Sokal, 1973; Bock, 1974; Hogeweg, 1976a; Aldenderfer & Blashfield, 1984; Everitt, 1993; Applied Maths, 1998). 2.2
PRINCIPAL COMPONENT ANALYSIS
Principal component analysis studies large data sets by reducing the number of characters. This is achieved by forming new characters that are combinations of the old ones. A simple example can be used to illustrate the principle behind the method. In the example, the number of characters will be reduced from two to one. In real applications, the method would be used to reduce the number of characters
33 from "many" to two or three. In Fig. 2.1, a number of objects characterised by only two characters are plotted. The space spanned by the two axes is called the character space, which in this case is two-dimensional (i.e., has two axes, the x- and y-axis) as there are only two characters. A line then needs to be drawn so that the variance among the points when projected on to this line will be as large as possible (this line is called the first principal component). This ensures that as much information as possible about the original data set will be retained. When this line has been found, all the points are projected on to it. On this line (i.e., the reduced character space), it may be possible to distinguish clusters by visual inspection. This new line, or character, can be interpreted in terms of the contributions that the original characters have made to it. When principal component analysis is used to reduce the number of characters from "many" to two or three, not only the first but also the second and third principal components are calculated, and the points are projected, not on to a line, but on to a two- or three-dimensional character space. 2.3
C L U S T E R ANALYSIS
In contrast to principal component analysis, cluster analysis does not reduce the number of characters, but involves a stepwise reduction in the number of objects by placing them into groups. An agglomerative clustering method starts with as many clusters as there are objects (each cluster thus contains a single object), and then sequentially joins objects (or clusters), on the basis of their similarity, to form new clusters. This process continues until one big cluster is obtained that contains all objects. The result of this process is usually depicted as a dendrogram, in which the sequential union of clusters, together with the similarity value leading to this union, is depicted. A dendrogram, therefore, does not define one partitioning of the data set, but contains many different classifications. A particular classification is obtained by "cutting" the dendrogram at some optimal value (defined relative to the dendrogram). In order to interpret the pattern(s) revealed by the cluster analysis, each pattern is studied to determine its relationship with several characteristics of the objects, including characteristics that were not part of the data set proper, i.e., so-called label information such as epidemic sites of origin of strains, dates of sampling, etc. To illustrate the clustering process, a simple example will be given in the next section, followed by a general protocol for cluster analysis and a description of different similarity measures and clustering methods.
A.
A simple example of cluster analysis
The following example illustrates the whole clustering protocol, from the basic data to the formation of a dendrogram (Fig. 2.2). The data set consists of only four objects, each described by only two characters (Fig. 2.2a). Thus, each object is
34
(a)
(b)
character 1
object
.......
2
i. . . . . . . . . . . . . . . . . . . . . . . . 3
....
...............
, ....... i !. . . . . . . . i i 4 ~-.- o - - -
1:
1
- - .o- - -:. . . . . . . . . . . . . . . . 9 2 .-.:---o ...............
3
2
2
6
4
7
2
character 1
(c)
(d)
.9
1 r
2
2
9
o
3
6
6
9
7
5
3
1
2
3
-~ tD
1.2
3
6
9
4
6
3
1.2
3
o 9
4
object
9
4
object
(f)
(e)
object 1 ,~
1.2 2
~.~ o
3.4
6
9 3 4
1.2 3.4 object
8
7
6
5
4
3
2
1
0
(dis)similarity
Fig. 2.2. Simple example illustrating the protocol for cluster analysis (see text): (a) data set, consist-
ing of four objects, each characterised by two characters; (b) objects plotted in character space; (c) similarity matrix showing dissimilarity between objects; (d) and (e) derived similarity matrices used in successive steps of the clustering process; (f) dendrogram. characterised by the values it takes on for these two characters. The objects could be (e.g.) four strains of bacteria, and the characters could (e.g.) describe how well the different strains grow on two different C-sources. Fig. 2.2b shows what the data look like when plotted. The x-coordinate of an object (point) is taken to be the value that the object takes on for character one, and the y-coordinate is the value that the object takes on for character two. As explained earlier, the space spanned by the two axes is called the character space, which in this case is again two-dimensional (i.e., has two axes, the x- and y-axis) as there are only two characters. In general, there are as many dimensions (i.e., axes) as there are different
35 characters. Plotting objects that are characterised by more than three characters is not possible because it would require more than three axes. Although these data cannot be plotted, they can still be treated mathematically in the same way. The advantage of this simple example is that the data and the clustering process can be easily visualised. The aim of the clustering procedure is to join the objects (i.e., points in the figure) into clusters, or groups, of similar objects. Two objects will be similar if they are close together in character space. Thus, the first step in any clustering procedure is to determine the similarity between each pair of objects. In order to determine the similarity between two objects, a similarity measure is required. In principle, there are a large number of different measures that can be used. For example, the distance between two objects in character space can be used as a measure of their similarity (or rather dissimilarity). In this example, an even simpler similarity measure will be used. The similarity between, for example, objects 1 and 2, is defined as the difference in the values for the first character plus the difference in the values for the second character. This is what is called city-block distance and can be expressed formally for this example as (1) where Dij is the dissimilarity between objects i and j, and C1. i is the value that object i t'akes on for character 1. The fact that absolute differences are taken is indicated by 1... 1. Using equation (1), the similarity between each pair of objects is determined, which yields a so-called similarity matrix (Fig. 2.2c). This matrix will have a triangular shape because the similarity between, e.g., objects 1 and 2 is the same as the similarity between objects 2 and 1. The clustering of objects starts by joining the objects that are most similar to each other, i.e., that have the lowest value in the similarity matrix. In this case, objects 1 and 2 are most similar to each other, and these will be joined to form the first cluster. The new situation is then a cluster consisting of objects 1 and 2 (which is denoted as cluster { 1,2 }), and two single objects, 3 and 4. The cluster can then be treated as a new object. The next step is to calculate a similarity matrix for the new situation. To do this, the similarities between the cluster and the two single objects need to be calculated, i.e., the similarity between object 3 and cluster { 1,2 }, and the similarity between object 4 and cluster { 1,2 }. The similarity between objects 3 and 4 is, of course, not changed. In this example, the similarity between object 3 and cluster { 1,2} is simply defined as the average of the following two similarities: (a) the similarity between object 3 and object 1, and (b) the similarity between object 3 and object 2. In the same way, the similarity between object 4 and cluster { 1,2 } can be defined. Thus, D3'{1'2} =
D3,1 + D3,2 2 '
where D3,{1.2) is the similarity between object 3 and cluster { 1,2 }. Similarly,
(2)
36
04,1 + 04, 2
D4'{1'2} =
2
'
(3)
w h e r e D4,{1,21is the similarity between object 4 and cluster { 1,2 }.
There are other ways to define the similarity between single objects and clusters of objects, and the method used to calculate the new similarity is what is called the clustering criterion or clustering method. In the new similarity matrix (Fig. 2.2d), the lowest value is again searched for, which is that between objects 3 and 4, and these objects are subsequently joined. Again a new similarity matrix is calculated, which now consists only of the similarity between cluster { 1,2 } and cluster {3,4 } (Fig. 2.2e). Using the same clustering criterion as before, we obtain (4)
D{1'2}'3 + D{1'2}'4 0{1'2}'{3'4} --
2
"
The s i m i l a r i t i e s 0{1,2},3 a n d 0{1,2},4 are given by equations (2) and (3), respectively (note that by d e f i n i t i o n D{1,2}, 3 - D3,{1,2] a n d D{1,2}, 4 - D4,{1,2}). The sequential union of points (groups) is now depicted in a dendrogram (Fig. 2.2f). First, objects 1 and 2 are joined. In the dendrogram, the level at which objects 1 and 2 are connected is the dissimilarity level in the similarity matrix that led to their union. Then, objects 3 and 4 are joined, and finally clusters { 1,2 } and {3,4 }. In the dendrogram, the level at which the clusters are joined is the similarity value as calculated in equation (4); this is a measure for the similarity between cluster { 1,2 } and cluster {3,4 }. Thus, the similarity between, for example, objects 2 and 4 is not shown in the dendrogram.
B.
General protocol for cluster analysis
Keeping in mind the previous example, the general procedure for clustering is as follows (Fig. 2.3): 1. Data set. The starting point is a data set of objects that are described by the values they take on for a number of characters. 2. Transformation. Before calculating a similarity matrix, it may first be necessary to transform the data. This is necessary if the characters are qualitatively different or are expressed in different units. Transformation ensures that equal weight is given to all characters. 3. Similarity matrix. The next step is to choose a similarity measure and calculate the similarity between each pair of objects, yielding a triangular similarity matrix. Similarity measures are usually distance measures, but can also be derived from (e.g.) correlation coefficients. For electrophoresis data, the similarity between two objects can be expressed as the correlation between their banding patterns. 4. Clustering. Once the clustering method has been c h o s e n - which is basically the formula that defines how to calculate the cluster-to-cluster similarities (and object-to-cluster similarities) from the basic object-to-object similarities - the
37
(d) object clustering method
label information
1
(dis)similarity
Fig. 2.3. The general protocol for cluster analysis (see text): (a) data set; (b) data set after transformation; (c) similarity matrix; (d) dendrogram.
similarity matrix can be used to form clusters. 5. Dendrogram. The result of this sequential joining of clusters is depicted in a dendrogram. In a dendrogram, the sequential union of objects and clusters is represented, together with the similarity value leading to this union. A dendrogram, therefore, does not define one partitioning, or grouping, of the set of objects, but contains many different partitionings of the set of objects. A particular partitioning can be obtained by "cutting" the dendrogram at some optimal value, defined relative to the dendrogram. For criteria to determine this cut-off value, see (e.g.) Blanc et al. (1994) and Hogeweg (1976b). In interpreting the groupings obtained, so-called label information can play an important role. Label information is basically all the information that is known about the objects which was not actually used in the clustering process itself (i.e., in determining the similarity between objects). Label information includes (e.g.) date of sampling, place of sampling, the date of analysis of the sampling, etc. It may be found - sometimes unexpectedly or unwanted - that the groupings obtained in the cluster analysis correlate with certain label information. In the next sections, some of the most frequently used similarity measures and clustering methods will be briefly described.
38
C.
Similarity measures
(i) City-block distance The similarity measure used in the simple example, the city-block distance (or character difference), is given by N
Oi,j-ElCk,i-
c~,jl,
<~)
k=l
where Dij is the dissimilarity between objects i and j, N is the total number of characters, and Ck,i is the value that object i takes on for character k (index k runs from 1 to N). To calculate the mean city-block distance, the total number of characters is used as the denominator, i.e., ~ [ C k,i -Ck,j[ Di'j - -N1 k=l
(6)
(ii) Euclidean distance The distance between two objects in character space is used as a measure of their dissimilarity: Di,j -
E ( Ck,i -- Ck,j ) 2 ,
(7)
k=l
where Oij is the distance between objects i and j, and Cki is the value that object i takes on ~or character k (that D.. represents distance can' easily be seen for N - 2, using the Pythagorean theorem). To avoid the use of the square root, the value of the distance is often squared, and this expression is referred to as "squared Euclidean distance". In comparing electrophoresis patterns, the matrix of similarities can be based either on the Pearson correlation coefficient or on one of the band-matching coefficients (Applied Maths, 1998). 9
/,J
(iii) Pearson or product-moment correlation coefficient The similarity between two objects is calculated as the correlation between the two arrays of character values (typically densitometric values) taken on by the two objects:
Z(Ck,i - Ci)(Ck,j
Sij
-
~=~
--Cj) ,
(8)
where S.. is the similarity (i.e., correlation coefficient) between objects i and j, Ck, i l,J is the value that object i takes on for character k, and C i is the mean of all the character values of object i. The value of the correlation coefficient ranges from
39 + 1 for perfect association to -1 for negative association; a value of 0 indicates that there is no association. That a correlation of 1 means perfect association can be seen by correlating object i to itself, i.e., N N Z(fk,i -'Cii)(Ck,i --Cii) Z(fk,i _~//)2 _ Si,i-I~
~
k=l
(Ck, i _ ~//)2 k=l
--Nk = l Z(fk,i _
--" 1
_
(Ck,i --Cii )2
--
(9)
)2
k=l
The correlation coefficient is a shape measure; i.e., it is sensitive to the pattern of dips and rises across the character values. Two profiles can have a correlation of + 1 and yet not be truly identical (i.e., take on the same values). This occurs, for example, when the two profiles have the same pattern of dips and rises, but one profile is elevated compared to the other (see also Chapter 3).
(iv) Band-based similarity coefficients (a) Coefficient of Jaccard. The similarity between two tracks of bands is the number of matching bands divided by the total number of bands in both tracks (i.e., the corresponding bands plus the track-specific bands)" Si,j -
Fli'j n i
,
(10)
+ n j - ni, j
where S..,,jis the similarity between tracks i and j, n/j is the number of corresponding bands for i and j, n i is the total number of bands m i, and n. is the total number of bands in j. So n.! + nJ. - n..l,J is the total number of bands in ~oth tracks, not double counting the corresponding ones. If all bands in i match those in j, then S t,j- 1.
(b) Area-sensitive coefficient. This is a more sophisticated similarity measure, which also takes into account the possible differences in areas of the matching bands: Sij =
Ai,j
,
(11)
n i + n j - ni, j
where
ni'j Ai,j - ~
O( ,
(12)
where ~ is a constant, a n d [Bi, k - Bj,k l is the absolute difference between the areas of the k-th corresponding band in i and j, where k runs from 1 to n t,j. Thus, differences in band areas of the corresponding bands are penalised. If the areas of all corresponding bands of both tracks are equal, this coefficient is reduced to the coefficient of Jaccard: i f Bi, k - Bj,~ for all k, Ai, j - ~'~ni'jk:l1 - ni, j.
40
/
,"
1
""
J
"'-.....'
Dk,1/
9 Dk,i"..
, .
.
.'
..-
Dk, j
.
9
.
.
.
Fig. 2.4. UPGMA or group average (see text). The dissimilarity between an object or cluster k, and a cluster 1 formed by joining objects or clusters i and j, is the average of the dissimilarities between k and i, and between k and j, weighted for the number of points in clusters i and j.
(c) Dice coefficient. The Dice coefficient is very similar to the coefficient of Jaccard, but gives more weight to matching bands" 2ni,j
Si, j = ~ ,
(13)
n i + nj
where Si,j is the similarity between tracks i and j, rti,j is the number of matching bands for i and j, n i iS the total number of bands in i, and n. is the total number of bands in j. J
D.
Clustering methods
(i) UPGMA or group average This similarity measure, termed the unweighted pair group method using arithmetic averages (UPGMA), was used in the simple example discussed earlier in this chapter. It states that the dissimilarity between an object or cluster k, and a cluster l formed by joining objects or clusters i and j, is simply the average of the dissimilarities between k and i, and between k and j (taking into account the number of points in clusters i and j ) (Fig. 2.4). This is given by the formula Dk, l =
NiDk, i + NjDk, j
N i +Nj
,
(14)
where k is the index used for an existing cluster or object, I is the index used for
41
Fig. 2.5. With Ward's clustering method, a cluster of aberrant points (in this example, the cluster with two points) is often found which have nothing in common with each other except that they are dissimilar to the other objects.
the newly formed cluster, Dk, ~is the dissimilarity between k and I, N i is the number of objects in cluster i, and NJ is the number of objects in cluster j. This clustering method effectively leads to minimisation of the average dissimilarity between the objects in a cluster. This interpretation holds for all types of similarity measures. The clustering structure is less pronounced and the clusters are more limited in diameter than with Ward's clustering method (see below). (ii) Ward's averaging With Ward's averaging, those clusters (objects) are joined which lead to a minimal increase in the total within group variance. This results in the following properties of the method: (a) a cluster of aberrant points is often found which have nothing in common with each other, except that they are dissimilar to the other objects (Fig. 2.5); (b) more groups are distinguished in dense areas of the character space (i.e., where most of the objects are); and (c) every data set shows a clear cluster structure, which does not necessarily imply that there are clear separations.
2.4
E X A M P L E S OF APPLICATIONS OF C L U S T E R ANALYSIS
Among the many possible areas of applications, pattern detection techniques are now widely used in both taxonomy and epidemiology. In taxonomy, the objective is to classify organisms into genera and species on the basis of their genotypic or phenotypic relationships (i.e., taxonomy is not necessarily limited to identifying relationships by ancestry); in epidemiology, the objective is confined to identifying bacterial isolates in terms of their recent ancestry (i.e., their epidemiological origin). Many examples of both applications can be found throughout this book. In this chapter, just three examples are given to illustrate the various goals of cluster analysis. The first example (Coenye et al., 2000) shows how cluster analysis used on different types of data, in combination with the evaluation of the groups obtained in terms of label and other information, can help to unravel the taxonomy of microorganisms. A polyphasic taxonomic study was performed on a group of isolates identified tentatively as Burkholderia cepacia, a bacterial pathogen that causes
42 life-threatening lung infections in cystic fibrosis patients. Using cluster analysis with the Pearson or product-moment correlation coefficient as the similarity measure, and UPGMA as the clustering method, analysis of SDS-PAGE fingerprints of whole-cell proteins (see Chapter 4) and AFLP fingerprints (see Chapter 8) identified at least five different species, and this was confirmed by DNA-DNA hybridisation experiments. Based on genotypic and phenotypic characteristics, these organisms were then classified in a novel genus, Pandoraea. The second example (Sloos et al., 1998) demonstrates the application of cluster analysis to microbial epidemiology. The diversity of strains of Staphylococcus epidermidis in a neonatal care unit of a secondary care hospital in The Netherlands was studied. Samples were taken consecutively from patients, and the isolates obtained were typed by pulsed-field gel electrophoresis (PFGE; see Chapter 7) and quantitative antibiogram analysis. The antibiograms were used to group the organisms (Fig. 2.6), using squared Euclidean distance as the similarity measure and Ward's averaging as the clustering method. The main grouping obtained was evaluated for its correlation with other characteristics of the individual isolates, including PFGE type, length of stay, usage of antibiotics, birth weight and cubicle number. Thus, these characteristics of the isolates were not used in the generation of clusters, but were used as label information to help interpret the grouping. The cluster analysis revealed that 14 isolates from six patients had a common PFGE pattern and were of one multiresistant antibiogram type. The remaining isolates belonged to a variety of PFGE types and were more susceptible to antibiotics. Colonisation with the multiresistant strain correlated with a long period of stay and with the use of specific antibiotics. Cluster analysis on the basis of antibiograms was also performed on a combined collection that included multiresistant strains from another hospital in the same area. This analysis revealed that the multiresistant strains from both hospitals were closely related, and suggested that transfer of the multiresistant strain had occurred between hospitals. In the third example (Blanc et al., 1996), cluster analysis of quantitative antibiograms was performed to test whether a typology based on antibiograms would correspond to typologies based on other characteristics. It was found that the grouping obtained by cluster analysis of antibiograms was equivalent to the grouping obtained by ribotyping (see Chapter 5) when the ribotyping was used as label information to evaluate the clusters. 2.5
DISCUSSION
Cluster analysis is a procedure that starts with a data set containing information about a set of objects, and then attempts to organise these objects into groups that are in some sense optimal for the data set under consideration. Cluster analysis can be used for a variety of goals (Aldenderfer & Blashfield, 1984), including developing typologies or classifications, generating concepts or hypotheses through data exploration, and testing whether typologies or classifications generated by other procedures, or by using other data, are present in the data set under consideration.
43
Fig. 2.6. Strain characteristics (left), antibiogram susceptibility profiles (middle), and grouping of 53 Staphylococcus epidermidis isolates of neonates on the basis of zone diameters (right). Squared Euclidian distance was calculated between all possible pairs of zones, and grouping was performed using Ward's method. The dotted line denotes the distance at which four clusters are delineated. The inhibition zones were used for classifying isolates into "susceptible" (green), "intermediate resistant" (blue), or "resistant" (red) categories for each antibiotic, using the standard Dutch criteria for susceptibility determination. Taken from Sloos et al. (1998).
These goals are illustrated, respectively, by the studies of Coenye et al. (2000), Sloos et al. (1998) and Blanc et al. (1996), as described above. Although pattern detection is sometimes regarded as yet another form of statistics, there are important conceptual differences (Hogeweg, 1976a)" 1. In statistics, deviations from randomness in the data set are looked for, while in pattern detection the structure in the data set is sought. Note that a random data set can also have structure!
44 2. In statistics, attempts are made to make sample-independent statements. The data under consideration are assumed to be a random sample of the whole population, and the objective is to make statements about the whole population by looking at a representative sample of the population. Ideally, these statements should not change if a different random sample is taken from the population. In pattern detection, the data set under study is not considered a sample from a larger population but is considered all there is. A different structure may be found if new data is added (e.g., in taxonomy when new species are discovered). 3. In statistics, groups (and an underlying distribution) are pre-supposed and tests are made to determine whether these groups differ significantly from each other (i.e., more than can be expected on the basis of random fluctuations alone), while in pattern detection, groups are generated per se. In other words, concepts are tested in statistics (i.e., attempts are made to answer the question as to whether pre-supposed groups are different), while concepts (i.e., groupings) are generated in pattern detection. Descriptive statistics may be used in pattern detection for characterising the grouping obtained in cluster analysis. Cluster analysis can best be seen as a heuristic, rather than a statistical, method for exploring the diversity in a data set by means of pattern generation. The result of a cluster analysis study can, and usually does, depend on the similarity measure used, the clustering method used, the set of objects in the study, the characters used to describe the objects, and the relative weight different characters are given in calculating the similarity between objects (see Hogeweg, 1976b; Van Ooyen & Hogeweg, 1990). Rather than trying to find the "right" pattern or classification, the differences in the patterns as revealed by the cluster analysis should be used to gain further understanding of the objects under study (see also Hogeweg, 1976a). Used in this heuristic way, cluster analysis is a powerful tool for data exploration in taxonomy and epidemiology, as well as in many other areas such as functional genomics.
REFERENCES Aldenderfer, M.S. & Blashfield, R.K. (1984). Cluster analysis. Sage Publications, Newbury Park. Applied Maths (1998). GelCompar (comparative analysis of electrophoresis patterns) reference manual, version 4.1. Applied Maths, Kortrijk. Blanc, D.S., Lugeon, C., Wenger, A., Siegrist, H.H. & Francioli, P. (1994). Quantitative antibiogram typing using inhibition zone diameters compared with ribotyping for epidemiological typing of methicillin-resistant Staphylococcus aureus. Journal of Clinical Microbiology 32, 2505-2509. Blanc, D.S., Petignat, C., Moreillon, P., Wenger, A., Bille, J. & Francioli, P. (1996). Quantitative antibiogram as a typing method for the prospective epidemiological surveillance and control of MRSA: comparison with molecular typing. Infection Control and Hospital Epidemiology 17, 654-659. Bock, H.H. (1974). Automatische klassifikation. Vandenhoeck & Ruprecht, Gotingen. Coenye, T., Falsen, E., Hoste, B., Ohlen, M., Goris, J., Govan, J.R.W., Gillis, M. & Vandamme, P. (2000). Description of Pandoraea gen. nov. with Pandoraea apista sp. nov., Pandoraea pulmo-
45 nicola sp. nov., Pandoraea pnomenusa sp. nov., Pandoraea sputorum sp. nov. and Pandoraea norimbergensis comb. nov. International Journal of Systematic and Evolutionary Microbiology
50, 887-899. Everitt, B. (1993). Cluster analysis, 3rd edn. Arnold, London. Hogeweg, E (1976a). Topics in biological pattern analysis. PhD Thesis, University of Utrecht. Hogeweg, E (1976b). Iterative character weighing in numerical taxonomy. Computers in Biology and Medicine 6, 199-211. Sloos, J.H., Horrevorts, A.M., Van Boven, C.EA. & Dijkshoom, L. (1998). Identification of multiresistant Staphylococcus epidermidis in neonates of a secondary care hospital using pulsed field gel electrophoresis and quantitative antibiogram typing. Journal of Clinical Pathology 51, 62-67. Sneath, EH.A. &. Sokal, R.R (1973). Numerical taxonomy. Freeman, San Francisco. Sokal, R.R. &Sneath, P.H.A. (1963). Principles of numerical taxonomy. Freeman, San Francisco. Van Ooyen, A. & Hogeweg, E (1990). Iterative character weighting based on mutation frequency: a new method for constructing phyletic trees. Journal of Molecular Evolution 31,330-342.
This Page Intentionally Left Blank
47
3
Setting-Up Intra- and Inter-Laboratory Databases of Electrophoretic Profiles
Herre E Heersma l, Kristin Kremer 2, Dick van Soolingen 2 and John Hauman 3 1Management Team Computerisation and Methodological Consultancy, and 2Laboratoryfor Infectious Diseases Surveillance, National Institute of Public Health and Environment, Bilthoven, The Netherlands; 3Department of Microbiology, University of Otago, Dunedin, New Zealand
CONTENTS 3.1
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.2
PRINCIPLES OF C O M P U T E R - A S S I S T E D ANALYSIS OF DNA FINGERPRINTS . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
A. B. C. D.
3.3
FACTORS D E T E R M I N I N G THE C O M P A R A B I L I T Y OF DNA PATTERNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. B. C.
D. E.
3.4
Scanning the images Conversion of images into the analysis software Normalisation The analysis phase (i) Matching single patterns against a list (ii) Group and cluster analysis (iii) Use of similarity coefficients (SCs)
Resolution of the scanned images Background subtraction and optimising the OD values Internal size markers (ISM) and external reference strains (ERS) (i) ERS (ii) ISM (iii) General recommendations for selection of ERS and ISM (iv) The use of different ISM and ERS within one project (iv) Tuning ISM and ERS Comparison of results when using ISM or ERS Evaluation of the computer-assisted phases
E X P E R I E N C E S AND APPLICATIONS A. B. C. D.
.................
The cluster similarity matrix Band-based similarity vs. Pearson correlation: pros and cons Exchanging DNA patterns: the bundle concept A=B and B=C does not mean A=C
51 52 53 53 54 54 56
58 58 59 61 61 62 63 64 64 65 66
67 68 68 69 70
3.5
SETTING UP I N T R A - A N D I N T E R - L A B O R A T O R Y DATABASES OF E L E C T R O P H O R E T I C PROFILES: R E C O M M E N D A T I O N S .... 72
3.6
FINAL R E M A R K S
REFERENCES
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
02001 Elsevier Science B.V. All rights reserved.
48 3.1
INTRODUCTION
The microbial identification and typing techniques described in this book make use of a range of methodologies leading to the generation of a distinguishing fingerprint, normally combined with the use of electrophoresis as a separation strategy. During a long-term surveillance project at the Laboratory for Infectious Diseases at the National Institute of Public Health and Environment (RIVM), considerable experience has been acquired in constructing databases of typing results in the form of DNA fingerprints, in which laboratory, clinical and patient data are combined. This chapter describes the setting-up of databases of electrophoretic profiles and other laboratory data that can then be used for intra- and inter-laboratory studies, as well as national and international surveillance work investigating transmission routes, sources of infection, population based studies, etc. The work will be illustrated by reference to extensive studies that have been performed at RIVM with Mycobacterium tuberculosis. The discovery of transposable elements and other repetitive DNA elements in strains belonging to the M. tuberculosis complex in the early 1990s led to various DNA fingerprinting methods being developed to differentiate between strains belonging to the complex. Some of these methods rely on PCR-based amplification of repetitive DNA sequences (Haas et al., 1993; Plikaytis et al., 1993; Friedman et al., 1995; Kamerbeek et al., 1997; Otal et al., 1997). Other methods visualise restriction fragments containing particular repetitive DNA elements such as insertion sequences (IS) (Van Soolingen et al., 1992; 1993; 1994a, b; Van Embden et al., 1993; Wiid et al., 1994). A comparison between PCR-based and restriction fragment length polymorphism (RFLP) typing methods revealed that IS6110 RFLP typing is the most discriminative and reproducible typing method currently available (Kremer et al., 1999). This chapter will refer throughout to the results of international collaboration within the framework of the EU Concerted Action for Genetic Markers for the Epidemiology of Tuberculosis (CAonTB). Within this project, an inter-laboratory study has been carried out, with the goal of determining the limits of reproducibility of an international database of M. tuberculosis IS6110 RFLP DNA patterns. This database was filled with results from over 30 laboratories worldwide. The laboratory work was already standardised with respect to restriction enzyme, probe and size markers. Based on this inter-laboratory study, standards and guidelines for computer-assisted analysis were established. This concerted effort to develop and standardise DNA fingerprinting and to develop computer-assisted methods has been very successful in establishing a robust infrastructure for studying the molecular epidemiology of TB in Europe. The experiences within this Concerted Action have led to exciting new possibilities for studying the transmission of tuberculosis. These experiences may also be applicable to other organisms. The analysis techniques described in this chapter are based on the application of GelCompar 2 and B ioNumerics software (Applied Maths, Kortrijk, Belgium), although the techniques are also applicable for use with other software.
49
I
Scanningor digitising
I
~1 C~176
Normalisation
Analysis
I
Fig. 3.1. The basic steps in computer-assisted analysis of electrophoretic profiles.
In brief, the application of electrophoresis for the computerised characterisation of an organism by its fingerprint involves two steps: I. Electrophoresis of macromolecules (protein or DNA), resulting in a banding pattern on a gel, a film, an autoradiogram or a densitometric curve. Approaches leading to this first step (Van Soolingen et al., 1999b), often referred to as 'the wet laboratory work', are discussed in the other chapters in this book. II. Recording of the electrophoretic profiles by a camera, scanner or densitometer into a digitised form (bitmaps or densitometric curves), and subsequent analysis of the bitmap or densitometric curves with computer software. The process of computer-assisted analysis of the electrophoretic profiles, based on one of a range of typing methods, basically consists of four distinct and sequential steps (Fig. 3.1): 1. creation of a digitised image that can be processed by a computer; 2. conversion of the digitised image into data that the computer analysis software can handle, and the recognition of the different lanes containing the images of the DNA fragments; 3. normalisation, involving interpolation of the different lanes to remove distortions; 4. analysis or comparison of the electrophoretic profiles processed in the previous phases; throughout the remainder of this chapter, electrophoretic profiles will be referred to as DNA fingerprints or DNA patterns. If a DNA sequencer is being used to generate the data (e.g., see Chapter 6), steps I and II. 1 are combined. The optical detector is located at a fixed position and the bands are recorded directly during electrophoresis while passing the detector. The following sections discuss the basic steps involved and the factors that determine the comparability of DNA patterns and the reproducibility of databases of DNA patterns. The national database of M. tuberculosis IS6110 RFLP patterns in The Netherlands and the international TBase database will be used as examples. Finally, results from an international inter-laboratory study based on the analysis of DNA patterns will be presented and some recommendations for setting-up intraand inter-laboratory databases of DNA patterns are delineated.
50
Fig. 3.2. Example of the analysis of two autoradiograms with ISMs. Marker points lm, 2m, 3m and 4m (not visible here) on the ISM autoradiogram, and marker points 1, 2, 3 and 4 on the radiogram with the IS6110 RFLP patterns are used for superimposition. To do so, both autoradiograms are put on a light box. After that step, small holes (5 and 6) are made with a sharp tool in both superimposed autoradiograms. These holes allow better alignment in the computer than the sometimes too dark markers.
3.2
PRINCIPLES OF COMPUTER-ASSISTED ANALYSIS OF DNA FINGERPRINTS
In this section, the recording of the patterns and computerised processing of the digitised information will be discussed. A more comprehensive description can be found in Heersma et al. (1998). The mobility of the DNA fragments in an agarose gel depends on numerous factors, including the DNA concentration, the type and concentration of the agarose, the voltage, temperature, buffer, electrophoresis time and temperature. Changes in any of these factors can lead to alterations in the profiles, which then influences the comparability of patterns, both within a single gel as well as between different gels. Two basic techniques can be applied to improve the comparability of patterns (Van Embden et al, 1993; Friedman et al, 1995), namely the use of internal size markers (ISM) or external reference strains (ERS). The difference between these
51
Fig. 3.3. Example of the use of ERS. In this autoradiogram, containing IS6110 RFLP patterns from 22 M. tuberculosis strains, lanes 1, 12 and 22 contain the Mt14323 ERS. A disadvantage of this ERS is the sometimes very weak upper and lower bands. The marker points and the small holes (see Fig. 3.2) are not used in this case. techniques can be seen in Figs. 3.2 and 3.3. When using ISM, the DNA patterns to be analysed and the reference patterns that enable compensation to be made for misalignments and errors are on different autoradiograms which have to be overlaid very accurately, whereas with ERS, the reference strains used to compensate for misalignments are on the same autoradiogram.
A.
Scanning the images
Before further processing, electrophoretic patterns, whether on agarose gels or autoradiograms, have to be transformed into a digitised image. This can be done with a CCD camera or a scanner, usually a flatbed type. A CCD camera contains an objective lens and a light sensitive device, the charge coupled device (CCD), with a fixed resolution in pixels or dots per inch (dpi) in both directions. A flatbed scanner usually consists of a glass cover, a light source and a computer-controlled array of light sensitive elements. In most cases, the resolution of a flatbed scanner is much higher than that of a CCD camera. With dedicated computer software, the image size and the resolution can be changed. Furthermore, properties of the digitised image, such as contrast and brilliance, can be adapted in
52
Fig. 3.4. Result of the matching of one strain to a list of patterns. Reference strain Mt14323, which was processed on another autoradiogram, was matched against the patterns of the autoradiogram in Figs. 3.2. and 3.3. Optimisation was enabled and both position tolerance and optimisation were set to 1% (see section 3.2.D). various ways.
B.
Conversion of images into the analysis software
In the next phase, the conversion or band pattern recognition phase, the digitised image is used as input, and this is converted to the BioNumerics or GelCompar 2 format. Pattern recognition software then gives the information necessary for the next steps, i.e., the overall borderlines of the desired parts of the image, the borderlines of the different 'raw' DNA fingerprints or 'Gelstrips' and, for the BioNumerics software, a densitometric curve for each DNA fingerprint. At the end of this step, each selected DNA pattern has its own individual entry in the computer database and is thereby separated from the rest of the gel or autoradiogram (Fig. 3.4). In many cases, it can happen that the 256 grey levels are not fully used in the image file. This may result in a dark background and/or weak looking bands. The software offers the possibility of applying background subtraction and optimising use of the grey scale. This is based on the lightest and darkest optical density (OD) values, and the 'stretching' of these values to the full scale of 256 grey levels. In this way it is possible to remove the background 'noise' caused by the grey or blue photographic emulsion (Fig. 3.5).
53
Fig. 3.5. Example of background subtraction to remove background noise caused by the photographic material.
C.
Normalisation
The normalisation phase allows the alignment of patterns to be achieved by associating bands to predefined reference patterns. When using ERS, the alignment of reference patterns is done by aligning bands with corresponding predefined reference bands and by subsequent interpolation of the intermediate values. The other (non-reference) tracks are aligned gradually according to the closest neighbouring reference tracks on either side. If ISM are used, each track is interpolated separately and individually. It is possible to correct for distortions in the gel by combining any sample with any other, using one or more bands which are known to be the same in different patterns. In this way, it is possible to perform alignments in each track. This is the crucial difference between the two techniques: ERS can only be used to compensate for global distortions, while with ISM it is possible to minimise local distortions (Figs. 3.6 and 3.7).
D.
The analysis phase
After the normalisation phase, the results can be analysed in two ways:
54
Fig. 3.6. The result of normalisation using ISMs. Each lane is used as a reference lane. Compare this to the result of the normalisation when using ERS in only three lanes (Fig. 3.7). 1. single patterns can be matched to a database or a list of patterns to reveal transmissions; 2. complete databases or lists of patterns can be compared and grouped to reveal the relatedness of patterns within a database. This section discusses the different steps involved in the analysis phase. A comprehensive description of the theoretical background to pattern analysis can be found in Chapter 2.
(i)
Matching single patterns against a list
When a single pattern is matched against a set (a database or a list) of N patterns, the similarity coefficient (SC) between each pattern in the set and the pattern to be matched is calculated. This SC is a number between 0 (no similarity at all) and 1 (100% identical patterns). The result is an array of N SCs. In the next step, the patterns are grouped with respect to their SC in decreasing order, and a list is depicted (Fig. 3.4).
(ii)
Group and cluster analysis
Sets of DNA patterns, in databases or lists, can be compared mutually and grouped. This process is often called cluster analysis. A cluster is considered to be a group of
55
Fig. 3.7. The result of normalisation using ERS. In this case, lanes 1, 12 and 22 were reference lanes. It is clear that misalignments and errors, especially in lanes 15-21, were not well-compensated. organisms with similar DNA patterns, and is delineated at a certain optimal 'cutting' level in a dendrogram (see Chapter 2). As a first step, the SCs for all patterns (including a self-comparison) are calculated for each of the N patterns. This results in a symmetric N • N similarity matrix D (sometimes also called the dissimilarity or resemblance matrix); each element Di,j of the matrix represents the SC of patterns i and j. The elements D .1,1 on the main diagonal of the matrix are all 1 (= 100%), reflecting the SCs of each pattern when compared against itself. Again, for all other elements of the matrix, each SC value is a number between 0 (no similarity at all) and 1 (100% identical patterns). The ordering in this similarity matrix is decided by the way in which the patterns are grouped in the list or the database, i.e., randomly. In the next step, the similarity matrix is transferred into a tree by using a clustering method. This is a series of steps that, theoretically, reduces the similarity matrix and, at the same time, develops a tree or dendrogram. In this example, a cluster is a set of DNA patterns that the computer considers to be identical according to the SCs. A cluster can contain as little as one pattern, indicating that the list contains no other pattern that matches at the defined cut-off level. The reorganised or optimised matrix is then kept in the computer. The most frequently used algorithm for clustering DNA patterns is the unweighted pair-group method using arithmetic averages (UPGMA; see Chapter 2).
56 This algorithm first clusters two patterns by calculating the unweighted average pattern. This new average pattern replaces the two original patterns. In the next step, this average pattern clusters with another pattern, and again the average of both patterns is calculated, and a new pattern replaces the other two. Theoretically, after each clustering step, the dimension of the matrix decreases and the clustering is a finite process.
(iii) Use of similarity coefficients (SCs) The two most frequently used measures to calculate SCs are Pearson's (productmoment) correlation coefficient or band-based comparisons.
(a) Pearson's correlation coefficient. Pearson's correlation coefficient (see Chapter 2) calculates the congruence between arrays of values, for instance the densitometric curves. Each element of the array contains a densitometric value, the OD, as interpreted by the scanner, optimised in the conversion phase and interpolated in the normalisation phase. This densitometric value can be anything between 0 and 255. So if the length of a track is 800 pixels (or BioNumerics positions), the array contains 800 greyscales between 0 and 255. (b) Band-based SCs.
Band-based SCs are calculated from arrays that contain only binary data, and reflect the fact that at a particular location on the autoradiogram, a DNA fragment is present (1) or not absent (0). Bands can be assigned manually or automatically. In this way it is possible to avoid or minimise the influence of all kinds of local errors, such as stain variations, non-linearities in the photographic material, differences in thickness of similar bands in different lanes, etc. Although most errors and non-linearities have been removed in the normalisation phase, differences in the morphology of bands can lead to differences in the interpretation of the positions of fragments on a gel. The band-based methods used most frequently to calculate SCs are the Dice and Jaccard methods. There is no difference between these measures with regard to the clustering, although Dice gives more weight to matching bands, whereas Jaccard gives more weight to differences (Figs. 3.8 and 3.9).
(c) Misalignments: tolerance settings and optimisation.
Even in the best laboratory situation, the different steps will almost always lead to disturbances or misalignments, thereby influencing the reproducibility of the DNA patterns. When the Pearson correlation coefficient is used with GelCompar or B ioNumerics software, clusters of 100% identical DNA patterns will be very rare because of differences in staining, background, or the morphology of bands, etc. Even for electrophoresis profiles of the same isolate on the same gel, small differences will always be present. The normalisation process also causes misalignments. Optimisation allows track-to-track corrections to compensate for the remaining misalignments by shifting one or both tracks with respect to the other until they reach their maximum correlation. The Optimisation value is the maximum shift between two pat-
57
Fig. 3.8. Clustering of the M. tuberculosis IS6110 RFLP patterns shown in Figs. 3.2 and 3.3. ISM were used with Jaccard and UPGMA, with tolerance and optimisation set at 1%. Compare this result with Fig. 3.9.
Fig. 3.9. Clustering of the M. tuberculosis IS6110 RFLP patterns shown in Figs. 3.2 and 3.3 with the Dice coefficient. Other settings were identical to those used in Fig. 3.8.
58 terns, and can be adjusted by the user. In cases where the physical and biological techniques lead in theory to binary results (i.e., there either is a fragment or there is not), such as with IS6110 RFLP typing, it is recommended that band-based SCs are used. The lack of complete 100% reproducibility of the patterns again leads to small misalignments when assigning bands. Position tolerance settings enables the position tolerance to be specified, which is the distance allowed between the positions of two bands at which they will still be regarded as matching. This setting can be adjusted by the user. As when using the Pearson correlation coefficient, the optimisation feature then enables track-to-track corrections to be made to compensate for the remaining misalignments by shifting one or both tracks with respect to the other until they reach their maximum correlation; this value is the SC for these patterns. 3.3
FACTORS DETERMINING THE COMPARABILITY OF DNA PATTERNS
Each of the steps mentioned in the previous section has its own factors that determine the reproducibility of the final results. These factors are discussed in more detail below.
A.
Resolution of the scanned images
A flatbed scanner can be used to scan polaroid photographs of a gel. When the patterns are reproduced on films or autoradiograms, the scanner should be extended with a transparency adapter in order to obtain the best results. Most modem flatbed scanners have a maximum resolution of at least 300 dpi and also at least 256 grey tones. This means that for a set of patterns with a total size of 10 x 15 cm (or about 4 x 6 inches), the computer image is about 1200 x 1800 pixels. This is an image size that can be processed by modem computers and gives very good results. It also leaves some room to adapt the scanned area. In most cases the resolution can be adapted to optimise file sizes in the computer. Negatives and slides with a size of 24 x 36 mm can also be transferred into computer images, although an ordinary flatbed scanner with a hardware resolution of 300 or 600 dpi is inappropriate for scanning slides! Although flatbed scanners are supplied with software to increase the resolution of scanned images to 4800 or even 9600 dpi by applying optimisation and interpolation techniques, the results are very poor, at least for the goals of this chapter. In this case a dedicated slide scanner, with a higher resolution (at least 2000 dpi) has to be used. A disadvantage of a CCD camera is the limited, and in most cases fixed, number of pixels, e.g., 768 x 480. Further, the absence of a zoom objective may limit the optimal use of the number of pixels, particularly if the relevant area is small compared to the area covered by the camera. When DNA patterns from slides (24 x 36 mm), autoradiograms (e.g., with a format of 10 x 15 cm) and/or polaroids (e.g., with a format of 90 x 90 mm) have to
59 be merged into one database, the resolutions have to be tuned to obtain an optimal result. In the example above, 300 dpi was used for the autoradiogram. Since a slide is about one-quarter the size of the autoradiogram used, a slide would have to be scanned in this case with a resolution of about 1200 dpi, whereas a polaroid would have to be scanned with a resolution of 10/9 x 200 = 220 dpi, assuming that the electrophoresis direction is parallel to the small edge of the autoradiogram. Images can be saved in different file formats. The most commonly used formats are Bit Map (BMP), Tagged Image File Format (TIF or TIFF), and Graphical Image Format (GIF). Image formats like JPEG or JPG are based on compression techniques to decrease image file sizes drastically. These "lossy" formats are very useful for applications where the file sizes play an important role, e.g., Internet applications. However, the compression techniques result in a loss of information and the introduction of errors, and these formats should thus never be used for the goals described here. The GelCompar 2 and B ioNumerics software programs only accept TIFF files. In order to assign the positions of bands on computerised images of DNA patterns accurately, each band has to be covered by at least several pixels. At RIVM, the minimum band thickness of an IS6110 RFLP DNA pattern on an autoradiogram with fingerprints of M. tuberculosis strains is about 1 mm. For an optimal result, initially at least 8 pixels have to cover the area of a band. Within the framework of the CAonTB project, this means that the optimal scanner setting is 200 dpi. If B ioNumerics software is being used, the resolution is then transformed to about 160 dpi (700 pixels for the whole normalised image), which means that each band is covered by at least 6 pixels. By doing so, the maximum error, introduced by the number of pixels, is 1/700 or 0.14%. This error is relatively small and negligible, at least when it is compared to the other possible sources of error. Of course, values and numbers mentioned above are just examples, and may differ for each type of experiment.
B.
Background subtraction and optimising the OD values
To optimise the quality of the DNA patterns, it is possible to apply background subtraction and optimise use of the OD scale in several of the various steps. The Pearson correlation coefficient is invariant to global shifts in the values of the arrays of elements and multiplication of the value with a factor. Pearson's correlation, r, is expressed by the following formula: ~ ( x - ~ ) ( y - y) ~/~_.](x _ 2)2 (y _ y)2
(1)
where y - average of n OD values x and y - average of n OD values y. If a constant value A is then added to each array element of x (or y) (i.e., each element x is replaced with x + A), this shift does not influence the value of the correlation factor r (Fig. 3.10). In a similar way, it is easy to prove that if each
60
Fig. 3.10. Example of a series of 15 points x~and a series of 15 points xi +3. The points in both series have the same correlation (0.997). The correlation between both series is 1 !
Fig. 3.11. Example of a series of 15 points xi and a series of 15 points 1.5*xi. The points in both series have the same correlation (0.022). The correlation between both series of points is 1 !
OD value x (or y) is replaced with Ax (A is the constant factor, and in fact each array element is multiplied with A), that this factor can be erased from the formula and thereby does not influence the SC (Fig. 3.11). From this it can be concluded that both background subtraction and optimisation of the OD scale leave the SCs unaltered, and thus that they do not influence the final clustering. Theoretically, r can have each value b e t w e e n - 1 and 1 (Everett, 1986; Romesburg, 1990; Armitage et al., 1991; Norusis, 1993); however, due to the types of arrays (the OD values) for DNA patterns, r will never be 0 or less.
61
Fig. 3.12. Three ERS as used within the CAonTB. Each marker was processed twice on the same gel. Top, H37Rv; middle, TN2650; bottom, Mt14323. It is clear that H37Rv does not cover as much of the area as TN2650 and Mt14323. C.
Internal size markers (ISM) and external reference strains (ERS)
In order to be able to compare DNA patterns from different gels and from different laboratories in a computerised way, it is essential to use ISM and/or ERS. In this way it is possible to compensate for errors and misalignments. This is a necessary condition to be able to build databases with large numbers of DNA patterns for the purposes mentioned in the Introduction.
(i) ERS By loading marker DNA in a few lanes of each gel, differences in migration between DNA fragments of a particular size on different gels and within gels can be corrected by interpolation techniques. Marker DNA should be run in at least three lanes when the gels used contain 22 lanes (i.e., lanes 1, 11 or 12 and 22), but preferably the ERS should be used after every four or five lanes (i.e., lanes 1, 6, 11, 16 and 22). The markers should cover at least the whole molecular size range of the fragments to be analysed. This means that, before selecting an ERS, a representative set of strains has to be fingerprinted. The markers can be a commercially available size ladder or a very specific strain that satisfies the conditions (Fig. 3.12). For similar reasons, this representative set of strains can be used to optimise the area to be scanned at the point when the standards are defined. In cases where the ERS does not cover the whole "spectrum" of the DNA fragments, the results of the normalisation phase will be unpredictable for those bands that reflect fragments with a higher or a lower molecular size than the highest or lowest bands of the ERS. This is because the normalised band positions are calculated through extrapolation. As can be seen in Fig. 3.13, the ERS designated H37Rv, which was used for a long time as the ERS for analysis of M. tuberculosis strains, is not an optimal choice as about 15% of all bands belonging to IS6110 RFLP DNA patterns of M. tuberculosis strains fall outside the area covered by the H37Rv bands. It should also be noted that if a pattern has only one neighbouring ERS, the alignments of the different patterns will also be the result of extrapolation. The normalisation process will then lead to unpredictable results.
62 Distribution of 103805 bands in 10867 RFLP patterns
4000 T
...........................................................................................................................................................................................................................
3:00
:
"
4
i
i
/
~oo
14323- ~
H37Rv f
Fig. 3.13. The distribution of the bands of the three ERS in Fig. 3.12 plotted against the distribution of the bands of a large collection of M. tuberculosis strains. About 15% of the bands are not covered by H37Rv, and the results of the normalisation when using H37Rv for the bands representing higher molecular size fragments should be considered unreliable.
(ii) ISM The use of ISM in each lane is a more accurate way to correct for inter- and intra-gel differences in DNA migration. The DNA of the strains to be analysed is mixed with a size marker DNA, which hybridises with a different probe, and two hybridisations are carried out: first to detect the DNA fragments to be analysed, and second to detect the marker DNA. Thus, two autoradiograms are obtained, one with the banding patterns of the strains to be analysed and one with ISM patterns. The basic idea behind this technique is that all disturbances during electrophoresis, leading to distortions in the patterns, affect the mobility of both the DNA fragments and the ISM fragments equally. The most critical step when applying ISM is the superimposition of the two autoradiograms. The two autoradiograms are marked with marker points to enable the precise superimposition of the two images (this is the most critical step with regard to the introduction of errors). These marker points can be made by marking the membrane with a DNA mix, containing both strain and internal marker DNA. At the RIVM, a mix of commercially available DNA size markers is used as internal markers for the normalisation of the M. tuberculosis strains (super-coiled DNA-PvuII/PhiX174-HaeIII; Fig. 3.2). The application of ISM enables the elimination of most of the inaccuracies caused by heterogeneities in the gel and other influences for each individual pattern. Computer software allows both images to be superimposed by using the marker points (Heersma et al., 1998; Van Soolingen et al., 1999b).
63
Fig. 3.14. The band positions of Mt14323 through the years, showing the ordering of 793 Mt14323 profiles as processed by the computer. Some of the results were rejected (1). Some misalignments were caused by tests with new marker techniques or new superimposition techniques (2) and these results were also rejected. (iii) General recommendations for selection of ERS and ISM Reference strains and markers suitable for setting up massive databases with many DNA patterns have to satisfy two important conditions: 9 They must cover at least the whole spectrum of the DNA patterns from the visible fragments with the highest molecular sizes to the smallest visible fragments; 9 The reference bands have to be distributed equally across the spectrum. From Fig. 3.2, it can be seen that the marker used at RIVM (viz. super-coiled DNA-PvulI/PhiX174-HaelII) does not fully satisfy the second condition. However, this has not influenced the reproducibility. This has been tested by comparing the matching of the patterns of the ERS Mt14323 through the years, because four of the 12 bands of Mt14323 are well-distributed in the area in the middle of this ISM (Fig. 3.14).
64
Fig. 3.15. Tuning ISM and ERS.
(iv) The use of different ISM and ERS within one project Under well-defined circumstances, it is possible to use different ISM and ERS within one project. Within the framework of the CAonTB, all institutes use the same laboratory technique with regard to the choice of restriction enzyme, probe, etc. Although most of the participating institutes use one specific ISM, some use different ISM and/or ERS. At least three different ERS and two ISM were used within the ambit of the project. One problem was that DNA patterns of identical strains, processed at different laboratories, did not cluster when using the standard tolerance settings. When comparing the fragment sizes of the ERS designated TN2650 (Fig. 3.15), which is considered as good an ERS as Mt14323, it appeared that there were small, but significant discrepancies between the molecular sizes of the ERS processed in different laboratories. These discrepancies influence the final analysis of DNA patterns dramatically. Originally, TN2650 was considered as a normal DNA pattern, and therefore the molecular sizes of TN2650 were based on a different ISM used within the CDC tuberculosis network in the USA, whereas the molecular sizes calculated in Bilthoven were based on the CAonTB "standard" ISM, super-coiled DNA-PvuII/PhiX174-HaeIII. Apparently, this reference strain and the size marker were not properly "tuned". (iv) Tuning ISM and ERS It was decided to use the ERS Mt14323, used at RIVM and in most of the laboratories collaborating in the CAonTB, as the 'gold standard' from which all other settings for different ERS and ISM could be derived. In a repeated experiment, all
65 three ERS (Mt14323, H37Rv and TN2650) were processed on several autoradiograms with the same ISM, and then analysed by the computer. The result was a set of two derived standards (a set of reference band positions) based on Mt41323. Next, the experiment was repeated for the other ISM (Marker X; Boehringer), again with the Mt14323 settings as the reference setting. The result was another derived standard, in this case an ISM. It appeared again that there was a significant discrepancy between the molecular sizes as supplied by the manufacturer and those calculated at RIVM. No explanation for this finding is readily available. From this and other experiments, it is clear that laboratory circumstances may differ, and that unexpected differences in databases may appear which influence the reproducibility. Nevertheless, using this approach it was possible to ensure the interchangeability of databases based on different ISM and/or ERS, and to ensure the reproducibility of a central database. Each laboratory was supplied with a set of standard settings, based on the individual laboratory standards with respect to the ERS and/or ISM (Fig. 3.2).
D.
Comparison of results when using ISM or ERS
There are pros and cons to the use of the ISM and ERS techniques. In most cases, applying ERS is cheaper for a number of reasons: e.g., no second film material is necessary, no extra marker DNA and extra hybridisation is required, less time consuming, etc. On the other hand, when applying ERS, at least two lanes, and preferably more, have to be filled with the ERS itself, which means a less economical use of the gels. A major drawback of applying ERS is the inability to remove all the distortions in profiles. Generally, the reproducibility of a profile is a reciprocal of the distance from the two neighbouring ERS lanes. The most delicate part of the process when applying ISM is the superimposition of the autoradiogram containing the ISM and the second autoradiogram with the DNA in question. Simple techniques have been developed to ensure comparability (Heersma et al., 1998). Each profile can then be aligned individually to remove distortions and misalignments (Fig. 3.2). Within the framework of the CAonTB project, one of the challenges was to set up an international database of DNA fingerprints of M. tuberculosis strains, based on the analysis of RFLPs associated with IS6110. This database contains DNA patterns produced at about 30 different laboratories, spread throughout the world. The banding patterns usually consist of 1-24 bands of equal intensities, although some IS6110-containing PvuII fragments migrate to similar positions on a gel, which may result in bands which appear broader and have a higher intensity. Each fragment is represented by a band position that is assigned either by the analysis software or manually. Broader and more intense bands are only considered as being two or more individual IS61 lO-containing fragments if the fragments can be distinguished by eye (Van Soolingen et al., 1999b). An inter-laboratory study was carried out to investigate the differences in the
66
Fig. 3.16. Results of the inter-laboratory study of the reproducibility of the computer-assisted analyses of the test set shown in Fig. 3.2. When using ISM with a tolerance setting of 0.8% and optimisation disabled, the patterns were found by the computer to be 100% identical, whereas a tolerance setting of 1% was required using ERS to give 100% similarity.
results from the different techniques. A set of autoradiograms with 19 DNA patterns of M. tuberculosis strains and three Mt14323 reference strains in lanes 1, 12 and 22 (Fig. 3.2), was sent to 12 laboratories. Six laboratories processed the fingerprints using ISM and the other six used only the autoradiogram with the IS6110containing RFLP profiles, and thereby used Mt14323 as the ERS. The results were returned on floppy disks or by email for processing. All band positions were manually compared. Using the GelCompar program with Optimisation switched off, the clustering of all 12 sets of DNA patterns was checked with different tolerance settings. The conclusion was that, with respect to IS6110 RFLP typing and the set of autoradiograms applied, the use of ISM yielded more accurate results (Fig. 3.16). With a tolerance setting of 0.8%, for each of the 19 DNA patterns, all results from the six different laboratories using ISM clustered at 100%, whereas a tolerance setting of 1% was necessary for a 100% clustering of the results from the laboratories using ERS. It is probable that the results from this inter-laboratory study are rather too optimistic because of the good quality of both autoradiograms, and that in a real situation, the results obtained using ERS would be even worse compared to the results obtained using ISM, although this cannot be proved in the research situation.
E.
Evaluation of the computer-assisted phases
When large numbers of fingerprints from many different gels are to be compared, it is recommended that it should be possible to evaluate the accuracy of all steps in the computer. This can only be done by including one or two extra standard
67 strains on each gel. For the work within the framework of the CAonTB, over 50 laboratories worldwide use M. tuberculosis strain Mt14323 as a standard strain, and in some laboratories this strain is also used as an ERS. The band positions of strain Mt14323 are compared to a reference database comprising only Mt14323 strains. The accuracy settings, or "position tolerance", required to obtain 100% matching of the patterns of the Mt14323 strain can be determined for a given set of autoradiograms, being a measure for the analysis of the fingerprints from the clinical isolates (Fig. 3.14). It is important to emphasise that each of the different steps performed by the computer has its own particular tolerance, and will thus introduce new errors that have to be compensated for.
3.4
EXPERIENCES AND APPLICATIONS
Initially, in the early 1990s, the use of IS6110 RFLP analysis for outbreak management of multidrug-resistant M. tuberculosis was reported frequently (Beck-Sagu6 et al., 1992; Edlin et al., 1992; Greifinger et al., 1992; Coronado et al., 1993). In more recent years, population-based molecular epidemiological studies in San Francisco (Small et al., 1994), New York (Alland et al., 1994), Denmark (Yang et al., 1995), Amsterdam (Van Deutekom et al., 1997), ZUrich (Pfyffer et al., 1998) and The Netherlands (Van Soolingen et al., 1999a) have provided insights into the risk factors for the transmission of tuberculosis. Recently, transmission of tuberculosis across country borders (Samper et al., 1997) and the spread of particular M. tuberculosis genotype families in different areas were recognised (Hermans et al., 1995; Kremer et al., 1999). The inter-laboratory database TBase, available via the Internet, with many thousands of IS6110 RFLP patterns of M. tuberculosis from over 30 laboratories worldwide, is a very powerful tool to support these studies. In order to ensure the comparability and reproducibility of a database with an everincreasing number of DNA patterns, both the laboratory work and the computerassisted analysis had to be standardised. In The Netherlands, the results of a comparison to a database of IS6110 RFLP DNA patterns are used to reveal transmissions of M. tuberculosis, outbreaks, laboratory cross-contaminations, etc. To support the surveillance of M. tuberculosis in The Netherlands, each week a set of new isolates is fingerprinted and, after a quality check based on ERS Mt14323, a comparison to the database is carried out. The DNA patterns are then compared to the database containing all DNA patterns from M. tuberculosis strains. Reports with relevant patient, laboratory and clinical data are created for the new DNA patterns that are identical, or have a high similarity, to patterns that are already in the list. These reports are sent to public health organisations involved in the surveillance of tuberculosis in The Netherlands. Usually, both unique patterns, i.e., strains for which the IS6110 DNA patterns are not yet in the database (indicating a potential new clone), and DNA patterns that are identical to patterns already in the database, are added to the database after the checks are carried out with good results. Clusters of two or more strains get a unique cluster number.
68
Fig. 3.17. Result of clustering using the Pearson correlation coefficient with ISM. It is clear that even for patterns of the same strains (in this case, the ISM was in all lanes), a high tolerance has to be accepted. For this example, with a tolerance setting of 1%, the patterns cluster at about 75%. After adding the bands, this set of patterns clustered at 100% when the Jaccard or Dice coefficient was used with a position tolerance of 1% and optimisation disabled.
A.
The cluster similarity matrix
A dendrogram is a representation of the two-dimensional cluster similarity matrix D. When the database grows, the dendrogram grows accordingly and tends to become too complex. In the case of high similarities between DNA patterns, there is usually no problem as the DNA patterns will group in one branch of the dendrogram. However, in the case of lower similarity, DNA patterns will group in different branches. It is often difficult or impossible to recognise the resemblance between DNA patterns in different branches directly from the dendrogram. Theoretically, the dendrogram is built up via a step by step elimination of the similarity matrix. In the computer software, each elimination is recorded as a transformation of the matrix. At the end, the random pattern of SCs in the original similarity matrix is changed. Due to the algorithm, clusters of identical patterns (SC=I) tend to concentrate around the main diagonal. If two or more clusters are related, i.e., have similar but not identical DNA patterns, the program reflects this by shading the matrix in a different colour (Fig. 3.17). B.
Band-based similarity vs. Pearson correlation: pros and cons
Despite the normalisation phase, unwanted laboratory induced variability in the DNA patterns, caused by differences in gel staining, unequally distributed background noise, etc., will influence the similarity of the patterns. Even when patterns
69
Fig. 3.18. Result of clustering the same ISMs shown in Fig. 3.17 with the Dice similarity coefficient. The computer clusters all lanes at 100% similarity.
of identical strains are next to each other on the same gel, small differences will always influence the final clustering. When the fingerprinting technique results in banding patterns with a limited number of well-defined and distinguishable bands, and the theory behind the separation technique indicates that each band reflects a DNA fragment, band-based comparison is preferable. In the analysis phase, bands can be assigned to DNA fragments automatically or manually. Agreement between laboratories on such factors as whether bands are assigned to weakly visible fragments on a gel, and whether intense fragments are assigned more than one band position, is essential. Still, with the use of UPGMA and band-based comparison techniques, the loss of one band or the addition of an extra band does not have a dramatic effect. Thus, when applying the Dice SC in the case of a pattern with 10 bands being compared to a pattern where one band is missing, the Dice SC is SC=2"9/(9+ 10)=0.95! Even in a large database containing over thousands of patterns, these two patterns will group in a similar branch of the dendrogram (Figs. 3.17 and 3.18).
C.
Exchanging DNA patterns: the bundle concept
Once databases are built up in different laboratories, the easy availability and rapid exchangeability of DNA patterns and relevant other data fields is an important consideration for research activities such as inter-laboratory, epidemiological and
70
Fig. 3.19. The principles of a bundle. Typing results from different sources are grouped and can then be attached to an email or sent on a floppy disk or CD. The recipient laboratory can unpack or detach the bundle, add the results to the local database, and then apply a comparison or clustering.
population-based studies. In principle, if a fast typing technique is available, it is possible to set up "early warning" or "alert" systems, e.g., for surveillance of the transmission of specific phenotypes of an organism, such as multidrug-resistant (MDR) strains. The bundles concept, which is an open and well-documented standard developed by Applied Maths as part of the B ioNumerics program, is a very flexible tool to support these goals. From a user-defined selection of information fields (often DNA patterns, but also other experimental types or data such as genus, species information, etc.) from a database, a bundle can be constructed. In the case of DNA patterns, the bundle contains information about the reference system and the molecular size settings. In this way it is very easy to remap or convert bundles to be compatible with databases based on different standards for the same organism. Bundles can be sent on different types of data carriers to other researchers, either as an attachment to an email or on a floppy disk or CD-ROM. The recipient can unpack the bundles in a very easy way (Applied Maths, 1999), add them to the relevant database, and then compare them with other DNA patterns (Fig. 3.19). D.
A = B and B = C does not mean A = C
In the world of mathematics, the transitive property says that if A=B and B=C, then A=C. However, this property does not always hold true for DNA fingerprints. As can be seen in Fig. 3.20 (1) and (2), the sets of strains A+B and B+C cluster at 100%. However, strain A does not cluster 100% with strain C, as shown in Fig. 3.20 (3). Moreover, if a dendrogram of the three patterns is created, one
71
Fig. 3.20. The transitive property does not always apply to DNA fingerprints. In (1), strain a clusters 100% with strain b, and in (2), strain b clusters at 100% with strain c. However, in (3), strain c does not cluster with strain a. Further, if the ordering of the strains to be analysed is changed, the dendrogram may change, as shown in (4) and (5). pattern does not cluster with the other two. The reason for this is the way that U P G M A clustering takes place. First, pattern one and pattern two are clustered, and the result is one average pattern, This new average pattern does not cluster 100% with the third pattern. If the order in which the clustering is calculated is changed, e.g., if patterns B and C are clustered first, pattern A will not cluster at
72 100% with the cluster of B and C, as shown in Fig. 3.20 (4) and (5). This is a disadvantage of the UPGMA method that has to be accepted.
3.5
S E T T I N G UP INTRA- AND I N T E R - L A B O R A T O R Y DATABASES OF ELECTROPHORETIC PROFILES: RECOMMENDATIONS
As a first stage, the laboratory work must be standardised with respect to the restriction enzyme, probe and sometimes also the size markers, and the fingerprinting technique must fit the goals with respect to the discriminative power. After this complicated process, a start can be made on analysing the laboratory results from different laboratories. This section considers various factors influencing the reproducibility of databases, as well as methods to compensate for these sources of error. It is very difficult to give a general "rule of thumb" for each type of experiment and organism. The following steps are suggested as preparations before setting up a live inter-laboratory database (it is assumed that one laboratory serves as the reference laboratory, responsible for the distribution of strains and fingerprints within the study, and also the collection and analysis of results from the different laboratories): 1. Inter-laboratory reproducibility study of both the laboratory and the computer analysis phases: 9 Collect a representative set of strains; 9 Select a reference strain that fits the requirements; 9 Select an ISM that fits the requirements (if the laboratory technique allows the use of an ISM); 9 Distribute an identical set of strains to different laboratories for fingerprinting; 9 Collect and analyse the results of the laboratory work; 9 Create an identical set of relevant fingerprints for each laboratory; 9 Each laboratory processes the strains and the sets of fingerprints; 9 The results of the wet laboratory work (in most cases an autoradiogram) are returned to the reference laboratory; 9 The results of the analysis of the sets of fingerprints can be returned as bundles, as an attachment to an email, or on another data carrier; 9 Both sets of results are analysed at the reference laboratory. In this way it is possible to get a good estimate of the errors introduced during the wet laboratory work and the computer analysis separately for each laboratory. At this point it is possible to tune the processes as accurately as possible. This is a necessary step to ensure the reproducibility of the database. Then: 2. If it is desired to use different ISM and/or ERS, one of the ISM or ERS should be selected as the "gold standard" for the project: 9 Create a database with the band reference positions of the different ISM and ERS; 9 Supply each laboratory with the settings. 3. In many cases, additional information on the isolates, such as epidemiological,
73 clinical, laboratory and patient data, is necessary. Before the start of the project, it is very important to have a commonly accepted description of the different database fields. Within the CAonTB, the conversion of these fields required a disproportionate amount of effort. The use of internationally accepted coding systems, such as ISO codes for countries, national codes for city of origin, ICD-10 codes for disease symptoms, codes for susceptibility and resistance levels, date fields (date of birth, date of isolation: yy-mm-dd or dd-mm-yy?), gender (male, female, unknown or M, F, ?), etc., are all basic things that can help to minimise the work. 4. Database management should be organised. A central approach, in which one laboratory is responsible for the final submission of DNA patterns and additional data to the general database, has proved to be a necessary condition. Apart from the DNA patterns, it is also necessary to exchange the ERS data for validation purposes. 5. If band-based comparison and clustering are used, the methods used to analyse and cluster DNA patterns should be decided, i.e., either Dice or Jaccard. Longlasting discussions of the pros and cons should not be entered into as the final result will be the same with respect to clustering and branching of the dendrogram. Sometimes compromises are necessary. Databases with DNA patterns based on applying different ISM and/or ERS and other deviant settings may already exist locally before setting up an inter-laboratory database of DNA patterns. Under certain circumstances it is possible to convert or remap the DNA patterns from one standard to another (Applied Maths, 1999). 3.6
FINAL R E M A R K S
The guidelines and standards presented in this chapter are based on experience gained during a project lasting from 1993 until the present. Many laboratories and institutes world-wide have collaborated in this project. A database with over 10,000 M. tuberculosis DNA patterns is available for use by scientists, both from within and outside the project. It is impressive, and often surprising, to see how DNA patterns from newly isolated strains cluster exactly to strains already in the database. In many cases, this indicates a cross-border infection with a particular M. tuberculosis strain. Many examples that underline the power of the approach have been published. However, these guidelines and standards should not be interpreted as being universally applicable. Other organisms, different typing techniques, different laboratory circumstances, and many other factors, will determine the reproducibility of databases of DNA patterns. An inter-laboratory study as described in this chapter is therefore a necessary requirement. There is no general prescription on how to calculate or predict the different errors, misalignments, etc. As an example of the possibilities of an international database of DNA patterns,
74
TBase, which is frequently updated, is available via the internet. The URL is: http://www.caontb.rivm.nl. Further questions regarding this article or the setting up of inter-laboratory studies can be sent to
[email protected]. Information on B ioNumerics, GelCompar 2, and the bundle concept can be found at the homepage of Applied Maths: http://www.applied-maths.com. REFERENCES Alland, D., Kakut, G.E., Moss, A.R., McAdam, R.A., Hahn, J.A., Bosworth, W., Drucker, E. & Bloom, B.R. (1994). Transmission of tuberculosis in New York City. An analysis by DNA fingerprinting and conventional epidemiologic methods. New England Journal of Medicine 330, 1710-1716. Applied Maths (1999). BioNumerics manual. Applied Maths, Kortrijk, Belgium. Armitage, E & Berry, G. (1991). Statistical methods in medical research. B lackwell, Oxford. Beck-Sagu6, C., Dooley, S.W., Hutton, M.D., Otten, J., Breeden, A., Crawford, J.T., Pitchenik, A.E., Woodley, C., Cauthen, G. & Jarvis, W.R. (1992). Hospital outbreak of multidrug-resistant Mycobacterium tuberculosis infections. Factors in transmission to staff and HIV-infected patients. Journal of the American Medical Association 268, 1280-1286. Coronado, V.G., Beck-Sagu6, C.M., Hutton, M.D., Davis, B.J., Nicholas, E, Villareal, C., Woodley, C.L., Kilburn, J.O., Crawford, J.T., Frieden, T.R., Sinkowitz, R.L. & Jarvis, W.R. (1993). Transmission of multidrug resistant Mycobacterium tuberculosis among persons with human immunodeficiency virus infection in an urban hospital: epidemiologic and restriction fragment length polymorphism analysis. Journal of Infectious Diseases 168, 1052-1055. Edlin, B.R, Tokars, J.I., Grieco, M.H., Crawford, J.T., Williams, J., Sordillo, E.M., Ong, K.R., Kilburn, J.O., Dooley, S.W., Castro, K.G., Jarvis, W.R. & Holmberg, S.D. (1992). An outbreak of multidrug-resistant tuberculosis among hospitalized patients with the acquired immunodeficiency syndrome. New England Journal of Medicine 326, 1514-1521. Everitt, B. (1993). Cluster analysis. Arnold, London. Greifinger, R., Grabau, J., Quinlan, A., Loeder, A., DiFerdinando, G. & Morse, D.L. (1992). Transmission of multidrug-resistant tuberculosis among immunocompromised persons in a correctional system - New York, 1991. Morbidity and Mortality Weekly Report 41, 507-509. Friedman, C.R., Stoeckle, M.Y., Johnson, W.D. & Riley, L.W. (1995). Double repetitive element PCR method for subtyping Mycobacterium tuberculosis clinical isolates. Journal of Clinical Microbiology 33, 1064-1069. Haas, W.H., Butler, W.R., Woodley, C.L. & Crawford, J.T. (1993). Mixed-linker polymerase chain reaction: a new method for rapid fingerprinting of isolates of the Mycobacterium tuberculosis complex. Journal of Clinical Microbiology 31, 1293-1298. Heersma, H.E, Kremer, K. & van Embden, J.D.A. (1998). Computer analysis of IS6110 RFLP patterns of Mycobacterium tuberculosis. In Methods in molecular microbiology, vol. 101, mycobacteria protocols, Parish, T. & Stoker, N.G., eds, pp. 395-422. Humana Press, Totowa, NJ. Hermans, EW.M., Massadi, E, Guebrexabher, H., van Soolingen, D., de Haas, EE.W., Heersma, H., de Neeling, H., Ayoub, A., Portaels, E, Frommel, D., Zribi, M. & van Embden, J.D.A. (1995). Analysis of the population structure of Mycobacterium tuberculosis in Ethiopia, Tunisia, and the Netherlands: usefulness of DNA typing for global tuberculosis epidemiology. Journal of Infectious Diseases 171, 1504-1513. Kamerbeek, J., Schouls, L., Kolk, A., van Agterveld, M., van Soolingen, D., Kuijper, S., Bunschoten, A., Molhuizen, H., Shaw, R., Goyal, M. & van Embden, J. (1997). Rapid detection and simultaneous strain differentiation of Mycobacterium tuberculosis for diagnosis and tuberculosis control. Journal of Clinical Microbiology 35, 907-914. Kremer, K., van Soolingen, D., Frothingham, R., Haas, W.H., Hermans, EW.M., Martin, C., Palittapongarnpim, E, Plikaytis, B.B., Riley, L.W., Yakrus, M.A., Musser, J.M. & van Embden,
75 J.D.A. (1999). Comparison of methods based on different molecular epidemiological markers for typing of Mycobacterium tuberculosis complex strains: interlaboratory study of discriminatory power and reproducibility. Journal of Clinical Microbiology 37, 2607-2618. Norusis, M.J. (1993). SPSSfor Windows professional statistics. SPSS, Chicago. Otal, I., Samper, S., Asensio, M.P., Victoria, M.A., Rubio, M.C., G6mez-Lus, R. & Martin, C. (1997). Use of PCR method based on IS6110 polymorphism for typing Mycobacterium tuberculosis strains from BACTEC cultures. Journal of Clinical Microbiology 35, 273-277. Pfyffer, G.E., Str~issle, A., Rose, N., Wirth, R., Br~indli, O. & Shang, H. (1998). Tuberculosis in the Metropolitan Area of Zurich; a 3-year survey based on DNA fingerprinting. European Respiratory Journal 11, 804-808. Plikaytis, B.B., Crawford, J.T., Woodley, C.L., Butler, W.R., Eisenach, K.D., Cave, M.D. & Shinnick, T.M. (1993). Rapid, amplification-based fingerprinting of Mycobacterium tuberculosis. Journal of General Microbiology 139, 1537-1542. Romesburg, H.C. (1990). Cluster analysis for researchers. Krieger, Malabar, FL. Samper, S., Martin, C., Pinedo, A., Rivero, A., Blazquez, J., Baquero, E, van Soolingen, D. & van Embden, J.D.A. (1997) Transmission between HIV-infected patients of multidrug-resistant tuberculosis caused by Mycobacterium bovis. AIDS 11, 1237-1242. Small, P.M., Hopewell, P.C., Singh, S.P., Paz, A., Parsonnet, J., Ruston, D.C., Schecter, G.E, Daley, C.L. & Schoolnik, G.K. (1994). The epidemiology of tuberculosis in San Francisco. New England Journal of Medicine 330, 1703-1709. Van Deutekom, H., Gerritsen, J.J.J., Van Soolingen, D., Van Ameijden, E.J.C., Van Embden, J.D.A. & Coutinho, R.A. (1997). Molecular epidemiological approach to studying the transmission of tuberculosis in Amsterdam. Clinical Infectious Diseases 25, 1071-1077. Van Embden, J.D.A., Crawford, J.T., Dale, J.W., Gicquel, B., Hermans, P., McAdam, R., Shinnick, T. & Small, P.M. (1993). Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. Journal of Clinical Microbiology 31, 406-409. Van Soolingen, D., Hermans, P.W.M., de Haas, P.E.W. & van Embden, J.D.A. (1992). Insertion element IS1081-associated restriction fragment length polymorphism in Mycobacterium tuberculosis complex species: a reliable tool to recognize Mycobacterium bovis BCG. Journal of Clinical Microbiology 30, 1772-1777. Van Soolingen, D., de Haas, P.E.W., Hermans, P.W.M., Groenen, P. & van Embden, J.D.A. (1993). Comparison of various repetitive DNA elements as genetic markers for strain differentiation and epidemiology of Mycobacterium tuberculosis. Journal of Clinical Microbiology 31, 1987-1995. Van Soolingen, D., de Haas, P.E.W., Haagsma, J., Eger, T., Hermans, P.W.M., Ritacco, V., Alito, A. & Van Embden, J.D.A. (1994a). Use of various genetic markers in differentiation of Mycobacterium bovis strains from animals and humans and for studying epidemiology of bovine tuberculosis. Journal of Clinical Microbiology 32, 2425-2433. Van Soolingen, D.,, de Haas, P.E.W., Hermans, P.W.M. & van Embden, J.D.A. (1994b). DNA fingerprinting of Mycobacterium tuberculosis. Methods in Enzymology 235, 196-205. Van Soolingen, D., Borgdorff, M.W., de Haas, P.E.W., Sebek, M.M.G.G., Veen, J., Dessens, M., Kremer, K. & van Embden, J.D.A. (1999a). Molecular epidemiology of tuberculosis in The Netherlands: a nationwide study from 1993 through 1997. Journal of Infectious Diseases 180, 726-736. Van Soolingen D., de Haas, P.E.W. & Kremer, K. (1999b). RFLP typing of mycobacteria. RIVM B ilthoven, The Netherlands. Wiid, I.J.E, Werely, C., Beyers, N., Donald, E & van Helden, ED. (1994). Oligonucleotide (GTG)5 as a marker for Mycobacterium tuberculosis strain identification. Journal of Clinical Microbiology 32, 1318-1321. Yang, Z.H., de Haas, EE.W., Wachman, C.H., van Soolingen, D., van Embden, J.D.A. & Andersen, ,~.B. (1995). Molecular epidemiology of tuberculosis in Denmark in 1992. Journal of Clinical Microbiology 33, 2077-2081.
This Page Intentionally Left Blank
77
4
Fingerprinting of Microorganisms by Protein and Lipopolysaccharide SDSPAGE
Lenie D i j k s h o o m Department of Infectious Diseases, Leiden University Medical Center, PO Box 9600, 2300 RC Leiden, The Netherlands
CONTENTS 4.1
INTRODUCTION
4.2
P R O T E I N S A N D L I P O P O L Y S A C C H A R I D E S AS B A C T E R I A L C E L L COMPONENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 A. The bacterial cell 79 B. Proteins in the bacterial cell 79 C. Lipopolysaccharides in the Gram-negative cell envelope 79
4.3
P R O T E I N S D S - P A G E - AN OVERVIEW A. Introduction B. Polyacrylamide gel electrophoresis (PAGE) C. PAGE in the presence of SDS D. Staining E. Computer-assisted protein profile analysis
4.4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
................
P R O T E I N S D S - P A G E IN P R A C T I C E . . . . . . . . . . . . . . . . . . A. Fractions for investigation B. Sample preparation (i) Preparation of whole-cell lysates (ii) Preparation of cell envelopes and outer membrane fractions C. Electrophoresis (i) Reagents and equipment (ii) Preparation of gels (iii) Sample application and electrophoretic separation (iv) Running conditions D~ Fixation and staining E. Storage of gels E Computer-assisted profile analysis (i) Sample and reference distribution (ii) Computer-assisted profile analysis G. Quality control and assessment (i) Determining the influence of culture conditions on the profiles of individual strains (ii) Testing the intra-strain stability of protein profiles (iii) Standardisation of protein electrophoresis and its analysis (iv) Control of the procedure (v) Inter-laboratory variation
02001 Elsevier Science B.V. All rights reserved.
78
81 81 81 83 83 84 84 84 84 85 86 87 87 88 88 89 89 90 90 90 92 92 93 93 94 94 95
78
4.5
4.6
H. Applicationsof protein electrophoresis of microorganisms (i) Taxonomicapplications (ii) Epidemiologicaltyping
95 95 96
LIPOPOLYSACCHARIDE ANALYSIS . . . . . . . . . . . . . . . . . A. Bacterialfractions for analysis B. Preparationof whole-cell lysates or cell envelopes followed by proteinase K digestion C. Electrophoreticseparation D. Silver-staining E. Applications
98 98 99 100 100 101
CONCLUSIONS
101
............................
ACKNOWLEDGEMENTS REFERENCES
4.1
..........................
................................
102 103
INTRODUCTION
Characterisation of microorganisms on the basis of their protein electrophoretic profiles and, to a lesser extent, their lipopolysaccharide (LPS) electrophoretic profiles, has been performed in numerous studies since the 1970s. In the beginning, the primary aim of most studies was to investigate the chemical composition of cells or cell fragments, but it was soon realised that the profiles obtained could also be used to compare large numbers of strains for their relatedness. The two approaches used most commonly for comparative protein analysis of many strains are sodium dodecylsulphate-polyacrylamide gel electrophoresis (SDS-PAGE) and multilocus enzyme electrophoresis (MLEE). In particular, one-dimensional protein SDS-PAGE, once set-up and standardised, is a robust method that allows the construction of large databases of strains from numerous bacterial species. The same technology has also been found useful for typing microorganisms on the basis of their LPS composition. Since the mid-1980s, nucleic acid-based technologies have advanced rapidly and a variety of DNA fingerprinting methods have now largely superseded protein profiling for typing bacterial strains. Even MLEE, which was for long the gold-standard method in population biology, is currently being replaced by a DNA sequence-based method, multilocus sequence typing (MLST; see Chapter 12). However, one-dimensional protein SDS-PAGE remains a powerful method for structural studies and for typing and classification of microorganisms. Similarly, LPS analysis is still an important method when screening organisms for their LPS phenotype, and is also still used to type organisms. The focus of the present chapter is on the comparative analysis of microorganisms on the basis of their protein and LPS profile as observed by SDS-PAGE. Further information on MLEE can be
79 found in Chapter 12. 4.2
PROTEINS AND LIPOPOLYSACCHARIDES AS BACTERIAL CELL COMPONENTS
A.
The bacterial cell
A general outline of bacterial cell composition can be found in major text books on microbiology (e.g., Brock et al., 1994). A characteristic feature of the bacterial cell that distinguishes it from eukaryotic cells is the complex of layers surrounding the cytoplasm. This complex of layers, denoted the bacterial cell envelope, differs significantly between Gram-negative and Gram-positive bacteria. In Gramnegative bacteria, the basic structure comprises the cell (cytoplasmic) membrane, the periplasmic space which also includes the peptidoglycan layer, the outer membrane and, depending on the strain or species, capsule, slime and appendages such as flagellae or pili. The Gram-positive cell envelope contains a cell membrane surrounded by a complex matrix, including a dense three-dimensional peptidoglycan network interwoven with long, flexible chains of teichoic or teichuronic acids; there is no outer membrane, but capsule or slime may be present. The precise locations of many proteins in the Gram-positive cell wall are unknown; some proteins may be embedded in the matrix, whereas others, including protein A of Staphylococcus aureus are probably located at the periphery. An extensive overview of the cell surface of bacteria in general can be found in Hammond et al. (1984), while the architecture and function of the outer membrane of Escherichia coli and other Gram-negative bacteria has been reviewed by Lugtenberg & Van Alphen (1983). B.
Proteins in the bacterial cell
Proteins, being a major constituent of any living cell, occur both in the bacterial cytoplasm and in the cell envelope, and serve many functions, e.g., as catalysts, as structural components, as vehicles in transport, as organs of movement (flagellae), or as protective agents. Analytical studies using polyacrylamide gel electrophoresis (PAGE), with and without denaturing agents, have contributed considerably to the knowledge of certain proteins in the outer cell membrane of Gram-negative organisms and their biological function (see e.g., Lugtenberg, 1981; Lugtenberg & Van Alphen, 1983; Koebnik et al., 2000). In comparative protein analysis for epidemiological or taxonomic purposes, microorganisms are grouped according to the protein profiles displayed by the use of SDS-PAGE, usually without any knowledge of the functions of the different proteins composing the profiles. C.
Lipopolysaccharides in the Gram-negative cell envelope
LPS is a major constituent of the external layer of the outer membrane in Gramnegative bacteria, with numerous biological functions (Rietschel & Brade, 1992;
80 Holst et al., 1996; Jacques, 1996). LPS is composed of a hydrophylic heteropolysaccharide portion and a covalently bound lipid component. The latter fragment, named lipid A, anchors LPS into the bacterial outer membrane. In many Gramnegative bacteria, the polysaccharide part comprises a 'core fragment' and a more distally located O-specific polysaccharide (O-antigen, O-chain). These bacteria have smooth (S) colonies, and hence this LPS structure is termed S-form LPS. Mutant bacteria that cannot synthesise the O-antigen grow with rough (R) colonies, and their LPS is termed R-form LPS. LPS without O-antigen is also found in bacteria of the genera Neisseria, Haemophilus, Bordetella and Branhamella (Preston et al., 1996). It has long been practice to subdivide the core according to its carbohydrate composition into the so-called inner core and outer core, which are connected to the lipid A moiety and to the O-polysaccharide moiety, respectively. However, subdivision of the core on the basis of the carbohydrate differences does not hold for all Gram-negative bacteria. Overall, the composition of the core is relatively conserved among enterobacterial species. The O-antigen is a carbohydrate polymer comprising up to several tens of oligosaccharides, termed 'repeating units'. The composition of sugars in the O-antigen shows a great variability among bacterial genera and species, which accounts for its usefulness as a taxonomic and epidemiological marker. Exploitation of this variability by the use of antisera specific for the different LPS structures was already initiated in the early 1900s, i.e., long before the chemical structure of LPS was elucidated, and has led to comprehensive serotyping schemes for Salmonella spp. and E. coli. To date, a repertoire of techniques is available for the precise chemical characterisation of LPS, including NMR spectroscopy, gas liquid chromatography (GLC), GLC-mass spectrometry, and serology (Vinogradov et al., 1997; Pantophlet et al., 1999a, b). Screening of Gram-negative bacteria for their LPS phenotype can be done by SDS-PAGE of cell extracts treated with proteinase K, combined with silver-staining of the gels. By this approach, O-polysaccharide chains in S-form LPs are generally displayed as ladder-like banding patterns. The successive steps of the ladder represent molecules that differ by one repeating unit. R-form LPS is smaller than S-form LPS, and migrates relatively fast through the gel; it can be observed following silver-staining as one or more dark spots in the lower part of the gels. SDSPAGE of cell extracts treated with proteinase K, combined with silver-staining of the gels, can be used for rapid screening of organisms for their LPS composition. The S-form and R-form LPS electrophoretic silver-stained profiles, and immunoblots of these profiles from many Gram-negative organisms, have been explored to determine any possible association of these LPS molecules with virulence. These characters have also been used to compare strains for taxonomic or epidemiological purposes (Aucken & Pitt, 1993; Pantophlet et al., 1999a, b).
81 4.3
PROTEIN S D S - P A G E - AN OVERVIEW
A.
Introduction
Molecules that contain ionisable groups, including proteins, dissociate in solution to form positively and negatively charged species that migrate in an electric field to the cathode (-) or anode (+), respectively. The velocity by which charged molecules can migrate in an electric field is determined by three properties of the molecules, namely charge, size and shape. These properties are exploited to separate mixtures of various ionisable components into different fractions during electrophoresis. A pioneer in the field was the Swedish scientist Tiselius who, in the 1930s, separated mixtures of proteins by free electrophoresis. In this approach, proteins move together in free solution without a supporting medium. Under the conditions used, the charge properties of the proteins did not vary considerably, and similar molecules tended to move close together, thus forming bands with boundaries between molecules differing in electrophoretic mobility m (where m is defined as the distance d travelled in time t by a particle under the influence of the potential gradient E; i.e., m=d/tE). This approach has been replaced over the course of time by zone electrophoresis, in which electrophoretic separation is achieved on a supporting medium such as paper, agarose, cellulose acetate or polyacrylamide, and the process of separation is continued until the components are separated into discrete zones.
B.
Polyacrylamide gel electrophoresis (PAGE)
Detailed discussions of the principles and practical performance of PAGE can be found in Gordon (1975) and Andrews (1986). Polyacrylamide is the supporting medium of choice in comparative electrophoretic typing of microorganisms on the basis of protein and LPS content. Gels of polyacrylamide are, like those of agarose and starch, composed of inter-twined molecular chains, thus creating a sieve-like structure in which the movement of large molecules is increasingly hindered by decreasing the pore size of the gel. The gels are formed by polymerisation of acrylamide with the cross-linking agent N,N'-methylenebisacrylamide (bisacrylamide, Bis) following addition of ammonium persulphate as a catalyst and N,N,N',N'tetramethylethylenediamine (TEMED) as an initiator. The polymerisation results in random chains of polyacrylamide that incorporate a small proportion of Bis molecules, which in turn react with other groups in other chains. As a result, a three-dimensional structure is formed. The composition of polyacrylamide gels is generally expressed in terms of T (the total acrylamide and Bis monomer concentration in g/100 ml) and C (the percentage of cross-linker in the total amount of acrylamide). Manageable gels can be prepared if T values range between 3 % and 30%, while C values within this range should conform to C = 6.5-0.3 T in order to maintain gel elasticity and prevent
82 Table 4.1. Composition of buffers and gels (Laemmli, 1970; Lugtenberg et al., 1975) Buffer/gel
Composition
Upper and lower reservoir buffer
0.025 M Tris, 0.192 M glycine, 0.1% SDS pH 8.3
Stacking gel buffer
0.125 M Tris adjusted to pH 6.8 with HC1, 0.1% SDS pH 6.8
Separation gel buffer Stacking gel
0.375 M Tris adjusted to pH 8.8 with HC1, 0.1% SDS Laemmli T=3.08%, C=2.6% 3 g acrylamide; 0.080 g Bis; buffer to 100 ml; 0.025 ml TEMED; 0.025 g ammonium persulphate pH 6.8 Lugtenberg T=3.08%, C=2.6% 3 g acrylamide; 0.080 g Bis; 0.1% SDS w/v; 0.025 g ammonium persulphate; buffer to 100 ml; 0.1 ml TEMED pH 6.8
Separation gel
Laemmli T=3.08%, C=2.6% 3 g acrylamide; 0.080 g Bis; buffer to 100 ml; 0.025 ml TEMED; 0.025 g ammonium persulphate pH 6.8 Lugtenberg T=3.08%, C=2.6% 3 g acrylamide; 0.080 g Bis; 0.1% SDS w/v; 0.025 g ammonium persulphate; buffer to 100 ml; 0.1 ml TEMED pH 6.8
shrinking or swelling during the polymerisation process. By varying T, the sieving size is altered and the gel can be designed to give optimum performance for separating the proteins being studied. PAGE is usually applied as disc electrophoresis, in which different buffers are used in the different parts of the gel column (the term 'disc' as used here is an abbreviation of discontinuous, and refers to the different buffer systems, although some books use the term to denote acrylamide electrophoresis in glass tubes which results in disk-shaped zones of proteins). In disc electrophoresis, the sample mixture is subjected to electrophoresis in a gel system which is composed of two sections, i.e., the concentration, stacking or spacer gel, and the separation gel. The stacking gel has a high porosity and, in the first step of electrophoresis, sample components are concentrated in a very thin layer before migration into the separation gel. This results from the effects of an isotachophoresis system created by the buffer in the stacking gel. The ion components of the stacking and separation gels are the same, but the pH of the buffers, usually containing Tris-HC1, differs. Thus, the frequently used system of Laemmli (1970) contains a 0.125 M Tris pH 6.8 stacking gel buffer, and a 0.375 M Tris pH 8.8 separation gel buffer (Table 4.1). Proteins migrate through the stacking gel at a rate approaching that in free solution, and the components are stacked into a very thin starting zone before they enter the separation gel. Next, in the separation gel, resolution of molecules is achieved by the difference in electrophoretic mobility within the constraints of molecular sieving. A detailed explanation of this process is given in Williams &
83 Wilson (1981). Initially, polyacrylamide gels were prepared in glass tubes, and one protein sample could be separated per tube. These systems were replaced rapidly by slab gels in which multiple samples can be investigated in adjoining lanes. C.
PAGE in the presence of SDS
With SDS-PAGE, as used for the comparative protein analysis of microbial strains (see section 4.4), the extracts are first solubilised in a buffer with the strongly denaturing agent sodium dodecyl sulphate (SDS) and the reducing agent [3-mercaptoethanol or dithiothreitol. Depending on the purpose, urea can also be included in the buffer as a denaturing agent. SDS, and eventually urea, are also added to the electrophoretic chamber buffer and to the stacking and separation buffers. By the action of the denaturing and reducing agents, disulphide linkages are broken and the protein dissociates into subunits (if present), while the polypeptide chains become completely unfolded to form a rod-like SDS-polypeptide complex. In aqueous medium, the hydrocarbon chains of SDS are associated with the polypeptide chain, while the exposed sulphate groups are in the dissociated negative state. These SDS-polypeptide complexes have a constant ratio of SDS to protein. Due to unfolding and the high concentration of SDS (c. 1.4 g/g protein), the original shape and electrophoretic charge of the protein no longer play a determining role in electrophoresis. Hence, differences in the migration of proteins in SDSPAGE are based on differences in mass of the SDS-protein particle, related to the molecular sieving effect of the separation gel. It has been shown that, provided the pore sizes are small enough, the observed mobility can be very nearly linear to the log~0 molecular size of a protein. Thus, molecular sizes of unknown proteins can be estimated by SDS-PAGE if a specimen with marker proteins of known molecular size is included in one of the lanes of the gel. However, results must be interpreted cautiously since some amino acid substitutions may severely affect the migration rate of a protein (Lugtenberg & Van Alphen, 1983). D.
Staining
Once electrophoresis has been performed, the protein bands in the gel can be fixed and stained. For this purpose, the gel is immersed in fixation/staining solution. Excess stain is removed by immersing the gel in a destaining solution and, as a result, the protein bands become visible. Coomassie Blue R250 is one of the most common dyes used in comparative protein electrophoresis of microbial specimens, particularly for whole-cell preparations (see section 4.4.D). Fast Green FCF has been used in a number of studies of Gram-negative cell envelopes. Silver-staining has a higher sensitivity than Coomassie and Fast Green staining, and can therefore be used to detect minor protein constituents. Kits with reagents and automatic staining systems have been developed to reduce the workload of this method.
84
E.
Computer-assisted protein profile analysis
Protein profiles of microorganisms are relatively complex, with at least several tens of bands, which makes visual comparative analysis difficult. This was realised in the 1970s by Kersters & De Ley (1975), who consequently pioneered computerassisted analysis of protein profiles. Their approach has been refined extensively over the course of time, and several different software packages (described elsewhere in this book) are now available for analysis of complex electrophoretic profiles, including DNA profiles. The basic principles and technical details of computer-assisted electrophoretic profile analysis are outlined in Chapters 2 and 3; some specific details for protein analysis are given in section 4.4.E 4.4
PROTEIN SDS-PAGE IN PRACTICE
A.
Fractions for investigation
The protein fractions used for comparative analysis of bacteria by SDS-PAGE belong to two major categories; i.e., whole-cell preparations, and cell envelopes or outer membrane preparations. Whole-cell preparations are the fractions used most commonly for protein profiling, both of Gram-positive and Gram-negative bacteria (Pot et al., 1994). The preparations can be obtained easily by boiling cells in a reaction mix which causes lysis of cells, as well as denaturation and solubilisation of cell proteins; however, additional treatment, such as lysozyme or ultrasonic treatment, may sometimes be required. Whole-cell protein electrophoretic profiles are characterised by a large number of protein bands. If Gram-negative bacteria are the subject of study, it can be advantageous to use cell envelopes or outer membrane preparations, since the profiles of these fractions comprise relatively few, densely stained bands and are more easy to interpret by eye as compared to whole-cell protein profiles. Methods for preparation of cell envelopes and outer membrane preparations have been reviewed by Lugtenberg & Van Alphen (1983). Cell envelopes can be obtained by ultrasonic disruption of cells combined with fractionated centrifugation, as described by Lugtenberg et al. (1975). Dijkshoorn et al. (1987a, b) used a slight modification of this procedure involving bench-top laboratory centrifugation rather than ultracentrifugation. Outer membrane fractions for comparative protein analysis are prepared most commonly by selective solubilisation of cell envelopes with detergents. However, the profiles of cell envelopes are, to a large extent, determined by the composition of the outer membrane proteins (Lugtenberg et al., 1975); hence, in comparative analysis, it may suffice to use cell envelopes rather than outer membranes.
B.
Sample preparation
Sample preparation is preceded by cultivation of the bacteria being studied, but culture conditions may differ depending on the precise organisms being investi-
85 Table 4.2. Sample buffer composition (Laemmli, 1970)
Tris-HC1 SDS Glycerol Bromophenol blue [3-Mercaptoethanol
0.0625 M, pH 6.8 2% (w/v) 10% (w/v) 0.001% (w/v) 5% (v/v)
gated. Once it has been established which medium and conditions will be used, care should be taken to adhere strictly to these conditions in order to obtain reproducible results.
(i) Preparation of whole-cell lysates The modified method described here has been found useful for Acinetobacter (Dijkshoorn et al., 1990) and E. coli (personal unpublished results). Bacterial stocks in glycerol broth at-80~ are subcultured on to blood agar and incubated overnight at the most suitable temperature for the organism being studied (e.g., 30~ for Acinetobacter or 37~ for E. coli). A subculture in 5 ml trypticase soya broth from four morphologically indistinguishable colonies is incubated overnight. From this culture, 20 drops are spread equally on to a trypticase soya agar plate which is incubated for 24 h. Next, cells are harvested by flooding the plate with several ml of phosphate buffered saline (PBS) and scraping off the cells. The cells are resuspended in PBS to a final volume of <30 ml, depending on the organism. The resulting bacterial suspension is centrifuged for 20 min at 10,000 g. The supernatant is discarded and the cells are first washed by resuspending them in <30 ml PBS, and then harvested by centrifuging for 10 min at 10,000 g. This washing and centrifugation step is repeated at least once. The supernatant is discarded after the final centrifugation step and quantitities of 100 mg wet weight of protein are dissolved in 2 ml of sample buffer (Table 4.2). The suspensions are then boiled for 10 min, followed by centrifugation for 10 min at 7000 g and 4~ to sediment large cell fragments and cells that were not lysed. The supernates are then ready for use, or can be stored at-20~ for short periods or at-80~ for prolonged periods. If stored, they should be heated again for 5 min at 100~ before electrophoresis. Some modifications of the procedure may be required for organisms that produce large amounts of polysaccharides or that are difficult to lyse, which may be the case for Gram-positive bacteria. Sedimentation of polysaccharide-producing organisms requires centrifugation for longer periods or at higher speed. The polysaccharides may interfere with the electrophoretic separation of proteins, and then have to be removed by repeated cycles of washing and centrifugation of cells (see above), sometimes preceded by mechanical disruption using rotating blades (Ultraturrax; Jahnke & Kunkel, Staufen, Germany). For Gram-positive bacteria, additional treatment before solubilisation in buffer may be required, including ultrasonic disruption or use of lysozyme to digest the peptidoglycan layer. Ultrasonic disruption of washed cells (see above) is done while cooling the cells on ice
86 in order to reduce enzymatic degradation of proteins. The duration and magnitude of the ultrasonic treatments should be optimised for the microorganism being studied. For lysozyme treatment, cells can be washed three times with PBS containing 0.05 M EDTA, and aliquots of 50 mg of washed cells in 0.9 ml buffer are then incubated with 0.2 mg lysozyme dissolved in 20 Ill 0.064 M Tris-HC1, pH 7.
(ii) Preparation of cell envelopes and outer membrane fractions The following procedure is essentially based on the method described by Lugtenberg et al. (1975) for E. coli, and has been applied in several epidemiological and taxonomic studies with Acinetobacter spp. (e.g., Dijkshoorn et al., 1987a, b; 1989; 1990). An overnight broth culture (e.g., Nutrient Broth No.2, CM67; Oxoid, Basingstoke, U.K.), usually 40 ml, which has been incubated with vigorous shaking at 30~ or 37~ for Acinetobacter or E. coli, respectively, is centrifuged in order to concentrate the cells. All further steps are performed at 0-4~ The pelleted cells are resuspended in 5 ml of 50 mM Tris-HC1 pH 7.8, 2 mM EDTA pH 8.5 (for slime-producing organisms, it is recommended that the pelleted cells should first be washed several times with chilled buffer, followed by centrifugation to remove the slime). The 5-ml cell suspensions are then sonicated, with cooling, with up to six pulses of 20 s, with interruptions of 10 s between pulses, at an amplitude of 20 lam until the suspension turns from turbid to opaque. The sonicate is then centrifuged for 20 min at 2700 g to concentrate unbroken cells and large membrane vesicles. The supernatant, containing cell debris and small membrane vesicles (i.e., the cell envelope-enriched fraction), is then centrifuged for 60 min at 12,300 g, after which the pellet containing the cell envelopes is resuspended in 100-200 ~1 of 2 mM Tris-HC1 buffer, pH 7.8. This suspension is ready for use or can be stored at-20~ Outer membrane fractions can be prepared from cell envelopes by selective solubilisation of cytoplasmic membranes with 0.5% (v/v) sodium lauryl sarcosinate (Sarkosyl) in the absence of magnesium ions (Filip et al., 1973). It is important that the protein content of the preparations applied to the gel is standardised in order to allow comparison of the patterns. In addition, differences in protein concentration in adjoining lanes may lead to distortions in the electrophoretic patterns. Therefore, before solubilisation of the cell envelope proteins in sample treatment buffer, the protein content of each cell envelope suspension should be determined. This can be done by using a commercial micromethod; e.g., the Bradford method with bovine serum albumin for calibration (Bio-Rad, Hemel Hempstead, UK). Protein samples are then prepared in sample buffer (Table 4.2) to a final protein concentration of 0.5 mg/ml. The mixtures are heated for 5 min in a boiling water bath, during which time the proteins unfold and bind SDS. The preparations are then ready for use or can be stored at-20~ Samples can be reused several times, but should be heated each time immediately before application to the gel to ensure complete unfolding of the polypeptides. It has been noted that the profiles of some organisms may differ, depending on the duration of the heating and the precise temperature used, because of the pres-
87 ence of proteins that are 'heat modifiable' or resistant to denaturing agents (Hancock & Carey, 1979; Armstrong & Parker, 1986). Taking this into account, it is recommended that the optimal conditions for comparative analysis of protein profiles should first be established, and that the conditions chosen should then be strictly adhered to.
C.
Electrophoresis
(i) Reagents and equipment Acrylamide and B is as monomers are strongly neurotoxic agents. Skin contact and inhalation should be avoided. Once in the polymerised state, polyacrylamide gel is harmless and old stock remaining should be polymerised before discarding. The polymerisation process is an endothermic process, and polymerisation of old stocks should be performed in a fume cupboard while taking safety rules into account.
A discontinuous acrylamide gel system with a concentrating (stacking) gel and a separation gel is normally used for comparative protein fingerprinting. An overview of the composition of the buffers and gels used by Laemmli (1970) and by Lugtenberg et al. (1975) is shown in Table 4.1. The purity of the SDS used seems to be an important factor in the resolving capacity of the electrophoresis system. Lugtenberg et al. (1975) found that major outer membrane proteins could be separated with SDS contaminated with C 10 and C14 derivatives, but not with more purified SDS unless longer gels were used. Many of the solutions required for electrophoresis, as well as ready-made gels, are now commercially available, but they can also be prepared locally and according to one's own purpose. In general, buffers can be prepared in large quantities and portions stored a t - 2 0 ~ for prolonged periods. It is recommended to store all solutions that are in daily use at 4~ Concentrated stock solutions of acrylamide and Bis should not be kept for more than two weeks, since hydrolysis to acrylic acid occurs during storage. This leads to a decrease in pH which causes reduced migration of proteins. Ammonium persulphate is very hygroscopic, and stocks of powder should be stored at 4~ to prevent hydration. Small quantities of this chemical can be weighed and stored in small containers in a desiccator at 4~ Directly before use, demineralised water is added to obtain the required final concentration. SDS solutions solidify when stored at 4~ and have to be heated gently to return them to a liquid state before use. Several different types of electrophoresis apparatus are available commercially. Mini-systems with gels of approximately 8 • 7 cm (e.g., Mini Protean II; B io-Rad) are quick and relatively easy to handle, but they are not generally suited for investigation of complex bacterial profiles, particularly when constructing libraries of these profiles, since they do not provide sufficient resolution. Larger systems, e.g., the Protean II system (Bio-Rad), with 16 • 16 cm slab gels are preferable, and are used widely for comparative protein and LPS typing of microorganisms.
88
(ii) Preparation of gels Polyacrylamide gels are usually prepared immediately before use, although commercially prepared gels are currently available. When gels are prepared, prior degassing is required to remove oxygen which would otherwise hinder polymerisation. The procedure described here is based on use of the Protean II system, and adaptations may be required if other systems are used. After each use, the glass plates should always be cleaned carefully. Before use they should be inspected for traces of grease or dust and, if appropriate, cleaned with tissue paper moistened with 96% ethanol and/or demineralised water. The plates are assembled between clamps and the risk of leakage is reduced by placing paper tissues and a strip of parafilm on the rubber strip beneath the mould. Once the mould has been assembled, the front plate is marked with a pencil to indicate a level that is 1 cm below the eventual bottom of the comb. The ingredients of the separation gel mixture (Table 4.1), except for SDS and TEMED, are mixed together. The mixture is degassed for 15 min to remove dissolved oxygen which inhibits polymerisation. Next, SDS solution and TEMED are added, mixed gently, and the mixture transferred immediately to the gel mould using a pipette, placed at one comer of the mould, up to the mark 1 cm below the comb position. A small volume (e.g., 200 ~tl) of isobutanol is then applied carefully to the surface of the gel, using a syringe with a long needle, to ensure that the gel sets with an even edge and that oxygen is excluded. The top side of the cassette is closed with Parafilm| to prevent evaporation, and polymerisation is allowed to take place over a fixed, standardised period to reach a plateau. It is convenient to prepare the separation gel in the afternoon and to adhere to a fixed polymerisation time of (e.g.) 16 h. Polymerisation of the separation gel should be performed overnight to allow the reaction level to reach a defined plateau. Polymerisation of the stacking gel is less critical, and periods of c. 1 h are acceptable. On the second day, the ingredients of the stacking gel are mixed, except for SDS and TEMED, and the mixture is degassed. The isobutanol layer on the separation gel is removed with a syringe and needle, and the surface is rinsed several times with distilled water. A small layer of water is left on the gel. SDS and TEMED are then added to the other stacking gel components and mixed gently. The remaining water is removed from the separation gel with a syringe, and the surface of the gel is rinsed once with the stacking gel reaction mixture. A Teflon comb is placed between the glass plates, tilted at a 15 ~ angle. The stacking gel mixture is added to the mould and, when it contacts the teeth, the comb is pushed in gently, allowing air bubbles to escape. Polymerisation should again be for a fixed period, e.g., 1 h. During this period, buffer for the upper compartment can be prepared and samples can be boiled.
(iii) Sample application and electrophoretic separation Before electrophoresis, the sites of the wells are marked on the glass plates with a pencil. The comb is removed carefully without damaging the wells. Application of several ml of electrophoresis buffer on top of the comb may help its smooth
89 removal. The wells are rinsed carefully three times with a few ml of upper compartment electrophoresis buffer. The glass plates containing the gel are then fixed to the cooling core, and buffer is added to the upper reservoir to a level of a few mm above the top of the glass plates. This enables any leakage from the upper compartment to be detected. Protein extracts should be boiled for at least 5 min immediately before application to the wells of the gel. A 20-well comb is normal, and 30 ~1 of extract containing 15 ~tg of protein is added to each well, preferably with a Hamilton microsytinge with a long needle rather than with a micropipette. This reduces the risk of cross-contamination into neighbouring wells. Gel loading is normally performed in a fume cupboard because of the [3-mercaptoethanol. Once the samples have been loaded, the cooling core with two gels, or one gel and a dummy plate, can be placed in the buffer tank. The upper reservoir is filled to a maximum with freshly prepared buffer. Buffer in the main tank can be used for at least one month, and can be topped-up with demineralised water during use. Air bubbles below the glass plates can be removed by vigorous stirring with a 10-ml pipette or glass rod. The cooling core is connected to the tap water system, and the buffer contained in the main tank is continually mixed with a magnetic stirring system.
(iv) Running conditions For reasons of reproducibility, it is important that the running conditions (i.e., the temperature of the buffer, current and duration of the run) are as standardised as possible. Recommended electrophoresis conditions are 20 mA constant current for each gel (i.e., 40 mA for two gels run together), with electrophoresis continued until the bromophenol blue dye front is 1 cm above the bottom of the glass plate. Gels can also be run overnight at 6.0 mA/gel. It is advisable to monitor the current, voltage and temperature of the buffer at regular intervals. Deviations of voltage from the start of the run, or deviations in run times, may indicate that one of the components of the gel system, usually one of the buffers, has not been prepared correctly.
D.
Fixation and staining
After the electrophoresis run is complete, the gels are immediately soaked in a fixing solution to avoid diffusion of proteins, and are then immersed in a staining solution. The fixing agent can also be included in the dye mixture so that fixing and staining are achieved in one step. Once fixing and staining have taken place, excess stain is removed by immersing the gel in a destaining solution, after which the protein bands can be visualised. It is recommended that the whole procedure should be standardised, including preparation of solutions, volumes used, and the duration and temperatures of staining and destaining. For staining with Coomassie Blue, the gel is first immersed in 3% trichloroacetic acid for 15 min with shaking. The fixation solution is then discarded
90 and fresh Coomassie Blue 0.15% (w/v) dissolved in 30:10 (v/v) methanol:acetic acid is added to the gel, which is stained for 1 h at room temperature with shaking. Excess stain is removed by replacing the staining solution with 25:10 (v/v) methanol:acetic acid and shaking. This step is repeated at least three times with fresh destaining solution. Staining with Fast Green FCF has been used in studies with E. coli and Acinetobacter spp. (Lugtenberg et al., 1976; Dijkshoorn et al., 1990). The procedure involves fixation and staining simultaneously for 24 h at room temperature with shaking in 0.1% (w/v) Fast Green FCF dissolved in 50:10 (v/v) methanol:acetic acid, followed by destaining in 45:10 methanol:acetic acid. The final staining level of the gel can be standardised by choosing an acceptable level of stain (e.g., 0.0002% (w/v) Fast Green FCF in 45:10 (v/v) methanol:acetic acid), and equilibrating gels in this 'after-staining' solution before photographs or digital images are made. A sample of the 'after-staining' solution is kept as a control for a visual or photometric check during destaining. The dye in the 'after-staining' solution is not only important for standardisation of the staining level, but may also prevent certain bands losing dye during storage (Wilson, 1979).
E.
Storage of gels
Gels can be stored for weeks or months in 'after-staining' solution. During such periods, the methanol may evaporate, which will cause swelling of the gels. This process is reversible, and the gels may regain their original size and aspect when they are re-equilibrated in 'after-staining' solution. For long-term preservation it is more practical to dry the gels. There are several systems available for drying gels, including conventional systems which dry gels with heating and/or vacuum. A more simple and convenient system is based on drying the gels in air with heating. For this purpose, gels are placed in a framework between sheets of cellophane and dried in a heated chamber. Once in the dry state, the gels can be stored easily, and subsequently reinspected or scanned at any time.
F.
Computer-assisted profile analysis
(i)
Sample and reference distribution
A prerequisite for successful pattern analysis is the preparation of high quality gels with a reference extract for normalisation in three or four lanes distributed equally over the gel. An example of a protein SDS-PAGE gel with 20 lanes is shown in Fig. 4.1. Proteins in the two outer lanes usually show a lower migration speed than proteins in other lanes, a phenomenon termed 'smiling'. These distortions may complicate pattern analysis, and therefore the outer lanes should not be used for test or reference extracts. The smiling effect can be reduced by running sample treatment buffer, of the same composition used for the other protein samples, in lanes 1 and 20 (see Fig. 4.1), or by loading a protein extract which is not part of the analysis. A reference extract for normalisation is included in lanes 3, 8, 13 and
91
Fig. 4.1. Example of SDS-PAGE cell envelope profiles of Acinetobacter strains. Samples and reference extracts have been applied to the gel to conform with the requirements for computer-assisted gel analysis. Lanes 1 and 20, sample treatment buffer without protein (included on the gel in order to reduce smiling); R, reference extracts distributed equally for normalisation purposes; C, freshly prepared extract of the same strain used for normalisation (used as a control for the sample preparation procedure); X, sample extract with bands outside the reference area, and therefore not included in the analysis.
18 of the gel. This reference extract is selected on the basis that its profile contains multiple clear and equally distributed protein bands in the same size range as those of the unknown organisms being studied. A large stock of this reference extract in sample treatment buffer (Table 4.2) is prepared; this can be stored a t - 8 0 ~ for prolonged periods, or a t - 2 0 ~ for shorter periods. One lane (lane 4) contains another preparation from the reference strain, but this extract is freshly prepared at the same time as the test extracts, and serves as a control for the preparation method. In addition, one lane usually contains a calibration standard with proteins of known molecular sizes solubilised in the same buffer as the extracts being investigated. Thus, nine of the 20 lanes of the gel are used for quality and standardisation purposes, and 11 lanes remain for test extracts. The limited number of lanes available for test extracts is compensated for by the fact that extracts on high quality gels which include multiple reference lanes can be used to construct libraries of profiles and do not have to be run again.
92
(ii) Computer-assisted profile analysis Images can be captured for computer-assisted analysis by flat-bed scanning of photos, by laser densitometry of dried gels, or by charge coupled device (CCD) video cameras. As discussed in Chapter 3, the resolution of most CCD cameras is lower than that of scanners, but CCD technology is rapidly improving, and resolutions of up to 1360 x 1024 pixels are now achievable. However, this section focuses on image capture by photography, followed by digitisation and analysis of protein gels with GelCompar or B ioNumerics software (Applied Maths, Kortrijk, Belgium). Moist or dried gels can be photographed with surface and transverse illumination by high quality laboratory cameras, or by Polaroid cameras. A red or yellow filter is used for gels stained with Fast Green FCF or Coomassie Blue, respectively. Black and white photographs (10 x 13 cm) are scanned with a flat-bed scanner (e.g., Hewlett Packard IICX), and images are stored in the TIFF format (see Chapter 3). The settings of the scanner should be standardised. Usually the file contains a maximum number of pixels to allow for saving on a 3.1" floppy disk. Computer-assisted analysis with GelCompar or B ioNumerics starts with conversion of the TIFF files to a file format that can be handled with these packages. Details of the procedure can be found in Chapter 3 and in the software manuals. For normalisation of protein patterns, an automated pattern recognition algorithm is used that aligns patterns by taking into account the whole contour of the curves rather than discrete peak positions (Vauterin et al., 1993). Background subtraction is done by the rolling disk mechanism, with settings made according to the quality of the gel. For cluster analysis of protein profiles, the Pearson product moment correlation coefficient is preferred as a measure of dissimilarity (Kersters & De Ley, 1975). This coefficient is principally independent of the relative concentrations of patterns and is largely insensitive to differences in backgrounds (Vauterin & Vauterin, 1992). The Unweighted Pair Group Method using Arithmetic averages (UPGMA) is generally used to group protein profiles. Depending on the purpose of the study, proteins in specific size ranges can be used, W~ile fragments in particular size ranges that are not considered relevant, or may even interfere, can be excluded from the analysis. It should be noted that computer-assisted analysis should always be accompanied by careful, visual inspection of profiles in order to check whether the grouping agrees with what can be seen intuitively by eye. G.
Quality control and assessment
It is sometimes stated that the computer-assisted normalisation process is a tool to compensate for variations between gels. However, the opposite is true since successful computer-assisted intra- and inter-gel analysis requires gels of a high quality that show minimal gel-to-gel variation. Thus, rigorous standardisation of the whole procedure is a prerequisite to computer-assisted data analysis. An overview of the variables that influence reproducibility is given in Table 4.3. Several aspects concerning the quality of protein electrophoresis, and its assessment, are discussed
93
Table 4.3. Variables that may influence the reproducibility of protein profiles and their analysis Variable
Comments
Culture conditions
Test different media; use media that can be standardised as well as possible; determine growth conditions and growth phase that lead to reproducible results Standardise by determining the protein concentration or the wet weight of cells Ammonium persulphate hydration may lead to too low a concentration and delayed polymerisation; limited shelf-life of acrylamide-Bis stocks because of acrylic acid formation; check pH and conductivity of buffers Standardise current, cooling temperature and duration of run; check voltage, time and temperature of tank buffer Some dyes do not dissolve well or do not form stable complexes with the proteins; see also Wilson (1979) Adhere to chosen settings
Protein concentration of specimens Use of chemicals and solutions
Running conditions Staining, destaining and storage of gels Photography, digitisation and computer analysis
in the following sections.
(i)
Determining the influence of culture conditions on the profiles of individual strains
Proteins are gene expression products, and their presence or absence in electrophoretic profiles may be influenced by environmental conditions (Lugtenberg et al., 1976; Kawaji et al., 1979). Therefore, when the epidemiological or taxonomic relatedness of organisms is being established on the basis of their protein profiles, it is important to first assess whether the protein profiles of these organisms are stable under different conditions. Introductory experiments on a few representative strains should be performed to test the influence of growth medium, growth temperature and growth phase. Hence, proteins that differ in expression in a strain can be recognised and eventually ignored during comparative analysis (although they may be interesting from the physiological point of view).
(ii)
Testing the intra-strain stability of protein profiles
Once the growth conditions for the organisms being studied have been established, several representative strains should be investigated on different occasions for the stability of their profiles over time. In epidemiology, insight into the stability of a marker can also be obtained by investigating multiple colonies from the original culture, and by comparing multiple isolates from different body sites and different sampling times from the same patient. For well-defined single-strain outbreaks, a series of isolates from different patients can be investigated. If the profiles within and between patients are indistinguishable, then the profiles may be useful markers in epidemiology. If the patterns vary within and between patients, this may be of interest in the study of virulence factors.
94
(iii) Standardisation of protein electrophoresis and its analysis Apart from the influence of culture conditions and growth phase, numerous experimental factors may influence the reproducibility of protein electrophoretic profiles, including the sample preparation procedure, the chemicals and solutions used during electrophoresis, the electrophoretic separation process, the staining and destaining procedure, and the process of digitisation and computer analysis. With sample preparation, it is important that the protein concentration of specimens is standardised and that the samples applied to the gel do not contain too much protein. Large variations in the protein loading of samples in adjoining lanes lead to distortions in the migration patterns. Chemicals and the solutions used may also influence the results obtained, as was apparent from the finding of Lugtenberg et al. (1975) that use of SDS with contaminating C10 and C14 derivatives resulted in better separation of major outer membrane proteins than purified SDS. Some other reagents need specific attention, including ammonium persulphate, which should be kept dry to prevent hydration, and acrylamide, which has a limited shelflife because of the formation of acrylic acid. The buffers of the electrophoresis system, particularly the separation gel buffer, have a great influence on the migration of proteins. Adjusting the pH of Tris solutions can be difficult, even with special 'Tris electrodes' (Costas et al., 1990). This may result in the addition of too large an amount of HC1. High ionic strengths in buffers give slower rates of migration. To overcome this, it is recommended that the volume of HC1 required should be determined precisely, and that both the conductivity and the pH of the solution should be checked. The process of staining is also subject to many variations, as shown in the detailed study of Wilson (1979) who compared Amido Black, Coomassie Blue and Fast Green dyes. Further trouble-shooting recommendations can be found in the reviews of Andrews (1986), Jackman (1987) and Costas et al. (1990), and also in the technical bulletins published by manufacturers (e.g., Bulletin 1156; Bio-Rad). In conclusion, once a defined batch of chemicals shows acceptable results, it is recommended to adhere to their use, with solution preparation methods also being controlled as far as possible. Finally, digitisation and computer analysis are important sources of variation, but it appears that inter-laboratory standardisation of these processes can only be achieved in large-scale collaborative projects. (iv) Control of the procedure It should now be apparent that the whole procedure, from growth of organisms and sample preparation to final analysis, should be performed under carefully controlled conditions if it is intended to compare large numbers of strains and to generate databases for identification and large-scale surveys. Therefore, standard operating procedures (SOPs) should be used, and the preparation of solutions should be formally recorded so that possible irregularities in results can be linked to deviations in (e.g.) buffer preparation or new batches of chemicals. To control the long-term reproducibility of the complete procedure, one representative strain should always be included as a control strain and processed in an
95 identical way to the unknown samples being studied. The control strain should have a protein profile with a reasonable number of bands coveting the molecular size range of fragments that can occur in the species being studied. Inclusion of this strain in each gel means that the reproducibility can be calculated following computer analysis by determining the clustering level of this control from different runs. If this strain is the same as that used to prepare reference extracts for normalisation of the protein profiles, it can be run on the gel in a lane next to the reference extract so that the pattern reproducibility can also be checked visually (see Fig. 4.1, lanes 3 and 4).
(v)
Inter-laboratory variation
As with many fingerprinting methods, minor variations in performance between laboratories give rise to differences in patterns which hamper inter-laboratory comparisons. One inter-laboratory study compared the same set of Campylobacter strains by their whole-cell protein profiles (Costas et al., 1990). The same pattern of grouping was obtained, although the profiles were clearly different due to the differences in procedures or reagents and, in particular, the different background subtraction methods of the computer analysis. Nevertheless, protein profiling is a robust method, and there is little doubt that protein profiles can be reproducible between laboratories if the procedures are standardised.
H.
Applications of protein electrophoresis of microorganisms
Protein SDS-PAGE has been used with a great variety of microorganisms, including bacteria, viruses and fungi. Studies using the technology are often focused primarily on the proteins as cell constituents, with the aim of elucidating (e.g.) their chemical or antigenic character and their role in pathogenicity. For this purpose, a limited number of strains are usually investigated, and protein electrophoresis may be combined with various other techniques, including determination of enzymatic activities, immunoblotting to determine the presence of certain epitopes, or electron microscopy. In contrast, for epidemiology and taxonomy, many strains may be investigated in one run, and the protein profiles are used merely to determine relatedness between the strains. Hundreds of publications dealing with protein analysis of microorganisms are revealed by a simple search in the PubMed database on the Internet, and a comprehensive review is beyond the scope of this book. Table 4.4 lists some examples of studies in which protein SDS-PAGE has been applied and several of these are discussed in more detail below.
(i)
Taxonomic applications
Grouping of organisms according to their similarity in protein profiles has been found to correlate well with grouping by DNA-DNA hybridisation (see e.g., Kersters & De Ley, 1975; Owen & Jackman, 1982). Consequently, protein SDS-PAGE has been found to be a powerful adjunct to DNA-DNA hybridisation and other methods, particularly when used in a polyphasic approach to resolve the taxonomy
96 Table 4.4. Examples of some organisms investigated by protein SDS-PAGE profiles
Organism
Reference
Aeromonas spp.
Kuijper et al. (1989) Dijkshoorn et al. (1987a, b; 1989; 1990) Costas et al. (1990) Dryden et al. (1992) Costas et al. (1994) Costas et al. (1989c) Vandamme et al. (2000) Gancheva et al. (1999) Verissimo et al. (1996) Costas et al. (1989a) Holmes et al. (1988) Costas et al. (1989b) Vandamme et al. (1998)
Acinetobacter spp. Campylobacter spp.
Coagulase-negative staphylococci Clostridium difficile Enterobacter cloacae Helicobacter spp. Lactobacillus acidophilus Legionella spp.
MRSA Providencia alcalifaciens Providencia rettgeri
Viridans streptococci
of bacterial genera at the species level. This usefulness is illustrated by Fig. 4.2, which shows the results of a cluster analysis of whole-cell protein profiles for strains belonging to species of the genus Weissella in which the different species are clearly separated. As is the case in this study, most taxonomic studies use whole-cell preparations rather than cell envelopes or outer membrane preparations. Computer analysis is an absolute requirement for grouping these profiles since they contain several tens of bands which makes grouping by eye impossible. One research group has set up a rigorously standardised system for protein SDS-PAGE combined with computer-assisted analysis, and uses the method as part of a polyphasic approach to investigate large collections of bacterial strains. As a result, important contributions have been made to understanding the taxonomy of a variety of bacterial genera (e.g., Vandamme et al., 1998; Gancheva et al., 1999; Heyndrickx et al., 1999). By this standardised approach, libraries of protein profiles can be constructed with well-identified strains and then used to identify unknown strains and resolve misidentifications (Vandamme et al., 2000). (ii) Epidemiological typing Another research group used highly standardised whole-cell protein analysis combined with numerical analysis during the 1980s and 1990s (Jackman, 1987; Costas et al., 1990), and explored its usefulness for a variety of organisms, including Clostridium difficile (Costas et al., 1994), Providencia alcalifaciens (Holmes et al., 1988), Providencia rettgeri (Costas et al., 1989b) and methicillin-resistant S. aureus (Costas et al., 1989a). It was recognised that predominant very dense protein bands could have a great impact on the outcome of the cluster analysis, particularly when Gram-negative organisms were being studied. This can be investigated by performing several different analyses, both of the whole profile and
97 70
-I-
80
90
100
L
i
i
i
i
i
i
I
LMG 18481 -
-
--
LMG 18482 LMG 18483 LMG 16479 LMG 17701
__
LMG 17708 LMG 17704
W. cibaria
LMG 17706 LMG 17694 LMG 14037
{
.~
_
_
LMG 17699T -
-
LMG 17700 LMG 17697 LMG 13653 LMG 13587
_
LMG 16883 LMG 17718 LMG 17705 LMG 17695 --
LMG 17698
W. confusa
LMG 17696 LMG 17709 LMG 17671 LMG 17670
--{__
LMG 9497T
LMG14040 LMG 6898 LMG 14471T LMG 9852T LMG 15125T
-~
LMG 15124
R12022l LMG 9469T
LMG 11497
LMG 13093 LMG 13094 LMG 3507T LMG 9847T
W. kandleri W. paramesenteroides W. hellenica W. halotolerans W. viridescens W. minor
Fig. 4.2. Dendrogram showing cluster analysis of Weissella strains based on the unweighted pairgroup average linkage of correlation coefficients between the protein profiles of the strains studied. The scale denotes the correlation coefficient as a percentage similarity for convenience (figure generously provided by Dr. Peter Vandamme).
of selected zones to distinguish between or within species. Many other studies on Gram-negative bacteria have used cell envelope or outer membrane profiles. If only a few strains are being compared, the profiles can be inspected easily by eye for relatedness since the number of bands is relatively low when compared to whole-cell profiles. The usefulness of protein electrophoresis for both epidemiology and taxonomy is well-illustrated by studies with Acinetobacter spp. Initial studies (Dijkshoorn et al., 1987a, b) used cell envelope protein SDS-PAGE profiles to type strains for epidemiological purposes. At the time, such profiles were analysed by eye. In later studies, the method was more standardised and numerical analysis of the profiles could be performed. In a comparative study of 120 strains belonging to different genomic species, a good correlation was found between the profiles of the 'minor' bands in the 55-97 kDa size range and the genomic species identification (Fig. 4.3). Thus, a small part of the overall profile could be used successfully for
98
Fig. 4.3. Cell envelope protein profiles of Acinetobacter strains belonging to different genomic species.
(presumptive) species identification. More heavily stained bands in the 20-55 kDa size range showed considerable strain-to-strain variation and could be used for epidemiological typing (Figs. 4.1 and 4.3) (Dijkshoorn et al., 1990). Whole-cell profiles appeared to be more difficult to interpret than cell envelope profiles (Fig. 4.4) and were less useful for typing purposes. 4.5
LIPOPOLYSACCHARIDE ANALYSIS
A.
Bacterial fractions for analysis
LPS can be extracted from bacteria by the phenol-water procedure (Westphal & Jann, 1965), or by the phenol-chloroform, petroleum-ether extraction procedure (Galanos et al., 1969) which is more commonly used for R-type LPS. Using SDSPAGE, LPS can be separated into different moieties, and it has been shown with
99
Fig. 4.4. Protein profiles of sets of Acinetobacter isolates from three outbreaks in hospitals. Left, whole-cell profiles; right, cell envelope profiles of the same isolates. biochemically defined LPS fractions that differences in O-specific side chains, core oligosaccharides and lipid A are reflected by differences in the electrophoretic profiles obtained (Jann et al., 1975). Tsai & Frasch (1982) developed a silverstaining method to visualise SDS-PAGE LPS profiles which was highly sensitive. In the comparative study of Hitchcock & Brown (1983), profiles of proteinase K-digested whole-cell lysates were found to be similar to profiles of purified LPS. Hence, it was concluded that proteinase K-treated whole-cell lysates could be used to provide preliminary data concerning the LPS phenotype. Once this was established, proteinase K-digested cell envelopes were also used for electrophoretic LPS analysis. B~
Preparation of whole-cell lysates or cell envelopes followed by proteinase K digestion
The modified method described here, as used by Vinogradov et al. (1997), involves first harvesting bacteria grown on solid medium (blood agar) with a sterile swab, resuspending the cells in 5 ml of 0.15 M NaC1, and then centrifuging at 7200 g for 10 min. The bacterial pellets are solubilised in 2 ml of 1 M Tris-HC1 pH 6.8, 4% [3-mercaptoethanol, 10% glycerol, and then stored at-20~ For digestion with
100 proteinase K, 15 ~1 of cell preparation are added to 15 ~tl sample buffer (Table 4.2) and heated to 100~ for 10 min. Following heating, 10 ~tl of sample extract is added to 10 ~1 of proteinase K solution (2.5 ~g in 10 ~1 sample buffer) and incubated at 60~ for 1 h. The digested sample can be stored at-20~ but should be heated at 100~ for 5 min before use. Preparation of cell envelopes is done by harvesting the cells, followed by ultrasonic disruption and differential centrifugation as described in section 4.4.B.ii. Degradation of proteins in the cell envelopes is achieved by boiling for 5 min in sample treatment buffer (0.5 mg of protein/ml) and then incubating with proteinase K (50 ~g/ml) for 1 h at 60~
C.
Electrophoretic separation
Electrophoretic separation of LPS fragments by SDS-PAGE can be done using 5% acrylamide stacking gels and separation gels with 10 or 15% acrylamide, depending on whether S-type LPS or R-type LPS is to be visualised. Various buffer systems and concentrations of acrylamide have been used (e.g., Hitchcock & Brown, 1983; Maskell, 1991).
D.
Silver-staining
The method outlined here corresponds to the method of Tsai & Frasch (1982), except that periodic acid is dissolved in water and ethanol is replaced as a fixative by isopropanol. The whole procedure is performed in a glass dish with gentle shaking, except for the overnight fixation step. All reagents are prepared in glass vessels and rinsed latex gloves are worn when handling the gel to prevent fingerprints. After use, the glass dish is cleaned with concentrated nitric acid, followed by careful rinsing with deionised or distilled water. In the first step, following electrophoresis, the gel is fixed overnight in 250 ml 40% (v/v) isopropanol in 5% (v/v) acetic acid. Next, LPS is oxidised for 5 min in 200 ml 0.7% (w/v) periodic acid in water, and then washed three times with 500-1000 ml of water for 15 min, 30 min and 60 min, respectively. The water is drained and freshly prepared (5-10 min before use) silver-staining solution is added to the gel for 10 min. The staining solution contains sodium hydroxide, concentrated ammonium hydroxide and silver nitrate (Tsai & Frasch, 1982). It is important to maintain the 'high strength' of the ammonium hydroxide, and it is recommended that small aliquots from a freshly opened bottle should be stored in small vessels which can be tightly sealed (Hitchcock & Brown, 1983). After the staining step, the gel is washed three times as described above. Finally, the silverstained profiles are visualised with 200 ml of a solution containing 10 mg citric acid and 100 ~tl 36% formaldehyde in water; however, 10-15 min may be required before an appropriate level of staining is obtained. It may be helpful to compare the results with a photograph showing the desired intensity of staining of a reference strain. Once this intensity is obtained, the gel is washed for 5 min with 500 ml of
101 water and a photograph is taken immediately. Due to the washing procedure, the gel becomes fragile and care should be taken that it does not break into pieces. E.
Applications
LPS electrophoretic profiling has been used in numerous studies, frequently in combination with immunoblotting techniques to examine the antigenic characteristics of the separated LPS moieties. In many cases, these studies were performed to explore the correlation of LPS with pathogenicity or to study the interaction of LPS with the immune system. Exploitation of LPS diversity by SDS-PAGE, with or without immunoblotting, for the purpose of typing is relatively uncommon compared to the many pathogenicity studies, although many serotyping systems are based on the diversity of LPS in microorganisms. A drawback of LPS SDSPAGE is that the method is rather laborious. However, it has undisputed potential as a typing method, as shown by Aucken & Pitt (1993) who applied the technique to 124 isolates belonging to 12 Gram-negative species and concluded that LPS profiling is useful for epidemiological investigations with small clusters of isolates in order to trace cross-infection between patients. Another illustration of the usefulness of LPS profiling is shown by a series of recent studies on Acinetobacter LPS. Initial work with Acinetobacter strain NCTC 10305 had shown that its LPS resembled in many ways that of rough mutants belonging to the Enterobacteriaceae family (Brade & Galanos, 1982). However, ladder-type LPS (Fig. 4.5) can be observed in some strains and genomic species of Acinetobacter when proteinase K-treated cell envelopes of these organisms are investigated by SDS-PAGE combined with silver staining (personal unpublished results). This finding is suggestive of S-type LPS in the strains investigated. Occurrence of this type of LPS in Acinetobacter was also suggested by Smith & Alpar (1991) who used immunoblotting. The presence of S-type LPS in Acinetobacter has now been established by both chemical characterisation and immunoblotting of LPS separated by SDS-PAGE using whole-cell extracts (Vinogradov et al., 1996; 1997; Haseley et al., 1997). On the basis of these findings, monoclonal antibodies have been developed that are promising tools for rapid identification and typing of acinetobacters (Pantophlet et al., 1999a, b). 4.6
CONCLUSIONS
The past three decades have shown that SDS-PAGE is a powerful and robust technique that can be used for analysis of proteins and LPS, either alone or in combination with other techniques. Thus, it is an important tool for taxonomy, epidemiology and more fundamental research. In general, the technique is considered to be rather laborious when compared to most DNA fingerprinting techniques. Nevertheless, once established in a standardised way, protein SDS-PAGE can be used to construct fingerprint libraries for the identification of species and strains of a variety of microorganisms. With the availability of commercially prepared reagents,
102
Fig. 4.5. Electrophoretic profiles of proteinase K-digested cell envelope fractions of Acinetobacter
strains following visualisation by silver-staining.
the work load of the method is now comparable to that of other advanced genomic fingerprinting methods such as AFLP (see Chapter 8), and inter-laboratory standardisation has become feasible. Similarly, SDS-PAGE, with or without additional blotting procedures, can be useful for fingerprinting organisms on the basis of their LPS content, although such use will be confined largely to structural analysis rather than to typing. Overall, it can be expected that SDS-PAGE will retain its place for the immediate future on the list of major methods in biology and biochemistry that are suitable for fingerprinting microorganisms. ACKNOWLEDGEMENTS It is a pleasure to thank Dr. Peter Vandamme for providing Fig. 4.2, Mr. Jan Beentjes for processing the figures, and Dr. Ben de Jong, Mr. Dirk Dewettinck and Dr. Ralph Pantophlet for helpful advice and critical reading of the manuscript.
103
REFERENCES Andrews, A.T. (1986). Electrophoresis: theory, techniques, and biochemical and clinical applications. Oxford University Press, New York. Armstrong, S.K. & Parker, C.D. (1986). Heat-modifiable envelope proteins of Bordetella pertussis. Infection and Immunity 54, 109-117. Aucken, H.M. & Pitt, T.L. (1993). Lipopolysaccharide profile typing as a technique for comparative typing of gram-negative bacteria. Journal of Clinical Microbiology 31, 1286-1289. Brade, H. & Galanos, C. (1982). Isolation, purification, and chemical analysis of the lipopolysaccharide and lipid A of Acinetobacter calcoaceticus NCTC 10305. European Journal of Biochemistry 122, 233-237. Brock, T.D., Madigan, M.T., Martinko, J.M. & Parker, J. (1994). Biology of microorganisms, 7th edn. Prentice Hall, Englewood Cliffs, NJ. Costas, M., Cookson, B.D., Talsania, H.G. & Owen, R.J. (1989a). Numerical analysis of electrophoretic protein patterns of methicillin-resistant strains of Staphylococcus aureus. Journal of Clinical Microbiology 27, 2574-2581. Costas, M., Holmes, B., Wood, A.C. & On, S.L. (1989b). Numerical analysis of electrophoretic protein patterns of Providencia rettgeri strains from human faeces, urine and other specimens. Journal of Applied Bacteriology 67, 441-452. Costas, M., Sloss, L.L., Owen, R.J. & Gaston, M.A. (1989c). Evaluation of numerical analysis of SDS-PAGE of protein patterns for typing Enterobacter cloacae. Epidemiology and Infection 103, 265-274. Costas, M., Pot, M., Vandamme, E, Owen, R.J. & Hill, L.R. (1990). Interlaboratory comparative study of the numerical analysis of one-dimensional sodium dodecyl sulphate-polyacrylamide gel electrophoretic protein patterns of Campylobacter strains. Electrophoresis 11,467-474. Costas, M., Holmes, B., Ganner, M., On, S.L., Hoffman, EN., Worsley, M.A. & Panigrahi, H. (1994). Identification of outbreak-associated and other strains of Clostridium difficile by numerical analysis of SDS-PAGE protein patterns. Epidemiology and Infection 113, 1-12. Dijkshoorn, L., Michel, M.E & Degener, J.E. (1987a). Cell envelope protein profiles of Acinetobacter calcoaceticus strains isolated in hospitals. Journal of Medical Microbiology 23, 313-319. Dijkshoorn, L., van Vianen, W., Degener, J.E. & Michel, M.E (1987b). Typing ofAcinetobacter calcoaceticus strains isolated from hospital patients by cell envelope protein profiles. Epidemiology and Infection 99, 659-667. Dijkshoorn, L., Wubbels, J.L., Beunders, A.J., Degener, J.E., Boks, A.L. & Michel, M.E (1989). Use of protein profiles to identify Acinetobacter calcoaceticus in a respiratory unit. Journal of Clinical Pathology 42, 853-857. Dijkshoorn, L., Tjernberg, I., Pot, B., Michel, M.E, Ursing, J. & Kersters, K. (1990). Numerical analysis of cell envelope protein profiles of Acinetobacter strains classified by DNA-DNA hybridization. Systematic and Applied Microbiology 13, 338-344. Dryden, M.S., Talsania, H.G., Martin, S., Cunningham, M., Richardson, J.E, Cookson, B., Marples, R.R. & Phillips, I. (1992). Evaluation of methods for typing coagulase-negative staphylococci. Journal of Medical Microbiology 37, 109-117. Filip, C., Fletcher, G., Wulff, J.L. & Earhart, C.E (1973). Solubilization of the cytoplasmic membrane of Escherichia coli by the ionic detergent sodium-lauryl sarcosinate. Journal of Bacteriology 115, 717-722. Galanos, C., Ltideritz, O. & Westphal, O. (1969). A new method for the extraction of R lipopolysaccharides. European Journal of Biochemistry 9, 245-249. Gancheva, A., Pot, B., Vanhonacker, K., Hoste, B. & Kersters, K. (1999). A polyphasic approach towards the identification of strains belonging to Lactobacillus acidophilus and related species. Systematic and Applied Microbiology 22, 573-585. Gordon, A.H. (1975). Electrophoresis of proteins in polyacrylamide and starch gels. In Laboratory techniques in biochemistry and molecular biology, vol. 1, 2nd edn, Work, T.S. & Work, E., eds.
104 Elsevier, Amsterdam. Hammond, S.M., Lambert, RA. & Rycroft, A.N. (1984). The bacterial cell surface. Croom Helm, Beckenham. Hancock, R.E.W. & Carey, A.M. (1979). Outer membrane of Pseudomonas aeruginosa: heat- and 2-mercaptoethanol-modifiable proteins. Journal of Bacteriology 140, 902-910. Haseley, S.R., Pantophlet, R., Brade, L., Holst, O. & Brade, H. (1997). Structural and serological characterisation of the O-antigenic polysaccharide of the lipopolysaccharide from Acinetobacter junii strain 65. European Journal of Biochemistry 245, 477-481. Heyndrickx, M., Lebbe, L., Kersters, K., Hoste, B., De Wachter, R., De Vos, P., Forsyth, G. & Logan, N.A. (1999). Proposal of Virgibacillus proomii sp. nov. and emended description of Virgibacillus pantothenticus (Proom and Knight 1950) Heyndrickx et al. 1998. International Journal of Systematic Bacteriology 49, 1083-1090. Hitchcock, P.J. & Brown, T.M. (1983). Morphological heterogeneity among Salmonella lipopolysaccharide chemotypes in silver-stained polyacrylamide gels. Journal of Bacteriology 154, 269-277. Holmes, B., Costas, M. & Sloss, L.L. (1988). Numerical analysis of electrophoretic protein patterns of Providencia alcalifaciens strains from human faeces and veterinary specimens. Journal of Applied Bacteriology 64, 27-35. Holst, O., Ulmer, A.J., Brade, H., Flad, H.-D. & Rietschel, E.T. (1996). Biochemistry and cell biology of bacterial endotoxins. FEMS Immunology and Medical Microbiology 16, 83-104. Jackman, P.J.H. (1987). Microbial systematics based on electrophoretic whole-cell proteins. In Methods in microbiology, vol. 19, Colwell, R.R. & Grigorova, R., eds. Academic Press, London. Jacques, M. (1996). Role of lipo-oligosaccharides and lipopolysaccharides in bacterial adherence. Trends in Microbiology 4, 408-410. Jann, B., Reske, K. & Jann, K. (1975). Heterogeneity of lipopolysaccharides. Analysis of polysaccharide chain lengths by sodium dodecyl-sulphate polyacrylamide gel electrophoresis. European Journal of Biochemistry 60, 239-246. Kawaji, H., Mizuno, T. & Mizushima, S. (1979). Influence of molecular size and osmolarity of sugars and dextrans on the synthesis of outer membrane proteins 0-8 and 0-9 of Escherichia coli K12. Journal of Bacteriology 140, 843-847. Kersters, K. & De Ley, J. (1975). Identification and grouping of bacteria by numerical analysis of their electrophoretic protein pattems. Journal of General Microbiology 87, 333-342. Koebnik, R., Locher, K.P. & Van Gelder, P. (2000). Structure and function of bacterial outer membrane proteins: barrel in a nutshell. Molecular Microbiology 37, 239-253. Kuijper, E.J., van Alphen, L., Leenders, E. & Zanen, H.C. (1989). Typing of Aeromonas strains by DNA restriction endonuclease analysis and polyacrylamide gel electrophoresis of cell envelopes. Journal of Clinical Microbiology 27, 1280-1285. Laemmli, U.K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227, 680-685. Lugtenberg, B. (1981). Composition and fuction of the outer membrane of Escherichia coli. Trends in Biochemical Sciences 2, 262-265. Lugtenberg, B. & Van Alphen, L. (1983). Molecular architecture and functioning of the outer membrane of Escherichia coli and other Gram-negative bacteria. Biochimica et Biophysica Acta 737, 51-115. Lugtenberg, B., Meijers, J., Peters, R., Van der Hoek, P. & Van Alphen, L. (1975). Electrophoretic resolution of the 'major outer membrane' protein' of Escherichia coli K12 into four bands. FEBS Letters 58, 254-258. Lugtenberg, B., Peters, R., Bemheimer, H. & Berendsen, W. (1976). Influence of cultural conditions and mutations on the composition of the outer membrane proteins of Escherichia coli. Molecular and General Genetics 147, 251-262. Maskell, J.P. (1991). The resolution of bacteroides lipopolysaccharides by polyacrylamide gel electrophoresis. Journal of Medical Microbiology 34, 253-257.
105 Owen, R.J. & Jackman, EJ.H. (1982). The similarities between Pseudomonas paucimobilis and allied bacteria derived from analysis of deoxyribonucleic acids and electrophoretic protein patterns. Journal of General Microbiology 128, 2945-2954. Pantophlet, R., Brade, L. & Brade, H. (1999a). Use of murine O-antigen-specific monoclonal antibody to identify Acinetobacter strains of unnamed genomic species 13 sensu Tjernberg and Ursing. Journal of Clinical Microbiology 37, 1693-1698. Pantophlet, R., Brade, L. & Brade, H. (1999b). Identification of Acinetobacter baumannii strains with monoclonal antibodies against the O antigens of their lipopolysaccharides. Clinical Diagnostics and Laboratory Immunology 6, 323-329. Pot, B., Vandamme, E & Kersters, K. (1994). Analysis of electrophoretic whole-organism protein fingerprints, p. 493-521. In Modern microbial methods. Chemical methods in bacterial systematics. Goodfellow, M. & O'Donnell, A.G., eds. J. Wiley and Sons, Chichester. Preston, A., Mandrell, R.E., Gibson, B.W. & Apicella, M.A. (1996). The lipooligosaccharides of pathogenic Gram-negative bacteria. Critical Reviews in Microbiology 22, 139-180. Rietschel, E.T. & Brade, H. (1992). Bacterial endotoxins. Scientific American 267, 26-33. Smith, A.W. & Alpar, K.E. (1991). Immune response to Acinetobacter calcoaceticus infection in man. Journal of Medical Microbiology 34, 83-88. Tsai, C.-M. & Frasch, C.E. (1982). A sensitive silver stain for detecting lipopolysaccharides in polyacrylamide gels. Analytical Biochemistry 119, 115-119. Vandamme, E, Torck, U., Falsen, E., Pot, B., Goossens, H. & Kersters, K. (1998). Whole-cell protein electrophoretic analysis of viridans streptococci: evidence for heterogeneity among Streptococcus mitis biovars. International Journal of Systematic Bacteriology 48, 117-125. Vandamme, E, Harrington, C.S., Jalava, K. & On, S.L. (2000). Misidentifying helicobacters: the Helicobacter cinaedi example. Journal of Clinical Microbiology 38, 2261-2266. Vauterin, L. & Vauterin, E (1992). Computer-aided objective comparison of electrophoresis patterns for grouping and identification of microorganisms. European Microbiology 2, 37-41. Vauterin, L., Swings, J. & Kersters, K. (1993). Protein electrophoresis and classification. In Handbook of new bacterial systematics, Goodfellow, M. & O'Donnell, A.G., eds, pp. 251-280. Academic Press, London. Verissimo, A., Morais, EV., Diogo, A., Gomes, C. & da Costa, M.S. (1996). Characterization of Legionella spp. by numerical analysis of whole-cell protein electrophoresis. International Journal of Systematic Bacteriology 46, 41-49. Vinogradov, E.V., Pantophlet, R., Dijkshoorn, L., Brade, L., Holst, O. & Brade, H. (1996). Structural and serological characterisation of two O-specific polysaccharides of Acinetobacter. European Journal of Biochemistry 239, 602-610. Vinogradov, E.V., Pantophlet, R., Haseley, S.R., Brade, L., Holst, O. & Brade, H. (1997). Structural and serological characterisation of the O-specific polysaccharide from lipopolysaccharide of Acinetobacter calcoaceticus strain 7 (DNA group 1). European Journal of Biochemistry 243, 167-173. Westphal, O. & Jann, K. (1965). Bacterial lipopolysaccharides. Extraction with phenol-water and further applications of the procedure. Methods in Carbohydrate Chemistry 5, 83-91. Williams, B.L. & Wilson, K. (1983). A biologist's guide to principles and techniques of practical biochemistry, 2nd edn., Willis, A.J. & Sleigh, M.A., eds, pp. 127-154. Arnold, London. Wilson, C.M. (1979). Studies and critique of Amido Black 10B, Coomassie Blue R, and Fast Green FCF as stains for proteins after polyacrylamide gel electrophoresis. Analytical Biochemistry 96, 263-278.
This Page Intentionally Left Blank
107
5
rRNA Gene Restriction Pattern Determination (Ribotyping) and Computer Interpretation
P a t r i c k A.D. G r i m o n t a n d F r a n c i n e G r i m o n t Unitd des Entdrobactdries, INSERM U389, Institut Pasteur, 28 Rue du Dr Roux, 75724 Paris Cedex 15, France
CONTENTS 5.1
INTRODUCTION
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
108
5.2
RIBOTYPING METHODOLOGY . . . . . . . . . . . . . . . . . . . A. Manual method (i) DNA extraction and purification (ii) Cleavage of DNA with restriction endonucleases (iii) Fragment size markers (iv) Horizontal agarose gel electrophoresis of cleaved DNA samples (v) Vacuum transfer of DNA fragments to a membrane (vi) Probes and labelling (vii) Hybridisation of membrane-bound fragments (viii) Visualisation of hybridised fragments B. Automated method
109 109 109 110 111 lll 112 113 114 115 115
5.3
COMPUTER ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . A. Need for computer-aided interpretation of typing patterns B. Steps in typing pattern analysis (i) Image capture (ii) Image improvement (iii) Data extraction (iv) Pattern comparison (v) Automated identification C. Data extraction and identification using the RiboPrinter
116 116 117 117 117 118 120 122 123
5.4
SOURCES OF VARIATION IN RIBOTYPE ANALYSIS A. Sources of variation in restriction pattern determination B. Qualitative variations C. Quantitative (size) variations
124 124 124 124
5.5
EXAMPLES OF APPLICATIONS . . . . . . . . . . . . . . . . . . . A. Ribotyping as an identification tool B. Ribotyping as a typing tool for epidemiology
125 126 127
5.6
FUTURE PERSPECTIVES AND CONCLUSIONS
129
REFERENCES 9
.......
..........
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Elsevier Science B.V. All rights reserved.
129
108 5.1
INTRODUCTION
Investigations of presumed outbreaks of bacterial infections require strain typing data to identify outbreak-related strains and to distinguish epidemic isolates from endemic isolates. It should be emphasised that precise identification at the species level is the primary epidemiological marker. As described in Chapter 1, traditional typing methods for bacteria were only devised after lengthy efforts and have been established for a limited set of bacterial species. Thus, serotyping schemes require a collection of rabbit sera, with each serum being collected after several immunisation steps and absorbed with cross-reacting bacterial suspensions. Such immunological reagents are normally applicable to only a single genomic species, but very powerful serotyping systems (such as the Kauffmann-White scheme for Salmonella) have been developed thanks to the efforts of several generations of workers. Phage-typing has always been shown to have a better resolution than serotyping. However, a collection of well-chosen and pure phages is required and these phages generally have a limited host-range. A few months are necessary to devise a system for local strains, while several years of work are often needed to achieve an internationally useful system. For some genera (e.g., Legionella), no phage have yet been isolated. Bacteriocin typing has the same limitations and requires the isolation of some bacteriocin-producing strains. Although a few classical typing systems have reached an impressive level of refinement (e.g., Salmonella serotyping), there is a growing need for universal typing methods (i.e., methods giving results with any bacterial species). Early applications of a molecular approach to typing consisted of restriction analysis of plasmid DNA (Meyers et al., 1976). However, this required the presence of a plasmid, its extraction and purification, and comparison of the same sort of plasmid in multiple plasmid-bearing strains. Comparison of restriction patterns of bacterial genomic DNA was subsequently proposed for Mollicutes (Bove & Saillard, 1979). However, for bacteria with larger genomes, restriction patterns were initially too complex to serve as fingerprints. The method proposed by Southern (1975) allowed visualisation of those restriction fragments in the overall complex pattern that hybridised with a probe. This method was soon used to analyse restriction fragment length polymorphisms (RFLPs) (Kan & Dozy, 1978). The fact that genes coding for ribosomal ribonucleic acids (rrs genes coding for 16S rRNA and rrl genes coding for 23S rRNA) have large portions which have been conserved during evolution (Fox et al., 1980) raised the possibility that these genes accounted for most of the DNA fragments shared amongst different bacterial species (Ostapchuk et al., 1980). This idea was used to devise a general strategy for preparing species-specific probes (Grimont et al., 1985). In the course of this work, it was observed that different species yielded different RFLP patterns when probed with labelled 16+23S rRNA. Other studies had previously reported the use of rRNA-RFLPs for different purposes (Ostapchuk et al., 1980; Gottlieb & Rudner, 1985). However, the method called rRNA gene restriction pattern determi-
109 nation (Grimont & Grimont, 1986) was the first to propose a single probe mixture (16+23S rRNA from Escherichia coli) for specific and intra-specific identification of bacteria, irrespective of their phylogenetic position. This was the first attempt at universal molecular typing. The method was later named ribotyping (Stull et al., 1988). 5.2
RIBOTYPING METHODOLOGY
A.
Manual method
(i) DNA extraction and purification In order to yield readable and reproducible rRNA gene restriction patterns after cleavage by restriction endonucleases, the extracted DNA must be of high molecular size and free from inhibitors which might interfere with endonuclease activity. Gram-negative bacteria, collected from a 10-ml stationary phase rich broth (e.g., trypto-casein soy broth) culture are easily lysed by detergents such as sodium dodecyl sulphate (SDS) in Tris-HC1, EDTA, NaC1 buffer (Brenner et al., 1982). Concomitant proteinase K treatment is likely to improve the yield. The precise protocol has been published previously (Grimont & Grimont, 1995). Lysis of Gram-positive bacteria can be more tricky. It may be best to collect bacteria in the exponential phase of growth (40 ml) since lysis of some Gram-positive bacterial species in the stationary phase of growth can be very difficult. Different lysis protocols can be used, depending on the bacterial genus being studied (see references in section 5.5). For example, the protocol for coryneforms involves suspending bacterial cells in Tris-HC1, EDTA, NaC1, Triton X-100 buffer and treating first with lysozyme and mutanolysine, followed by proteinase K and SDS, while the protocol for staphylococci involves suspending bacterial cells in TrisHC1, EDTA, NaC1, Triton X-100 buffer and treating first with lysostaphin, followed by SDS and proteinase K. DNA purification involves shaking the bacterial lysate vigorously with an equal volume of 25:24:1 (v/v) phenol (saturated with Tris, EDTA, NaC1 and protected against oxidation with 8-hydroxyquinoline):chloroform:isoamyl alcohol. After centrifugation, the upper (aqueous) phase containing the nucleic acids is collected and treated again with the phenol:chloroform mixture until there is no longer a white (proteinaceous) interface between phases. The nucleic acids are precipitated with two volumes of cold (-20~ absolute ethanol and then dried in a vacuum dessicator. The dried precipitate is then dissolved in Tris-HC1, EDTA buffer. Several kits are available for fast DNA extraction and purification. We have used the IsoQuick Nucleic Acid Extraction Kit (Orca Research, Bothell, WA, USA) and the QIAamp DNA Mini Kit (Qiagen, Valencia, CA, USA) successfully with Gramnegative bacteria, and the Wizard Genomic DNA purification kit (Promega, Madison, WI, USA) with lysed Gram-positive bacteria. In all cases, it is useful to systematically submit a 100-~1 portion of the DNA solution to microdialysis on Millipore type VS filters (diameter 2.5 cm, pore size
110 0.025 ~m) in order to get rid of possible salts and traces of phenol or other compounds which may inhibit endonuclease cleavage. This is done just before endonuclease treatment. When working for the first time with a bacterial group, it is also useful to check the DNA quality by gel electrophoresis (mini-gel with unrestricted DNA). Poor DNA extraction will result in too small an amount of DNA, while hydrolysed DNA will be seen as a smear instead of a thick band. Insufficiently pure DNA is usually not identified as such before restriction attempts.
(ii)
Cleavage of DNA with restriction endonucleases
Restriction endonucleases recognise specific palindromic sequences of 4-13 nucleotides in length. Short recognition sequences are likely to be more frequent in a genome than longer recognition sequences, which is why more fragments of lower molecular size are produced. Sequences containing only combinations of adenine and thymine nucleotides are likely to be more frequent in a genome with a low G+C ratio and rarer in a genome with a high G+C ratio. When recognition sequences are rare in a genome, only a few fragments are generated, and these will be of a size which is too large to migrate in an agarose gel with the methods described below. A restriction endonuclease unit is the minimal amount of enzyme required to cleave 1 lag of DNA in 1 h, using the assay conditions recommended by the manufacturer, and yield an enzyme-specific restriction pattern. Restriction endonucleases are usually supplied in a buffer containing 50% (v/v) glycerol. Incubation buffers are usually supplied by the manufacturer in a concentration that is five or 10 times the final required concentration. Digestion is performed at the temperature recommended by the manufacturer (often 37~ in a water bath for 2-4 h. If the reaction mixture cannot be analysed on a gel in the same day, it should be stored at-20~ The choice of endonuclease for ribotyping is of paramount importance. The resulting DNA fragments should range from 1-20 kb in size. For a given bacterial species, different endonucleases may give widely different taxonomic resolutions. For example, HindIII gives nearly a species-specific ribotype with Vibrio cholerae, whereas BglI gives >40 ribotypes. The aesthetic aspect of patterns should also be taken into consideration. Bands should be well-separated to ease their detection by computer. In some instances, endonucleases giving common bands in almost all ribotypes within a species have been chosen because common bands help pattern comparison (e.g., MluI for E. coli; BstEII for Corynebacterium diphtheriae). When new bacterial groups are studied, it is advisable to select a few strains representative of known diversity (i.e., geographic, pathogenic or antigenic diversity) and to run ribotyping experiments using several (about 10) different endonucleases. It is useful to include endonucleases which are commonly used for related bacterial groups because, if these perform well, there is no need to add to the confusion by selecting different endonucleases. The choice of endonuclease should be made only after the precise purpose of the ribotyping has been decided (i.e., species identification, world-wide surveillance, or fine typing of local strains).
111 (iii) Fragment size markers
To interpolate the size of DNA fragments, it is necessary to include molecular size standards (or markers) in some of the lanes. Each marker lane should contain the same set of standard fragments covering the range of sizes expected to be observed in ribotyping experiments. Each set should contain more than four fragments, with the smallest being smaller than any test fragment, and the largest being larger than any test fragment. Some size markers need to be hybridised with a specific probe so that they can be visualised in ribotyping: Marker Raoul I (Appligene, Illkirch, France) contains 22 fragments of 48502, 18520, 14980, 10620, 9007, 7378, 5634, 4360, 3988, 3609, 2938, 2319, 1810, 1416, 1255, 1050, 903,754, 686, 554, 375 and 234 bp. It should be hybridised with labelled pBR322 DNA. - Lambda DNA cleaved with HindIII contains seven fragments of 23130, 9416, 6557, 4361, 2322, 2027 and 564 bp. It should be hybridised with labelled lambda DNA. -
The ribotype pattern given by some reference strains can also be used as fragment size markers. The advantage of this approach is the ease of production (total DNA from a strain) and visualisation by hybridisation with the same probe used for ribotyping. The disadvantage is that fragment sizes are rarely known exactly. Two of these markers are listed below: - Xenorhabdus strain CIP 105189 (Grimont 278) yields 15 ribotype fragments of 13084, 11634, 10121, 8514, 5805, 4599, 3784, 2953, 2556, 2436, 2249, 2130, 1528, 1383 and 775 bp. - Citrobacter koseri strain CIP 105177 (Grimont 32) yields 13 ribotype fragments of 16752, 12482, 7330, 6650, 6456, 5752, 5098, 4405, 3023, 2778, 1696,1444 and 1171 bp. Once the genome of a strain has been fully sequenced, it is possible to derive exact marker sizes: - E. coli K12 strain MG 1655 cleaved in silico with MluI yields 15 ribotype fragments of 19180, 10629, 8935, 6529, 5807, (5260, 5248), (4997, 4898), 3985, (3139, 3136), 2284, 1681 and 745 bp. However, fragments indicated in parentheses cannot be separated experimentally and should be taken as 5254, 4938 and 3138 bp, respectively (a total of 12 observed fragments). (iv)
Horizontal agarose gel electrophoresis o f cleaved DNA samples
The choice of electrophoretic equipment and conditions has a strong influence on the quality of the results, and especially whether pattern digitisation will be easy or difficult. When setting up ribotyping, different electrophoresis tanks, casting plates, and well-forming combs should be tested. Horizontal electrophoresis with a submarine or submerged system in which the whole gel is submerged in buffer during electrophoresis is usually preferred. A casting plate which is 20 cm wide and 25 cm long, used in combination with a 20-tooth comb and a 5 mm-thick aga-
112 rose gel, gives wide enough, well-separated lanes and bands. The agarose brand (e.g., Seakem; FMC, Rockland, ME, USA) should have a low electroendosmostic coefficient. Different brands of agarose should be compared before selecting the brand and quality to provide thin and clearly separated bands in ribotyping experiments. The usual agarose concentration in gels is 0.8% (w/v), although better reproducibility has been observed in size determination of larger (c.20 kb) fragments when 0.7% agarose is used. Care should be taken to check that the surface is flat and horizontal in all directions when the gel is being poured. The electrophoresis buffer also influences the quality of results. The choice is between two buffers, Tris-acetate, EDTA (TAE) or Tris-borate, EDTA (TBE) buffers. TAE provides faster migration with better resolution. However, electrophoresis in TAE buffer is very sensitive to salt contamination in samples, resulting in serious band distortion. TBE buffer provides less resolution, but is less sensitive to salt contamination and is the normal choice for routine work. Electrophoresis settings should be based on constant voltage (not constant current) at a relatively low voltage to avoid gel distortion. We use 1.5 V / cm for 16 h for ribotyping. Heat generated by electrophoresis is a nuisance, hence the low voltage selected. Furthermore, performing electrophoresis in an air-conditioned room is an advantage. (v) Vacuum transfer of DNA fragments to a membrane When first described (Grimont & Grimont, 1986), the ribotyping protocol used the original capillary transfer method of Southern (1975) with nitrocellulose membranes. These membranes are easily tom when dried and were soon replaced by nylon membranes which are very resistant to tearing and to alkali. DNA fragments transferred to nylon membranes can be hybridised and then stripped of probe several times. The choice of membrane is dependent upon the probe used. When 32p_ or acetylaminofluorene (AAF)-labelled 16+23S rRNA is used, membranes such as Hybond N (Amersham Pharmacia B iotech, Little Chalfont, UK) give good results. Positively charged membranes give excessive background staining with AAF-labelled probes. However, with horseradish peroxidase-labelled probes (followed by bioluminescence detection with the Amersham ECL system) or digoxigenin (DIG)-labelled oligonucleotide probes, positively charged membranes (e.g., Amersham Hybond-N+) perform better. The use of vacuum transfer instead of capillary transfer is faster and results in thinner ribotype bands. A vacuum transfer system usually consists of a vacuum unit (base and frame) connected to a vacuum pump, a polyethylene porous screen (support for the transfer membrane, the mask and the gel) resting on the inner rim of the base unit, and a polyethylene mask to be placed on top of the screen, overlapping the membrane edges. The mask ensures that the full effect of the vacuum is concentrated on the gel. A typical system is the VacuGene system (Amersham Pharmacia Biotech). Before transfer, the gel containing the DNA fragments is immersed in a depuri-
113 nation solution (0.25 M HC1) which breaks down large fragments for faster transfer, washed with distilled water, then immersed in a denaturation solution (1.5 M NaC1, 0.5 M NaOH). The gel is then placed on the transfer membrane in the vacuum blotting unit, following the manufacturer's instructions, and transfer is obtained with a transfer solution (1.5 M NaC1, 0.25 M NaOH) under 55 mB or 55 cm H20 vacuum. The denatured single-stranded DNA is bound permanently to the transfer membrane. After transfer (about 60 min), the membrane is washed with 2 x SSC (1 x SSC is 0.15 M NaC1, 0.015 M Tri-sodium citrate, pH 7.0). The membranes can be stored wrapped in filter paper and aluminium foil at room temperature in a dry place. They can be stored almost indefinitely or mailed to another laboratory. Excessive transfer time may allow fragments to pass through the membrane. This can be suspected when bands appear on the membrane side opposite to the side which was in contact with the gel. When a vacuum blotting system is used for the first time, a few experiments with different transfer times should be set up to determine the optimal time for transfer. (vi)
Probes and labelling
The original probe used by Grimont & Grimont (1986) was a commercial mixture of 16+23S rRNA from E. coli that was 5' end-labelled with gamma-32p-ATE Subsequently, chemically-labelled 16+23S rRNA (AAF-rRNA) was used (Grimont et al., 1989a), but unfortunately this is no longer available commercially. Another non-radioactive labelling procedure with horseradish peroxidase (ECL gene detection system) attached to rRNA has also been used (Koblavi et al., 1990). The E. coli rrnB operon (including rrs and rrl genes) was cloned in pBR322 by Brosius et al. (1981). The recombinant plasmid (pKK3535), or a 7.5-kb B a m H I fragment, was used as a probe after labelling with gamma-32P-ATP (Bercovier et al., 1986), biotin (Altwegg & Mayer, 1989) or DIG (Graves et al., 1991). rRNA genes from Bacillus subtilis were cloned by Iglesias et al. (1983). Recombinant plasmid pBA2 contains a 2.3-kb insert which has been used as a probe for ribotyping Gram-positive bacteria after labelling with gamma-32p-ATP (De Buyser et al., 1989). Other authors have cloned parts or all of the rrn operon from the bacterial species studied, e.g., Bacillus spp. (Gottlieb & Rudner, 1985), Salmonella (Demezas & Bell, 1995), or Legionella (Saunders et al., 1988). All cloned probes used for ribotyping differ in the extent of sequence positions covered, lacking part of the rrs or rrl genes and/or including spacer or neighbour regions. DNA probes have also been obtained from purified 16+23S rRNA by random oligopriming using reverse transcriptase in the presence of labelled nucleotides (Pitcher et al., 1987). Other authors have used rRNA from different bacteriological sources labelled with 32p (Picard-Pasquier et al., 1989), biotin (Pitcher et al., 1987) or DIG (Blumberg et al., 1991). The polymerase chain reaction (PCR) has also been used to amplify rRNA genes from diverse bacteria for use as a probe. In most cases, the probe has consisted of parts of the rrs gene, comprising 544 nucleotides (Stanley et al., 1993), 1057
114 Table 5.1. Oligonucleotide probes comprising OligoMix5
Target
Probe
Probesequence
TdC a
Length
Positionb
rrs rrs rrl rrl rrl
Ad rB O4c O 16g O24c
AgAgTTTgATC(A,C)TggCTCAgc TgACgggCggTgTgTACAA ACCgATAgIgAACCAgTACCgTg TACCICAAACCgACACAggTIg TTTggCACCTCgATgTCggCT
58-60 60 68 66 66
20 18 23 23 21
08-28 1408-1390 442-466 1601-1623 2490-2512
aTdC, dissociation temperature in ~ bCorresponding to E. coli base numbering of Brosius et al. (1981). c(A,C), A or C in equal probabilities. nucleotides (Ezquerra et al., 1993), 1338 nucleotides (Pelkonen et al., 1994), 420 nucleotides (Ahrne et al., 1995) or 804 nucleotides (Domenech et al., 1994). It should be stressed that, in all cases, probes obtained by using rRNA, DNA reverse transcripts, cloned rRNA genes, or PCR amplification of rRNA genes from a particular bacterium react better with DNA from bacteria phylogenetically closely related than with DNA from remotely related bacteria. Furthermore, probes which do not contain all of the 16+23S rRNA sequences (e.g., obtained after cloning, reverse transcription or PCR) are likely to visualise DNA fragment patterns which are subsets of the full ribotype pattern (i.e., that obtained with 16+23S rRNA). Oligonucleotide probes hybridising to rRNA genes have been used for molecular typing. These probes consist of a single oligonucleotide recognising a variable region of the rrs gene (G6bel et al., 1987; Romaniuk & Trust, 1987) or a conserved region of the gene (Moureau et al., 1989). Jannes et al. (1993) have also described the use of three radioactive oligonucleotide probes (one targeting rrs and the other two targeting rrl) for ribotyping. Patterns obtained with these oligonucleotide probes are generally subsets of the full ribotype pattern. A probe consisting of a mixture of five oligonucleotides (referred to as OligoMix5) that targets two conserved regions at the extremities of rrs and three conserved regions at the extremities and in the middle of rrl has been published (Regnault et al., 1997). These oligonucleotides are listed in Table 5.1. When labelled with DIG (Oligonucleotide Tailing Kit; Boehringer, Lewes, UK), this probe was found to yield full ribotype patterns with similar intensities, irrespective of the phylogenetic position of the bacteria examined (Regnault et al., 1997). For these reasons, OligoMix5 can be recommended as a universal probe for ribotyping. (vii) H y b r i d i s a t i o n o f m e m b r a n e - b o u n d f r a g m e n t s
Transfer membranes were originally hybridised in sealed plastic bags, but the use of a hybridisation oven can be strongly recommended. Up to five membranes can be prehybridised and hybridised in a single tube in a hybridisation oven. The use of a hybridisation oven results in more homogeneous results and lower reagent costs. However, the temperature displayed on the oven should be verified with a refer-
115 ence thermometer, especially when OligoMix5 is used as a probe. Since hybridisation with OligoMix5 can fail when conditions are not optimal, the recommended protocol is given below. Prehybridisation is performed by incubating membranes with hybridisation solution containing 5 x SSC, 1% (w/v) blocking solution, 0.1% (w/v) n-lauryl sarkosine, and 0.02% (w/v) SDS at 65~ for 30 min with shaking. Hybridisation and washes are performed according to the instructions provided with the DIG Oligonucleotide Tailing Kit. Oligonucleotides (100 pmol each) are added to the hybridisation solution. Hybridisation is performed at 52~ for 16 h with shaking. Washes are performed at 47~ for 2 x 15 min with washing solution A (2 x SSC, 0.1% SDS), and then for 2 x 5 min in 0.1 x SSC, 0.1% SDS at 45~
-
-
Alternatively, according to the instructions provided with the DIG Easy Hyb Solution (Boehringer), oligonucleotides (28 pmol each) are added to 14 ml of Easy Hyb Solution. Hybridisation is performed at 37~ for 4 h, with washes at 42~ for 2 x 5 min in 2 x SSC, 0.1% SDS, and then for 2 x 10 min in 0.1 x SSC, 0.1% SDS.
(viii) Visualisation of hybridised fragments Immunoenzymatic detection of DIG-labelled probes is performed with a NonRadioactive DNA Labelling and Detection Kit (Boehringer), used according to the manufacturer's instructions. The steps involved are: (i) membrane saturation with a blocking solution; (ii) reaction with a sheep alkaline phosphatase-conjugated anti-DIG antibody; and (iii) incubation of the membrane in alkaline phosphatase buffer containing 5-bromo-4-chloro-3-indolyl-phosphate and 4-nitro-blue-tetrazolium. Progress of the enzymatic reaction is followed visually (bands should appear purple). AAF-labelled rRNA is detected with mouse monoclonal anti-AAF antibodies, sheep anti-mouse alkaline phosphatase-conjugated antibodies, and the phosphatase substrate 5-bromo-4-chloro-3-indolyl phosphate (Grimont et al., 1989a). Horseradish peroxidase conjugated to rRNA is detected by enhanced chemiluminescence with the ECL Gene Detection System (Amersham Pharmacia Biotech) and photography (Koblavi et al., 1990). B iotinylated probes are detected with streptavidin and biotinylated peroxidase or alkaline phosphatase. Radioactive bands obtained following hybridisation with probes labelled with gamma-32p-ATP are detected by autoradiography. B.
Automated method
Ribotyping is the only molecular typing method which has been fully automated. The RiboPrinter (Qualicon, Wilmington, DE, USA) microbial characterisation system consists of the following principal parts: (i) A special collector/mixer which is used to capture colonies and mix them with
116 buffer before they are transferred to the sample carrier. A customised sample preparation rack allows easy loading and mixing of samples. Each sample is transferred to one well on a sample carrier, which can hold up to eight samples. The eight samples in a carrier create a processing batch. (ii) A heat-treatment station which inactivates the viable cells at 80~ before the sample carrier is placed in the instrument. (iii) An automated microbial characterisation unit which includes the following internal modules: (a) DNA preparation. Bacteria are lysed and the DNA is made available for processing. The DNA is then digested with a chosen restriction enzyme; (b) Separation~transfer. The DNA fragments are size-separated on a pre-cast agarose gel by electrophoresis and then transferred to and immobilised on a nylon membrane; (c) Membrane processing. DNA fragments hybridising to a probe are made to chemiluminesce by exposing the membrane to a series of chemical and enzymatic treatments; (d) Detection. The band pattern image created by the luminescing fragments is captured by a low-light camera. This digitised image is saved and can then be analysed using the software and database capability built into the system or exported as a TIFF (tagged image file format) image; (iv) A PC with internal or external removable hard drive for database storage, backup and retrieval, connected to a modem; (v) A laser printer for producing reports and hard copies of the RiboPrint patterns. Each gel contains 13 lanes, with a fragment size marker in lanes 1, 4, 7, 10 and 13. The commercially available fragment size marker contains six fragments of sizes 48000, 9600, 6500, 3160, 2150 and 1100 bp. However, the 48000 bp fragment migrates as if it were 22000 bp. 5.3
C O M P U T E R ANALYSIS
A.
Need for computer-aided interpretation of typing patterns
Patterns generated by ribotyping and other molecular typing methods have no value in themselves and must be used primarily to compare strains. The question generally asked is whether the isolates being studied derive from a single strain? A taxonomic question is also sometimes raised; i.e., are a set of strains all members of a particular species? Pattern quality (band resolution, number of bands) is very important in order to reduce the level of subjectivity in computer analysis. In order to conclude that two patterns are identical, the number of bands in each pattern must be the same, and each band in one pattern should correspond to an identical band in the other pattern. However, unless the DNA fragments are sequenced, it can never be certain that two bands are identical. In molecular typing, it is usually assumed that two fragments are identical when their sizes are identical. If fragment sizes are deduced from electrophoretic migration, the comparison relies on identical electrophoretic migration. However, electric field variations in
117 agarose gels may cause a given DNA fragment to migrate differently according to lane position in the gel. To overcome this problem, molecular size markers (or ladders) are used in several lanes in a gel (e.g., four marker lanes are included in 20-lane agarose gels). Visual examination of a gel for pattern identity is somewhat subjective and is very difficult when patterns are on different gels. To reach an objective and reproducible pattern comparison, or when many patterns have to be compared, gel image capture (digitisation), lane and band detection, and automatic pattern comparison are needed. As described elsewhere in this book, different types of equipment and software programs are available, although their principles are quite diverse. In order to illustrate the analysis of ribotyping patterns, the following sections of this chapter will describe the use of three software programs: Taxotron (Taxolab, Institut Pasteur, Paris, France); GelCompar/ BioNumerics (Applied Maths, Kortrijk, Belgium), and the microbial characterisation and identification system associated with the RiboPrinter.
B.
Steps in typing pattern analysis
Pattern analysis can be subdivided into five steps and these are described below with particular reference to their importance in ribotyping:
(i)
Image capture
The gel image is captured by either a CCD (charged couple device) camera or a flatbed scanner connected to a computer, as described in Chapter 3. Both should give a TIFF (tagged image file format) image that can be opened by different software programs on various computers (Macintosh, PC, workstations). When only nylon membranes or photographs are to be digitised (i.e., dry material), a flatbed scanner is sufficient. Cameras and flatbed scanners must produce TIFF images with 256 shades of gray and optical (not extrapolated) resolution up to 300 dpi (dots per inch). Image file size should between 200 kilobytes and 1 megabyte. Higher resolution yields image files that can only be handled by a limited number of software programs for an insignificant gain in data precision. A few programs may handle more gray levels. However, the higher precision in gray levels is not justified when reproducibility is considered. Colour images are just a nuisance as far as ribotyping is concerned. The captured image can be printed on a high resolution printer and saved on a hard drive. In an active ribotyping laboratory, the hard drive on which images are saved will quickly become saturated. Files should be efficiently compressed or stored on removable supports (magneto-optical disks, writable CDs).
(ii) Image improvement As described in Chapter 3, software programs may erase background noise, increase contrast, redress distorted lanes or bands, or normalise band migrations. Noise generated by a CCD camera (high frequency noise; i.e., pixel to pixel signal instability) is often smoothed by the image capture program. Baseline (low
118 frequency) variations are due to background staining (mostly in Southern blottype methods). Programs differ in their treatment of noise or background staining. Background subtraction may erase weak bands. Mathematical filters can be applied to pixel values so that a pixel value will be increased or decreased depending on the values of neighbouring pixels (this can be used for smoothing, contrast enhancement, edge detection, or object detection). Straightening of distorted ribotyping lanes is performed by some software programs and raises no major problem. The user needs to select several points on the distorted lane and the program will modify the image so that the points are on a straight line. Normalisation of migration requires the user to select bands in different lanes so that the program modifies the image to bring these bands to the same migration position. Lack of precision (variable sight among users, paralax errors, optical illusions, etc.) in selecting the middle of a band has a strong influence on the reproducibility of the final results. Bands that are deemed to have the same migration should be in marker lanes. It is ethically unacceptable to select bands in test lanes simply because these bands are supposed to have identical migration. Normalisation is an essential step in some programs, but is not recommended in others. Authors of scientific publications should indicate when a ribotyping image has been modified, especially when migration values are affected.
(iii) Data extraction The program used should be able to find the lanes in a ribotyping gel image automatically, or the lane positions should be deduced after a minimum of indications from the user. Two different strategies are then used by programs to extract data from gel images. The first strategy requires image normalisation (distortions are corrected and the migration of homologous fragments is set to be identical) to enable track density comparison, usually by calculating Pearson's correlation coefficient (see Chapter 2). This kind of program (e.g., GelCompar or B ioNumerics) was originally devised for comparing patterns composed of overlapping bands (e.g., total protein electrophoresis patterns), and has been described in detail in Chapter 3 of this book. An alternative strategy requires finding ribotyping bands in each lane and interpolating DNA fragment sizes from migration data. This process requires the presence of several fragment size markers on the gel in order to obtain precise results and is described in more detail below. The first step of band detection can be done automatically or the user can select each band manually. The result is a list of migration values in pixels (a pixel is the smallest picture element that can be handled by a computer). Automatic band detection can be done either by identifying the highest density values (peaks) or by calculating these from a theoretical function. Peak maximum values should be determined from track densities on the basis of mean pixel values across the lane width (for a given migration) rather than mid-lane values. When two bands are overlapping with separate extremities, the density across a mid-lane section may
119 show a wide peak, whereas density calculated as the mean density across the lane width may separate two peaks. Calculating theoretical curves as sums of Gaussian curves or fast Fourier transforms may be a useless exercise when the error associated with band migration is taken into account. Furthermore, the migration of a DNA fragment may yield a skewed peak, and sophisticated calculations may have to invent a second smaller band to explain the skewed appearance. This may not correspond to a biological fact. In all cases, the list of detected ribotyping bands must be modifiable, since some bands may be artifacts or result from incomplete endonuclease cleavage. In addition, some weak or overlapping bands may need to be recognised if the program has overlooked them. Obviously, this degree of subjectivity is the weak point of methods based on band detection. When fragment migration is deduced from bands selected manually, human generated errors can occur (lack of precision, optical effect, paralax error, etc.). Once bands have been detected, the computer will have stored a list of migration values for each lane. Since it is easy to miss a band or to detect a false band, the program should indicate on the gel image which bands were scored as present. Thus, the user can either validate or modify the results. Once the computer knows which are the molecular size marker lanes, the fragment size associated with each fragment, and the migration value of these fragments, a formula relating size to migration can be used. An interpolation function will correspond to each reference lane. Thus, it is important that a single mixture of reference fragments be used in at least two lanes (preferably in every four or five lanes). The number of reference fragments should be at least four (preferably 5-10), with the smallest fragment being smaller than any test (unknown) fragment, and the largest being larger than any test fragment. There is no perfect formula to relate size to migration, but the following formulae are those used most frequently (M, migration value; L, fragment size; a-d, constants):
Spline. In(L) = a + bM + cM 2 + dM 3 (Press et al., 1986) The spline fitting curve goes through each reference point. The curve should be smooth. Errors in band assignment result in distorted curves. Inverse relationship.
L = a + b/(M-c) (Schaffer & Sederoff, 1981) In the general model, parameters are obtained by the least-square method. The fitting curve does not necessarily go through each reference point, since new estimates of reference fragment sizes are interpolated. A fitting formula and a fitting error should be provided by the interpolation program for each marker lane. In the local model (Southern, 1979), only three reference points are used at a time, and the fit is thus exact for these points (the fitting curve goes through all reference points). For each unknown fragment, one function is calculated by use of one standard fragment smaller than the unknown and two standard fragments larger than the unknown. The size of the unknown fragment is then interpolated. Another
120 function is calculated by use of two standard fragments smaller than the unknown and one standard fragment larger than the unknown. The size of the unknown fragment is again interpolated. The final size value is the mean of both interpolated values.
Four-parameter logistic model.
M = [(a-d)/(1-(L/c)b)]+d (Oerter et al., 1990) The parameters are estimated by non-linear least-squares curve fitting using the Marquardt-Levenberg algorithm. The fitting curve does not necessarily go through each reference point, since new estimates of reference fragment sizes are interpolated. A fitting formula and a fitting error should be provided by the interpolation program for each marker lane. For methods re-estimating reference fragments, the user can suspect a problem with a reference lane (band missing, extra band, poor assignment of fragment size) when the fitting error is unusually high. The regression curve should then be examined to see whether one or more reference point is away from the curve. For other methods, the fitting curve should always be examined for distortions. After calculating fitting formulae for each standard lane, the program interpolates the size of each fragment in the gel. Interpolation means that fragments of unknown sizes should be smaller than the largest reference fragment and larger than the smallest reference fragment. Extrapolation (for a fragment with a size beyond the size range of the standard fragments) is possible with the inverse relationship method (general or local fit), although the error associated with such a determination is higher than for interpolation. Extrapolation using spline should never be used since the spline is not defined beyond the reference points (the curve may well loop backwards). Methods which re-estimate reference points (e.g., Schaffer and Sederoff inverse relationship) work well for ribotyping, although the largest errors are associated with values close to the fitting curve extremities. Spline gives more accurate interpolation results with fragments close to the fitting curve extremities. However, it should never be used to extrapolate fragment sizes.
(iv) Pattern comparison There are two fundamentally different approaches to pattern comparison. One compares track densities, while the other scores homologous DNA fragments.
Comparing track (lane) densities (GelCompar/BioNumerics). As described in Chapter 3, migrations should be normalised for comparing track densities (i.e., the image should be modified so that homologous standard fragments in the different marker lanes are positioned at the same vertical pixel position, and each track starts and ends at fixed vertical pixel positions). Normalisation is a critical step which requires great care. Normalised tracks of the same length are scanned and represented as sets of pixel values (track vectors). These sets of values are used for calculating correlations between tracks by Pearson's coefficient (see Chapter 3).
121 However, when discrete ribotyping patterns are obtained, there is little value in using density values between peaks. When the background staining is erased, only densities on peaks (including peak slopes) are used. However, peak intensity is not an obvious taxonomic feature. When two bands in an experiment are very close, the space between them retains some stain. The same two bands in another experiment may be better separated, with the track density between bands falling to baseline. Some apparent taxonomic distance may result from such artifacts. Irregular staining of fragments, irregular background staining, and the effect of the choice of probe in ribotyping experiments may make things worse. As a result, an analysis of track densities never shows two DNA samples as identical (similarity=100 or distance=0), even when these derive from a single organism. Furthermore, in some studies in which DNA fragments are in approximately similar numbers, band intensity variations may cause a larger distance among qualitatively identical patterns than the presence or absence of a weak band. In these latter cases, scoring homologous DNA fragments may lead to different epidemiological interpretations.
Scoring homologous DNA fragments (Taxotron). The major issue in scoring homologous DNA fragments is in determining whether two bands in different lanes are indeed homologous; i.e., correspond to fragments of equal size. Since the same fragment run in different lanes is unlikely to migrate at exactly the same position, some degree of tolerance is needed. Some programs consider two bands to be identical when their respective migration distances are within preset limits (e.g., 1% of migration expressed in pixels). Other programs interpolate fragment sizes and consider two bands to be identical when their sizes are within preset limits (e.g., 5% of size). Such tolerance (percentage error) may be fixed or may vary with fragment size (Grimont, 2000). In some cases, a threshold may be set under which the error follows a given rule, with another rule used for size values above the threshold. For example, in manual ribotyping, the error in size determination is usually low between 1 and 9 kb, and much higher above 10 kb. It is important to understand that the tolerance can be set properly only after reproducibility studies have provided knowledge of the error in fragment size determination over the range of sizes that are associated with a particular typing method. It should also be understood that fragment A could be considered as identical to fragments B and C (because the size differences between A and B, and between A and C are within the tolerance limit), while fragments B and C could be considered as different. To compare two ribotyping patterns, a distance is calculated which is the number of non-matching fragments divided by the total number of fragments in both patterns. This is in fact the complement of the Dice index (Sneath& Sokal, 1973). Other similarity or distance coefficients have no advantage over the Dice coefficient. Whatever the chosen method, pattern comparison yields a triangular distance (or similarity) matrix from which a numerical analysis can be performed and a
122 Hin dlll
Distance 0.8 ,
0.6 ,
0.4 ,
0.2
I
~ 1
I
,
0 i
20
L. pneumophila 14 L. pneumophila 3 L. pneumophila 10 L. pneumophila 12 L. pneumophila 2 L. pneumophila 1 L. pneumophila 6 L. pneumophila 13 L. pneumophila 9 L. pneumophila 5 Dallas L. pneumophila 4 L. pneumophila 8 L pneumophila 7 L. pneumophila 11 L. pneumophila 5 U8W
I
10 III
I
I
It It I t II It It I I II I I II I I II I I II I I I I I I II I II I I II I I II III I II
Ecd~V Kbp
5 I
I
I
20 I I
I I I
I
I
I I
10 I~
I I I II I I I I I I I I I I I I I I II I II II I I II I I I I I I I I
I
I
5 I
I
I
Kbp
I I
I
I
Fig. 5.1. Dendrogram obtained by comparison of HindlII and EcoRV ribotypes from strains of Legionella pneumophila. The dendrogram was drawn by Dendrograf (Taxotron package). A fragment length tolerance of 5%, and UPGMA were used. Distance is the complement of Dice index. dendrogram (tree) drawn. The objects to classify are called Operational Taxonomic Units (OTUs), which in this case are ribotyping pattems or strains. Several different clustering methods are available (see Chapters 2 and 3). Some programs (e.g., Taxotron) allow consensus trees to be built by mixing different biological approaches (e.g., RFLP with different enzymes or different methods, ribotyping, AFLP, RAPD, pulsed-field gel electrophoresis, or phenotypic tests). For this, distance matrices need to be obtained for each method or enzyme, with strains in the same order. Then an average distance matrix is calculated and a tree is obtained after choosing the clustering method (Fig. 5.1).
(v)
Automated identification
The same two approaches which were discussed for pattern comparison are also encountered in automated identification.
Identification based on normalised track (lane) densities (GelCompar/ BioNumerics). The image of each normalised track is stored in a database and associated with a species and strain name (see Chapter 3). The identification process consists of searching for the reference track giving the highest Pearson's correlation coefficient value when compared to the test track. As a rule, no two tracks can show 100% correlation. To ease interpretation, it is useful to test repeats of the same DNA sample and see what degree of variation is associated with the correlation coefficient values.
Identification based on the sizes of DNA fragments (Taxotron).
A database consists of a set of patterns with pattern names and the sizes of all fragments. The iden-
123 tification process takes into account the tolerance (accepted error) which has been chosen (e.g., 5% of size). The program compares the test pattern with each pattern in the database. DNA fragments are considered homologous when size differences are lower than the tolerance. When patterns A and B have the same number of fragments, and all fragments of test pattern A match homologous fragments in reference pattern B, the distance between A and B is zero. The program sorts out all reference patterns which may have a distance of zero with the test pattern. Then, the largest variation between homologous fragments is given. This is an indication of the reliability of identification. A maximum variation below 2.5% indicates a reliable identification. Between 2.5 and 5% variation, the identification is fair. Above 5% variation, the identification is doubtful.
C.
Data extraction and identification using the RiboPrinter
The RiboPrinter microbial characterisation system extracts patterns. These patterns can be used to characterise the samples and to identify the genus and species of the samples. Specific identification patterns can also be reviewed to provide identification at a subspecies level. Subtyping is done using the RiboGroup reference patterns, since these are not limited by specific genus or species labels. All the patterns obtained by each RiboPrinter microbial characterisation system are stored in the system's database. Every sample will be assigned to one, and only one, RiboGroup in the specified recognition library. Although a sample may belong to only one RiboGroup within its own library, it may belong to other RiboGroups in several different libraries. This allows maximum flexibility for data analysis. After a batch image has been generated, the system extracts the data and processes it into a set of eight patterns, one for each sample. The system automatically compares this data to all the existing patterns cut with the same enzyme and stored in the same database library. If the system finds a match so close that it cannot reliably distinguish between the two patterns, it groups the two patterns together as a RiboGroup. The decision is based on a pre-set similarity threshold. The system will only characterise patterns run with the same protocol. RiboGroups are dynamic. This means that each time a new sample is processed, the RiboGroups are recalculated. The membership of a specific sample to a specific RiboGroup can change over time. This occurs because the statistical relationships of the individual samples can change as new samples with different degrees of similarity are added. As new patterns are generated, the relationship among all patterns can change. Once the RiboPrinter has characterised a pattern, it will try to identify it by name. The first step is to match the pattern to the information in the Identification Library. This is a special collection of patterns supplied with the system. This collection was created by analysing bacteria and defining a set of RiboPrint patterns that clearly describe a given bacterium below the species level. If the system finds a match with a high similarity, that is, one above the 'Identification Threshold',
124 it reports the name of the genus, species and, if available, the specific subgroup. The Identification Threshold is a decimal number indicating the lowest acceptable level of similarity between samples (e.g., 0.86). If a pattern matches more than one Identification Pattern, the system can display a list of possible identifications and similarity values. 5.4
SOURCES OF VARIATION IN RIBOTYPE ANALYSIS
A.
Sources of variation in restriction pattern determination
In order to validate a new typing system, it is essential to acquire some knowledge about the errors associated with the particular typing system selected. DNA from one or a few strains should be run at least in duplicate on the same gel, and gels should be at least duplicated with the same design (strain order). An analysis of variance (ANOVA) can be done using different parameters (repeat, gel, laboratory, agarose concentration, fragment size marker, interpolation formula). The analysis should evaluate 'repeatability' (replicate samples on the same gel), reproducibility (replicate samples in different gels, run in different conditions or in different laboratories), and accuracy (samples containing DNA fragments of exactly known sizes; i.e., with known sequences).
B.
Qualitative variations
Extra bands may be observed after insufficient cleavage of DNA (poor DNA quality or insufficient endonuclease concentration) or overstaining (excessive amount of cleaved DNA). It is best to repeat the endonuclease digestion for DNA samples which show several restriction fragments much larger than those observed with other samples belonging to the same species. Missing bands may be due to weak staining (low DNA amount, transfer, or probe problem). Scanning the gel or membrane with optimised luminosity and contrast may help to visualise faint bands. A mistake in strain assignment, DNA labelling or endonuclease used should be suspected when a known strain gives an unexpected pattern.
C.
Quantitative (size) variations
For a given DNA fragment, large variations in DNA concentration result in migration variations, and thus apparent size variations. Variations in electrophoretic conditions (type and concentration of buffer, agarose concentration and quality, size and design of the electrophoretic tank, voltage and time of electrophoresis, temperature) also result in apparent fragment size variations. These conditions should be standardised when data from different laboratories are to be compared. Different formulae relating fragment size to migration give different results.
125 Reproducibility is generally good with the Schaffer and Sederoff inverse relationship; however, accuracy is better with spline. Comparison of data requires the choice of a single formula. Different fragment size markers also give different results. The best markers are those with bands in sufficient number (more than four), which are well-separated, proportionally distributed across the gel length after electrophoresis, and unambiguously identifiable (no overlapping bands at the top). Comparison of data requires the choice of a single size marker. The position of test lanes (with unknown fragments) relative to reference lanes (size markers) influences the results. It is totally inappropriate to use a single marker lane. Two marker lanes (left and fight sides of a gel) allow the gel image to be positioned properly for migration measurements. However, band migration is not uniform across the gel width, and it is much better to have no less than four marker lanes in 20-lane gels. The error in fragment size determination is reduced when, for each test lane, the interpolation takes into account the nearest marker lane on the left and the nearest marker lane on the right, giving a weight to each marker lane (in fact, each interpolation formula) in relation to the position of the test lane relative to each marker lane. 5.5
EXAMPLES OF APPLICATIONS
The use of ribotyping in epidemiological surveillance of nosocomial outbreaks has been reviewed previously (Bingen et al., 1994). This previous review described specific applications of ribotyping to Enterobacter cloacae, Ent. sakazakii, Providencia stuartii, Klebsiella pneumoniae, Acinetobacter baumannii, Stenotrophomonas maltophilia, Pseudomonas aeruginosa, Burkholderia cepacia, Legionella pneumophila, Branhamella catarrhalis, Rhodococcus spp., Tsukamurella paurometabolum, Enterococcus spp., Streptococcus pyogenes, Staphylococcus aureus and coagulase-negative Staphylococcus spp. Accordingly, this chapter will simply mention just a few examples that illustrate typical uses of ribotyping. Since the first work on ribotyping with various bacteria (Grimont & Grimont, 1986), it appeared that different genomic species showed different ribotypes (often when a single endonuclease was used, and always when two endonucleases were used). However, the diversity of ribotypes within a species depends on the species and endonuclease chosen. When such diversity is insignificant, ribotyping is an identification tool. When the intra-species diversity is high, ribotyping is a typing tool (high intra-species diversity makes species identification difficult). In a number of cases, ribotyping is both an identification and a typing tool, which means that its power as a typing tool is likely to be reduced compared to other typing methods.
126 A.
Ribotyping as an identification tool
The genus Brucella is extremely homogeneous and contains a single genomic species (Verger et al., 1985). Ribotyping cannot differentiate the different nomenspecies, irrespective of the endonuclease chosen (Grimont et al., 1992). Recently, strains infecting marine mammals were identified as atypical Brucella and showed ribotypes extremely close to that of B. melitensis, the single genomic species of Brucella (Verger et al., 2000). Ribotyping is not a suitable typing tool for L. pneumophila (Schoonmaker et al., 1992). However, strains belonging to 28 species of Legionella could be identified by ribotyping using HindlII and EcoRV (Grimont et al., 1989b). This work was extended to 123 strains representing 44 species. Specific profiles were shown by each species, including the bluish-white and the red auto-fluorescent species which are difficult to identify by other means. In addition, 45 clinical or environmental isolates could be identified to 14 different species (Salloum et al., 2001). For typing L. pneumophila, pulsed-field gel electrophoresis is more appropriate (Schoonmaker et al., 1992). Staphylococcus spp. can be identified by ribotyping using HindlII and EcoRI (De Buyser et al., 1989). A total of 110 strains belonging to 12 species yielded 44 ribotypes for each enzyme. Although distinct patterns were observed within some species, a core of common bands could be discerned within each species or subspecies. New species or subspecies were detected by their ribotypes, such as S. pasteuri (Chesneau et al., 1993), S. vitulus (Webster et al., 1994), S. sciuri subsp, sciuri, S. sciuri subsp, carnaticus and S. sciuri subsp, rodentium (Kloos et al., 1997). Strains which were difficult to identify by biochemical tests were identified as S. xylosus, S. equorum (Meugnier et al., 1996) or S. caprae (Vandenesch et al., 1995) by ribotyping. Within the framework of a European project entitled 'High resolution automated microbial identification', the ability of ribotyping to uncover the taxonomic diversity of a collection of bacteria was tested with 226 strains of Pseudomonas (Brosch et al., 1996). The strains, belonging to more than 40 species, displayed 169 unique rRNA gene restriction patterns with SmaI, and 159 unique rRNA gene restriction patterns with HinclI. A combined analysis of both sets of restriction data yielded 79 ribogroups or isolated ribotypes. Most (93%) ribogroups were homogeneous with respect to nomenspecies. Some nomenspecies were split into several ribogroups (e.g., P. putida, P. fluorescens, P. marginalis and P. pseudoalcaligenes). A database was established for ribotypes of Pseudomonas strains using Taxotron in order to establish ribotyping as an alternative procedure for identification of Pseudomonas strains at the molecular level. The Proteus-Providencia group contains three genera (Proteus, Providencia and Morganella) and 10 species. Ribotyping with EcoRV and HinclI showed Prot. mirabilis, Prot. penneri, M. morganii and Prov. heimbachae to contain a single ribogroup (i.e., a group of highly related ribotypes) (Pignato et al., 1999). In contrast, distinct ribogroups were detected within Prov. alcalifaciens (two ribo-
127 groups), Prov. rettgeri (five ribogroups), Prov. stuartii (two ribogroups), Prov. rustigianii (two ribogroups), and Prot. vulgaris (two ribogroups). Genetic diversity had already been observed by DNA-DNA hybridisation in some of these species. The four species of dairy propionibacteria, Propionibacterium freudenreichii, Prop. jensenii, Prop. thoenii and Prop. acidipropionici, give different BamHI and ClaI ribotypes with species-specific fragments (de Carvalho et al., 1994). Moreover, ribotyping allowed the differentiation of Prop. freudenreichii subsp, freudenreichii from Prop. freudenreichii subsp, shermanii. The patterns of dairy propionibacteria were different from those of closely related bacteria and other bacteria used in the dairy industry. DNA relatedness within the genus Methylophaga can be differentiated by ribotyping using EcoRI and HindlII (Janvier et al., 1999). Furthermore, the same study showed good correlation between manual ribotyping and the automated method when Taxotron software was used.
B.
Ribotyping as a typing tool for epidemiology
Ribotyping has been applied to a variety of epidemiological problems (Bingen et al., 1994). However, most studies have aimed to answer limited epidemiological questions. Only a few studies have proposed a typing system for use by other workers. Some examples in which ribotyping is competitive compared to newer methods are reviewed briefly below. In October 1984, in the town of Promissao, State of Sao Paulo, Brazil, 10 children died after a purpura fulminans syndrome without any sign of meningitis. These children had suffered from a purulent conjunctivitis 3-15 days before developing high fever, abdominal pain with vomiting and purpura. This syndrome, called Brazilian purpuric fever (BPF), was subsequently associated with the presence of Haemophilus aegyptius in the blood (Brazilian Purpuric Fever Study Group, 1987). A total of 92 isolates of H. aegyptius (now renamed H. influenzae biogroup Aegyptius), associated with conjunctivitis or BPF in the State of Sao Paulo, were ribotyped with EcoRI (Irino et al., 1988). All strains were classified into 15 ribotypes. All isolates from blood corresponded to ribotype 3, whereas isolates from conjunctivitis were distributed into all 15 ribotypes. This work demonstrated the efficiency of ribotyping as an epidemiological tool. V. cholerae is subdivided into two biotypes (Cholerae and E1Tor) and three serotypes (Inaba, Ogawa and Hikojima). This subdivision is insufficient for international surveillance purposes. Ribotyping using BglI disclosed 17 patterns among 89 V. cholerae O1 strains (Koblavi et al., 1990). There was no correlation with serotypes. However, four ribotypes were associated exclusively with the Cholerae biotype, whereas the other 13 ribotypes were associated exclusively with the E1Tor biotype. Ribotype B 1 (biotype Cholerae) was identified in isolates from the Bangladesh area in 1949 and in the 1980s, suggesting the persistence of strains from earlier pandemics. Ribotype B5a (biotype E1 Tor) represented most strains from the seventh pandemic (which started officially in 1961). It is noteworthy that a strain
128 from the 1937 epidemic in the Celebes islands (first epidemic due to biotype E1 Tor) uniquely comprised ribotype B 17. This ribotyping system was also used by Popovic et al. (1993) who showed that ribotype B9/B 10 was limited to Australia. Ribotype B27 was limited to Senegal and Guinea-Bissau, whereas ribotype B21a was found in the Middle East (Pourshafie et al., 2000). Although ribotype B5a dominated in Colombia and South America, a few other ribotypes were introduced at different times (Tamayo et al., 1997). RFLP analysis, using the toxin gene cassette as a probe, allowed ribotypes to be subdivided into toxinogenotypes (Koblavi, 1996; Damian et al., 1998). The current ribotyping system, which presently includes 57 ribotypes, has been shown to be essential for epidemiological studies (Faruque et al., 1993; Desmarchelier et al., 1995). Ribotyping is also a good tool for epidemiological surveillance of plague. Ribotyping Yersinia pestis with EcoRI and EcoRV allowed 16 ribotypes to be described which could be correlated with biotypes, and which allowed hypotheses concerning the history of plague to be verified (Guiyoule et al., 1994). In the course of this study, ribotyping proved to be more stable after five in-vivo passages than pulsed-field gel electrophoresis. Ribotyping is the major epidemiological typing method for diphtheria, as shown by studies on the large epidemics involving the former Soviet Union (De Zoysa et al., 1995; Popovic et al., 1996). An international ribotype database for Corynebacterium diphtheriae is being built, based on digestion with BstEII and PvuII (Popovic et al., 2000). Ribotyping of Leptospira with EcoRI allowed 118 patterns to be correlated with 99 distinct ribogroups (Perolat et al., 1990). Common bands were observed in strains belonging to the same genomic species. Finally, a collection of 191 strains of E. coli, comprising 164 serovar reference strains and 28 clinical strains, was ribotyped with MluI, ClaI or HindIII (Machado et al., 1998). A wide diversity of ribotypes was observed with endonucleases MluI (104 patterns), ClaI (90 patterns) and HindIII (98 patterns). When MluI was used, 85% of patterns (11-15 fragments) shared five fragments of 17.09, 3.94, 3.06, 2.23 and 1.76 kb in size. When these fragments were used as internal standards, the percentage error in fragment length determination was half that obtained with an external standard. Two fragment size databases of MluI and ClaI ribotypes were built. Automatic identification was obtained after setting a 5% fragment size variation tolerance (error). MluI ribotyping is recommended as a primary epidemiological marker. Strains with similar MluI ribotypes should then be examined by ClaI ribotyping. Ribotyping with HindIII can only be the third choice, since the patterns were often uncertain due to the frequent occurrence of faint bands. Most of the studied serovars gave discrete patterns, and these data provide the basis for a molecular typing system for E. coli which could possibly substitute for serotyping when the latter is not available. This work was extended to include Shigella spp., which belong in the E. coli genomic species. A total of 51 distinct ribotypes was obtained (Coimbra et al., 2001).
129
5.6
FUTURE PERSPECTIVES AND CONCLUSIONS
Ribotyping was the first universal typing method for bacteria. However, simpler or faster molecular methods have been proposed subsequently, which have resulted in a decreased interest in ribotyping. Why then continue using it ? One reason is that the taxonomic support provided by ribotyping is unique. According to previous observations, it can be assumed that, in general, strains belonging to different DNA hybridisation groups (genomospecies) show different ribotypes. For a number of species, endonucleases can be found that subdivide species into ribotypes. For typing purposes, pulsed-field gel electrophoresis or other methods will certainly be preferred when only a few local isolates have to be compared (Tenover et al., 1995). However, because ribotypes are readily amenable to computer analysis, it is possible to establish ribotype databases. Automatic identification of ribotypes can then be achieved when the proper database is available and when the error associated with fragment length determination is taken into account. The higher stability of ribotyping makes it suited for national or international surveillance of some epidemic diseases (e.g., cholera, diphtheria, plague or typhoid), even over long time periods. High reproducibility allows ribotype databases to be built. The feasibility of ribotype database-sharing among laboratories still needs to be demonstrated for the manual method. As ribotyping is rather laborious for routine use in the clinical laboratory, whether or not used in conjunction with traditional methods of epidemiological typing, a completely automated ribotyping method was needed. This is now available with the RiboPrinter technology, which is giving a new impulse to ribotyping. It would be nice to be able to use the same databases for manual and automated ribotyping, provided the data can be converted and made compatible. Such an attempt is in progress in our laboratory.
REFERENCES Ahrne, S., Stenstr6m, I.M., Jensen, N.E., Pettersson, B., Uhlen, M. & Molin, G. (1995). Classification of Erysipelothrix strains on the basis of restriction fragment length polymorphisms. International Journal of Systematic Bacteriology 45, 382-385. Altwegg, M. & Mayer, L.W. (1989). Bacterial molecular epidemiolology based on a non-radioactive probe complementary to ribosomal RNA. Research in Microbiology 140, 325-333. Bercovier, H., Kafri, O. & Sala, S. (1986). Mycobacteria possess a surprisingly small number of ribosomal RNA genes in relation to the size of their genome. Biochemical and Biophysical Research Commmunications 136, 1136-1141. Bingen, E.H., Denamur, E. & Elion, J. (1994). Use of ribotyping in epidemiological surveillance of nosocomial outbreaks. Clinical Microbiology Reviews 7, 311-327. Blumberg, H.M., Kiehlbauch, J.A. & Wachsmuth, I.K. (1991). Molecular epidemiology of Yersinia enterocolitica 0:3 infections - use of chromosomal DNA restriction fragment length polymorphisms of rRNA genes. Journal of Clinical Microbiology 29, 2368-2374. Bove, J.M. & Saillard, C. (1979). Cell biology of spiroplasma. In The mycoplasma, vol. 3, Whitcomb, R.F & Tully, J.G., eds, pp. 83-153. Academic Press, New York.
130 Brazilian Purpuric Fever Study Group (1987). Haemophilus aegyptius bacteremia in Brazilian Purpuric Fever. Lancet ii, 761-763. Brenner, D.J., McWhorter, A.C., Knutson, J.K.L. & Steigerwalt, AG. (1982). Escherichia vulneris: a new species of Enterobacteriaceae associated with human wounds. Journal of Clinical Microbiology 15, 1133-1140. Brosch, R., Lefevre, M., Grimont, E & Grimont, EA.D. (1996). Taxonomic diversity of Pseudomonas revealed by computer-interpretation of ribotyping data. Systematic andApplied Microbiology 19, 541-555. Brosius, J., Ullrich, A., Raker, M.A., Gray, A., Dull, T.J., Gutell, R.R. & Noller, H.E (1981). Construction and fine mapping of recombinant plasmids containing the rrnB ribosomal RNA operon of E. coli. Plasmid 6, 112-118. Chesneau, O., Morvan, A., Grimont, E, Labiscinski, H. & E1 Solh, N. (1993). Staphylococcus pasteuri sp. nov., isolated from human, animal, and food specimens. International Journal of Systematic Bacteriology 43, 237-244. Coimbra, R.S., Nicastro, G., Grimont, EA.D. & Grimont, E (2001). Computer identification of Shigella species by rRNA gene restriction patterns. Research in Microbiology 152, 47-55. Damian, M., Koblavi, S., Carle, I., Nacescu, N., Grimont, E, Ciufecu, C. & Grimont, EA.D. (1998). Molecular characterization of Vibrio cholerae strains isolated in Romania. Research in Microbiology 149, 745-755. De Buyser, M.L., Morvan, A., Grimont, E & E1 Solh, N. (1989). Characterization of Staphylococcus species by ribosomal RNA gene restriction patterns. Journal of General Microbiology 135, 989-999. de Carvalho, A.E, Gautier, M. & Grimont, E (1994). Identification of dairy Propionibacterium species by rRNA gene restriction patterns. Research in Microbiology 145, 667-676. Demezas, D. & Bell, J. (1995). Evaluation of low molecular weight RNA profiles and ribotyping to differentiate some Bacillus species. Systematic and Applied Microbiology 18, 582-589. Desmarchelier, P.M., Wong, EY.K. & Mallard, K. (1995). An epidemiological study of Vibrio cholerae O 1 in the Australian environment based on rRNA gene polymorphisms. Epidemiology and Infection 115, 435-446. De Zoysa, A., Efstratiou, A., George, R.C., Jahkola, M., Vuopio-Varkila, J., Deshevoi, S., Tseneva, G. & Rikushin, Y. (1995). Molecular epidemiology of Corynebacterium diphtheriae from northwestern Russia and surrounding countries studied by using ribotyping and pulse-field gel electrophoresis. Journal of Clinical Microbiology 33, 1080-1083. Domenech, P., Menendez, M.C. & Garcia, M.J. (1994). Restriction fragment length polymorphisms of 16S rRNA genes in the differentiation of fast-growing mycobacterial species. FEMS Microbiology Letters 116, 19-24. Ezquerra, E., Burnens, A., Jones, C. & Stanley, J. (1993). Genotypic typing and phylogenetic analysis of Salmonella paratyphi B and S. java with IS200. Journal of General Microbiology 139, 2409-2414. Faruque, S.M., Alim, A.R.M.A., Rahman, M.M., Siddique, A.K., Sack, R.B. & Albert, M.J. (1993). Clonal relationships among classical Vibrio cholerae O 1 strains isolated between 1961 and 1992 in Bangladesh. Journal of Clinical Microbiology 31, 2513-2516. Fox, G.E., Stackebrandt, E., Hespell, R.B., Gibson, J., Maniloff, J., Dyer, T.A., Wolfe, R.S., Balch, W.E., Tanner, R.S., Magrum, L.J., Zablen, L.B., Blakemore, R., Gupta, R., Bonen, L., Lewis, B.J., Stahl, D.A., Luehrsen, K.R., Chen, K.N. & Woese, C.R. (1980). The phylogeny of prokaryotes. Science 209, 457-463. G6bel, U.B., Geiser, A. & Stranbridge, E.J. (1987). Oligonucleotide probes complementary to variable regions of ribosomal RNA discriminate between Mycoplasma species. Journal of General Microbiology 133, 1969-1974. Gottlieb, P. & Rudner, R. (1985). Restriction site polymorphism of ribosomal ribonucleic acid gene sets in members of the genus Bacillus. International Journal of Systematic Bacteriology 35, 244-252.
131 Graves, L.M., Swaminathan, B., Reeves, M.W. & Wenger, J. (1991). Ribosomal DNA fingerprinting of Listeria monocytogenes using a digoxigenin-labeled DNA probe. European Journal of Epidemiology 7, 77-82. Grimont, P.A.D. (2000). Taxotron users manual. Institut Pasteur, Paris. Grimont, E & Grimont, P.A.D. (1986). Ribosomal ribonucleic acid gene restriction patterns as potential taxonomic tools. Annales de l'Institut Pasteur/Microbiology 137B, 165-175. Grimont, E & Grimont, P.A.D. (1995). Determination of rDNA gene restriction patterns. In Methods in molecular biology, vol. 46, diagnostic bacteriology protocols, Howard, J. & Whitcombe, D.M., eds, pp. 181-200. Humana Press, Toyota, NJ. Grimont, P.A.D., Grimont, E, Desplaces, N. & Tchen, P. (1985). DNA probe specific for Legionella pneumophila. Journal of Clinical Microbiology 21, 431-437. Grimont, E, Chevrier, D., Grimont, P.A.D., Lefevre, M. & Guesdon, J.-L. (1989a). Acetylaminofluorene-labelled ribosomal RNA for use in molecular epidemiology and taxonomy. Research in Microbiology 140, 447-454. Grimont, E, Lefevre, M., Ageron, E. & Grimont, P.A.D. (1989b). rRNA gene restriction patterns of Legionella species: a molecular identification system. Research in Microbiology 140, 615-626. Grimont, E, Verger, J.-M., Cornelis, P., Limet, J., Lefevre, M., Grayon, M., Regnault, B., Van Broeck, J. & Grimont, P.A.D. (1992). Molecular typing of Brucella with cloned DNA probes. Research in Microbiology 143, 55-65. Guiyoule, A., Grimont, E, Iteman, I., Grimont, P.A.D., Lefevre, M. & Carniel, E. (1994). Plague pandemics investigated by ribotyping of Yersinia pestis strains. Journal of Clinical Microbiology 32, 634-641. Iglesias, A., Ceglowski, P. & Trautner, T.A. (1983). Plasmid transformation in Bacillus subtilis. Effects of the insertion of Bacillus subtilis rRNA genes into plasmids. Molecular and General Genetics 192, 149-155. Irino, K., Grimont, E, Casin, I., Grimont, P.A.D. & the Brazilian Purpuric Fever Study Group (1988). rRNA gene restriction patterns of Haemophilus influenzae biogroup aegyptius strains associated with Brazilian Purpuric Fever. Journal of Clinical Microbiology 26, 1535-1538. Janvier, M., Grimont, P.A.D. & Grimont, F. (1999). Characterization of Methylophaga species by rRNA gene restriction patterns (ribotyping). Systematic and Applied Microbiology 22, 372-377. Jannes, G., Vaneechoutte, M., Lannoo, M., Gillis, M., Vancanneyt, M., Vandamme, P., Verschraegen, G., Van Heuverswyn, H. & Rossau, R. (1993). Polyphasic taxonomy leading to the proposal of Moraxella canis sp. nov. for Moraxella catarrhalis-like strains. International Journal of Systematic Bacteriology 43, 438-449. Kan, Y.W. & Dozy, A. (1978). Polymorphism of DNA sequence adjacent to human b-globin structural gene: relationship to sickle mutation. Proceedings of the National Academy of Sciences of the United States of America 75, 5631-5635. Kloos, W.E., Ballard, D.N., Webster, J.A., Hubner, R.J., Tomasz, A., Couto, I., Sloan, G.L., Dehart, H.P., Fiedler, E, Schubert, K., de Lencastre, H., Sanches, I.S., Heath, H.E., Leblanc, P.A. & Ljungh, A. (1997). Ribotype delineation and description of Staphylococcus sciuri subspecies and their potential as reservoirs of methicillin resistance and staphylolytic enzyme genes. International Journal of Systematic Bacteriology 47, 313-323. Koblavi, S. (1996). Identification et typage moleculaire des Vibrionaceae. PhD Thesis, University Pads VII, France. Koblavi, S., Grimont, E & Grimont, P.A.D. (1990). Clonal diversity of Vibrio cholerae O 1 evidenced by rRNA gene restriction patterns. Research in Microbiology 141, 645-657. Machado, J., Grimont, E & Grimont, P.A.D. (1998). Computer identification of Escherichia coli rRNA gene restriction patterns. Research in Microbiology 149, 119-135. Meugnier, H., Bes, M., Vernozy-Rozand, C., Mazuy, C., Brun, Y., Freney, J. & Fleurette, J. (1996). Identification and ribotyping of Staphylococcus xylosus and Staphylococcus equorum strains isolated from goat milk and cheese. International Journal of Food Microbiology 31, 325-331.
132 Meyers, J.A., Sanchez, D., Elwell, L.E & Falkow, S. (1976). Simple agarose gel electrophoresis method for the identification and characterization of plasmid deoxyribonucleic acid. Journal of Bacteriology 127, 1529-1537. Moureau, E, Derclaye, I., Gregoire, D., Janssen, M. & Cornelis, G.R. (1989). Campylobacter species identification based on polymorphism of DNA encoding rRNA. Journal of Clinical Microbiology 27, 1514-1517. Oerter, K.E., Munson, EJ., McBride, W.O. & Rodbard, D. (1990). Computerized estimation of size of nucleic acid fragments using the four-parameter logistic model. Analytical Biochemistry 189, 235-243. Ostapchuk, E, Anilionis, A. & Riley, M. (1980). Conserved genes in enteric bacteria are not identical. Molecular and General Genetics 180, 475-477. Pelkonen, S., Romppanen, E.L., Siitonen, A. & Pelkonen, J. (1994). Differentiation of Salmonella serovar infantis isolates from human and animal sources by fingerprinting IS200 and 16S rrn loci. Journal of Clinical Microbiology 32, 2128-2133. Perolat, E, Grimont, E, Regnault, B., Grimont, EA.D., Fourni, E., Thevenet, H. & Baranton, G. (1990). rRNA gene restriction patterns of Leptospira: a molecular typing system. Research in Microbiology 141, 159-171. Picard-Pasquier, N., Ouagued, M., Picard, B., Goullet, E & Krishnamoorthy, R. (1989). A simple, sensitive method of analysing bacterial ribosomal DNA polymorphism. Electrophoresis 10, 186-189. Pignato, S., Giammanco, G.M., Grimont, E, Grimont, EA.D. & Giammanco, G. (1999). Molecular characterization of genera Proteus, Morganella and Providencia by ribotyping. Journal of Clinical Microbiology 37, 2840-2847. Pitcher, D.G., Owen, R.J., Dyal, E & Beck, A. (1987). Synthesis of a biotinylated DNA probe to detect ribosomal RNA cistrons in Providencia stuartii. FEMS Microbiology Letters 48, 283-287. Popovic, T., Bopp, C., Olsvik, O. & Wachsmuth, K. (1993). Epidemic application of a standardized ribotype scheme for Vibrio cholerae 01. Journal of Clinical Microbiology 31, 2474-2482. Popovic, T., Kombarova, S.Y., Reeves, M.W., Nakao, H., Mazurova, I.K., Wharton, M., Wachsmuth, I.K. & Wenger, J.D. (1996). Molecular epidemiology of diphtheria in Russia, 1985-1994. Journal of Infectious Diseases 174, 1064-1072. Popovic, T., Mazurova, I.K., Efstratiou, A., Vuopio-Varkila, J., Reeves, M.W., De Zoyza, A., Glushkevich, T. & Grimont, E (2000). Molecular epidemiology of diphtheria. Journal of Infectious Diseases 181, S 168-S 177. Pourshafie, M.R., Grimont, E & Grimont, EA.D. (2000). Molecular epidemiological study of Vibrio cholerae isolates from infected patients in Teheran, Iran. Journal of Medical Microbiology 49, 1085-1090. Press, W.H., Flannery, B.E, Teukolsky, S.A. & Vetterling, W.T. (1986). Numerical recipes. Cambridge University Press, Cambridge. Regnault, B., Grimont, E & Grimont, EA.D. (1997). Universal ribotyping method using a chemically-labelled oligonucleotide probe mixture. Research in Microbiology 148, 649-659. Romaniuk, EJ. & Trust, T.J. (1987). Identification of Campylobacter species by Southern hybridisation of genomic DNA using an oligonucleotide probe for 16S rRNA genes. FEMS Microbiology Letters 43, 331-335. Salloum, G., Meugnier, H., Reyrolle, M., Grimont, E, Grimont, EA.D., Etienne, J. & Freney, J. (2001). Identification of Legionella species by ribotyping and other molecular methods. Research in Microbiology, in press. Saunders, N., Harrisson, T.G., Kachwalla, N. & Taylor, A. (1988). Identification of species of the genus Legionella using a cloned rRNA gene from Legionella pneumophila. Journal of General Microbiology 134, 2363-2374. Schaffer, H.E. & Sederoff, R.R. (1981). Improved estimation of DNA fragment length from agarose gels. Analytical Biochemistry 115, 113-122.
133 Schoonmaker, D., Heimberger, T. & Birkhead, G. (1992). Comparison of ribotyping and restriction enzyme analysis using pulsed-field gel electrophoresis of Legionella pneumophila isolates obtained during a nosocomial outbreak. Journal of Clinical Microbiology 30, 1491-1498. Sneath, P.H.A. & Sokal, R.R. (1973). Numerical taxonomy. Freeman, San Francisco. Southern, E.M. (1975). Detection of specific sequences among DNA fragments separated by gel electrophoresis. Journal of Molecular Biology 98, 503-517. Southern, E.M. (1979). Measurement of DNA length by gel electrophoresis. Analytical Biochemistry 100, 319-323. Stanley, J., Baquar, N. & Threlfall, E.J. (1993). Genotypes and phylogenetic relationships of Salmonella typhimurium are defined by molecular fingerprinting of IS200 andl 6S rrn loci. Journal of General Microbiology 139, 1133-1140. Stull, T., LiPuma, J.J. & Edlind, T.D. (1988). A broad-spectrum probe for molecular epidemiology of bacteria: ribosomal RNA. Journal of Infectious Diseases 157, 280-286. Tamayo, M., Koblavi, S., Grimont, E, Castenada, E. & Grimont, P.A.D. (1997). Molecular epidemiology of Vibrio cholerae O 1 isolates from Colombia. Journal of Medical Microbiology 46, 611-616. Tenover, EC., Arbeit, R.D., Goering, R.V., Mickelsen, P.A., Murray, B.E., Persing, D.H. & Swaminathan, B. (1995). Interpreting chromosomal DNA restriction patterns produced by pulsedfield gel electrophoresis: criteria for bacterial strain typing. Journal of Clinical Microbiology 33, 2233-2239. Vandenesch, E, Eykyn, S.J., Bes, M., Meugnier, H., Fleurette, J. & Etienne, J. (1995). Identification and ribotypes of Staphylococcus caprae isolates isolated as human pathogens and from goatmilk. Journal of Clinical Microbiology 33, 888-892. Verger, J.-M., Grimont, E, Grimont, P.A.D. & Grayon, M. (1985). Brucella, a monospecific genus as shown by deoxyribonucleic acid hybridization. International Journal of Systematic Bacteriology 35, 292-295. Verger, J.-M., Grayon, M., Cloeckaert, A., Lefevre, M., Ageron, E. & Grimont, E (2000). Classification of Brucella strains from marine mammals using DNA-DNA hybridization and ribotyping. Research in Microbiology 151,797-799. Webster, J.A., Bannerman, T.L., Hubner, R.J., Ballard, D.N., Cole, E.M., Bruce, J.L., Fiedler, E, Schubert, K. & Kloos, W.E. (1994). Identification of the Staphylococcus sciuri species group with EcoRI fragments containing rRNA sequences and description of Staphylococcus vitulus sp. nov. International Journal of Systematic Bacteriology 44, 454-460.
This Page Intentionally Left Blank
135
6
Generation and Analysis of RAPD Fingerprinting Profiles
Kevin TowneE and Hajo Grundmann 2 1public Health Laboratory and 2Division of Microbiology, University Hospital, Queen's Medical Centre, Nottingham NG7 2 UH, UK
CONTENTS 6.1
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2
RAPD FINGERPRINTING APPROACHES A.
Bo
6.3
135 ..............
Overview of the generation of RAPD fingerprints (i) RAPD fingerprinting (ii) REP-PCR Factors affecting the generation of RAPD fingerprints (i) Template:primer interactions (ii) Reaction conditions (iii) Template DNA (iv) Primer considerations (v) Equipment and reagents
FRAGMENT SEPARATION AND ANALYSIS A. B. C.
136 136 136 138 139 139 139 140 141 142
............
143
Analysis of RAPD fingerprints with conventional gel-based systems Analysis of RAPD fingerprints on automated DNA sequencers Computer analysis of RAPD fingerprint data
143 145 146
6.4
STANDARDISATION OF RAPD FINGERPRINTS . . . . . . . . . .
150
6.5
CURRENT APPLICATIONS
152
6.6
PROSPECTS FOR THE FUTURE
REFERENCES
6.1
...................... ...................
................................
153 154
INTRODUCTION
Many of the bacterial typing methods described elsewhere in this book make use of the polymerase chain reaction (PCR) to amplify specific genes or spacer regions as a prelude to the generation of DNA fingerprints for typing purposes. However, although excellent and informative results can be produced, most of these techniques require several days to complete, usually involving several different molecular manipulations, and their very sophistication makes them less amenable to O2001 Elsevier Science B.V. All rights reserved.
136 introduction in routine hospital microbiology laboratories. Accordingly, their use has tended to be confined to central research or reference laboratories. Unfortunately, central analysis of an ever-growing number of important microorganisms at a national or international level imposes an unmanageable workload on central reference facilities, with a concomitant delay in obtaining results, and involves the undesirable and increasingly expensive shipping of pathogens by air or surface carriers. Therefore, there is an increasingly urgent need for a simplified DNA fingerprinting method that can be used to generate meaningful and reproducible typing data rapidly, preferably within a single working day, at the local level. In an attempt to address this problem, an increasing number of laboratories have investigated the use of two functionally interchangeable methods termed 'arbitrarily primed amplification' of chromosomal DNA (Welsh and McClelland, 1990; Williams et al., 1990; Maslow et al., 1993) and 'randomly amplified polymorphic DNA' (Caetano-Anoll6s et al., 1991; Maslow et al., 1993), otherwise known as AP-PCR and RAPD, respectively. Each of these techniques involves PCR amplification of 'random' fragments of genomic DNA with single primers, but the techniques are only actually 'random' in so far as a PCR primer(s) can be selected without regard to the sequence of the genome to be fingerprinted (Welsh & McClelland, 1990; Williams et al., 1990). An alternative related approach, referred to as REP-PCR (Woods et al., 1992), amplifies intervening sequences located between highly repetitive DNA motifs. These techniques (described in more detail in section 2A) have now been used successfully for localised epidemiological typing of an ever expanding range of bacteria. However, difficulties arise when attempting to compare the fingerprint banding patterns generated from a large number of isolates over an extended time period. Other important problems (see section 2B) include a reported lack of reproducibility and difficulties with the standardisation of equipment and reagents (MacPherson et al., 1993; Meunier & Grimont, 1993; van Belkum et al., 1995; Power, 1996; Tyler et al., 1997). The aim of this chapter is to review the current methods used for the generation and analysis of bacterial PCR fingerprints with 'random' primers and to consider whether appropriate experimental methods can be defined for particular groups of bacteria that will produce reliable, discriminatory and reproducible results at the practical level in different laboratories. 6.2
RAPD FINGERPRINTING APPROACHES
A.
Overview of the generation of RAPD fingerprints
(i) RAPD fingerprinting While conventional PCR assays can be used to amplify DNA sequences that are characteristic of a particular species or even strain, the method has the drawback that specific oligonucleotide primers are required, thereby making knowledge of the DNA sequence of the organism being studied an essential prerequisite. Analysis of RAPD fingerprints removes this requirement since a primer(s) can be
137 Table 6.1. Suggested classification of arbitrary primer fingerprinting methods according to primer length (Caetano-Anollrs et al., 1992a) Method
Primer length
AP-PCR
c. 18-24 mer
RAPD
c. 10 mer
DAF a
c. 5-8 mer
a
'DNA amplification fingerprinting'.
chosen without regard to the sequence of the genome to be fingerprinted (Welsh & McClelland, 1990; Williams et al., 1990). Thus, the method requires no previous knowledge of the molecular biology of the organisms to be investigated and can therefore, in principle, be applied to any species from which DNA can be prepared. It is worth noting that the nomenclature used to describe this technique can be confusing (Vaneechoutte, 1996). Several different names have been used to describe methods involving the generation of RAPD fingerprints. An umbrella term of MAAP ('multiple arbitrary amplicon profiling') has been proposed (Caetano-AnollEs et al., 1992a), and it has also been suggested that individual methods could be classified according to primer length (Table 6.1). However, at the present time, none of this proposed nomenclature has received formal recognition, and most recent publications continue to refer to the general method as AP-PCR or RAPD typing regardless of the length of primer used. The technique will be referred to as RAPD fingerprinting for the remainder of this chapter. The basis of the method is the original observation by Welsh & McClelland (1990) that a single arbitrarily-chosen primer, combined with two cycles of PCR at low stringency and many cycles at high stringency, generates a discrete and reproducible set of amplification products characteristic of particular genomes. The rationale for the phenomenon probably rests on the fact that, at a sufficiently low temperature, a primer can be expected to anneal to many sites on the chromosome, with a variety of mismatches, to initiate DNA polymerisation. Some of these sites will be within several hundred base pairs of each other and on opposite strands, such that the intervening sequence will be capable of amplification by PCR (Fig. 6.1). The molecular basis underlying the generation of RAPD fingerprints has been analysed in more detail by Venugopal et al. (1993), but arbitrary primers as short as 5 mer have been shown to generate complex genomic fingerprints following PCR amplification (Caetano-AnollEs et al., 1991). A key feature is that the initial amplification reaction in RAPD analysis is usually performed at low stringency, normally achieved by using a low annealing temperature during the PCR.
138 3
I I
I
3
2
I
I
I
I I
4
4 I
I
i
I
2
I
i
I
i I
I
I
A
1
A I
iJ
B
i
1
i
2
i
3
i
i
3
4
i
i
4
Fig. 6.1. Simplified representation of RAPD typing illustrating the binding of a single primer to both strands of two chromosomes (A and B). The resulting amplified DNA fragments are shown by the bold filled-in sections of the chromosomes. A single change in one annealing site (designated *) between the two chromosomes results in a significant difference in the corresponding RAPD fingerprint patterns as a result of the altered size of fragment 2. (ii) R E P - P C R A related method, termed REP-PCR, is based on amplification from the sites of repetitive extragenic elements (REP) that are found in varying numbers and at different positions in the chromosomes of many different bacteria. As with RAPD, if two REP elements are located close enough to each other, they can act as priming sites to amplify the region of DNA between them. These intervening regions are termed inter-repeat fragments and, depending on the primer used, their number and sites (and consequently the number of fragments amplified) are variable from strain to strain. Some individual techniques within this category have also been given specific names. Thus, a conceptually similar method known as ERIC-PCR is based on a family of conserved 'enterobacterial repetitive intergenic consensus' (ERIC) sequences found in the Enterobacteriaceae (Hulton et al., 1991; Versalovic et al., 1991). ERIC sequences resemble REP sequences, but their specific nucleotide
139 sequences are quite different (Stem et al., 1984; Hulton et al., 1991). Similar techniques targeted against specific eukaryotic repeat sequences have also been described (Breukel et al., 1990; Raich et al., 1993). The main characteristic distinguishing all these techniques from RAPD is that they produce distinct DNA fingerprints at relatively high stringency (high annealing temperatures similar to those employed in a standard PCR), although there is, of course, no reason why REP primers cannot be used in conjunction with lower annealing temperatures in classic RAPD reactions. B.
Factors affecting the generation of RAPD fingerprints
(i) Template:primer interactions As described above, hybridisation of a single primer can occur to multiple sites on both DNA strands of a chromosome, particularly when the annealing step is carried out at a relatively low temperature (low stringency) to allow for slight mismatches. When two annealing sites are sufficiently close to each other on opposite strands, the intervening sequence is capable of amplification by PCR. Amplification products synthesised in the first round of PCR then become the preferred templates for subsequent amplification cycles (Caetano-Anoll6s et al., 1992b) and result in the amplification of characteristic RAPD fingerprints of greater or lesser complexity, depending upon the precise primer used. However, it is worth noting that amplification products derived from a single primer will have palindromic ends that are capable of forming hairpin loop structures; these may, in tum, interfere with primer annealing in subsequent PCR rounds. As noted by Power (1996), the extent and stability of hairpin loop formation will vary between each individual amplification product, and therefore only some of the first round products will be amplified efficiently in subsequent cycles. This consideration is not a factor in amplification reactions (e.g., some REP-PCRs) that use two separate primers. In subsequent rounds of PCR, decreasing efficiency of template denaturation tends to favour the formation of product:product duplexes rather than primer:template interactions, and the production of amplified product becomes linear rather than exponential. As different amplification products which act as templates are amplified individually, it might also be expected that templates present at low concentrations will be preferentially amplified during later PCR rounds. Therefore, on a theoretical basis, since the specificity of the amplification process depends on primer:template interactions, it might be expected that changes in the concentration of template DNA or primer will affect the precise nature of the PCR products amplified, and hence result in different RAPD fingerprints. (ii) Reaction conditions As described in section 6.2A, REP-PCR is distinguished from RAPD fingerprinting by the relatively high annealing temperatures used in the PCR, although REP primers can also be used with lower annealing temperatures in classical RAPD reactions. However, although production of adequate RAPD fingerprints generally
140 requires an annealing temperature of 25-40~ 10 to 15-mer primers have been shown to yield highly reproducible RAPD fingerprints at 45-50~ (Grundmann et al., 1997). The use of a 'hot start' technique may be beneficial in terms of reproducibility at these higher annealing temperatures, but is not thought to influence RAPD fingerprint profiles at lower annealing temperatures (Power, 1996). Buffer components (e.g., Mg ++ concentration) can be expected to influence RAPD fingerprints in much the same way as they influence standard diagnostic PCRs (Bassam et al., 1992), but it seems that the precise number of cycles does not influence the qualitative nature of RAPD fingerprints provided that a minimum number of cycles (at least 30) is performed (Power, 1996). (iii) Template DNA This is an area in which it is the particular group of organisms being studied that seems to be critical. For some genera (e.g., Acinetobacter spp., Klebsiella pneumonia, K. oxytoca, Pseudomonas aeruginosa) it seems that reproducible RAPD fingerprints can be obtained simply by using heated cell suspensions (95~ for 15 min) as a DNA template (Grundmann et al., 1997; Vogel et al., 1999). This procedure has the major advantage of considerably simplifying the process and speeding-up the overall time required for RAPD analysis. However, for other genera (e.g., Candida albicans, Serratia marcescens) it seems that purified DNA is required for reproducible results (Power, 1996; Vogel et al., 1999), perhaps because of the presence of heat-stable nucleases in the initial heated cell suspensions. Observations regarding the influence of the template DNA concentration on RAPD fingerprints are rather divergent. A number of studies have concluded that the reproducibility of RAPD fingerprints is not strongly affected by the template DNA concentration (Welsh & McClelland, 1990; MacPherson et al., 1993; Muralidharan & Wakeland, 1993; Van Belkum, 1994), with only variations in staining intensity being observed at moderately high DNA concentrations (> 10 ng / 25-~L PCR). In contrast, other studies (e.g., Ellsworth et al., 1993; Davin-Regli et al., 1995) have reported that variations in the quantities of genomic template DNA alter the efficiency of amplification to such an extent that substantially different fingerprint patterns are generated. Variations seem to be greater at low DNA concentrations (<1 pg/~L) and could be explained by relative changes in the number of amplification events from perfect priming sites compared with rare sites, and the relative frequency of mismatch annealing events. Interestingly, excess template DNA can result in the production of a reduced number of amplification products (Power, 1996), perhaps because limiting concentrations of primer result in little or no amplification from sites at which annealing is less than optimum. In general, for most templates, 2.5 pmol of primer and 10 ng of template DNA in a 25-~L reaction mix seem to be optimal for generating sufficiently complex fingerprint patterns. However, providing a standardised DNA extraction method is used with cell suspensions of broadly equivalent initial cell densities, experience and published evidence suggests that reproducible fingerprints can be obtained with a range of
141 bacterial genera without precise measurement of the template DNA concentration (Grundmann et al., 1997; Vogel et al., 1999). An additional consideration is whether RAPD fingerprints should be performed with purified chromosomal DNA or total DNA (i.e., including any plasmid DNA present). Some authors have suggested that RAPD typing should be based strictly on purified chromosomal DNA alone (e.g., Power, 1996). However, total separation of chromosomal DNA from plasmid DNA can never be guaranteed, and the more lengthy extraction and purification procedures required have the effect of neutralising two of the major advantages of the RAPD approach, namely its simplicity and speed. A good compromise is to ensure that total DNA extracts are diluted as much as possible before amplification, thereby ensuring that the presence of plasmid DNA has a minimal effect. (iv) Primer considerations As described in section 6.2A, generation of RAPD fingerprints can be achieved with a primer chosen randomly without regard to the sequence of the genome being analysed. Nevertheless, it is clear that some arbitrary primers work better than others and may provide results that are more reproducible (Penner et al., 1993). It also seems that the use of a combination of arbitrary primers in a single reaction can give more detailed and reproducible patterns (Tyler et al., 1997). Consequently, primer selection is probably the single most influential parameter in the successful generation of RAPD fingerprints (van Belkum et al., 1993), with primer sequence, primer length and primer concentration being the three most important considerations. Primer sequence is generally not a major concern at the low annealing temperatures used in classical RAPD reactions, and many different arbitrary primer sequences have been used successfully. However, the precise number of fingerprint bands obtained will depend on the number and frequency of annealing sites on the template DNA being investigated. Since this is difficult to predict at low annealing temperatures, it is usually necessary to screen a number of different primers or primer combinations to obtain the optimum results for each individual species being investigated. Laboratories that do not wish to design, synthesise and test RAPD primers on an empirical basis may find that a useful starting point is a commercially available set of six 10-mer primers of arbitrary sequence (Amersham Pharmacia Biotech, Little Chalfont, UK) that have been specifically designed and tested for use in RAPD analysis with a variety of different species. As with primer sequence, the effects of varying the primer length are also somewhat unpredictable. Clearly the optimum primer length should be related to the particular template DNA being investigated and, in theory, it might be expected that longer primers would provide a greater discriminatory ability than short primers (Power, 1996). Thus, MacPherson et al. (1993) found that 10-mer primers generated far fewer fragments than 20-mer primers in control experiments with Toxoplasma gondii DNA, while RAPD typing of methicillin-resistant Staphylococcus aureus (MRSA) and Enterococcus faecium was also more successful using prim-
142 ers longer than 10-mer (van Belkum et al., 1993; Power, 1996). In contrast, short primers were more discriminatory than longer primers for typing clinical isolates of C. albicans (Power, 1996). Again, it seems that the optimum length of primer is something that needs to be determined on an empirical basis for each organism being investigated. The most critical of the three considerations seems to be the ratio of the primer concentration to the template DNA concentration (Hadrys et al., 1992). Theoretical considerations suggest that the dynamics of the reaction should tend towards shorter products with higher primer:template ratios, as observed by van Belkum et al. (1993). It might also be expected that more frequent annealing to less specific target sequences will occur at higher primer concentrations, again resulting in the generation of more and smaller fragments. Munthali et al. (1992) noted that the optimal primer concentration varied for different plant species, and it seems that primers that work well at a particular concentration with one species may fail to do so with another. This may be caused by fewer potential annealing sites, so that a higher primer concentration is required to achieve a DNA fingerprint of sufficient complexity. In any event, it is clearly important to consider the primer:template interaction, and this interaction may influence the final outcome to such an extent that Tyler et al. (1997) have suggested that some bacterial genera may be inherently more amenable to RAPD techniques than others. (v) Equipment and reagents Several different research groups have suggested that the reproducibility of RAPD fingerprints is seriously affected by the make and model of thermocycler used (e.g., Caetano-Annol6s et al., 1992b; MacPherson et al., 1993; Meunier & Grimont, 1993). It has also been shown in the past that the temperature across the block of some machines may vary by as much as 5~ between the innermost and outermost tube positions (Linz, 1990; He et al., 1994). This problem may not be so acute with modern thermal cyclers that are equipped with the best temperature regulation and heat exchange systems (Meunier & Grimont, 1993), but this is a serious issue that should always be considered. Nevertheless, there is evidence that some primers produce the same RAPD fingerprints with the same bacterial template DNA regardless of the particular equipment used (Power, 1996; Grundmann et al., 1997). In any event, such variability should not be a problem during the local investigation of outbreaks in a single laboratory where the same equipment is used continuously. A more significant problem seems to be associated with the choice of PCR reagents. Meunier & Grimont (1993) demonstrated drastic variations among RAPD fingerprints according to the particular brand of Taq DNA polymerase used. Although such differences might be partly related to differences in the recommended reaction buffers for each particular version of the enzyme, it is known that commercially available thermostable DNA polymerases can differ markedly in their characteristics. For example, some enzymes possess proofreading ability which may correct mismatches in primer:template interactions at the 3' end,
143 thereby effectively increasing the number of priming sites on a particular template, while others may possess reverse transcriptase activity that could result in anomalous RAPD fingerprints as a result of contaminating RNA in the original template DNA preparations. It therefore seems probable that a major source of variation in RAPD fingerprint profiles is the Taq DNA polymerase preparation itself. Consequently, a major step forward in eliminating this important source of variation has been achieved following the commercial introduction of specially designed RAPD analysis beads (Amersham Pharmacia Biotech). These beads are supplied individually in a quality controlled, premixed, room temperature-stable format that is already optimised for RAPD reactions. Each bead contains AmpliTaq DNA polymerase and Stoffel fragment, as well as all necessary buffer ingredients and nucleotides. The combination of two different thermostable polymerases produces a more discriminatory RAPD fingerprinting pattern than either of the polymerases used alone. Each bead is supplied predispensed in a 0.5-mL microfuge tube and the only additions required are sterile distilled water, template DNA and a suitable primer. This format significantly reduces the number of pipetting steps required, thereby minimising the risk of contamination and increasing the potential reproducibility of the RAPD technique (Grundmann et al., 1997; Vogel et al., 1999). 6.3
FRAGMENT SEPARATION AND ANALYSIS
In addition to the technical aspects involved in the production of RAPD fingerprints (discussed in section 6.2), the final steps of visualisation and analysis of the fingerprints produced are also critical. While analysis of the fingerprints can be achieved with one of several different computer software packages (see section 6.3C), two significantly different options are available for visualisation of the RAPD fingerprints produced, namely visualisation on conventional gel-based systems or visualisation on automated DNA sequencing equipment by the construction of fluorescence density profiles.
A.
Analysis of RAPD fingerprints with conventional gel-based systems
Most readers of this chapter will already be familiar with standard methods (Sambrook et al., 1989) for the separation of DNA bands on agarose gels and their subsequent visualisation by staining with ethidium bromide and illumination with ultraviolet light. These procedures are also suitable for visualising RAPD fingerprints and no significant modifications to the standard methods are required. It is worth noting that most useful RAPD fingerprints obtained with a single primer consist of DNA bands in the size range 100 b p - 2 kb, and that good separation over this range is usually achieved with agarose concentrations of 1.5% or 2% w/v and a voltage of c. 120 v. However, it is important to standardise the electrophoresis and staining conditions. In particular, destaining of the gel after staining with ethidium bromide may result in RAPD fingerprints that are different to those visualised on primary stained gels (Power, 1996), simply because faint bands can either
144
Fig. 6.2. Example of RAPD fingerprints visualised on an agarose gel. Lanes labelled M contain
standard size markers (100-bp ladder). Lanes 1-11 show RAPD fingerprints generated from nosocomial isolates of Acinetobacter baumannii with M13 core primer (Grundmann et al., 1997). Isolates with distinct RAPD fingerprint patterns or isolates showing closely similar patterns (e.g., lanes 3, 7 and 10) can be easily recognised by quick visual examination. be reduced beyond the point of detection in destained gels, or may be swamped by the excessive fluorescence from stronger bands in heavily stained gels. Similarly, the precise length of destaining is also important for the same reasons. This consideration may not be so significant when comparing RAPD fingerprints obtained from a small collection of isolates on the s a m e gel, in which case it is usually quite easy to visually recognise distinct patterns (Fig. 6.2), but it is vitally important that electrophoresis and staining conditions are standardised for any hope of a successful comparison of fingerprints obtained on different gels (see below). As with other DNA fingerprinting methods, interpretation of the results obtained, even on a single gel, can present a problem. Apart from the simple presence or absence of bands, there are no hard or fast rules that define how many band differences are required before two isolates can be considered to be unrelated (i.e., when they are epidemiologically distinct). Some groups of workers favour an approach based on that proposed for analysis of fingerprints generated by pulsed-field gel electrophoresis (Tenover et al., 1996), in which isolates are considered as 'related' if between one and three DNA band differences are observed, and as 'probably related' if differing by between four and six bands. However, although successful in stimulating debate, this approach has been criticised for being too prescriptive and misleading for inexperienced workers. In particular, apart from practical problems such as the particular restriction enzyme used with a particular species, it takes no account of the important theoretical problem (which also affects RAPD fingerprint profiles) of differing recombination and mutation rates between one bacterial species and another, whereby 'related' isolates of a species with a high recombination or mutation rate observed over a short period of time must be presumed to have a recent epidemiological link, while 'related' isolates of a species
145 with a lower rate may have a more distant epidemiological link (Blanc et al., 1998). So far as the analysis of RAPD fingerprints is concerned, even when being analysed visually on a single agarose gel, it should be emphasised that no hard and fast criteria for interpretation of fingerprint profiles can be proposed, but that working criteria should be defined empirically for each individual species:primer combination before epidemiological conclusions can be reached (see section 6.3C). In order to attempt any comparison of fingerprints visualised on different gels over an extended time period, it is first necessary to rigorously standardise all the PCR protocols and electrophoresis procedures used (see section 6.4). Small numbers of RAPD fingerprints can then be compared directly by visual examination of gel photographs, but difficulties arise when attempting to visually compare the fingerprint banding patterns from a large number of isolates examined on different gels over extended time intervals. For this purpose it is usually necessary to make use of a computer database of fingerprint patterns. Photographs of RAPD fingerprint patterns visualised on gels can be digitised with a scanner equipped with appropriate software (e.g. Adobe Photoshop; Adobe Systems, NL) and the digital images saved to computer disk in either the PICT or TIFF formats, depending on the computer analysis program to be used subsequently (see section 6.3C). Again, it is important to standardise the scanner settings to ensure that faint bands are imported into the database in a consistent manner. In addition, it is vital for accurate gel normalisation purposes (see section 6.3C) that each individual gel contains size markers (e.g., a 100-bp ladder) in at least the first and last lanes (and preferably after every six lanes), together with internal size markers (e.g., of 100 and 1000 bp) added to each sample being analysed. B.
Analysis o f R A P D fingerprints on automated DNA sequencers
Some of the drawbacks associated with the analysis of RAPD fingerprints on conventional gel-based systems can be overcome by combining the generation of RAPD fingerprints with automated laser fluorescence (ALF) analysis of the DNA fragments generated on a DNA sequencing apparatus. The technique achieves a high resolution of the RAPD fingerprinting patterns by using rapid denaturing sequencing gels which are capable of discriminating DNA products that differ in size by only a single base (Stegemann et al., 1991). For this purpose, the primer(s) to be used to generate the fingerprints must be labelled during manufacture with a suitable dye, with the precise label depending on the exact sequencing equipment being used. Thus, for example, primers can be labelled at the 5'-end with 6-carboxyfluorescene (FAM) for analysis on the ABI Prism system (Perkin Elmer Applied B iosystems, Warrington, UK), or with the carbocyanine dye Cy5 for analysis on the ALFexpress system (Amersham Pharmacia Biotech). Although there are a few published examples of work with the ABI Prism system (e.g., Cotter et al., 1998), most published examples of this application for analysis of bacterial RAPD fingerprints have used the ALF or ALFexpress systems (e.g., Grundmann et al., 1995a, b; 1997; 1999; Webster et al., 1996; Webster & Towner, 2000).
146 Typically, denaturing separating gels are prepared containing Hydrolink Long Ranger acrylamide 5% w/v, 7 M urea and 0.6 x Tris-borate buffer. A 1-1alportion of each amplification product is denatured in 5 ~1 of formamide-containing stop solution at 95~ for 5 min and then applied to the gel contained in the sequencing apparatus. Electrophoresis is in 0.6 x Tris-borate buffer at a constant voltage of 800 V (current limit of 50 mA) and a constant temperature of 45~ Amplification (fingerprint) patterns are identified by the fluorescence emitted by the DNA fragments passing the fixed laser beam in the sequencing apparatus and are reconstructed as fluorescence density profiles (Fig. 6.3). As described above (section 6.3B) for analysis on conventional agarose gels, in order to allow comparisons to be made between fluorescence density profiles, it is necessary to add internal size markers with fluorescent labels (typically 100 bp and 1064 bp) to each track, with a further set of fluorescently labelled external markers (e.g., a 100-bp ladder) run in about every sixth track. The fluorescence density profiles can then be digitised and downloaded to computer databases directly without the loss of information that may occur during the scanning of gel images generated by conventional visualisation methods.
C.
Computer analysis of RAPD fingerprint data
Several different programs for computers with either Macintosh or PC Windowsbased operating systems are now available commercially for analysing RAPD fingerprints. Although largely automated, these programs still require a limited amount of operator intervention to deal with artifacts that are recognised by the automated system. Such programs are being used increasingly by many clinical microbiology laboratories, and many published studies now attach increasing significance to dendrograms illustrating the results of epidemiological investigations, often without too much thought as to what the computer-generated results actually mean. Depending on the particular computer platform used (i.e., PC or Macintosh), the different software packages are suitable for analysing digitised RAPD fingerprint patterns that have been downloaded in the form of either scanned gel images or fluorescence density profiles. These programs are essentially the same as those used for analysis of the other types of nucleic acid fingerprints described in this book and examples of their use will only be described in brief in this chapter. For Macintosh computers, probably the most widely used package to date is the DENDRON program (Solltech Inc., Iowa, U.S.A.) and its derivatives. This program normalises data from separate electrophoresis gels according to internal size standards added to each track and/or sets of molecular size standards run at regular intervals in separate tracks. Alignment by the computer of the size standards allows inter- and intra-gel inconsistencies and variations in electrophoresis conditions to be corrected. The DENDRON program identifies the positions and intensities of the bands in each lane of a gel and then calculates a similarity coefficient (SAB) for every pair of isolates. In general, published examples of the use of this program with scanned gel images (e.g, Webster et al., 1996; Grundmann et al., 1997) have
147 Auto-Scaled Data = Time [Minutes] 1 2
I
20
30
40
3
I
I
70
80
90
100
Fig. 6.3. Example of fluorescence density profiles produced by the ALFexpress system for nosocomial isolates of Acinetobacter baumannii with M13 core primer (Grundmann et al., 1997). Profile 1 contains standard size markers (100-bp ladder). Peak 1 is the labelled M 13 core primer used to produce the profiles. Each profile was spiked with internal size markers of 100-bp (peak 2) and 1064-bp (peak 3) before running in the ALFexpress system. The slight discrepancy observed between the positions of peak 3 in profile 1 and the other profiles provides an illustration of the need for accurate normalisation of the profiles before final analysis.
used SAB values computed with the Dice coefficient solely on the basis of band positions (Struelens et al., 1996) in RAPD profiles. The SAB values are presented in a matrix and then used to generate a dendrogram by the unweighted pair group method using arithmetic averages (UPGMA) in which the two or more isolates with the highest SAB value are grouped into a cluster with a connection (or branch point) corresponding to that SAB value along the horizontal axis of the dendrogram. The process continues in the direction of lower SAB values until the dendrogram is complete (Fig. 6.4). This program is normally not suitable for examin-
148 CW8 ]CW9 CW5 CW4 CW6 CW7 CW23 CW24 CW25 CW26 CW27 CW28 CW29
CWI8
1
..........................
i
i .............
,,
CWI2 CWI CW38 ICW15 ICWI4 CWI3 ICW28
I
CWI6
I
Icw17 ,
I
I
I
••|••••|••|•h•••••••h•••••|•
8
.1
.2
.3
.4
.5
.6
.7
.8
CW3
CWI9 , CW21
cwz2
CWI8 CW2
.9
1 SAB
Fig. 6.4. Example of a dendrogram illustrating the relationships between 30 nosocomial isolates of Acinetobacter baumannii, based on analysis with the Dice coefficient of RAPD profiles generated with M13 core primer (Webster et al., 1996). Within this particular system, a SAB value of 0.7 (70%) has been shown to distinguish between genetically and epidemiologically unrelated isolates. Isolates with a SAB value of >0.7 are considered to be closely related (Grundmann et al., 1995, 1997; Webster et al., 1996).
ing fluorescence density profiles generated by automated DNA sequencing equipment because of the PC-based computer platform that is generally used by such systems. For PC users, the most widely used package for analysing RAPD fingerprints and digitised fluorescence density profiles is undoubtedly the BioNumerics program (Applied Maths, Kortrijk, Belgium) and its progenitor GelCompar. These programs can be used in the same way as DENDRON for examining scanned gel images and functions in a similar manner to DENDRON. When used for examining fluorescence density profiles, similarity analysis is usually carried out with Pearson's product moment correlation coefficient (as opposed to the Dice coefficient), followed by UPGMA cluster analysis (e.g., Grundmann et al., 1995a, b; 1997; 1999). The Pearson coefficient calculates the correlation between arrays of
149 values, in this case fluorescence density values for each point in the two profiles being compared. As the curves as a whole are compared, the Pearson correlation coefficient is less sensitive to differences in band intensities than the Dice coefficient and is ideally suited to PCR-based methods where intensity differences between bands matter more than with restriction fragment analyses. Crucially for the comparison of fluorescence density profiles, it is largely insensitive to relative concentrations, but it is sensitive to differences in background. The question of whether to use the Dice or Pearson correlation coefficients to examine RAPD fingerprint profiles provides an interesting philosophical problem. As mentioned above, the Dice coefficient is based primarily on the presence or absence of bands in particular positions. Although the various programs available will give an automated interpretation of the positions of bands on a scanned gel image or fluorescence density profile, it is important that these positions are checked manually before final analysis, and this checking process is ultimately a question of subjective judgement. In contrast, the Pearson coefficient is more objective in that it depends on the variance between two fluorescence density values at each point in the curve pattern and consequently does not suffer from typical peak/shoulder mismatches. However, it relies totally on the computer's interpretation of the profile and accurate gel normalisation is therefore mandatory. Seward et al. (1997) made a direct comparison of the overall robustness of the respective algorithms and the reproducibility of the cluster analysis results generated by the Dice and Pearson correlation coefficients by using both methods to examine the same set of RAPD fingerprints generated from isolates of known epidemiological relationships. Both methods were efficient at recognising groups of isolates with closely similar RAPD fingerprints (i.e., isolates which might be expected to have a close epidemiological or evolutionary relationship), but showed considerable variation in terms of the relationships suggested for less closely related isolates with comparatively low SAB values. Overall it was concluded that both analysis methods were useful tools for examining genotypic relationships between individual isolates, but that assessment of the results and occasional intervention by a trained microbiologist is still essential to detect and correct artifacts introduced by the automated systems (Seward et al., 1997). Whatever the precise algorithm used to analyse the relationships between strains, the outcome is either expressed in terms of the percentage similarity between each individual pair of isolates (Dice coefficient) or as a correlation value (Pearson coefficient). There is a general consensus that several band differences (depending on the total number of bands) in the overall RAPD fingerprint profile are required before two isolates may be considered different (Struelens et al., 1996). However, as outlined in section 6.3A, there are no hard or fast criteria that define the precise number of band differences that is required to separate two isolates, and it is clear that a 'cut-off' value should be defined empirically for each individual species:primer combination before epidemiological conclusions can be reached. To date, the only such studies with RAPD fingerprints have been concerned with strains of Acinetobacter spp., for which it seems that a cut-off similarity value of
150 70% is suitable for distinguishing unambiguously between genetically and epidemiologically unrelated isolates with the particular sets of primers and amplification conditions used (Grundmann et al., 1995, 1997; Webster et al., 1996). Such a cut-off value (which depends on the total number of bands in the profile) is quite helpful as it means that the minor variations in fingerprint profiles which seem to occur occasionally with most RAPD reactions have no influence on the final epidemiological conclusion. More studies need to be performed with RAPD fingerprints generated from epidemiologically defined collections of isolates belonging to other bacterial genera. It cannot be overemphasised that the absolute number of band differences is a measure that needs to be interpreted with extreme caution as its importance will be entirely proportional to the overall number of resolved RAPD fragments (Struelens et al., 1996). 6.4
STANDARDISATION OF RAPD FINGERPRINTS
Section 6.2B described some of the factors that can affect the generation of PCR fingerprints. These have been discussed in more detail in previous reviews of this subject (Power, 1996; Tyler et al., 1997) and more details can be found in the references listed in Table 6.2. These factors can undoubtedly present major problems in terms of the reproducibility of the technique. Nevertheless, the RAPD technique offers some major advantages over traditional phenotypic typing methods (and the other molecular fingerprinting methods described in this book) by being rapid, relatively inexpensive, theoretically applicable to any organism, and technically feasible for most diagnostic microbiology laboratories. There is a general consensus amongst microbiologists that careful attention to detail and standardisation of techniques should be sufficient for an individual laboratory to use the technique comparatively for such purposes as recognising and tracing strains involved in an outbreak. The availability of specially designed and quality controlled RAPD analysis reagents has provided a considerable advance and, as with other PCR techniques, methods that remove or reduce non-specific annealing before amplification (e.g., the 'hot-start' technique) should help in the production of reproducible RAPD fingerprints. The RAPD approach is being used increasingly in many microbiology laboratories for the epidemiological typing of an ever-expanding range of bacteria. However, variations between laboratories still present considerable difficulties and, to date, there has only been one study that has attempted to address this problem by combining the use of standardised methodology for the generation of RAPD fingerprints with automated laser fluorescence (ALF) analysis of the DNA fragments. In this study (Grundmann et al., 1997), an exact measure of the inter-laboratory reproducibility of a given amplification profile was obtained by analysing the degree of similarity (pattern correlation) between RAPD fingerprints generated from DNA extracts prepared centrally or independently in seven European laboratories for the same isolates of Acinetobacter spp. The overall pattern correlation was calculated as the arithmetic mean of Pearson's product moment correlation
151 Table 6.2.
Summary of important factors affecting the standardisation of bacterial RAPD finger-
prints
Template DNA
Parameter
Example references
Preparation
Elaichouni et al. (1994) McClelland & Welsh (1994) Micheli et al. (1994) Bassam et al. (1992) Caetano-Anoll6s (1993) Ellsworth et al. (1993) Micheli et al. (1994) Caetano-Anoll6s et al. (1992b) Caetano-Anoll6s (1993) Bassam et al. (1992) MacPherson et al. (1993) Caetano-Anoll6s et al. (1992b) Caetano-Anoll6s (1993) Bassam et al. (1992) Meunier & Grimont (1993) Schierwater & Ender (1993) Bassam et al. (1992) Brikun et al. (1994) Bassam et al. (1992) Ellsworth et al. (1993) MacPherson et al. (1993) Meunier & Grimont (1993) Penner et al. (1993) Caetano-Anoll6s et al. (1992b) Caetano-Anoll6s (1993) Ellsworth et al. (1993) Bassam et al. (1992) Kangfu & Pauls (1992) MacPherson et al. (1993) Kangfu & Pauls (1992) Kangfu & Pauls (1992) Berg et al. (1994) McClelland & Welsh (1994)
Concentration Secondary structure RNA contamination Primer
Length Concentration G+C content
PCR reagents
Source / type
Taq
concentration
Magnesium concentration Equipment
Thermocycler model
Reaction and visualisation conditions
Annealing temperature Number of cycles Denaturing time Extension time Type of gel
coefficients of fingerprints from all identical isolates after parallel analysis of the RAPD fingerprints by ALF on sequencing gels. Table 6.3 shows the pattern correlation results obtained with each of the four individual primer sets investigated over the 120-800 bp size range. Overall, the pattern correlation was good (range: 83.3-86.6%), but was slightly better for the centrally prepared DNA extracts (87.1%) than the DNA extracts prepared in each individual laboratory (84.7%), although this difference was not significant. Since good discrimination between genetically and epidemiologically unrelated groups ofAcinetobacter spp. could be
152 Table 6.3. RAPD pattern correlation results obtained following ALF analysis of data obtained in a multicentre trial in seven different European laboratories with isolates of Acinetobacter spp.
Percentage pattern correlation (_+SD) Primer(s)
Centrally prepared template DNA
Locallyprepared template DNA
M13 DAF4 ERIC-2 REP1 + REP2
88.6 (_+7.8) 86.6 (_+8.5) 86.3 (__.12.5) 86.8 (_+17.5)
82.8 (_+10.6) 87.9 (_+8.5) 84.7 (__.10.7) 83.5 (_+8.6)
Mean
87.1
84.7
Data taken from Grundmann et al., 1997. achieved with the methods used at a similarity level of 70%, the study was successful in demonstrating that the minor variations in pattern correlations observed between the different participating laboratories would not be sufficient to alter the overall epidemiological conclusions. However, it was clear that such inter-laboratory comparisons are heavily dependent on rigorous normalisation of sample lanes on a gel to allow accurate intra- and inter-gel comparisons. This in turn necessitates the addition of internal molecular size markers to each and every lane, together with the inclusion of a set of molecular size markers spanning the entire range of DNA fragment sizes being analysed at regular intervals (at least every six lanes) on the gel. 6.5
C U R R E N T APPLICATIONS
A cursory examination of the scientific literature will reveal an ever-increasing number of articles describing the use of RAPD fingerprinting techniques for typing an ever-increasing range of bacterial species. However, rather fewer of these proposed applications have been rigorously assessed in terms of the performance criteria (typeability, reproducibility, stability, discriminatory power, epidemiological concordance and typing system concordance) recommended by the European Study Group on Epidemiological Markers (Struelens et al., 1996). Nevertheless, it seems that the most obvious and valuable applications of RAPD fingerprinting lie in the relatively rapid and potentially discriminatory characterisation of isolates that is achievable on a day-to-day basis in individual diagnostic laboratories during outbreak investigations. Application of standardised RAPD fingerprinting techniques should provide rapid information at the local level and help with the initiation of effective infection control programmes. Apart from outbreak investigations, it has also been suggested (Grundmann et al., 1999; Webster & Towner, 2000) that RAPD fingerprinting techniques may have a role to play in the routine analysis of isolates in a diagnostic laboratory. Such routine typing of isolates would be a desirable component of infection
153 control services in a setting such as an intensive care unit where high densities of cross-transmission might be expected as a result of the underlying illness of patients, the high frequency of invasive procedures, and the overall high levels of antibiotic selection pressure. At present, typing of bacteria is rarely performed in non-outbreak situations because most typing systems are time-consuming, not applicable to all bacteria, or not cost-effective. However, by using the relatively inexpensive and potentially universal RAPD fingerprinting technique in situations where bacteria of a single species are isolated from patients in conspicuous clusters, informative typing data can be made available to clinical staff in a timely fashion. 6.6
PROSPECTS FOR THE FUTURE
The spread of particular virulent, epidemic or antibiotic-resistant microorganisms poses a major and increasing threat to the health of populations throughout the world, with substantial additional costs associated with lengthened hospital stays, use of more expensive antibiotics, and increased morbidity and mortality. However, studies of the geographical epidemiology of particular microbial pathogens are often restricted mainly to individual research institutes because of the complicated or specialised nature of the typing systems that are currently being used. National Governments expect the scientific community to respond to the problem of cosmopolitan bacterial strains and their spread within the population, but tackling this problem requires more coordination of efforts. WHO has been taking the lead in developing networks for surveillance and identification of emerging and antibiotic-resistant pathogens in close collaboration with the EU and the Centers for Disease Control and Prevention in Atlanta, USA, but a mere report or database of accumulated susceptibility data and other associated information will not suffice for the timely recognition of particular virulent or epidemic organisms. Identification and typing of bacteria is thus extremely important in efforts to monitor the geographical spread of pathogens. However, as outlined at the beginning of this chapter, central analysis of an ever-growing number of important microorganisms at a national or international level requires undesirable and potentially hazardous transport of pathogens between laboratories and imposes an unmanagable workload, with associated delays in obtaining results, on central reference facilities. Meaningful understanding and sensible intervention by local health authorities to prevent the national and international spread of disease can only be achieved by extensive and rapid communication of data between different local laboratories. Therefore, local isolate analysis combined with central data collection offers a more efficient and safer alternative. The design of new RAPDbased fingerprinting methods for analysis of microbial DNA means that such typing strategies can potentially be applied to many microorganisms of public health significance (Struelens et al., 1996). The RAPD fingerprinting technique, combined with the use of quality controlled reagents, standardised protocols and analysis on sequencing gels, has been
154
shown in trial experiments to generate reproducible results of epidemiological significance with various groups of bacteria (Grundmann et al., 1995a, b; 1997; Webster et al., 1996; Vogel et al., 1999), but the generation of numerous complex fingerprint profiles, particularly in cases where strains from different geographical locations are being compared over significant time periods, requires that a computer-assisted strategy is used to enable the formation of a database of fingerprint patterns. As described in section 6.3C, a number of software programs are available that successfully perform DNA fingerprint pattern analysis, clustering and comparison, but a consensus still needs to be reached for data generation, entry and retrieval protocols, and cut-off values for determining epidemiological relatedness. Automated laser fluorescence-based analysis of RAPD fingerprints, combined with digital data communication via the Internet, may bring new opportunities for the comparison of typing data generated in different facilities. Such data could be made accessible to other laboratories via an Internet-based database. In places where automated sequencers are not available, DNA extracts could be easily prepared and shipped without accompanying risk to sentinel surveillance laboratories with appropriate equipment for immediate analysis. The increasing public health problem resulting from the national and international spread of pathogenic microorganisms means that it is now timely for further studies on establishing standardised methodologies and systems for use in conjunction with this technology. REFERENCES Bassam, B.J., Caetano-Annol6s, G. & Gresshoff, EM. (1992). DNA amplification fingerprinting of bacteria. Applied Microbiology and Biotechnology 38, 70-76. Berg, D.E., Akopyants, N.S. & Kersulyte, D. (1994). Fingerprinting microbial genomes using the RAPD or AP-PCR method. Methods in Molecular and Cellular Biology 5, 13-24. Blanc, D.S., Hauser, EM., Francioli, E & Bille, J. (1998). Molecular typing methods and their discriminatory power. Clinical Microbiology and Infection 4, 61-63. Breukel, C., Wijnen, J., Tops, C., van der Klift, H., Dauwerse, H. & Khan, EM. (1990). Vector-Alu PCR: a rapid step in mapping cosmids and YACs. Nucleic Acids Research 18, 3097. Brikun, I., Suziedelis, K. & Berg, D.E. (1994). DNA sequence divergence among derivatives of Escherichia coli K-12 detected by arbitrary primer PCR (random amplified polymorphic DNA) fingerprinting. Journal of Bacteriology 176, 1673-1682. Caetano-Annol6s, G. (1993). Amplifying DNA with arbitrary oligonucleotide primers. PCR Methods and Applications 3, 85-94. Caetano-Annol6s, G., Bassam, B. J. & Gresshoff, E M. (1991). DNA amplification fingerprinting using very short arbitrary oligonucleotide primers. BioTechnology 9, 553-557. Caetano-Annol6s, G., Bassam, B. J. & Gresshoff, E M. (1992a). DNA fingerprinting: MAAPing out a RAPD definition? BioTechnology 10, 937. Caetano-Annol6s, G., Bassam, B. J. & Gresshoff, E M. (1992b). Primer-template interactions during DNA amplification fingerprinting with single arbitrary oligonucleotides. Molecular and General Genetics 235, 157-165. Cotter, L., Daly, M., Greer, E, Cryan, B. & Fanning, S. (1998). Motif-dependent DNA analysis of a methicillin-resistant Staphylococcus aureus (MRSA) collection. British Journal of Biomedical Science 55, 99-106. Davin-Regli, A., Abed, Y., Charrel, R.N., Bollet, C. & de Micco, E (1995). Variations in DNA con-
155 centrations significantly affect the reproducibility of RAPD fingerprint patterns. Research in Microbiology 146, 561-568. Elaichouni, A., Van Emmelo, J., Claeys, G., Verschraegen, G., Verelst, R. & Vaneechoutte, M. (1994). Study of the influence of plasmids on the arbitrary primer polymerase chain reaction fingerprint of Escherichia coli strains. FEMS Microbiology Letters 115, 335-339. Ellsworth, D.L., Rittenhouse, K.D. & Honeycutt, R.L. (1993). Artifactual variation in randomly amplified polymorphic DNA banding patterns. BioTechniques 14, 214-217. Grundmann, H.J., Schneider, C. & Daschner, ED. (1995a). Fluorescence-based DNA fingerprinting elucidates nosocomial transmission of phenotypically variable Pseudomonas aeruginosa in intensive care units. European Journal of Clinical Microbiology and Infectious Diseases 14, 1057-1062. Grundmann, H., Schneider, C., Tichy, H.V., Simon, R., Klare, I., Hartung, D. & Daschner, ED. (1995b). Automated laser fluorescence analysis of randomly amplified polymorphic DNA: a rapid method for investigating nosocomial transmission ofAcinetobacter baumannii. Journal of Medical Microbiology 43, 446-451. Grundmann, H.J., Towner, K.J., Dijkshoom, L., Gemer-Smidt, P., Mahar, M., Seifert, H. & Vaneechoutte, M. (1997). Multicenter study using standardized protocols and reagents for evaluation of reproducibility of PCR-based fingerprinting of Acinetobacter spp. Journal of Clinical Microbiology 35, 3071-3077. Grundmann, H.J., Hahn, A., Ehrenstein, B., Geiger, K., Just, H. & Daschner, ED. (1999). Detection of cross-transmission of multiresistant Gram-negative bacilli and Staphylococcus aureus in adult intensive care units by routine typing of clinical isolates. Clinical Microbiology and Infection 5, 355-363. Hadrys, H., Balick, M. & Schierwater, B. (1992). Application of random amplified polymorphic DNA (RAPD) in molecular ecology. Molecular Ecology 1, 55-63. He, Q., Viljanen, M.K. & Mertsola, J. (1994). Effects of thermocyclers and primers on the reproducibility of banding patterns in randomly amplified polymorphic DNA analysis. Molecular and Cellular Probes 8, 155-159. Hulton, C. S., Higgins, C. F. & Sharp, P. M. (1991). ERIC sequences: a novel family of repetitive elements in the genomes of Escherichia coli, Salmonella typhimurium and other enterobacteria. Molecular Microbiology 5, 825-834. Kangfu, Y. & Pauls, K.P. (1992). Optimization of the PCR program for RAPD analysis. Nucleic Acids Research 20, 2606. Linz, U. (1990). Thermocycler temperature variation invalidates PCR results. BioTechniques 9, 286-294. MacPherson, J. M., Eckstein, P. E., Scoles, G. J. & Gajadhar, A. A. (1993). Variability of the random amplified polymorphic DNA assay among thermal cyclers, and effects of primer and DNA concentration. Molecular and Cellular Probes 7, 293-299. Maslow, J. N., Mulligan, M. E. & Arbeit, R. D. (1993). Molecular epidemiology: application of contemporary techniques to the typing of microorganisms. Clinical Infectious Diseases 17, 153-164. McClelland, M. & Welsh, J. (1994). DNA fingerprinting by arbitrarily primed PCR. PCR Methods and Applications 4, $59-$65. Meunier, J. R. & Grimont, P. A. D. (1993). Factors affecting reproducibility of random amplified polymorphic DNA fingerprinting. Research in Microbiology 144, 373-379. Micheli, M.R., Bova, R., Pascale, E. & D'Ambrosio, E. (1994). Reproducible DNA fingerprinting with the random amplified polymorphic DNA (RAPD) method. Nucleic Acids Research 22, 1921-1922. Munthali, M., Ford-Lloyd, B.V. & Newbury, H.J. (1992). The random amplification of polymorphic DNA for fingerprinting plants. PCR Methods and Applications 1,274-276. Muralidharan, K. & Wakeland, E.K. (1993). Concentration of primer and template qualitatively affects products in random-amplified polymorphic DNA PCR. BioTechniques 14, 362-364.
156 Penner, G.A., Bush, A., Wise, R., Kim, W., Domier, L., Kasha, K., Laroche, A., Scoles, G., Molnar, S.J. & Fedak, G. (1993). Reproducibility of random amplified polymorphic DNA (RAPD) analysis among laboratories. PCR Methods and Applications 2, 341-345. Power, E. G. M. (1996). RAPD typing in microbiology - a technical review. Journal of Hospital Infection 34, 247-265. Raich, T.J., Archer, J.L., Robertson, M.A., Tabachnik, W.J. & Beaty, B.J. (1993). Polymerase chain reaction approaches to Culicoides (Diptera: Ceratopogonidae) identification. Journal of Medical Entomology 30, 228-232. Sambrook, J., Fritsch, E.E & Maniatis, T. (1989). Agarose gel electrophoresis. In Molecular cloning, a laboratory manual, 2nd edn., pp.6.3-6.19. Cold Spring Harbor Laboratory Press, New York. Schierwater, B. & Ender, A. (1993). Different thermostable DNA polymerases may amplify different RAPD products. Nucleic Acids Research 21, 4647-4648. Seward, R.J., Ehrenstein, B., Grundmann, H.J. & Towner, K.J. (1997). Direct comparison of two commercially available computer programs for analysing DNA fingerprinting gels. Journal of Medical Microbiology 46, 314-320. Stegemann, J., Schwager, C., Erfle, H., Hewitt, N., Voss, H., Zimmermann, J. & Ansorge, W. (1991). High-speed online DNA sequencing on ultrathin slab gels. Nucleic Acids Research 19, 675-676. Stern, M. J., Ames, G. E, Smith, N. H., Robinson, E. C. & Higgins, C. E (1984). Repetitive extragenic palindromic sequences: a major component of the bacterial genome. Cell 37, 1015-1026. Struelens, M.J., and the members of the European Study Group on Epidemiological Markers (ESGEM) of the European Society for Clinical Microbiology and Infectious Diseases (ESCMID) (1996). Consensus guidelines for appropriate use and evaluation of microbial epidemiologic typing systems. Clinical Microbiology and Infection 2, 1-11. Tenover, EC., Arbeit, R.D., Goering, R.V., Mickelson, EA., Murray, B.E., Persing, D.H. & Swaminathan, B. (1996). Interpreting chromosomal DNA restriction patterns produced by pulsedfield gel electrophoresis: criteria for bacterial strain typing. Journal of Clinical Microbiology 33, 2233-2239. Tyler, K. D., Wang, G., Tyler, S. D. & Johnson, W. M. (1997). Factors affecting reliability and reproducibility of amplification-based DNA fingerprinting of representative bacterial pathogens. Journal of Clinical Microbiology 35, 339-346. van Belkum, A. (1994). DNA fingerprinting of medically important microorganisms by use of PCR. Clinical Microbiology Reviews 7, 174-184. van Belkum, A., Bax, R., Peerbooms, E, Goessens, W., van Leeuwen, N. & Quint, W.G.V. (1993). Comparison of phage typing and DNA fingerprinting by PCR for discrimination of methicillinresistant Staphylococcus aureus strains. Journal of Clinical Microbiology 31,798-803. van Belkum, A., Kluytmans, J., van Leeuwen, W., Bax, R., Quint, W., Peters, E., Fluit, C., Vandenbroucke-Grauls, C., van den Brule, A., Koeleman, H., Melchers, W., Meis, J., Elaichouni, A., Vaneechoutte, M., Moonens, E, Maes, N., Struelens, M., Tenover, E & Verbrugh, H. (1995). Multicenter evaluation of arbitrarily primed PCR for typing of Staphylococcus aureus strains. Journal of Clinical Microbiology 33, 1537-1547. Vaneechoutte, M. (1996). DNA fingerprinting techniques for microorganisms. A proposal for classification and nomenclature. Molecular Biotechnology 6, 115-142. Venugopal, G., Mohapatra, S., Salo, D. & Mohapatra, S. (1993). Multiple mismatch annealing: basis for random amplified polymorphic DNA fingerprinting. Biochemical and Biophysical Research Communications 197, 1382-1387. Versalovic, J., Thearith, K. & Lupski, J. R. (1991). Distribution of repetitive DNA sequences in eubacteria and applications to fingerprinting of bacteria genomes. Nucleic Acids Research 19, 6823-6831. Vogel, L., Jones, G., Triep, S., Koek, A. & Dijkshoorn, L. (1999). RAPD typing of Klebsiella pneumonia, Klebsiella oxytoca, Serratia marcescens and Pseudomonas aeruginosa isolates using standardized reagents. Clinical Microbiology and Infection 5, 270-276.
157 Webster, C.A. & Towner, K.J. (2000). Use of RAPD-ALF analysis for investigating the frequency of bacterial cross-transmission in an adult intensive care unit. Journal of Hospital Infection 44, 254-260. Webster, C.A., Towner, K.J., Humphreys, H., Ehrenstein, B., Hartung, D. & Grundmann, H. (1996). Comparison of rapid automated laser fluorescence analysis of DNA fingerprints with four other computer-assisted approaches for studying relationships between Acinetobacter baumannii isolates. Journal of Medical Microbiology 44, 185-194. Welsh, J. & McClelland, M. (1990). Fingerprinting genomes using PCR with arbitrary primers. Nucleic Acids Research 18, 7213-7218. Williams, J. G. K., Kubelik, A. R., Livak, K. J., Rafalski, J. A. & Tingey, S. V. (1990). DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Research 18, 6531-6535. Woods, C. R., Versalovich, J., Koeuth, T. & Lupski, J. R. (1992). Analysis of relationships among isolates of Citrobacter diversus by using DNA fingerprints generated by repetitive sequence-based primers in the polymerase chain reaction. Journal of Clinical Microbiology 30, 2921-2929.
This Page Intentionally Left Blank
159
7
Analysis of Microbial Genomic Macrorestriction Patterns by Pulsed-Field Gel Electrophoresis (PFGE) Typing
Marc J. Struelens ~,2, Raf De Ryck ~ and Ariane Deplano 1 1Centre for Molecular Diagnostic Microbiology, Department of Microbiology, HOpital Erasme and 21nfectious Diseases Epidemiology Unit, School of Public Health, Universit# Libre de Bruxelles, 808 route de Lennik, 1070-Bruxelles, Belgium
CONTENTS 7.1
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . .
159
7.2
PRINCIPLES OF PFGE
160
7.3
PRINCIPLES OF GENOMIC MACRORESTRICTION ANALYSIS
7.4
METHODOLOGY A. B. C. D. E.
7.5
........................
...........................
DNA preparation DNA macrorestriction digestion PFGE Molecular size standards Troubleshooting
Visual analysis Computer-assisted data analysis
7.6
APPLICATIONS
7.7
CONCLUSION AND PERSPECTIVES
REFERENCES
7.1
167 167 169
............................
................................
163 163 165 165 165 166
ANALYSIS AND INTERPRETATION OF DATA . . . . . . . . . . . . A. B.
. 162
................
172 173 173
INTRODUCTION
Macrorestriction analysis of microbial genomes resolved by pulsed-field gel electrophoresis (PFGE) has emerged during the 1990s as the method of choice for studying the molecular epidemiology of bacterial pathogens, at least at the level of outbreak studies and hospital epidemiology (Tenover et al., 1997; Struelens, 1998). Macrorestriction fingerprinting is a high resolution typing technique derived during the late 1980s (Grothues et al., 1988) from the combination of two tech9
Elsevier Science B.V. All rights reserved.
160 A
B (+)
C
(-)
-
(-)
(-)
+
(+)
+
(+)
T +
(-)
Fig. 7.1. Geometryof electrodes in commonlyused PFGE equipment: A, FIGE; B, RFE; C, CHEF.
niques: chromosomal restriction fragment pattern analysis using low-frequency cutting enzymes (typically with less than 30 cleavage sites per genome) and PFGE, a modification of agarose gel electrophoresis initially introduced in 1984 for providing electrophoretic karyotypes of yeast (Schwartz & Cantor, 1984) and other eukaryotic microorganisms. PFGE is based on gel electrophoresis in which the electric field periodically changes direction and/or intensity. In contrast to unidirectional electrophoresis, which cannot separate DNA molecules larger than about 50 kb in size due to molecular trapping in the gel, PFGE can separate DNA molecules as large as 12 Mb. 7.2
PRINCIPLES
OF PFGE
There are a number of physical models that have been proposed to predict the molecular separation of DNA in agarose gels by PFGE (Birren et al., 1988; Kozulik, 1995). In response to changes in the orientation of the electric field, large DNA molecules migrate through the agarose matrix in a zig-zagging motion which has been visualised experimentally using fluorescent microscopy and labelled phage DNA (Gurrieri et al., 1990). DNA molecules stretch out linearly in the direction of the electric field. When the field changes direction, the DNA molecules initially adopt a partially relaxed conformation, and then form multiple kinks in the direction of the new field until the predominant kink unfolds and the DNA molecule elongates again to migrate in the new direction. The larger the molecule, the longer the time required for relaxation and reorientation of its leading end. Thus, the time interval between changes of field direction, also called the pulse time, will primarily determine the size window for separation of DNA molecules by PFGE (Birren et al., 1988). During the 1980s, different instruments were developed for PFGE by using a diversity of geometry for placing the electrodes around the agarose gel. Fig. 7.1
161 schematically illustrates examples of systems commonly used for bacterial typing. The simplest and earliest instruments are based on field inversion gel electrophoresis (FIGE), in which a conventional gel chamber is exposed to periodic alternations in polarity of the electrodes. Net forward migration is obtained by increasing the ratio of forward to backward pulse times to 3:1. Zero integrated field electrophoresis is a modification of FIGE in which the electric field intensity is slightly higher forward than backward, thereby producing higher resolution, but at the cost of longer running-times. More recently, FIGE has been adapted to capillary electrophoresis systems (PFCE or pulsed-field capillary electrophoresis; Kim & Morris, 1995). Small amounts of DNA samples are injected into a capillary tubing containing a sieving matrix composed of long polymer chains. Under exposure to high frequency modulation of direct current and alternating voltage current, DNA molecules of up to 1.6 Mb in size can be separated within minutes, in contrast to the several hours or days required for PFGE (Kim & Morris, 1995). Another rapid method for separation of macrorestriction fragments has been described recently, based on ultra-sensitive flow-cytometry (Kim et al., 1999). Other PFGE instruments that allow co-linear separation of large DNA molecules include systems in which the electrode sets rotate around the gel chamber (RFE or rotating field electrophoresis; Fig. 7.1). Conversely, other types of apparatus rotate the gel periodically in a stationary field (RGE or rotating gel electrophoresis). However, due to mechanical constraints, these systems are fragile and do not allow very short pulse times. The most popular PFGE system used for bacterial typing is the contour-clamped homogeneous electric field (CHEF) apparatus, in which an hexagonal array of electrodes periodically alternate uniform fields with an angle of reorientation of 120 ~ CHEF instruments are available commercially from Bio-Rad (CHEF DR-II and III), Pharmacia (Pulsaphor) and BRL (Hexafield). A more sophisticated commercial model of the CHEF electrophoresis system with programmable electrodes (CHEF Mapper; Bio-Rad) allows versatile variation of electric field angle and shape that enables optimisation of resolution by molecular size range. Several inter-dependent physical and chemical factors affect the separation of DNA molecules in PFGE. The most important parameter is the pulse time. CHEF electrophoresis performed using a single pulse time will produce a non-linear relationship between fragment size and migration distance. The longer the pulse time, the larger the size of the molecules that can be separated, but at the cost of lower resolution, i.e., separation of DNA molecules or fragments of similar size. Maximal resolution is seen just below the maximum size of non-separated bands, called the compression zone. Thus, use of the shortest pulse time which permits separation of the desired upper size range will provide optimal resolution. The use of progressive increases in pulse time, called pulse time ramping or switch time ramping, can increase the linearity of size-dependent velocity of DNA fragments, particularly by using an exponential time ramp. A typical ramp from 5 s to 40 s enables optimal separation of DNA fragments from 50 kb to 600 kb, while increasing the pulse time to 75 s extends the separation to 1 Mb with standard CHEF conditions
162 (see below; Birren et al., 1988). The DNA migration rate will increase with voltage. At a constant pulse time, an increase in voltage will mobilise larger molecules, but with a decrease in band sharpness. A voltage of 6V/cm is used for the separation of molecules from 50 kb to 1.5 Mb in size, but should be reduced to separate larger molecules (Birren et al., 1988). The reorientation angle taken by alternating electric fields can be decreased from the normal 120 ~ to 96-106 ~ to faster separate molecules of >1 Mb in size, such as yeast chromosomes (Birren et al., 1988). The temperature and ionic strength of the electrophoresis buffer affects DNA velocity and resolution. At increased temperature, DNA migrates faster, but band resolution decreases. PFGE gels are normally run at temperatures between 12 and 15~ This temperature is maintained with the use of a buffer cooling and re-circulating system to prevent the generation of temperature gradients within the gel during the run. The agarose type and concentration, as well as the buffer strength, affect DNA migration in PFGE in a similar way to conventional electrophoresis. Special PFGE-grade agarose with a high physical strength and low electro-endosmosis is recommended. An agarose concentration of 1% is normally used for the separation of DNA fragments ranging from 50 kb to 1 Mb, but this concentration should be decreased for larger size separation. Use of more concentrated agarose (1.2-1.6%) increases band sharpness and resolution, but at the cost of longer running times and a lower range of DNA size separation (Kim et al., 1999). Low ionic strength buffers (e.g., 0.25-0.5 x TAE or TBE) are preferred to reduce heat generation and shorten the run times (Birren et al., 1988). 7.3
PRINCIPLES OF GENOMIC MACRORESTRICTION ANALYSIS
Macrorestriction analysis consists of PFGE separation of a limited number (<30) of large (>50 kb) fragments of a bacterial chromosome digested with infrequently cleaving restriction endonucleases, also called "rare cutters". These enzymes are selected on the basis of the rarity of occurrence of their recognition sequence in the target genome. For example, the tetra nucleotide CTAG is counter-selected and is therefore very rare in most bacterial genomes with high (>45%) G+C base composition. Endonucleases that include CTAG in their recognition sequence cleave such genomes less than once per 1 kb: e.g., XbaI (TCTAGA), SpeI (ACTAGT) and NheI (GCTAGC). Hexamer endonucleases with G+C recognition sequences are rare cutters of A+T rich prokaryotic genomes, and vice-versa (McClelland et al., 1987). Likewise, enzymes with eight and 10 nucleotide-long recognition sites are also used (e.g., NotI, Sill). As a general rule, the following enzymes are useful macrorestriction cutters based on genomic G+C content: (i) < 40% G+C: SmaI (CCCGGG), SaclI (CCGCGG), RsrlI (CGGWCGG), EagI (CGGCCG), NarI (GGCGCC), NaeI (GCCGGC); (ii) 40-50% G+C: AvrlI (CCTAGG), NheI (GCTAGC), SfiI (GGCCNsGGCC); (iii) 50-65% G+C: XbaI (TCTAGA), SpeI (ACTAGT); (iv) > 65% G+C: XbaI, SpeI, DraI (TTTAAA), SspI (AATATT).
163 Table 7.1. Infrequently cleaving endonucleases suitable for macrorestriction analysis of selected bacteria Organism
Enzymes a
Staphylococcus aureus Staphylococcus epidermidis Enterococcus spp. Streptococcus pneumoniae Clostridium difficile Escherichia coli Klebsiella pneumoniae Enterobacter spp. Serratia marcescens Citrobacter spp. Haemophilus influenzae Bordetella pertussis Neisseria meningitidis Pseudomonas aeruginosa Stenotrophomonas maltophilia Acinetobacter baumannii Burkholderia spp. Campylobacter spp. Bacteroides spp. Mycobacterium spp. Legionella pneumophila
SmaI, SstlI, CspI SmaI, SstlI, KpnI SmaI, ApaI SmaI, ApaI SmaI, SstlI, NruI XbaI, NotI, SfiI XbaI, AsnI XbaI, SpeI XbaI XbaI SmaI, RsrlI XbaI, AsnI, DraI NotI, BgllI SpeI, DraI, SspI, XbaI XbaI SmaI, ApaI SpeI SmaI NotI AseI, XbaI, DraI, AsnI SfiI, NotI
a These enzymes generate between five and 50 chromosomal fragments.
Examples of restriction endonucleases commonly used for macrorestriction and PFGE typing of important human pathogens are shown in Table 7.1. 7.4
METHODOLOGY
A.
DNA preparation
Conventional DNA extraction methods in solution are unsuitable for macrorestriction analysis because mechanical sharing will degrade genomic DNA into fragments of several hundred kb that are inadequate for megabase-size restriction digestions. To obtain intact chromosomal DNA, it is necessary to protect the DNA by incorporating bacteria into ultra-pure, nuclease-free, low-melting temperature agarose blocks, plugs or beads. All DNA preparative steps are then conducted by exposing agarose-embedded cells to lytic solutions and enzymes. It is very important to standardise the DNA concentration in the agarose plugs used for restriction and PFGE analysis to obtain comparable and informative frag-
164 ment patterns. Indeed, insufficient DNA will produce faint or incomplete patterns, whereas excess DNA will generate smears and result in retarded mobility of fragments during PFGE. The DNA concentration can be estimated from the bacterial cell density, which should be measured with a densitometer or nephelometer device and adjusted to a standard value for a given species, usually to between 1-5 x 10 9 cfu/ml. The cell preparation will depend on the particular organism. Non-fastidious organisms are normally grown overnight in appropriate broth medium with orbital shaking, whereas fastidious organisms are preferably harvested from colonies grown on an appropriate solid medium. Bacteria are suspended in EET buffer (100 mM EDTA, 10 mM EGTA, 10 mM Tris-HC1, pH7.8), recovered at 4~ by centrifugation, and then resuspended in SE buffer (75 mM NaC1, 25 mM EDTA, pH7.8) to a density of 1-5 x 10 9 cfu/ml, depending on the organism. A simpler method, recommended by the manufacturer of a ready-to-use PFGE typing kit (GenePath; BioRad), is to prepare the bacterial suspension, centrifuge it, and then visually compare the size of the pellet obtained in a micro-tube with a sketch of an optimal pellet. The bacterial suspension is briefly heated to 50~ mixed with an equal volume of 2% low-melting temperature agarose maintained at 56~ and then dispensed into plug molds. Molds are provided with PFGE equipment or can be custommade. As the quality of the macrorestriction PFGE patterns depends on plug thickness and regularity of shape, thin plugs are preferable. A mold chamber of 6 x 10 x 0.5 mm produces a block which can be cut with a scalpel blade after DNA preparation and restriction into smaller 4 x 2 x 0.5 mm blocks that can be inserted into the slots in a PFGE gel. When set, the agarose plugs are removed, placed into tubes with 0.5 ml of lysis solution, incubated at 56~ for a minimum period of 2 h (preferably overnight) with gentle shaking. The lysis solution used for Gramnegative bacteria contains 1% sodium lauroyl sarcosine (or sodium docecyl sulphate), 1 mg/ml proteinase K (or pronase), and 0.5 M EDTA, pH9.5. The detergent and proteolytic enzyme act together to remove cellular constituents, while the high EDTA concentration inhibits nuclease activity. Other microorganisms require additional cell-wall digesting enzyme treatments, including lysostaphin for staphylococci, lysozyme and mutanolysin for enterococci and streptococci, and zymolase for yeasts. To reduce the incubation time, these cell-wall lytic enzymes can be mixed with the cell suspension immediately prior to incorporation in the agarose (Goering & Winters, 1992) After cellular and protein lysis, the agarose blocks containing high molecular size DNA are extensively washed (at least four times for 30 min each) in TE buffer (10 mM Tris, 1 mM EDTA, pH7.5) to remove proteolytic activity, detergent and excess EDTA. Inactivation of proteinase K by the addition of phenylmethyl sulfonyl fluoride (which is very toxic) is sometimes recommended in the literature, but is not normally required if the washing procedure outlined above is followed. Agarose blocks containing intact DNA can be stored for several months in TE buffer at 4~ preferably with monthly changes of buffer.
165 B.
DNA macrorestriction digestion
Prior to digestion, plugs with intact DNA are cut into smaller portions (e.g., 2 x 4 x 0.5 mm) by placing them in a plastic sterile Petri dish placed over millimeter grid paper. Plugs must be equilibrated into a small volume (120 Ial) of the appropriate restriction enzyme reaction buffer for 30 min at the optimal temperature for enzyme activity. A sufficient amount (20 U) of enzyme in buffer is added, mixed and the plug is incubated for 4 h at the appropriate reaction temperature recommended by the manufacturer. Prior to loading in the gel slots, the agarose blocks containing digested DNA are washed in TE buffer and pre-cooled to 4~ together with the PFGE gel. Plugs are carefully inserted in the gel slots and sealed with 1% low-melting temperature agarose. C.
PFGE
The optimal running parameters used for separation of macrorestriction fragments will depend on the organism, enzyme and PFGE apparatus used. To determine these conditions, it is possible to either adapt previously published protocols or select parameters based empirically on the predicted size of the restriction fragments. Some systems (e.g., the CHEF-Mapper) provide algorithms that define optimal separation conditions based on the molecular size range of interest. Using CHEF electrophoresis, the following conditions are a good starting point for the analysis of bacterial macrorestriction fragments from 50-800 kb in size: i.e., pulse time ramp from 5 s to 50 s, at 6 V/cm, in a 1% gel, run at 13~ for 24 h. Based on the results obtained, modifications in pulse time and/or run time can be made to adjust the resolution for smaller or larger DNA fragments. DNA fragments are normally visualised after staining the gel for 30 min in ethidium bromide solution (0.5 jag/ml) and destaining in distilled water, followed by photography or digital image capture by a CCD camera under UV light. Alternatively, Southern blotting of DNA fragments on a nylon membrane and hybridisation of membranes with labelled DNA probes can be performed with PFGE gels (R6mling et al., 1994). Because DNA transfer is less efficient than with conventional agarose gels, it is best to depurinate the DNA in 0.25 M HC1 or to expose the DNA to UV-nicking before denaturation. Transfer is preferably performed with vacuum-blotting equipment. D.
Molecular size standards
The major limitation of PFGE, like any electrophoretic typing system, is the level of reproducibility achievable between patterns obtained within and between gels. To ensure adequate normalisation of patterns and accurate fragment size estimates, suitable molecular size markers should be included in at least every fifth lane. Suitable size markers should span the entire size range of chromosomal fragments at regular intervals to enable accurate standardisation of migration distance and cor-
166 rection of intra-gel distortions or "smiling". The usual size range required for macrorestriction analysis of bacterial genomes is from 20 kb to 1 Mb. Commercially available ladders available for PFGE include polymers of)~ phage DNA (monomer size of 48.5 kb) or plasmid pBR328, which can be used to cover the scale up to 200-300 kb, and Saccharomyces cerevisiae chromosomes which extend from 200 kb to >1.6 Mb. However, these markers tend to produce broader bands than bacterial chromosomal fragments, and are not easily standardised to contain equivalent DNA concentrations. A more advantageous molecular size standard can be prepared in-house by performing a chromosomal digest of reference bacterial strains that have been accurately mapped. As an example, a SmaI digest of Staphylococcus aureus strain NCTC 8325/3 DNA, which yields 16 evenly distributed fragments of 10-673 kb, can be used. Alternatively, Escherichia coli strain MG1655 DNA digested with NotI, or Pseudomonas aeruginosa DNA digested with SpeI (R6mling et al., 1994) can be used as size standards.
E.
Troubleshooting
Faint bands or a broad smear may indicate DNA degradation, which occurs during bacterial cell harvesting by the activity of endogenous nucleases. This occurs frequently with strains of Clostridium spp., and occasionally with strains of Campylobacterjejuni or P. aeruginosa. Treatment of the bacterial suspension in 10% formalin for 1 h, followed by three washes in saline, inhibits DNAse activity (Gibson et al., 1994). Another recently observed cause of this DNA degradation with P. aeruginosa and Streptomyces spp. (R6mling & Ttimmler, 2000) is cleavage by reactive Tris radicals in the electrophoresis buffer, which can be neutralised by adding 50 BM thiourea to the buffer. Broad bands may indicate incomplete digestion of DNA (if one heavy broad band is apparent on top of the lane) or DNA overload in the plugs. Bands can become distorted if the plugs are unevenly cut, only partially inserted into the gel, or damaged during insertion, or if the bacterial suspension is not homogeneous prior to agarose embedding. Care should be taken to avoid bacterial clumps being trapped in the agarose. Lanes can become distorted if the gel is not fastened securely during electrophoresis, if air bubbles are trapped underneath and generate localised heating in the gel, or if one electrode burns out and distorts the geometry of the electric field. To avoid gel displacement, a special adhesive support plate is available for casting gels from some PFGE equipment manufacturers (e.g., BioRad).
167 7.5
ANALYSIS AND INTERPRETATION OF DATA
A.
Visual analysis
When PFGE typing is applied to the comparison of a limited number of isolates from a presumptive outbreak, the analysis can be performed on one gel and patterns are then compared directly with each other. Strains of the same species share the same genome size, G+C content and codon usage. Therefore, macrorestriction patterns of unrelated isolates exhibit a similar number of chromosomal macrorestriction fragments in the same size range, but only a small minority of fragments will be of the same size. In contrast, clonally-related isolates that arise from the same contamination source, or from the same infection chain, and which share a recent common ancestor, produce either indistinguishable or closely related patterns. The question arises as to how minor pattern differences should be interpreted in terms of genetic relatedness, and this is related to the confidence with which an epidemiological association between isolates can be inferred (Tenover et al., 1995; Struelens et al., 1996). Minor differences arise naturally in a clonal lineage over the course of infection in individual patients or during epidemic spread. These differences are due to mutations or larger genetic rearrangements, such as insertion or deletion of mobile genetic elements, including insertion sequences, transposons, plasmids and bacteriophages. These changes occur at frequencies that vary according to the genomic plasticity of each bacterial species (or even lineage), and which also depend on the environment and the number of replication cycles undergone over time (Struelens et al., 1996). For example, the following empirical observations were made in two long-term monitoring studies of P. aeruginosa isolates colonising unrelated cystic fibrosis patients (Grothues & Tiimmler, 1991; Struelens et al., 1993c). Genomic macrorestriction analysis revealed that clonally related isolates from individual patients or siblings exhibited up to four XbaI and six DraI fragment differences, equivalent to 80% fragment matching. In contrast, isolates from unrelated patients differed by more than 10 XbaI or 20 DraI fragments, and patterns were less than 70% similar. Likewise, epidemiologically related strains of methicillin-resistant S. aureus were found to exhibit one to three fragment differences in their SaclI profiles, at >85% similarity, over the course of a hospital outbreak lasting 3 years (Struelens et al., 1992a). In keeping with these empirical findings and theoretical biological considerations, Tenover and colleagues (Tenover et al, 1997) have proposed interpretative criteria for macrorestriction patterns from small sets of isolates related to putative outbreaks of bacterial infection (Table 7.2). These criteria, which are widely used, are also consistent with typing guidelines proposed by the European Study Group of Epidemiological Markers (Struelens et al 1996). Fig. 7.2 illustrates the typical differences arising from a single mutation or rearrangement in one fragment of a "prototype" epidemic strain. This central reference pattern is defined by its modal frequency among the patterns exhibited by a set of outbreak-related isolates. The
168 Table 7.2. Proposed criteria for interpreting PFGE patterns of small sets of isolates (n<30) related to a putative outbreak (adapted from Tenover et al., 1997)
Number of fragment differences compared with the epidemic strain
Number of genetic Category of Epidemiological interpretation differences genetic relatedness
None 2-3 4-6 >7
0 1 2 >3 A
1A
Indistinguishable Closely related Possibly related Different B
1B
Part of the outbreak Probably part of the outbreak Possibly part of the outbreak Not part of the outbreak
C
D
E
1C
1D
1E
Fig. 7.2. Schematic diagram of typical variations in PFGE patterns of clonally-related isolates fol-
lowing the occurrence of single genetic events in the epidemic strain (at position of fragment labelled with asterisk). Lane A, epidemic strain pattern; B, mutation with gain of restriction site (three fragment differences); C, mutation with loss of restriction site (three fragment differences; D, insertion (two fragment differences); E, deletion (two fragment differences). All patterns are considered subtypes (A-E) of PFGE epidemic type 1 and would be labelled 1A to 1E (adapted from Tenover et al., 1995). situation b e c o m e s more complicated during a prolonged outbreak, where the "epidemic type" m a y progressively shift in frequency due to genetic drift and successive waves of spread. These criteria also need to be applied with caution in settings where only a limited n u m b e r of widespread endemic clones is circulating. In such a setting, even genotypically closely related isolates m a y not be part of the same outbreak.
169 The classification of PFGE patterns into types that are closely or possibly related is usually achieved by attributing a letter or numeral to indicate a PFGE type, with a letter or numeral suffix to indicate a subtype, e.g., type A1, A2 etc. (Fig. 7.2; Struelens et al., 1992a; Tenover et al., 1995). B.
Computer-assisted data analysis
Numerisation of PFGE fingerprints, together with computer-assisted normalisation and introduction of data into a relational database, is necessary to analyse patterns from large series of isolates compared on multiple gels. Computer-assisted analysis allows systematic quality control of patterns and rapid identification of identical or closely related patterns in a large database of several hundred patterns. Quantitative analysis of pattern similarity in large databases, assembled following multicentre surveys or long term surveillance programmes, can then be performed. The first step involves the digital acquisition of the gel image. This can be achieved by direct gel image capture with a CCD (charge coupled device) camera or by scanning a gel photograph with a flatbed document scanner. Although both systems offer similar resolution and dynamic range (100-300 dpi with 256 grey scale levels is sufficient), the CCD camera is more convenient and approximately 10-fold less costly in terms of consumables. For example, our laboratory uses a 30-well PFGE gel with a migration distance of 10 cm for separating S. aureus NCTC 8325/3 SmaI fragments spanning 10-674 kb. A CCD camera with a 587 x 768 pixel resolution will capture the informative band track with a vertical resolution of 380 data points. Digitised images are saved in TIFF files (tagged image file format) which can then be imported into fingerprint analysis software for personal computers. Several specialised software packages are commercially available, including GelCompar and B ioNumerics (Applied Maths, Kortrijk, Belgium), Taxotron (Taxolab, Institut Pasteur, Paris, France), and the BioImage system (BioImage, Ann Arbor, MI, USA). These programs use different approaches for comparing electrophoretic patterns (Gerner-Smidt et al., 1998). Some programs allow processing of a crude gel image, including removal of artefacts due to non-specific staining by background subtraction algorithms, optimisation of signal contrast over 0-256 grey levels, and rectification of distorted bands/tracks by interpolation within each lane and each gel. Pattern comparison can proceed in two different ways: either by normalising patterns between gels and performing pattern comparison, or by identifying each band fragment and performing band matching analysis, based either on their normalised migration distance or on their calculated molecular size. In the GelCompar and BioNumerics programs, a standard reference pattern in one gel is used as a reference system to normalise all subsequent gels by interpolation of the migration distances in tracks with the same reference molecular size standards in every gel. In the B ioImage and Taxotron programs, the molecular size of each fragment
170 is calculated from the size standards included in each gel. As described in detail elsewhere in this book, the comparison of patterns can be based on two types of similarity coefficient, i.e., densitometric curve comparisons based on the Pearson correlation coefficient, or fragment matching coefficients of similarity, such as the Dice coefficient. When using the Pearson coefficient of similarity, a bias is introduced in the comparison of PFGE patterns by factors such as the intensity of band staining (giving a higher weight to large DNA fragments), band shape and gel background, all of which are taken into account in the coefficient calculation. This coefficient is, however, the simplest to use. It is also more forgiving than band matching coefficients when analysing PFGE patterns containing a very large number of similar size fragments, or many faintly staining low molecular size fragments. However, the Dice coefficient, calculated as the number of matching size fragments multiplied by two and divided by the total number of fragments in a pair of patterns, is the preferred coefficient for comparing RFLP or PFGE patterns (Struelens et al., 1996). It is also known as the coefficient of Nei and Li when it is applied to RFLP data. Depending on the software, it will be calculated based on either normalised patterns with band identification, or on a table of fragments of calculated size. Although all software programs provide automatic band identification, the user must always verify that faint bands and double bands are detected and that artefacts are eliminated. Calculation of the molecular size of fragments by interpolation is preferably performed by using the spline regression method because the relationship between fragment size and migration distance is neither linear nor loglinear for PFGE gels. The precision of fragment size determination depends on the quality of the gels and on the molecular size standards used for normalisation. The percent size error can be limited to 5-7% of the fragment size or 0.5-1% of the total track length. These limits can be used for quality control of gels and for setting the tolerance limit in fragment matching comparisons. It is useful to include an internal control strain in every gel in addition to the molecular size standards that should be loaded on each side and in every fifth well of the gel. This internal control strain allows quality control of the reproducibility of the gel. Its pattern should be >96% similar between gels by the Pearson coefficient and 100% similar by the Dice coefficient with a 1% track length tolerance level. Any gel with lower reproducibility values should not be included in the database. Phenetic clustering of PFGE profiles can be performed by constructing similarity trees from the triangular matrix of pairwise similarity coefficients between isolates of interest in the database (Fig. 7.3). The most neutral and commonly used algorithm is the unweighted pair group method using arithmetic averages (UPGMA; Struelens et al., 1996). Another method, which aggregates similar profiles based on minimising inter-group branch length, is the neighbour-joining method. In these trees, the horizontal branch length measures the distance between isolates or groups of isolates (Fig. 7.4). Although such dendrograms facilitate visualisation of hierarchical groups of isolates based on macrorestriction polymor-
171
CHEF gel and matrix of Dice similarity coefficients (determined with BioNumerics software using a band matching tolerance level set at 0.8% of lane length) for SmaI patterns of methicillin-resistant S. aureus strains. Groups of closely-related subtypes (1a-f; 2a-c) and unrelated types (3 and 4) are shown.
Fig. 7.3.
UPGMA tree
40
Neighbor-Joining tree
% similarity 60 80
Type 1O0 lb le
% similarity Type 40 60 80 100 le ld
la la lc
lb ~
lc la
ld
la
If 49 2c ~
2a 2b
~ f
-lf i ~
49 2b
~ ~
2c 2a
Dendrograms of SmaI pattern similarity calculated with the Dice coefficient for the methicillin-resistant S. aureus strains typed in Fig. 7.3, constructed with the UPGMA (left) and neighborjoining (right) clustering methods. Fig. 7.4.
172 phisms, they should not be inferred to depict phylogenetic structure. In general, the correlation of PFGE pattern similarity with genomic relatedness decreases rapidly below 70% similarity values although, under certain conditions (e.g., use of multiple rare cutters), PFGE can provide valuable taxonomic depth beyond clonal lineages to the genus and species levels (Grothues & Ttimmler, 1991). 7.6
APPLICATIONS
PFGE analysis of microbial genomes has been applied in a variety of studies, including the physical mapping of bacterial chromosomes, the study of genomic evolution of bacterial clones in a specific habitat, and taxonomic studies from the genus to pathovar level. In molecular epidemiology, PFGE has been used to compare macrorestriction patterns of >120 bacterial species, belonging to 45 genera, and > 10 species of yeasts. Moreover, PFGE has been used for typing yeasts, filamentous fungi and protozoa based on their electrophoretic karyotype (Voss et al., 1995). A Medline search in mid-2000 yielded 1350 publications on "PGFE and bacterial typing", thereby illustrating the popularity of this technique. In addition to its broad range applicability with minor technical modifications, PFGE analysis of macrorestriction patterns has been found to perform with optimal discriminatory power (95-100%) for most bacterial species so far evaluated with large collections of unrelated isolates (Tenover et al., 1994; Grundmann et al., 1995; Struelens et al., 1996). This high resolving power is related to the fact that macrorestriction patterns scan >90% of the chromosome for large rearrangements (affecting >5% of fragment length) and approximately 0.05% of the genome for point mutations affecting restriction sites. In comparison with other phenotypic and genotypic markers, PFGE typing shows equal or superior epidemiological concordance and discrimination for many pathogens, including S. aureus (Struelens et al., 1993a; Kumari et al., 1997), Enterobacteriaceae (Arbeit et al., 1990; Gori et al., 1996), P. aeruginosa (Grundmann et al., 1995; Kersulyte et al., 1995), Acinetobacter baumannii (Seifert & Gerner-Smidt., 1995) and Legionella pneumophila (Struelens et al., 1992b; Fry et al., 1999). These characteristics explain why PFGE is one of the current methods of choice for typing most nosocomial and many community-acquired bacterial pathogens (Tenover et al., 1997; Struelens, 1998). The commonest application of PFGE typing is its use for comparative typing in local epidemiological investigations of suspect outbreaks of infection (De Gheldre et al., 1997; Tenover et al., 1997; Struelens, 1998; Struelens et al., 1998). This high resolution method can also provide useful insights in the study of clonality of acute and chronic infections (Arbeit et al., 1990; 1993; Struelens et al., 1993b, c; Hakim et al., 1998). It can also assist in differential diagnosis of relapse and re-infection, and in ascertaining the portal of entry of systemic infection by clonal delineation of isolates recovered from multiple samples in individual patients (Arbeit et al., 1993; Wendt et al., 1999). Macrorestriction analysis performed with standardised protocols and computer-
173 assisted normalised databases has been applied successfully to both local surveillance of nosocomial infection (Harstein et al., 1997; Lema~tre et al., 1998) and national and international surveys of the emergence, molecular evolution and geographical spread of epidemic clones of methicillin-resistant S. aureus (Roman et al., 1997; Deplano et al., 2000), Streptococcus pneumoniae (Lefevre et al., 1993; Ip et al., 1999), Salmonella enterica Typhimurium (Tassios et al., 1999), Neisseria meningitidis (Van Looveren et al., 1998) and P. aeruginosa (Tassios et al., 1998). However, efforts aimed at providing an international consensus on standard PFGE protocols for library typing of major epidemic pathogens such as MRSA have been hampered by the difficulties that many laboratories have in obtaining reproducible results, even when using ready-made reagents from a single manufacturer (Van Belkum et al., 1998). However, recent progress has been achieved in the USA, where 'PulseNet', coordinated by the Centers for Disease Control (CDC), enables standardised PFGE typing of Shiga-toxin-producing E.coli and salmonellae to be used by a network of public health laboratories (www.cdc.gov/ncidod/dbmd/ pulsenet/pulsenet.htm). Data can be compared directly in the central database after transfer on the Internet. Similar procedures are now being piloted in Europe, e.g., the HARMONY network (www.phls.co.uk/international/harmony.htm). 7.7
CONCLUSION AND PERSPECTIVES
Genomic macrorestriction analysis by PFGE (PFGE typing) has emerged over the past decade as the typing method of choice for most bacteria causing nosocomial infections, and many community-acquired pathogens as well. Although technical protocols have been simplified over recent years and the duration of some of the incubation periods has been shortened, the technique remains technically demanding and still requires at least a 72 h turnaround time. Further automation, possibly by means of development of PFCE, would enhance the practicability, throughput and speed of the analysis. Likewise, further international agreement needs to be achieved to develop standard protocols for data generation, quality control procedures, database structures and analytical criteria to enable the full use of PFGE typing for the international surveillance of global pathogens. REFERENCES Arbeit, R.D., Arthur, M., Dunn, R., Kim, C., Selander, R.K. & Golstein, R. (1990). Resolution of recent evolutionary divergence among Escherichia coli from related lineages: the application of pulsed field electrophoresis to molecular epidemiology. Journal of Infectious Diseases 161, 230-235. Arbeit, R.D., Slutsky, A., Barber, T.W., Maslow, J.N., Niemczyk, S., Falkinham, J.O., O'Connor, G.T. &von Reyn, C.E (1993). Genetic diversity among strains of Mycobacterium avium causing monoclonal and polyclonal bacteremia in patients with AIDS. Journal of Infectious Diseases 167, 1384-1390. Birren, B.B., Lai, E., Clark, S.M., Leroy, H. & Simon, M.I. (1988). Optimised conditions for pulsed field gel electrophoretic separations of DNA. Nucleic Acids Research 16, 7563-7583.
174 De Gheldre, Y., Maes, N., Rost, E, De Ryck, R., Clevenbergh, E, Vincent, J.L. & Struelens, M.J. (1997). Molecular epidemiology of an outbreak of multidrug-resistant Enterobacter aerogenes infections and in vivo emergence of imipenem resistance. Journal of Clinical Microbiology 35, 152-160. Deplano, A., Witte, W., Van Leewen, W.J., Brun, Y. & Struelens, M.J. (2000). Clonal dissemination of epidemic methicillin-resistant Staphylococcus aureus in Belgium and neighbouring countries. Clinical Microbiology and Infection 6, 1-7. Fry, N.K., Alexiou-Daniel, S., Bangsborg, J.M., Sverker, B., Pastoris, M.C., Etienne, J., Forsblom, B., Gaia, V., Helbig, J.H., Lindsay, D, Ltick, C., Pelaz, C., Uldum, S.A. & Harrison, T.G. (1999). A multicenter evaluation of genotypic methods for epidemiologic typing of Legionella pneumophila serogroup 1: results of a pan-European study. Clinical Microbiology and Infection 5, 462-477. Gerner-Smidt, P., Graves, L.M., Hunter, S. & Swaminathan, B. (1998). Computerized analysis of restriction fragment length polymorphism patterns: comparative evaluation of two commercial software packages. Journal of Clinical Microbiology 36, 1318-1323. Gibson, J.R., Sutherland, K. & Owen, R.J. (1994). Inhibition of DNAse activity in PFGE analysis of DNA from Campylobacterjejuni. Letters in Applied Microbiology 19, 357-358. Goering, R.V. & Winters, M.A. (1992). Rapid method for epidemiological evaluation of Gram-positive cocci by field inversion gel electrophoresis. Journal of Clinical Microbiology 30, 577-580. Gori, A., Espinasse, E, Deplano, A., Nonhoff, C., Nicolas, M.H. & Struelens, M.J. (1996) Comparison of pulsed-field gel eletrophoresis and randomly amplified DNA polymorphism analysis for typing extended spectrum-~-lactamase-producing Klebsiella pneumoniae. Journal of Clinical Microbiology 34, 2448-2453. Grothues, D. & Tfimmler, B. (1991). New approaches in genome analysis by pulsed-field gel electrophoresis: application to the analysis of Pseudomonas species. Molecular Microbiology 5, 2763-2776. Grothues, D., Koopmann, U., Von der Hardt, H. & Ttimmler, B. (1998) Genome fingerprinting of Pseudomonas aeruginosa indicates colonization of cystic fibrosis siblings with closely related strains. Journal of Clinical Microbiology 26, 1973-1977. Grundmann, H., Schneider, C., Hartung, D., Daschner, ED. & Pitt, T.L. (1995). Discriminatory power of three DNA-based typing techniques for Pseudomonas aeruginosa. Journal of Clinical Microbiology 33, 528-534. Gurrieri, S., Rizzarelli, E., Beach, D. & Bustamante, C. (1990) Imaging of kinked configurations of DNA molecules undergoing orthogonal field alternating gel electrophoresis by fluorescence microscopy. Biochemistry 29, 3396-3401. Hakim, A., Deplano, A., Maes, N., Kentos, A., Rossi, C. & Struelens, M.J. (1998). Polyclonal coagulase-negative staphylococcal catheter-related bacteremia documented by molecular identification and typing. Clinical Microbiology and Infection 5, 224-227. Hartstein, A.I., LeMonte, A.M. & Iwamoto, EK.L. (1997). DNA typing and control of methicillinresistant Staphylococcus aureus at two affiliated hospitals. Infection Control and Hospital Epidemiology 18, 42-48. Ip, M., Lyon, D.J., Yung, R.W.H., Chan, C. & Cheng, EB. (1999). Evidence of clonal dissemination of multidrug-resistant Streptococcus pneumoniae in Hong Kong. Journal of Clinical Microbiology 37, 2834-2839. Kersulyte, D., Struelens, M., Deplano, A. & Berg, D.E. (1995). Comparison of arbitrarily primed PCR macrorestriction (pulsed-field gel electrophoresis) typing of Pseudomonas aeruginosa strains from cystic fibrosis patients. Journal of Clinical Microbiology 33, 2216-2219. Kim, Y. & Morris, M.D. (1995). Rapid pulsed field capillary electrophoretic separation of megabase nucleic acids. Analytical Chemistry 67, 784-786. Kim, Y., Jett, J.H., Larson, E.J., Penttila, J.R., Marrone, B.L. & Keller, R.A. (1999). Bacterial fingerprinting by flow cytometry: bacterial species discrimination. Cytometry 36, 324-332. Kozulik, B. (1995). Models of gel electrophoresis. Analytical Biochemistry 231, 1-12.
175 Kumari, D.N.E, Keer, V., Hawkey, EM., Parnell, E, Joseph, N., Richardson, J.E & Cookson, B. (1997). Comparison and application of ribosome spacer DNA amplicon polymorphisms and pulsed-field gel eletrophoresis for differentiation of methicillin-resistant Staphylococcus aureus strains. Journal of Clinical Microbiology 35, 881-885. Lefevre, J.C., Faucon, G., Sicard, A.M. & Gasc, M. (1993). DNA fingerprinting of Streptococcus pneumoniae strains by pulsed-field gel electrophoresis. Journal of Clinical Microbiology 31, 2724-2728. Lema~tre, N., Sougakoff, W., Masmoudi, A., Fievet, M.H., Bismuth, R. & Jarlier, V. (1998). Characterization of gentamicin-susceptible strains of methicillin-resistant Staphylococcus aureus involved in nosocomial spread. Journal of Clinical Microbiology 36, 81-85. McClelland, M., Jones, R, Patel, Y. & Nelson, M. (1987). Restriction endonucleases for pulsed field mapping of bacterial genomes. Nucleic Acids Research 15, 5985-6005. Moissac, Y.R., Sheryl, L.R. & Peppler, M.S. (1994). Use of pulsed-field gel electrophoresis for epidemiological study of Bordetella pertussis in a whooping cough outbreak. Journal of Clinical Microbiology 32, 398-402. Roman, R.S., Smith, J., Walker, M., Byrne, S., Ramotar, K., Dyck, B., Kabani, A. & Nicolle, L.E. (1997). Rapid geographic spread of a methicillin-resistant Staphylococcus aureus strain. Clinical Infectious Diseases 25, 698-705. R6mling, U. & Ttimmler, B. (2000). Achieving 100% typeability of Pseudomonas aeruginosa by pulsed-field gel electrophoresis. Journal of Clinical Microbiology 38, 464-465. R6mling, U., Wingender, J., Mtiller, H. & Ttimmler, B. (1994). A major Pseudomonas aeruginosa clone common to patients and aquatic habitats. Applied and Environmental Microbiology 60, 1734-1738. Schwartz, D.C. & Cantor, C.R. (1984). Separation of yeast chromosome-sized DNA by pulsed field gradient gel electrophoresis. Cell 37, 67-75. Seifert, H. & Gerner-Smidt, E (1995). Comparison of ribotyping and pulsed-field gel electrophoresis for molecular typing of Acinetobacter isolates. Journal of Clinical Microbiology 33, 1402-1407. Struelens, M.J. (1998). Molecular epidemiologic typing systems of bacterial pathogens: current issues and perspectives. M~morias do Instituto Oswaldo Cruz 93, 581-585. Struelens, M.J., Deplano, A., Godard, C., Maes, N. & Serruys, E. (1992a). Epidemiologic typing and delineation of genetic relatedness of methicillin-resistant Staphylococcus aureus by macrorestriction analysis of genomic DNA by using pulsed-field gel electrophoresis. Journal of Clinical Microbiology 30, 2599-2605. Struelens, M.J., Maes, N., Rost, E, Deplano, A., Jacobs, E, Liesnard, C., Bornstein, N., Grimont, E, Lauwers, S., Mclntyre, M.P. & Serruys, E. (1992b). Genotypic and phenotypic methods for the investigation of a nosocomial Legionella pneumophila outbreak and efficacy of control measures. Journal of Infectious Diseases 166, 22-30. Struelens, M.J., Bax, R., Deplano, A, Quint, W.G.V. & Van Belkum, A. (1993a). Concordant clonal delineation of methicillin-resistant Staphylococcus aureus by macrorestriction analysis and polymerase chain reaction genome fingerprinting. Journal of Clinical Microbiology 31, 1964-1970. Struelens, M.J., Rost, E, Deplano, A., Maas, A., Schwam, V., Serruys, E. & Cremer, M. (1993b). Pseudomonas aeruginosa and Enterobacteriaceae bacteremia after biliary endoscopy: an outbreak investigation using DNA macrorestriction analysis. American Journal of Medicine 95, 489-498. Struelens, M.J., Schwam, V., Deplano, A. & Baran, D. (1993c). Genome macrorestriction analysis of diversity and variability of Pseudomonas aeruginosa strains infecting cystic fibrosis patients. Journal of Clinical Microbiology 31, 2320-2326. Struelens, M.J. & Members of the European Study Group on Epidemiological Markers (ESGEM) of the European Society for Clinical Microbiology and Infectious Diseases (ESCMID) (1996). Consensus guidelines for appropriate use and evaluation of microbial epidemiologic typing sys-
176 tems. Clinical Microbiology and Infection 2, 2-11. Struelens, M.J., De Gheldre, Y. & Deplano, A. (1998). Comparative and library epidemiological typing systems: outbreak investigations versus surveillance systems. Infection Control and Hospital Epidemiology 19, 565-569. Tassios, ET.T., Gennimata, V., Maniatis, A.N., Fock, C., Legakis, N.J. & the Greek Pseudomonas aeruginosa Study Group (1998). Emergence of multidrug resistance in ubiquitous and dominant Pseudomonas aeruginosa serogroup 0:11. Journal of Clinical Microbiology 36, 897-901. Tassios, ET., Gazouli, M., Tzelepi, E., Milch, H., Kozlova, N., Sidorenko, S., Legakis, N.J. & Tzouvelekis, L.S. (1999). Spread of a Salmonella typhimurium clone resistant to expanded-spectrum cephalosporins in three European countries. Journal of Clinical Microbiology 37, 3774-3777. Tenover, EC., Arbeit, R., Archer, G., Biddle, J., Byrne, S, Goering, R., Hancock, G., H6bert, A., Hill, B., Hollis, R., Jarvis, W.R., Kreiswirth, B., Eisner, W., Maslow, J., McDougal, L.K., Miller, M.J., Mulligan, M. & Pfaller, M.A. (1994). Comparison of traditional and molecular methods of typing isolates of Staphylococcus aureus. Journal of Clinical Microbiology 32, 407-415. Tenover, EC., Arbeit, R.D., Goering, R.V., Mickelsen, EA., Murray, B.E., Persing, D.H. & Swaminathan, B. (1995). Interpreting chromosomal DNA restriction patterns produced by pulsed-field gel electrophoresis: criteria for bacterial strain typing. Journal of Clinical Microbiology 33, 2233-2239. Tenover, EC., Arbeit, R.D. & Goering, R.V. (1997). How to select and interpret molecular strain typing methods for epidemiological studies of bacterial infections: a review for healthcare epidemiologists. Infection Control and Hospital Epidemiology 18, 426-249. Van Belkum, A., Van Leeuwen, W., Kaufmann, M.E., Cookson, B., Forey, E, Etienne, J., Goering, R., Tenover, E, Steward, C., O'Brien, E, Grubb, W., Tassios, E, Legakis, N., Morvan, A., E1 Solh, N., De Ryck, R., Struelens, M.J., Salmenlinna, S., Vuopio-Varkila, J., Kooistra, M., Talens, A., Witte, W. & Verbrugh, H. (1998). Assessment of resolution and intercenter reproducibility of results of genotyping Staphylococcus aureus by pulsed-field gel electrophoresis of SmaI macrorestriction fragments: a multicenter study. Journal of Clinical Microbiology 36, 1653-1659. Van Looveren, M., Vandamme, E, Hauchecorne, M., Wijdooghe, M., Carion, E, Caugant, D.A. & Goossens, H. (1998). Molecular epidemiology of recent Belgian isolates ofNeisseria meningitidis serogroup B. Journal of Clinical Microbiology 36, 2828-2834. Voss, A., Pfaller, M.A., Hollis, R.J., Rhine-Chalberg, J. & Doebbeling, B.N. (1995). Investigation of Candida albicans transmission in a surgical intensive care unit cluster by using genomic DNA typing methods. Journal of Clinical Microbiology 33, 576-580. Wendt, C., Messer, S.A., Hollis, R.J., Pfaller, M.A., Wenzel, R.E & Herwaldt, L.A. (1999). Molecular epidemiology of gram-negative bacteremia. Clinical Infectious Diseases 28, 605-610.
177
8
Selective Restriction Fragment Amplification by AFLP TM
P a u l J.D. J a n s s e n Rega Institute for Medical Research, Catholic University of Leuven, Belgium
CONTENTS 8.1
INTRODUCTION
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
178
8.2
METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Digestion of genomic DNA (i) Choice of restriction enzymes (ii) DNA quality (iii) Genome complexity, compositional bias and DNA modification B. Template preparation (i) Ligation of adaptors (ii) Adaptor features C. Selective PCR (i) Principles (ii) Primer structure (iii) Selection criteria D. Fragment separation and pattern visualisation (i) Use of denaturing polyacrylamide gels (ii) Agarose gel electrophoresis
178 179 179 181
8.3
DATA ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Digitisation of AFLP data B. Basic requirements for comparative analysis of AFLP patterns C. Reproducibility
191 191 192 193
8.4
APPLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. AFLP as a taxonomic tool (i) The genus Aeromonas (ii) The genus Acinetobacter (iii) AFLP genotyping in other taxonomic investigations B. Epidemiological typing of bacteria (i) Genotyping in the hospital environment (ii) AFLP analysis of food-borne pathogens C. AFLP for studying the molecular evolution of microbes D. Expression profiling
194 194 194 195 196 197 197 200 201 202
8.5
COMPARISON
204
8.6
FUTURE PROSPECTS AND CONCLUSIONS
9
WITH OTHER METHODS
Elsevier Science B.V. All rights reserved.
. . . . . . . . . . . . . . ............
184
185 185 186 187 187 188 188 189 189 190
205
178 REFERENCES
8.1
................................
206
INTRODUCTION
AFLP was originally developed at KeyGene International (Wageningen, The Netherlands) as a universal DNA fingerprinting method with applications in a large variety of fields, including the monitoring of traits in plant and animal breeding, the diagnosis of disease in humans, animals and plants, parentage analysis, forensic analytical studies, and the genotypical characterisation of fungi, yeast or bacteria (Zabeau & Vos, 1993). Looking back over the past 7 years during which the technique has been put to use, it is clear that any kind of DNA, regardless of its source, composition or complexity, can be analysed by this method. During this period, AFLP has proven to be an important alternative in genotypic analysis and its concept has formed the basis for new applications, including genetic mapping and expression profiling. The aim of this chapter is to give the reader a thorough understanding of the method's principles, with attention to possible technical pitfalls and advances, and to provide on overview of what has been accomplished so far with AFLP in regard to the typing and characterisation of bacteria. 8.2
METHODOLOGY
The name AFLP chosen by its inventors reflects the resemblance of the method to Restriction Fragment Length Polymorphism (RFLP) analysis and has been filed as a trademark by KeyGene International, Wageningen. Unfortunately, in the literature there seems to be some confusion on this point, and often AFLP is used as an acronym for 'Amplified Fragment Length Polymorphism'. It is important to make this distinction as the latter definition has been used on several occasions in reference to genotypic methods that produce banding patterns comprising amplicons of variable length, including randomly amplified polymorphic DNA (RAPD) analysis (Williams et al., 1990), arbitrarily primed PCR (AP-PCR) analysis (Welsh & McClelland, 1990), and DNA amplification fingerprinting (DAF; Caetano-Annol6s et al., 1991). Strictly speaking, and based on the technical layout, AFLP should be classified under the category of Selective Restriction Fragment Amplification (SRFA) techniques employing short double stranded 'adaptors' or 'indexers' that are ligated to genomic restriction fragments, and which serve as primer binding sites for amplification (Vaneechoutte, 1996). The AFLP concept can be largely divided into three steps (Fig. 8.1): (i) digestion of total cellular DNA with one or more restriction enzymes and ligation of restriction half-site-specific adaptors to all restriction fragments; (ii) selective amplification of some of these fragments with two PCR primers that have corresponding adaptor and restriction site-specific sequences; and (iii) electrophoretic separation of amplicons on a gel matrix, followed by visualisation of the banding pattern.
179
Fig. 8.1. Main principles of the AFLP method: (1) digestion of total genomic DNA; (2) ligation of adaptors; (3) amplification with adaptor-specific primers and electrophoretic separation of amplicons (see text for details). A.
Digestion of genomic DNA
(i) Choice of restriction enzymes It is well known that microorganisms display a wide range in G+C content of 26 to 72 mol%. Because an adequate high number of restriction fragments must be obtained to examine the entire genome, a good choice of restriction enzymes is important. For instance, when choosing EcoRI (G/AATTC) for a high GC genome (>60 mol%), too few representative fragments will be obtained and large segments of the genome will remain unexplored. In the original AFLP procedures (Vos et al., 1995), two restriction enzymes were used: one digesting the DNA infrequently (called the '6-base cutter') and one that digests the DNA frequently (known as the '4-base cutter'). The simultaneous use of two restriction enzymes is beneficial. First, heterologous ends are produced, allowing ligation of adaptors with different sequences so that many primer pairs can be used on the same template. This facilitates the analysis of highly monomorphic genomes to a great extent (Keim et al., 1997; see later). Second, by using a mix of enzymes, only a limited number of fragments (due to the 6-base cutter) are obtained that are relatively small (due to the 4-base cutter) and amplification is thus straightforward. In addition, the resulting amplicons are in an optimal size range (50-1000 bp) for high resolution separation on denaturing 5-6% (w/v) polyacrylamide gels (i.e., sequencing gels).
180 In spite of these advantages, several papers describe the use of a single restriction enzyme for AFLP in which amplicons are separated on conventional 1.5-2% (w/v) agarose gels stained with ethidium bromide (Valsangiacomo et al., 1995; Picardeau et al., 1997; Clerc et al., 1998; Gibson et al., 1998). This obviously saves time because the DNA does not require labelling and there is no need for pouring ultra-thin slab gels involving meticulous plate cleaning. It also saves costs as no special equipment is required. However, it is unclear whether these technical simplifications have an effect on the usefulness of the method with respect to bacterial typing and characterisation. Clerc et al. (1998) have described the use of a single 4-base cutter MspI (C/CGG) and AFLP primers that contain three selective nucleotides. In this case, the rationale of using +3 primers is that a 4-base cutter will produce many more fragments. Assuming random base distribution, the cutting frequency F of a restriction enzyme can be estimated from the probability p that a defined string of n bases will occur in any given sequence of DNA as follows: 1
1
F = p ( N 1 N z N 3 ... N n ) = - x - x - x . . .
4
4
1
1
4
=~. 4n
(1)
The number of restriction sites in a genome with size S is then given by S x E Hence, for a genome size of 5 million bases, roughly 20,000 fragments are expected when using a 4-base cutter. Because simplified banding patterns are easier to interpret and analyse, this large number of fragments must be reduced substantially during the selective PCR. This can be cumbersome because the use of +3 primers may affect reproducibility due to mismatches at the first or second position of the selective extension (see later). In the study of Clerc et al. (1998), MspI-generated banding patterns contained up to 37 amplicons. This is about seven times more than would be expected theoretically (e.g., 20,000 divided by 64 x 64 equals 4.77), a good indication that mismatching does indeed occur. Of course, the exact size and base composition of the genomes under study and the composition of the 3'-extensions of the AFLP primers should be taken into consideration. Consequently, Clerc et al. (1998) tested more than 40 +3 primers, 12 of which gave suitable patterns. Unfortunately, data on reproducibility experiments were not included, and no information was given on the average size of the amplicons, so that the quality of the banding patterns obtained remains uncertain. Valsangiacomo et al. (1995) and Picardeau et al. (1997) used PstI (CAGCT/G) as the sole restriction enzyme for AFLR After selective amplification, banding patterns contained 5-10 and 3-8 amplicons, respectively, in the size range of 0.5-2.5 kb. Whereas Valsangiacomo et al. (1995) used AFLP primers with one or two selective bases, Picardeau et al. (1997) resorted to the combined use of +2 and +5 primers to limit the number of amplicons. In the latter study, deploying a total of 4-10 selective bases casts serious doubts on the specificity of the PCR since, theoretically (e.g., for a PstI digest on bacterial DNA), only a few thousand restriction fragments are expected as starting material. Mismatching in this case
181 is thus very likely. In fact, the banding patterns generated by Picardeau et al. (1997) had a clearly visible background of what appears to be aberrantly amplified fragments. It would be interesting to know which PCR profile for AFLP was used, but this information was not included. Table 8.1 lists some of the restriction enzymes and selective primers that have been used so far for AFLP with microbial DNA. As a thumb of rule, using the original AFLP procedures (Vos et al., 1995), the combined digests of EcoRI / MseI or ApaI / TaqI seem to be most suited for microbial DNAs with low (<40 mol%) and high (>55 mol%) G+C content, respectively, and the combination HindIII / TaqI appears to be useful for genomes with intermediate G+C content (40-55 mol%) (Janssen et al., 1996). The enzymes ApaI (GGGCC/C), HindIII (A/AGCTT), EcoRI (G/AATTC), TaqI (T/CGA), and MseI (/TTAA) are reliable and, with the exception of MseI, inexpensive.
(ii) DNA quality There are numerous methods available for the preparation of microbial genomic DNA. The most commonly used methods are based on cell lysis through addition of cell wall degrading enzymes (e.g., lysozyme, lysostaphin, proteinase K, etc.) and ionic agents such as SDS or sarcosyl, followed by phenol and chloroform extractions. Methods that aim to obtain high molecular size DNA from organisms known to release high levels of DNAses often make use of nucleic acid stabilising agents, such as cetyltrimethylammonium bromide (CTAB) or guanidinium thiocyanate (GTC). For AFLP it is strongly recommended to use DNA of adequate purity and high molecular size. The simple reason lies in the fact that the activity of restriction enzymes may be sensitive to co-purified impurities, leading to inspecific cleavage ('star-activity') or incomplete digestion. If this occurs, partial fragments of aberrant size will be amplified with serious consequences for data interpretation and reproducibility. Nonetheless, Clerc et al. (1998) reported the use of a quick boiling method to produce DNA from Pseudomonas syringae grown on a specific medium. Cell suspensions were boiled for 10 min and the cleared lysate was used directly for combined restriction-ligation reactions and subsequent selective PCR. As no reproducibility experiments were conducted, it remains unclear if there were any significant differences between AFLP patterns obtained with purified DNA compared to those obtained with crude lysate DNA. Presumably, because Clerc et al. (1998) used a frequent cutter (MspI) as the sole restriction enzyme, low level degradation was not detected. Recently, a quick DNA method for AFLP analysis of pneumococci and staphylococci was tested (E Janssen & J. Van Eldere, unpublished results). Cells were lysed and genomic DNA was prepared with the InstaGene kit (Bio-Rad, Richmond, VA), involving a boiling step and intensive vortexing. The DNA was digested with HindIII and TaqI without further purification. Fragments were tagged with corresponding adaptors, selectively amplified, and separated on an automated sequencing apparatus. In the range of 0-500 bp, banding patterns obtained with crude lysate preparations were very similar to AFLP patterns obtained with purified
Table 8.1. Restriction enzymes and selective primers used for AFLP analysis of bacterial genomes
Organism
G
Clostridium beijerinckii
9
9
t.~ 9
d
No. bands e
Reference
Size a
Enzymes b
Primers c
Visualization
28
5.8
EcoRI / MseI
C / C (C / A) f
DP, [7-32P]
Bacillus anthracis
33
4.5
EcoRI / MseI
N/N
DP, [7-32P]
50-90
Keim et al. (1997)
Campylobacter spp.
30-38
1.7-2.0
HindIII/HhaI
A/A
DP, fAFLP
40-50
Duim et al. (1999)
Campylobacter spp.
30-38
1.7-2.0
BglII / Csp6I
- /A
DP, FAM
50
Kokotovic & On. (1999)
DP, FAM DP, Cy5
60-80
Kokotovic et al. (1999)
70-80
Grady et al. (1999) Hookey et al. (1999)
"l- C a
26 (50-60) f Janssen et al. (1996)
Mycoplasma spp.
30-35
- /-
34
0.6-1.0 2.8
BgIII / MfeI
Staphylococcus aureus
E-M and A-T
-/C resp.-/G
Staphylococcus aureus
34
2.8
EcoRI / MseI
- / AT
DP, FAM
40-75
Staphylococcus epidermidis
30-37
2.4
EcoRI / MseI
- /A
DP, [7-32P]
40-70
Sloos et al. (1998)
Acinetobacter spp.
38-47
_+2.6
HindIII / TaqI
A / AA
DP, [7-32P]
20-50
Janssen et al. (1997)
Acinetobacter spp.
38-47
_+2.6
EcoRI / MseI
A/C
DP, Tex, [~_32p]
30-55
Koeleman et al. (1998) Pedersen et al. (1998)
Vibrio spp.
38-51
2.8-3.5
HindIII/TaqI
A/A
DP, Tex, [7-33p]
50-60
Helicobacter pylori
39
1.7
HindIII
A, G, C, T
Aga, EtBr
15-20
Legionella pneumophila
39
4.1
PstI
G, GC, A, AT
Aga, EtBr
5-10
Paenibacillus larvae
42
NA
EcoRI / MseI
C/A
DP, [qt-32P]
60-70
Heyndrickx et al. (1996)
Chlamydia spp.
45
1.0-1.2
EcoRI / MseI
- /C
DP, Tex
45-50
Meijer et al. (1999)
Escherichia coli
51
4.6 4.4-4.7
EcoRI / MseI
- / AA AA/-
DP, FAM DP, [~_33p]
50
Arnold et al. (1999b)
45-55
Aarts et al. (1998)
6.2
PstI
GC / ATTAG g
Aga, EtBr
2.2
HindIII / TaqI
A/A
DP, Cy5
60-70 100-120
Salmonella spp.
EcoRI / MseI
Gibson et al. (1998) Valsangiacomo et al. (1995)
Mycobacterium kansasii
50-53 NA
Streptococcus pneumoniae
43
Streptococcus pyogenes
NA
2.0
EcoRI / MseI
- /T
DP, FAM
Aeromonas spp.
57-63
4.5
ApaI/TaqI
A/A
DP, TAMRA
25-35
Huys & Swings (1999)
Aeromonas spp.
57-63
4.5
ApaI / TaqI
A/A
DP, [qt-32P]
40-50
Huys et al. (1996b)
Stenotrophomonas spp.
NA
NA
ApaI / TaqI
G/G
DP, [~_32p]
45-60
Hauben et al. (1999)
Pseudomonas syringae
60
5.6
MspI
NNN
Aga, EtBr
24-37
Clerc et al. (1998)
(Continued.)
3-8
Picardeau et al. (1997) Van Eldere et al. (1999) Desai et al. (1998)
Table 8.1. Continued. Organism
G
Pseudomonas aeruginosa Xanthomonas axonopodis
9
9
9
d
No. bands e
Reference
Size a
Enzymes b
Primers c
Vlsuahzatlon
63
5.9
DP, Cy5
50
Speijer et al. (1999)
4.5
EcoRI / MseI E-M-T-P h
- /C
64
N /N
DP, [T-32p]
21-64
Restrepo et al. (1999)
Pseudomonas spp.
58-68
_+6.0
EcoRI / MseI
A/C
DP, AgNO 3 staining
15-25
Geornaras et al. (1999)
Burkholderia spp.
67-69
4.1-8.1
ApaI / TaqI
A / A ( -/-
DP, FAM, ([T-32P])
40 (25)
Coenye et al. (1999)
+
C a
)
a The GC-content is expressed in mol%, approximate genome size is in million bases (Mbp); NA, not available. b Abbreviations: E, EcoRI; P, PstI; A, ApaI; M, MseI; T, TaqI. c Only 3'-selective bases are given. Primers separated by a comma were used separately. N, A or G or C or T. d Abbreviations: DP, denaturing PAGE; Aga, agarose gel electrophoresis; EtBr, ethidium bromide; fAFLP, fluorescent label not specified; Cy5, 5-carboxyfluorescein; TAMRA, carboxytetramethyrhodamine; FAM, 6-carboxyfluorescein; Tex, Texas Red | (sulphonyl chloride derivative of sulphorhodamine 101). e Only for primer pairs that were used in further analyses. f In total, 17 C. beijerinckii strains were investigated (unpublished). g Presumably used as a primer pair.
184 DNA from the same strains, although one- or two-band differences were occasionally observed. In addition, the level of reproducibility was higher when DNA of high integrity and molecular size was used as a template. Nevertheless, such short protocols may become useful for the rapid automated typing of microbial isolates, particularly in preliminary screening schemes.
(iii) Genome complexity, compositional bias and DNA modification According to genomic data available on the Web (i.e., The Institute for Genomic Research - TIGR - Microbial Database; http://www.tigr.org/), the size of prokaryotic genomes falls in a 20-fold range between 0.5 and 10 million basepairs (Mb), with most genomes measuring between 2 and 5 Mb in length. Although the final number of restriction fragments is determined primarily by the size of the genome, it is also very important to consider the base composition and distribution of the genome, and to bear in mind that a DNA modification system may be present. The formula of Nei & Li (1979) takes into account the variation in mol% G+C content and calculates the restriction frequency F as:
where g is the fractional G+C content of the genome DNA, and m and n are the numbers of G+C base pairs and A+T base-pairs in the recognition site, respectively. However, when calculated values were compared to the outcome of chromosomal digests, it was found that this formula was actually a poor predictor of restriction frequency (Phillips et al., 1987; Owen, 1989) because genomes often exhibit compositional variance in di-and trinucleotide sequences (Karlin et al., 1997). These variations are thought to be introduced by several factors, including mutation pressure, base stacking stability, DNA repair and codon usage. To account for such variations, Phillips et al. (1987) used Markov chain analysis to determine the mono- through hexanucleotide composition of the Escherichia coli genome. A few years later, Forbes et al. (1991) adopted this idea to calculate the mean fragment length for a given combination of restriction enzyme and genome, and compared the computed values with electrophoretic results for a range of restriction enzymes digesting a number of bacterial genomes. Given that restriction frequency and mean fragment length are reciprocal, the restriction frequency (F) for a 4-base cutter with recognition sequence (NIN2N3N4)c a n be calculated from: F = f(N1N2N3)'f(N2N3N4) f(N2N3)
(3a)
with f b e i n g the frequency of di- and trinucleotides for a given genome in percentages. For a 6-base cutter with restriction site (N1NzN3NaNsN6), this would translate into:
185
F = f(N1N2N3)'f(N2N3N4)'f(N3N4Ns)'f(N4NsN6). f(N 2N3 ).f(N 3N4).f(N aN 5)
(3b)
Nowadays, entire microbial genomes are being sequenced at an astonishingly high rate, and it has become a rather simple matter to obtain reliable estimates of di- and trinucleotide frequencies for a genome of interest. In fact, some Webbased applications provide this kind of information for any set of genes, for a given number of input sequences, or even for entire genomes, bringing precise calculation of expected restriction frequencies within easy reach. Still, computations such as those outlined above do not necessarily agree with the real situation. Detailed comparison of the expected and observed oligonucleotide frequencies in various bacterial species actually suggest that palindromic restriction sites (e.g., recognition sites for class II restriction enzymes), are under-represented in bacterial genomes (Rocha et al., 1998). Nonetheless, for a rough estimate of the expected number of fragments, formula (1) is quite useful to remember, while Markov chain analysis, as illustrated by formula (3), appears to be better suited for accurate estimates. An additional factor that may influence the cutting frequency of restriction enzymes is DNA modification, particularly in the form of methyltransferases that 'protect' recognition sites from restriction by base methylation. Methyltransferases have been observed in many bacteria, although detailed information on their occurrence and functional diversity is scarce. As a final remark, it is generally a good idea to gain further knowledge of cleavage frequencies by empirical determination, and to test out a whole range of restriction enzymes on a number of strains in order to find out which enzyme combinations (and, therefore, which primer combinations) will ultimately produce the most suitable banding patterns.
B.
Template preparation
(i)
Ligation of adaptors
After digestion of the genomic DNA, fragments are tagged with short double stranded DNA molecules that bind in a complementary fashion to the restriction half-sites. In the presence of T4 ligase and ATE a phosphodiester link is created between the 3'-OH and 5'-PO 4 groups at the end of each adaptor and fragment (Fig. 8.2, step 1). Note that, because unphosphorylated oligonucleotides are used for the adaptors, there is only one phosphodiester link for each adaptor-to-fragment ligation (Fig. 8.2, step 2), meaning that if DNA was denatured prior to the addition of Taq DNA polymerase and dNTPs, no amplification would occur (Vos et al., 1995). The PCR in AFLP works because Taq polymerase fills in the 3'-recessed ends or displaces the non-ligated strand at ambient temperatures (Vos et al., 1995), thereby ensuring the presence of the originally intended primer binding site (Fig. 8.2, step 3).
186
Fig. 8.2. Preparationof template DNA for selective amplification (see text for details). (ii) Adaptor features When two restriction enzymes are used, adaptors have different sequences allowing a combination of PCR primers. Obviously, the core sequences of the adaptors are dictated by proper primer design, i.e., inverted repeats and long nucleotide runs should be avoided. Also, although the sequences must be sufficiently different to ensure specificity, the T m of both primers should be in the same range. Once the primer sequences are chosen, adaptors can be designed. An essential feature of AFLP adaptors is that they contain a base change so that the original restriction site is not reconstructed during ligation. In the example of Fig. 8.2, a HindIII-adaptor with a 3'-cytosine is used for this purpose. This has the advantage that the ligation can be done in the presence of restriction enzymes, thereby preventing fragmentto-fragment ligation and simultaneously stimulating adaptor-to-fragment ligation. In addition, because adaptors are made of unphosphorylated oligonucleotides, they are unable to ligate to each other. The combination of these features ensures that virtually all restriction fragments are tagged with the appropriate adaptor sequence. In the original AFLP procedure (Zabeau & Vos, 1993), the adaptor corresponding to the 6-base cutter half-site was labelled with a biotin group. Prior to amplification, fragments created by the rare cutter (i.e., HindIII-TaqI templates) were purified with streptavidine-coated magnetic beads. By so doing, essentially all
187
Fig. 8.3. Selective amplification with AFLP primers; only a perfect match at the 3'-end between primer and template results in DNA chain elongation and subsequent PCR amplification. fragments with 4-base cutter half-sites located at both ends, otherwise accounting for up to 90% of the complete template population, were successfully removed. This renders the reaction mixture less complex, while achieving an enrichment for fragments with heterologous ends so that all reaction components are used optimally. This purification step often improves the quality of the banding pattern, especially when using eukaryotic DNAs (Zabeau & Vos, 1993). However, for AFLP analysis of bacterial genomes, template mixtures contain many fewer fragments, and there is no need to eliminate the templates produced by the frequent cutter (Vos et al., 1995; Janssen et al., 1996). In addition, due to careful primer design and extensive optimisation of PCR conditions, purification with coated beads has now become redundant (see next section). C.
Selective PCR
(i) Principles AFLP primers consist of the adaptor-derived core sequence, including the 3'-part of the restriction half-site, and an extension sequence made up by a number of so-called selective bases. These selective bases are complementary to nucleotides flanking the restriction sites. For a given template, elongation of the DNA chain
188 will only take place if the primer binds with high specificity, e.g., if the corresponding complementary nucleotide is present in the fragment (Fig. 8.3). In addition, exponential amplification is only achieved when DNA synthesis occurs from both ends. The stringency of primer binding is essential for the success of reproducible AFLE According to Vos et al. (1995), primers with mispaired 3'-ends don't participate in the amplification process, especially if only one or two selective bases are used per primer. In addition, a typical AFLP amplification profile starts with a high annealing temperature (63-65~ which is gradually lowered (usually by 0.7-1.0~ per cycle) until the optimal annealing temperature is reached. This part of the PCR profile, often referred to as the 'touch down', usually consists of 10-12 cycles and ensures dissociation of mismatched primers, thus improving specificity. The final annealing temperature, usually about 52-56~ is subsequently maintained for a further 18-24 cycles, depending on the application. Another characteristic of selective PCR in AFLP with two enzymes is that small fragments with homologous ends (i.e., fragments produced by the 4-base cutter only) appear to be amplified far less efficiently. Possible explanations are a difference in annealing temperatures between the two selective primers and the formation of 'panhandle' stem-loop structures (Vos et al., 1995). The fact that these fragments do not actually participate in the amplification process has made the purification step over magnetic beads (see previous section) superfluous. (ii) Primer structure Obviously, proper primer design is very important. Hairpin structures should be avoided and the formation of self-dimers and cross-dimers should be kept to a minimum as much as possible to prevent the loss of reaction components. Also, it is preferred that a primer sequence does not contain long stretches of As or Ts, because this may cause local instabilities in the primer-template hybrid, and the G+C-content should range between 40 and 60%, allowing highly specific annealing between primer and template at 56-60~ For AFLP primers, another feature is the obligatory presence of a 5'-guanine residue. This prevents the generation of socalled 'double bands' that were observed occasionally in older AFLP gels due to incomplete addition of an extra nucleotide to the synthesised strands. This terminal transferase activity, also known as 'extendase' activity, of the DNA polymerase is 3'-base dependent and has been studied in detail by Hu (1993). Apparently, if the 3'-base is a cytosine, extendase activity is quite strong and mainly adenines are added. Using AFLP primers with a 5'-guanine thus guarantees that most amplicons will be identical and have the same electrophoretic properties, leading to the formation of discrete bands and improving comparative analysis of banding patterns. (iii) Selection criteria Assuming random base distribution, one out of four fragments will be amplified for each selective base used. However, in reality, the final complexity of the AFLP pattern is determined not only by the number of selective bases used, but also
189 by the choice of the selective bases and the base composition of the genome. Table 8.1 provides a summary of selectivity according to the primers used and the genome being studied. A tendency towards matched base composition between primer extension and genome can be clearly seen from these data. For instance, a C/C combination of AFLP primers to analyse the Clostridium beijerinckii genome (28 mol% G+C) generates an average of 26 amplicons, whereas a C/A combination of primers gives roughly two times that number, simply because more Ts are available for base pairing. In general, when properly executed, AFLP methodology deploying +1 or +2 primers (i.e., primers with one or two selective bases, respectively) ensures a proportional reduction in the number of amplicons for each selective base used, and mismatching - if it occurs at all - does not reach a detectable level because of the high-stringency conditions of the PCR. However, low-level mismatching does occur for +3 primers, and selectivity is lost when +4 primers are used (Vos et al., 1995). To circumvent the problem of mismatching, a two-step amplification strategy was developed for the AFLP analysis of complex genomes (Vos et al., 1995). First, fragments are amplified with +1 primers in a pre-amplification reaction. After completion, the reaction mixture is diluted and used for the main amplification reaction with +3 primers. This reduces background 'smears' to a large extent, and has the additional advantage that large quantities of template DNA can be produced. For most bacterial genomes, it will suffice to use one or two selective bases and tune the final number of amplicons by testing out a few primer pairs.
D.
Fragment separation and pattern visualisation
(i) Use of denaturing polyacrylamide gels The original AFLP procedures described by Zabeau & Vos (1993) and Vos et al. (1995) made use of conventional sequencing gels containing 7-8 M urea and 4.5-6% cross-linked polyacrylamide. Typical run conditions for electrophoresis were a constant power of 50 W and 40-50 V/cm, a running buffer of 0.6 x TBE (TBE: 100 mM Tris, 100 mM boric acid, 2 mM EDTA, pH 8.3) and a run time of about 2.5 h. With these conditions, the loading front migrated through approximately 90% of the gel length, with an excellent separation of fragments ranging in size between 50 and 600 nucleotides. Typically, detection of AFLP amplicons required the labelling of one of the primers with either P-32 or P-33, followed by visualisation of the banding patterns by autoradiography or by the use of a phosphorimager (Fig. 8.1). However, many research institutes prefer to limit the use of radioactivity, and methods for non-radioactive detection of nucleic acids have been put forward. Chalhoub et al. (1997) evaluated the use of silver-staining for the visualisation of AFLP products, and concluded that autoradiography and silver-staining displayed a similar resolution with equal sensitivities. Unfortunately, silver-staining detects both strands of the AFLP products, thereby giving rise to double-band patterns,
190
#2 #3
50 - 500 bp 1
O0
I 150
I 200
I 250
I 300
I 350
I 400
I 450
I 500
I 550
time (min)
Fig. 8.4. Electropherogram obtained for four strains of S. pneumoniae via the use of fluorescently labelled primers and an automated sequencing apparatus (see text).
and complicating the interpretation of the AFLP results. A recent paper by Lin et al. (1999) described the chemiluminescent detection of AFLP products. Detection of the amplicons was achieved by blotting the electrophoresed DNA on to a nylon membrane, followed by hybridisation of the transferred DNA with an alkaline phosphotase-labelled probe. Patterns were subsequently visualised by exposure to a light-sensitive film and appeared to be similar to the autoradiographic patterns when primers labelled with P-32 were used. The drawback of this approach is that blotting and hybridisation of large gels is technically very difficult, and Lin et al. (1999) resorted to the use of small 15 x 17 cm gels in which only fragments in the size range 50-330 nucleotides could be examined. A much better, but also much more expensive, approach involves the use of Cy5-1abelled primers in combination with on-line laser detection of fluorescent amplicons while they pass through the gel (Fig. 8.4). This has been done for AFLP analysis of a number of bacterial species (Table 8.1). If an automated apparatus is equipped with a multiple fluorescence detection system (e.g., the ABI 377 system; Applied Biosystems), a separate fluorescent label (e.g., 6-carboxy-x-rhodamine; ROX) can be used as an internal standard for addition to each sample, thus greatly improving the normalisation of the gels. Improved resolution is also achieved because the fragments are detected at a fixed distance from the origin, leading to a more uniform spacing between fragments. General advantages of 'automated' fluorescent AFLP (fAFLP) include the large simultaneous throughput of samples, a rapid turnaround, and the direct processing of raw data and subsequent errorfree storage of results through linkage with a computer. An interesting alternative to online detection of fluorescently-labeled AFLP products has been reported by Roman et al. (1999) who describe the scanning of wet or dry AFLP gels with a fluorescence imager in as little as 11 min. (ii) Agarose gel electrophoresis Several groups have reported on a simplified AFLP procedure in which only one restriction enzyme is used (Valsangiacomo et al., 1995; Picardeau et al., 1997; Clerc et al., 1998; Gibson et al., 1998; Table 8.1). Because the average size range of amplicons appeared to be much higher (0.4-2.5 kb) as compared to conven-
191 tional AFLP using two restriction enzymes (50-550 bp), separation of AFLP products was routinely done on agarose gels. Depending on the application, agarose concentrations varied from 1.5-3% and nucleic acids were stained with ethidium bromide. In spite of the much lower resolution inherent to agarose gel electrophoresis, AFLP results obtained in these studies generally agreed well with other typing data available. Obvious advantages of agarose-based AFLP are the ease of use and the low cost. On the other hand, large templates (> 2 kb) may be amplified with a lower efficiency, so that PCR conditions must be adjusted accordingly, and ethidium bromide staining is notoriously unreliable for very small DNA fragments. Taken together, AFLP analysis on agarose represents a cost-effective means to produce preliminary typing data, but appears to be less suited for detailed taxonomic studies of homogeneous species or for epidemiological investigations of highly related strains (i.e., in outbreak situations). 8.3
DATA ANALYSIS
A.
Digitisation of AFLP data
A common theme in the non-automated processing of AFLP data is the generation of image files from the electrophoresis patterns (e.g., those obtained by autoradiography, by phosphor imaging, etc.). These image files are usually saved in 'tagged image file format' (TIFF), which is a generally accepted computer file format and supported by most image processing tools for both workstations and personal computers. A TIFF file consists of a number of labels (tags) which describe certain properties of the file (such as gray levels, colour table, byte format, compression size, etc.). After the inital tags comes the data, which may be interrupted by more descriptive tags. The size of each TIFF file is predetermined by the resolution in which the digital image was created (i.e., by the specifications of the scanner). A typical resolution of an AFLP image obtained from a 35 x 43 cm polyacrylamide slab gel is 1200 by 2000 datapoints. The image is used for defining 'tracks' for each lane of interest. Within the boundaries of these tracks, data points are resampied and average densitometric values are lined out as a function of the run-time, resulting in electrophoresis 'peak profiles'. These profiles are then used for comparative analysis (cf. the Pearson product-moment correlation coefficient). Images can also be used to define the bands for each lane that will participate in the comparative analysis. Similarities are then calculated from the presence and absence of bands (cf. the band-based similarity coefficient of Dice). However, it may sometimes be difficult to assign bands correctly because of the high complexity of the AFLP banding patterns, in which case the Pearson correlation coefficient is often preferred. With 'automated' AFLP technology, fluorescently labelled amplicons are detected with a laser while they migrate downwards through the gel. The detection signals are collected, digitised and sent to the computer for storage and processing. These raw data are converted to the TIFF format with software that is nor-
192 mally included in the automatic sequencing apparatus package. The actual size of the image file depends on the conversion settings used, with a typical 7- to 10-fold reduction of the original vertical resolution (as defined by the run-time and signal detection interval). Although TIFF images obtained by classical or fluorescent AFLP are processed in the same way, the actual banding patterns obtained by the two different approaches differ in one important aspect. Classically obtained patterns (i.e., via a scan of an autoradiogram) contain large bands that are compressed together and smaller bands that are relatively diffuse as compared to the larger bands. In contrast, fAFLP patterns contain many small discrete bands while the larger bands are diffuse, simply because the smaller fragments migrate much faster through the gel and are detected very early, with little time in between each detection. A possible improvement to overcome the 'stacking' of small bands in fAFLP patterns would be the use of a signal detection interval that is initially very small (e.g., 500 ms) and gradually increases during the runtime; i.e., more data points are created at the start of the run so that the pattern is somewhat stretched and small bands can be assigned more accurately. Alternatively, Stoffel fragment of DNA polymerase (which has a low rate of processivity) may be added to facilitate the discrimination of small (50-100 bp) PCR products (Hookey et al., 1999).
B.
Basic requirements for comparative analysis of AFLP patterns
A general prerequisite for a reliable and accurate intra- and inter-gel comparison of banding patterns is the use of reference patterns that are included at regular intervals in each gel. Usually, in the classical approach of AFLP or in fAFLP with only one fluorophore, a reference pattern is included every 4-6 lanes, and one reference pattern is chosen as the standard. Gels are 'normalised' by aligning all reference strains to this standard so that all patterns positioned between these references are spatially adjusted for optimal comparison. However, minute differences in the gel matrix may cause slightly different migration of the amplicons, influencing the analysis to some extent. Nonetheless, the high resolution of large polyacrylamide slab gels in AFLP methodology ensures that external reference patterns on the same gel usually correlate well (95-100%), and inter-gel correlations of 90% or higher are routinely achieved. A slightly higher reliability and consistency of results may be reached by fAFLP that makes use of two or more fluorophores, with the added advantage of a significantly increased throughput, since sample patterns can be run together with an internal standard that is labelled differently. Another point that needs attention is the signal strength. Peaks that are topped off, either by the use of a scanner with an insufficient optical density range, or by software that does not permit proper conversion of the raw data, should be avoided as they will affect analysis quite badly. Similarly, some patterns may contain one or two very heavy bands due to highly efficient amplification and it is advisable to omit these amplicons from the analysis altogether (e.g., by deselecting the corresponding zones in the digitised gel). Finally, the authenticity of banding patterns must also be considered. Random
193 spots on the autoradiogram, or background noise in fAFLR should be removed prior to the generation of raw data files. In the latter case, special precautions that are outlined in the apparatus manual (such as never cleaning the glass plates with soap), the use of a standardised gel matrix, and additional features that allow automatic background subtraction, have virtually eliminated all false signals. However, for those that still work with autoradiograms, the random specks and spots may pose a real problem, as they may become registered as actual bands during the subsequent scan. These false signals can be easily removed from the primary TIFF image by graphics software (e.g., PhotoStyler or similar), but the operation can be very tedious and time-consuming. The source of the random signals is not always clear, but regular cleaning of the intensifying screen and the film cassette used for autoradiography may help, and special care should be taken in preparing and pouring the gels. For instance, small dust particles, impurities, or tiny pieces of paper towel may contribute to anomalous signal generation, requiring filtering and/or deionizing of the acrylamide solution and thorough cleaning of the glassplates. As a consequence, many researchers resort to the use of highly purified and standardised acrylamides purchased as a premixed solution, or even buy prefabricated polymerised gels. In addition, the use of UV-crosslinking to ensure uniform polymerisation of the gel matrix has become very popular because it reduces the risk for local differences in gel properties and it is relatively fast.
C.
Reproducibility
The reproducibility of AFLP has been tested in various studies and involved the inclusion of duplicate strains and/or the generation of multiple sub-cultures on which the same DNA extraction and AFLP procedures were performed (whether or not simultaneously). In general, the intra- and inter-gel similarity levels were at least 95% and 90%, respectively. However, much depends on the quality of the DNA because impurities that are being co-purified with the DNA may have a detrimental effect on the activity of restriction nucleases or other enzymes such as the T4 ligase or the DNA polymerase. In addition, the integrity of the DNA obviously plays a role as degraded DNA would lead to a large decrease or total loss of target fragment. Substantial differences in banding patterns may also be observed when massive amounts of RNA are present in the DNA preparation (consequently, a routine RNAse step is strongly recommended) and some brands of DNA polymerase give slightly different banding patterns for the same template DNA (P. Janssen & R. Coopman, unpublished data). It has been further suggested (Gibson et al., 1998) that repeated thawing and freezing may damage template D N A - resulting in decreased amounts of product - presumably due to the higher fragility of the single-strand linkage between adapter and fragment (Fig. 8.2). Although variations in signal strength (i.e., peak height) have been noted in almost all of the fAFLP studies (where signal strength is more readily visible), this only poses a problem in extreme instances (local differences in PCR efficiency may be caused by second-
194 ary DNA structures, long stretches of As or Ts, or GC-rich regions). Although fAFLP is rapidly becoming the standard (automated) technique in diagnostic laboratories, especially in those that undertake frequent epidemiological studies (making the use of a DNA sequencer cost-effective), multicentre standardisation studies (e.g., for different types of PCR equipment, different operators, minimal variations in the buffer conditions, etc.) have yet to be undertaken for AFLE Perhaps the ultimate test for the accuracy of the AFLP method is to compare AFLP fragments derived experimentally from a given bacterial strain with those predicted by analysis of its published sequence. This has been done for an E. coli K12 strain, for which 92% of the in silico predicted fragments could be produced experimentally by AFLP (Arnold et al., 1999a). To date, 28 bacterial genomes are fully sequenced, with sequence analysis of another 80 bacteria well underway, many of them health-related. Without doubt, sequence-derived (i.e., 'virtual') AFLP data will be useful to fine-tune experimental AFLP conditions and may form the basis of future standardisation schemes. 8.4
APPLICATIONS
Over the past few years, many new genotypic typing methods have been developed, including PFGE, AP-PCR, ARDRA, RAPD, mixed-linker PCR, restriction fragment end-labelling, and sampled sequencing (Van Belkum, 1994; Vaneechoutte, 1996). These methods, some of which are discussed in this book, are either based on the use of restriction enzymes, PCR amplification, or both, and have all been used on bacterial genomes with variable success. With the advent of AFLP in 1993 for the detection of molecular markers in plants (Zabeau & Vos, 1993), and the general publishing of the AFLP concept two years later (Vos et al., 1995), microbiologists were quick to apply the AFLP technique to address specific questions in microbial taxonomy and epidemiology. Below are some examples on the use and applications of AFLP in microbial systems. More generalised reviews on AFLP are available in the literature (Blears et al., 1998; Koeleman et al., 1998). A.
AFLP as a taxonomic tool
(i) The genus Aeromonas The first report on the use of AFLP for bacterial typing and characterisation was presented in 1993 at the Fourth International Symposium on Aeromonas and Plesiomonas (Janssen et al., 1993) as part of an ecological study on the presence of aeromonads in drinking water production plants (Huys et al., 1993). A total of 38 reference and type strains belonging to the known hybridisation groups (HGs) of A. hydrophila and A. caviae were subjected to AFLP fingerprinting. All six HGs (three of A. hydrophila and three ofA. caviae) were clearly separated and AFLPbased clustering fully corroborated genotypic and chemotaxonomic data. Strains of the same HG were easily distinguished from each other according to their AFLP profile, and newly isolated aeromonads were allocated to the correct HG as con-
195 firmed by fatty acid analysis. In early 1996, the potential of AFLP to characterise bacterial strains at the subgenetic level was evaluated (Janssen et al., 1996). In this study, 90 Aeromonas strains were included, representing all 14 HGs known within the genus at that time. The AFLP clusters obtained were in perfect concordance with the grouping of these strains according to hybridisation data, and 85 strains were assigned to the correct HG. Apparently, the five remaining strains belonging to A. sobria (two strains), A. veronii biogroup veronii (one strain) and HG 11 (two strains), could not be assigned due to statistical variance as these species were represented by only two to three strains each. In a follow-up study by Huys et al. (1996b; Table 8.1), the use of AFLP allowed the clarification of some unresolved issues in the taxonomic stucture of the genus Aeromonas. This elaborate study included the phenospecies A. allosacharophila, A. encheleia, A. enteropelogenes and A. ichthiosmia, none of which were previously associated with any of the existing HGs. According to the newly obtained AFLP data, the former two species were related genotypically to subgroups of A. veronii and A. eucrenophila, respectively, and the latter two species were found to be identical to A. trota and another subgroup of A. veronii, respectively. The same AFLP data also showed a significant genotypic heterogeneity among A. eucrenophila, suggesting a subdivision of this species, with an affiliation of some strains with the unnamed species HG11. This was confirmed in a separate paper that investigated the genotypic relationship between eight A. eucrenophila strains, six A. encheleia strains, the two HG11 strains, and 14 mainly aquatic Aeromonas isolates using AFLP, 16S rRNA RFLP analysis, SDS-PAGE of whole-cell proteins, and fatty acid analysis (Huys et al., 1996a), eventually leading to an emended description of the species A. eucrenophila and A. encheleia (Huys et al., 1997a). In the same period, a library of AFLP profiles for Aeromonas species was established. This database (AERO94), consisting mainly of 90 reference and previously reported type strains (Huys et al., 1996b), supplemented with 17 new strains, was used to determine the genotypic diversity among 168 Aeromonas isolates originating from five drinking water production plants in Flanders, Belgium (Huys et al., 1996c). Of these, 144 strains (86%) could be allocated by AFLP to one of the currently known Aeromonas taxa. The remaining 24 unidentified isolates formed a homogeneous AFLP cluster and were found to be closely related to, but clearly distinct from, A. hydrophila HG2 (now classified as A. bestiarum). A subsequent study, whereby some of these strains were analysed in detail using a polyphasic approach, eventually resulted in the proposal of a new Aeromonas species, A. popoffii (Huys et al., 1997b). A recent publication describes the use of automated AFLP to fingerprint Aeromonas strains (Huys & Swings, 1999). (ii) The genus Acinetobacter In an extensive study, 151 classified strains of the genus Acinetobacter (representing 18 genomic species, including type, reference and field strains) and eight previously unclassified acinetobacters were analysed by AFLP (Janssen et al., 1997).
196 Using a single set of restriction enzymes and one combination of corresponding AFLP primers, all classified strains could be allocated to the correct genomic species, and all groups were separated properly, with minimal intraspecific similarity levels (SD) ranging from 29% to 74%. Even the closely related DNA groups 1 (A. calcoaceticus), 2 (A. baumannii), 3 and 13TU (sensu Tjernberg & Ursing, 1989), together forming the so-called A. calcoaceticus - A . baumannii (Acb) complex, were clearly distinguishable by AFLP, with intra-specific linkage levels above 50%. AFLP analysis also successfully highlighted some incongruities within the 13BJ-14TU group and placed four unclassified strains, obtained from diverse sources and origins, firmly together in one group at a minimal similarity level of 50%. (iii) AFLP genotyping in other taxonomic investigations AFLP fingerprinting was used in a polyphasic taxonomic study to support reclassification and emended discriptions of Paenibacillus larvae, the causative agent of disease in honeybee larvae (Heyndrickx et al., 1996). Two subgroups, P. larvae subsp, larvae and P. larvae subsp, pulvifaciens, could be separated by AFLP in separate clusters in spite of the high linkage level between them (>80%) and the respective internal similarities of 88% and 95% (using the Dice coefficient). These results were confirmed by the other taxonomic data obtained in this study by fatty acid analysis, pyrolysis mass spectrometry, DNA-DNA binding studies and wholecell protein profiling. Recently, 94 Vibrio strains that are closely related to V. harveyi, an ubiquitous marine bacterium, were analysed with AFLP (Pedersen et al., 1998). One single enzyme combination, HindIII / TaqI, and one particular pair of primers were used to group 77 strains in nine AFLP clusters with a cut-off similarity level of 50% (using the Dice similarity coefficient; SD). Twelve of the remaining 17 strains were type strains of separate Vibrio species, and five strains could not be grouped. The AFLP data obtained were supported in the same study by DNA-DNA hybridisation, ribotyping and plasmid profile results. AFLP methodology has also helped to unravel some of the taxonomic fine-structure in Xanthomonas, a genus that mainly comprises phytopathogenic and plantassociated bacteria. In one study, 68 X. translucens strains were investigated by AFLP (Bragard et al., 1997). This species encompasses the so-called 'translucens group' of closely related pathovars within the former species X. campestris, and also includes five other pathovars isolated from various Poaceae. The classification of the X. translucens pathovars and their strains is very intricate. Nonetheless, AFLP-based clustering was consistent with the current pathogenicity grouping, and recent reclassifications within these taxa were fully supported by the AFLP data obtained. In another study, a particular pathovar of X. axonopodis, pv. manihotis (Xam), the causative agent for cassava bacterial blight, was studied in detail by AFLP (Restrepo et al., 1999). To identify the best enzyme combination and selective primers, 64 different primer pairs, comprising all 16 possible + 1/+ 1 primer combi-
197 nations for each enzyme combination (Table 8.1), were tested on six strains. Eight enzyme/primer pairs that generated informative banding patterns were selected for all further analyses. From the literature, it appears that this is one of the most comprehensive studies on the effect of enzyme and primer use on AFLP banding pattern composition for a given bacterial species, and the results obtained confirmed the bias described earlier for restriction sites and selective bases towards the G+C content of the genome analysed (Janssen et al., 1996). In total, 47 X. axonopodis strains, originating from three different edaphoclimatic zones (ECZs), were subjected to AFLP with eight pre-selected primer combinations, giving rise to a total of 173 polymorphic bands that were considered for multiple component analysis (MCA) and cluster analyses. For nine of the 10 AFLP clusters that were discerned at a 70% similarity cut-off point (using Jaccard's coefficient), an agreement with the ECZ allocation (i.e., the region from which the strains originated) was observed. In addition, the AFLP data closely matched the RFLP results obtained for the pathogenicity gene pthB. In fact, AFLP analysis appeared to have a higher discriminative power in differentiating highly related strains as it revealed the existence of sub-groups in the ECZ5 Xam population and strengthened the hypothesis that ECZ5 ("from high-altitude tropic regions") actually forms a genetically and evolutionary separate group. Another example of efficient assessment of intra-pathovar diversity by AFLP was presented by Clerc et al. (1998), who studied the genetic relatedness of 23 strains belonging to various pathovars of Pseudomonas syringae with AFLP and RAPD. Both techniques were equally able to discriminate the otherwise indistinguishable pathovars P. syringae pv. tomata and P. syringae pv. maculicola, but in addition, AFLP was more efficient in addressing intra- and inter-specific distances.
B.
Epidemiological typing of bacteria
(i) Genotyping in the hospital environment Valsangiacomo et al. (1995) reported on the application of AFLP to type Legionella pneumophilia strains from different regions of Switzerland. A simplified procedure was used, employing a single restriction enzyme, PstI, and agarose gel electrophoresis (Table 8.1). AFLP analysis of 10 clinical isolates (from three cases of legionellosis) and 18 environmental strains (mainly obtained from the local water systems) enabled the authors to assess the origin of infection and the likely route of transmission of the pathogen. These results were corroborated by ribotyping and RFLP analysis. An epidemiological relationship between environmental and clinical isolates of Mycobacterium kansasii was established in a polyphasic study using RFLP and PFGE analyses, PCR restriction analysis of the hsp65 gene, and AFLP (Picardeau et al., 1997). Thirty-eight strains of clinical origin (including 14 isolates from AIDS patients) and 24 isolates from local tapwater were investigated. For AFLP analysis, only one restriction enzyme, PstI, was used, and separation of the PCR
198 products was done in 2% agarose gels (Table 8.1). Although the AFLP patterns consisted of only 3-8 bands, they were found to be as informative as the PFGE patterns, since both methodologies displayed polymorphisms within each of the five clusters defined by this study. A single enzyme approach was also developed for AFLP-based typing of Helicobacter pylori (Gibson et al., 1998). In this study, the genomic DNA of 46 isolates obtained from 28 patients (some of which belonged to the same family group) was digested by HindlII, tagged with appropriate adaptors, and selectively amplified with +1 primers (Table 8.1). Amplicons were separated on a 1.5% agarose gel and the resulting AFLP patterns were used for pairwise analysis of strains. All strains were characterised by at least one other genotypic method (ribotyping and/or urease genotyping). AFLP profiles differentiated strains of unrelated individuals and confirmed the common origin of strains in some of the family groups. To investigate whether certain pathogenic Aeromonas strains were circulating among patients with diarrhoea in Bangladesh and to gain a better understanding of the virulence of Aeromonas spp., Kuhn et al. (1997) set out to investigate 120 isolates, including 80 faecal isolates from patients (69 isolates from patients with diarrhoea, and 11 from healthy controls) and 40 environmental isolates from surface water. All isolates were phenotyped with the Phene-Plate (PhP) method, a high-resolution biochemical fingerprinting system. Most (106) were assigned to hybridisation groups (HGs) by fatty acid analysis, while some (29) required further characterisation using the AERO94 database of Aeromonas AFLP profiles (Huys et al., 1996b, c). The predominant PhP type in the human isolates group was type BD-2 (all of which were identified as A. hydrophila HG1), which produced relatively high levels of virulence factors, including haemolysin and cytotoxin. This PhP type occurred only once in the environmental group, suggesting that the HG1/ BD-2 type represents a true human pathogen, and confirming other reports associating A. hydrophila HG 1 with human intestinal disease. AFLP has also been used in conjunction with other typing methods for the epidemiological typing of Acinetobacter baumannii strains (Dijkshoorn et al., 1996). The AFLP analysis in this study followed the conventional approach of applying two restriction enzymes, HindlII and TaqI, and using + 1 and +2 selective primers (Table 8.1). Thirty-one strains, comprising 14 strains from 14 outbreaks in different European cities and 17 sporadic strains from different origins and sources were investigated. Twelve of the 14 outbreak strains grouped in one AFLP cluster (at 92.8% using the Pearson correlation coefficient), while a second cluster comprised three outbreak strains and one sporadic strain that linked at 89.9%. Strains in these clusters had patterns with differences in one or two band positions only. At a delineation level of 89.0%, only one other cluster containing three sporadic strains could be discerned. All other strains, including the two remaining outbreak strains, were left ungrouped. The other methods used in this study (biotyping, antiobiogram analysis, cell envelope protein profiling and ribotyping) fully corroborated the AFLP data, indicating that AFLP was a reliable and accurate typing method. A subsequent study on 25 Acinetobacter strains isolated from five hos-
199 pital outbreaks in three countries was undertaken (Janssen & Dijkshoorn, 1996). Isolates from the same outbreak displayed identical banding patterns and each set of outbreak strains was found in one particular AFLP cluster with a minimum of 94% similarity (using the Pearson correlation coefficient), confirming previously published typing data on these strains. In a recent study, automated AFLP with fluorescently labelled primers was used to type A. baumannii isolates, including type and reference strains and clinical isolates (Koeleman et al., 1998). Conventional AFLP with two restriction enzymes and one radioactively labelled primer has also been used for typing Staphylococcus epidermidis isolates in multiple blood cultures (Sloos et al., 1998). Fifteen unrelated strains from various cities across The Netherlands and 44 strains from 11 patients were investigated (Table 8.1). These patients were from four different departments of the Leiden University Medical Center. With AFLP, the former group was heterogeneous with a similarity level ranging between 78 and 93% (using the Pearson correlation coefficient). Isolates from nine patients were found in nine separate clusters, each containing the four representative blood culture strains originally isolated from each patient. Within each set of four strains, the AFLP patterns were highly similar (>94%), indicating identity (the identity level was set at 94% by reproducibility tests). For the sets of strains isolated from the two remaining patients, dissimilar AFLP patterns were found with relatively low linkage levels of 83 and 73%. This may be explained by the fact that contamination with skin inhabitants during blood-taking procedures is not uncommon. A number of research groups have used fluorescent AFLP (fAFLP) for typing clinical isolates (Table 8.1). Desai et al. (1998) investigated two possibly related outbreaks of group A streptococcal (GAS) invasive disease that took place in a nursing home and a district general hospital in North London, and 16 other GAS isolates collected from various hospitals across England. All but four strains were of serotype M77. The two outbreaks contained two and eight strains, respectively, and were all of serotype T13 M77. Within each outbreak, only one AFLP genotype was observed, with virtually no pattern variance, whereas the nonoutbreak M77 serotype strains displayed unique AFLP profiles. In addition, AFLP readily distinguished the two clones from each other. Through comparison of the AFLP data with macro-restriction analysis of the same strains, it was concluded that the discriminatory power of fAFLP for typing GAS was much higher, and that AFLP methodology is suited for subtyping within a single serotype. In a follow-up study, Desai et al. (1999) analysed 35 S. pyogenes M1 isolates by fAFLP. These isolates were from a different clinical background and formed a clonal group displaying identical ribotyping and macro restriction profiles. Nonetheless, fAFLP readily subtyped them, grouping 25 isolates in seven (multi-isolate) profiles and assigning further individual profiles to the remaining ten isolates. Fluorescent AFLP analysis and macro-restriction analysis were also compared by Van Eldere et al. (1999) who genotyped 48 pneumococci isolated from blood and cerebrospinal fluid. All isolates originated from hospitals in the northern part of France or southern regions of Belgium bordering France and represented five
200 serotypes (types 6, 9, 14, 19 and 23F). Of these strains, 42% showed intermediate or full penicillin resistance, with the majority of penicillin-resistant isolates (78%) located in serotypes 9V and 23E These groups could be easily differentiated with both DNA typing methods. This was also the case for the serotype 14 and 19 penicillin-susceptible strains which formed well-separated clusters following PFGE and AFLP analysis. However, the susceptible type 23F strains were only found as a distinct cluster when AFLP was used. Overall, data analysis showed that AFLP and macro restriction were equally efficient in assessing intraserotype diversity. A recent paper by Hookey et al. (1999) has described the use of fAFLP for the genotypic analysis of methicillin-resistant Staphylococcus aureus (MRSA). A collection of 34 isolates from 22 hospitals in the south of England, together with a single isolate of each of the current predominant UK phage types, and one reference strain of MRSA, were subjected to AFLP using EcoRI and MseI enzymes and a single primer set (Table 8.1). The resulting data were compared with data obtained by standard phenotypic methods (including phage-typing, protein-A production and antibiogram analysis) and data generated by genotypic methods such as RFLP analysis of the coagulase (coa) gene and macro-restriction analysis. Based on replicate studies, the level of identity for AFLP was set at 93.7% (using the Dice coefficient). Using this criterion, all but two strains could be distinguished from each other. Thirty-one of the 34 strains fell into four major fAFLP clusters with an internal linkage level of at least 80%. The two largest clusters, containing 10 and 13 strains, could be further subdivided into two subgroups, roughly confirming the grouping of these strains according to the phenotypic and other genotypic characteristics determined in this study. An extension of this work was reported by Grady et al. (1999) who took 24 MRSA isolates of phage-type 15 (EMRSA-15) and subjected them to separate fAFLP analyses using the restriction enzymes ApaI + TaqI and EcoRI + MseI. Both template DNAs were selectively amplified with a combination of +0 (for ApaI-, resp. EcoRI-adaptors) and +1 (for TaqI-, resp. MseI-adaptors) primers, resulting in AFLP patterns containing c.75 bands in the size range of 50-800 and 50-300 bp, respectively. In each of the two separate analyses, EMRSA-15 isolates could be differentiated from other MRSA isolates included in the study. By combining both data sets, fAFLP divided the 24 EMRSA-15 isolates into 11 profiles. In contrast, RFLP analysis of the coagulase gene of these isolates failed to discriminate between any of these isolates, and macro-restriction of the 24 isolates, although discriminative, was not as reproducible as fAFLE
(ii) AFLP analysis of food-borne pathogens To investigate the possible source and route of food contamination, sensitive and reliable typing methods are needed. Although classical methods such as biotyping, phage-typing and serotyping may still be used, they usually give incomplete information, and diagnostic laboratories nowadays prefer to integrate one or more DNA-based typing methods in their identification and typing schemes. The usefulness of AFLP for the typing and identification of food pathogens is well doc-
201 umented. In the study by Aarts et al. (1998), 78 different Salmonella strains, comprising 62 serotypes, were analysed by AFLE Choosing EcoRI and MseI as enzymes and one particular primer combination, reproducible and informative AFLP profiles with up to 50 bands were obtained. All serotypes displayed a unique profile, and AFLP appeared to group all strains with identical bacteriophage specificity in the same cluster. However, a phylogenetic analysis based on AFLP data was not performed. In another study, 50 type, reference and field strains of Campylobacter jejuni and C. coli were subjected to fAFLP analysis (Kokotovic & On, 1999; Table 8.1). Of the 27 C. jejuni and 23 C. coli strains studied, 19 and 18 different fAFLP profiles, respectively, were recognised. In general, outbreak strains could be readily discriminated from the sporadic isolates within the same Campylobacter species. However, a numerical analysis of AFLP data was not included. Campylobacters were also subjected to fAFLP by Duim et al. (1999) who investigated 45 strains of C. jejuni and C. coli, including 31 isolates from poultry, 10 human isolates and four reference strains. Some of these strains had been genotypically characterised previously. Informative patterns were obtained for all strains using one enzyme combination and one set of +1 selective primers (Table 8.1). Isogenic mutants of Campylobacter, or highly related strains that produced identical PFGE patterns, showed highly similar or identical AFLP patterns. Twenty-five randomly chosen poultry isolates grouped in two AFLP clusters representing the two species C. jejuni and C. coli (as confirmed by species-specific multiplex PCR) with average linkage levels of 32% and 58%, respectively. The 10 C. jejuni strains of human origin displayed heterogeneous AFLP patterns. When the results of all the C. jejuni strains were combined, human isolates scattered throughout the dendrogram and an epidemiological link between the strains was not apparent. Nonetheless, the AFLP data showed clearly that some human strains were highly related to poultry strains (with a Pearson correlation coefficient of 90% or higher), supporting the thesis that poultry products are a source of human infection. In a recent study by Arnold et al. (1999b), 87 strains of E. coli, including 72 strains from a EcoR reference collection and 15 strains of the clinically important serogroup O157, were subjected to fAFLP analysis (Table 8.1). The composition of the EcoR group had been defined previously by multilocus enzyme electrophoresis (MLEE). Sixty-three of these EcoR strains were grouped by AFLP in the correct MLEE subdivision, while 11 of the O157 serotype strains were found in a separate AFLP cluster.
C.
AFLP for studying the molecular evolution of microbes
In general terms, nucleotide substitutions that are introduced inadvertently, i.e., due to replication infidelities, or are brought forward under selective pressure, constitute a major source for naturally occuring DNA polymorphisms. However, genetic diversity in prokaryotes appears to be driven largely by a number of dynamic processes that enable them to react swiftly to changes in their environ-
202 ment. To accomplish this 'adapt-to-survive' strategy, microbes have a plethora of routes at their disposal to acquire beneficial, or eliminate superfluous, genetic material, and to 're-shuffle' genes that need to be expressed at short notice. This structural plasticity of microbial genomes has been the subject of numerous investigations, particularly in the light of the recent spread of antibiotic resistance genes and the intra- and inter-species transfer of virulence determinants. However, such investigations are focused mainly on one particular gene or set of genes, and reports on whole genome analysis in the context of evolutionary studies on microbes are very scarce (Brikun et al., 1994; Naas et al., 1995). The AFLP method has great flexibility in that many different primer pairs (i.e., up to 16 for +1 primers) may be used on the same template. This means that large numbers of nucleotides distributed over the entire genome can be surveyed simultaneously. For instance, with an average of 60 bands for each of the 16 patterns, and given that 12 nucleotides (6 + 4 of the restriction sites and two of the +1 selective bases) are associated with each band, 16 x 60 x 12 = 11,520 nucleotides are examined for point mutations and, assuming an average fragment length of 250 bp, 240,000 nucleotides are surveyed for length mutation. This approach allows the detection of rare polymorphisms and is particularly interesting for the analysis of highly related genomes. It has been used for the differentiation of a Tn5-marked strain from its wild-type (Kersters et al., 1996) and for the study of a genetically stable clone of Aeromonas hydrophila in a drinking water well (Ktihn et al., 1997). Recently, Bacillus anthracis, one of the most highly monomorphic species known to date, was subjected to molecular marker analysis by AFLP using EcoRIMseI templates and +1 primers (Keim et al., 1997). Seventy-nine B. anthracis strains, collected world-wide, and seven strains of six closely related taxa, were analyzed by all 16 possible +1/+1 primer combinations. A total of 1,221 fragments were observed, of which 1,184 fragments were monomorphic (in contrast, B. anthracis and its nearest relatives, B. cereus and B. thuringiensis, differed in their AFLP patterns by nearly 60% of their fragments). In spite of this, AFLPbased cluster analysis outlined two very distinct genetic lineages, possibly representing two independent epidemic foci. In addition, AFLP marker similarity levels indicated that the ongoing anthrax epidemic in Canada and the northern United States is due to the introduction of a single strain that has remained stable over a 25-year period. D.
Expression profiling
For many years, analytical studies of gene expression have relied on transcript imaging by northern blotting, S 1-mapping, or differential plaque hybridisation. Although northern-blot analysis is still regarded as the 'gold standard', especially in a confirmatory context, these methods are time-consuming and are impractical for high-throughput screening. More recently, differential display methods have been developed that allow the
203
Fig. 8.5. Schematicrepresentation of expressionprofiling using a cDNA-AFLPapproach. The aster-
isk indicates the registration of a differentially expressedgene. rapid identification of differentially expressed genes in a multi-sample format. The general concept of these methods is the synthesis of cDNA, followed by a restriction digest and/or PCR, and visualisation of amplicons on a denaturing polyacrylamide gel (Fig. 8.5). A number of refinements have been introduced, including the enrichment of the mRNA pool by subtractive hybridisation, the use of biotinylated primers for batch purification on streptavidin-coated beads, and the use of special restriction enzymes and adaptors to select and preferentially amplify particular cDNAs (reviewed by Matz & Lukyanov, 1998; Kozian & Kirschbaum, 1999). Among the first reports of the use of AFLP for mRNA fingerprinting were those of Bachem et al. (1996) and Money et al. (1996), who also showed that differentially expressed genes can be isolated and characterised. In the past three years, cDNA-AFLP has been used extensively for transcript analysis in a large variety of eukaryotic systems, but the method is yet to be applied to bacteria. The main reason for this is that cDNA can be readily manufactured and purified in eukaryotic systems by the use of a poly(T) primer that binds to the 3'-poly(A) track of the mRNA. This is not possible in prokaryotic systems where polyadenylation of mRNA is very limited (Sarkar, 1997). In addition, up to 98% of the total RNA in bacteria is of ribosomal origin, making detection of differentially expressed genes even more problematic. Nonetheless, there is a great need for high-throughput screening methods for the analysis of virulence genes in bacterial pathogens and for the identification of new targets for drug design (Knowles et al., 1997; Quinn et al., 1997). It would therefore be worthwhile to develop a suitable cDNA-AFLP protocol for bacteria. Possible routes to overcome the problems of poly(T)-directed cDNA production
204 and the abundant presence of rRNA species do exist. For instance, Amara & Vijaya (1997) reported on the specific addition of a poly(A) tail to E. coli transcripts with a yeast poly(A) polymerase. Apparently, in the presence of manganese and magnesium, ribosomal RNAs remain part of the polysome and, as such, are not available for polyadenylation, while adenines are added to the free 3'-ends of the transcripts. Another report has described how rRNA can be removed succesfully from total RNA by subtractive hybridisation with antisense rRNA produced in vitro (Robinson et al., 1994). This involved cloning the 16S-23S operon (in reverse direction) in a transcription vector and incorporation of biotin in the antisense rRNA. With these and similar approaches, the AFLP method should become a valuable tool for analysis of differential expression in bacteria. Recently, a commercial kit from Display Systems Biotech (Vista, CA) has become available that allows differential display of prokaryotic genes without the need of poly(A)-primed PCR by making use of random octamers for cDNA production. 8.5
COMPARISON WITH OTHER METHODS
This chapter (sections 8.3 and 8.4) provides ample evidence for the versatility and reliable performance of AFLP in microbial taxonomy and epidemiology. However, there is a lack of congruity in the methodology used, i.e., the use of restriction enzymes, gel systems and visualisation technologies may all differ from laboratory to laboratory, and a standardised scheme for AFLP does not yet exist. Nonetheless, the technique is robust and reproducible when executed within basic guidelines, much more so than other PCR-based methods such as RAPD or REP-PCR, which are intrinsically prone to variations in amplification efficiency (Towner & Cockayne, 1993; van Belkum, 1994). These basic guidelines are inspired by common sense. For instance, high resolution AFLP with two restriction enzymes, high quality DNA and automated laser detection can be expected to give better and more reproducible results than, say, AFLP on crude DNA with one restriction enzyme and separation of amplicons through agarose. In addition, possible sources of variation in AFLP analyses are minimalised by using preset electrophoresis conditions and a standardised PCR profile at high stringency. AFLP does not require knowledge of genomic sequences (unlike REP-PCR) and covers the entire genome (unlike locus-specific RFLP; e.g. ribotyping or ARDRA). The technique can also be used with any DNA, regardless of its origin or complexity. Even the smallest bacterial genome (that of Mycoplasma) produces complex AFLP patterns (Table 8.1), while AFLP can also be used to analyse larger genomes (e.g., from eukaryotes). However, the variable base composition of bacterial genomes requires the use of suitable restriction enzymes and the applicability of AFLP remains limited to the sub-genetic level (Janssen et al., 1996). The turn-around time of AFLP is relatively good, with less than three days needed for template preparation, selective amplification, electrophoresis, data acquisition and analysis (34 strains, starting from chromosomal DNA). For fAFLP, even two days suffice and more strains can be processed simultaneously (depend-
205 ing on the apparatus). The downside is that the cost of a DNA sequencer (US $40,000-125,000) may be prohibitive for some laboratories. Recently, Olive & Bean (1999) have compared the characteristics of various molecular typing methods, including PFGE and RAPD. The concordance of AFLP-generated data with existing taxonomic data or data simultaneously generated by other taxonomic or epidemiological methods is very good (Dijkshoorn et al., 1996; Huys et al., 1996b; Janssen et al., 1996; 1997; Koeleman et al., 1998; Desai et al., 1999; Kokotovic et al., 1999; Speijer et al., 1999; Van Eldere et al., 1999). In particular, some of these reports provide evidence that AFLP data are in very good agreement with the data obtained by DNA-DNA hybridisation, which is still regarded as the 'gold' standard method in microbial systematics. Recently, Hauben et al. (1999) and Rademaker et al. (2000) established correlation plots of DNA homology data versus AFLP correlation values for large sets of bacterial strains, and found an overall high correlation between AFLP fingerprinting and DNA-DNA pairing data. However, AFLP is unlikely to replace this technique for bacterial species delineation, as outlined in a cautionary note by Esteve (1997). Rather, because high resolution AFLP data can be easily stored and exchanged by computer, AFLP should be considered as an ideal preliminary screening method, while any significant taxonomic findings should be confirmed by DNA hybridisation studies. 8.6
FUTURE PROSPECTS AND CONCLUSIONS
The range of microbial genomes that have been sequenced to completion has grown rapidly over the past few years and will certainly expand steadily over the next year or two, with the sequencing of more than 100 microbial genomes finished by the year 2002. The resulting biological information has enabled scientists to predict possible protein functions and to identify potential target genes for antimicrobial drug discovery. Databases are now accessible to the average researcher in the laboratory via Web browser interfaces (reviewed by Moir et al., 1999) and many of these permit downloading of sequence data for customised analysis. For some human pathogens, such as Streptococcus p n e u m o n i a e and Helicobacter pylori, multiple strains have already been completely sequenced, allowing complete genome comparison with specially developed software (MUMmer; Delcher et al., 1999). However, such an approach of 'armchair genomics' needs to link up with reality since it is an impossible task to sequence all microorganisms in the environment or to analyse the entire genome of every clinical isolate of interest. High resolution DNA fingerprinting techniques such as AFLP should thus remain very useful, especially for the analysis of microbial communities or for detailed epidemiological studies. In addition, genetic markers that can be located to a particular amplicon can be easily isolated by excision of the AFLP fragment, reamplifled and directly sequenced. In this context, AFLP-based probe development and the use of microarray technology should be seen as logical continuations in microbial genome analysis.
206
REFERENCES Aarts, H.J., van Lith, L.A. & Keijer, J. (1998). High-resolution genotyping of Salmonella strains by AFLP-fingerprinting. Letters in Applied Microbiology 26, 131-135. Amara, R.R. & Vijaya, S. (1997). Specific polyadenylation and purification of total messenger RNA from Escherichia coli. Nucleic Acids Research 25, 3465-3470. Arnold, C., Metherell, L., Clewley, J. & Stanley, J. (1999a). Predictive modelling of fluorescent AFLP: a new approach to the molecular epidemiology of E. coli. Research in Microbiology 150, 33-44. Arnold, C., Metherell, L., Willshaw, G., Maggs, A. & Stanley, J. (1999b). Predictive fluorescent amplified-fragment length polymorphism analysis of Escherichia coli: high-resolution typing method with phylogenetic significance. Journal of Clinical Microbiology 37, 1274-1279. Bachem, C.W., van der Hoeven, R., de Bruijn, S., Vreugdenhil, D., Zabeau, M. & Visser, R.G. (1996). Visualization of differential gene expression using a novel method of RNA fingerprinting based on AFLP: analysis of gene expression during potato tuber development. Plant Journal 9, 745-753. Blears, M.J., De Grandis, S.A., Lee, H. & Trevors, J.T. (1998). Amplified fragment length polymorphism (AFLP): a review of the procedure and its applications. Journal of Industrial Microbiology and Biotechnology 21, 99-114. Bragard, C., Singer, E., Alizadeh, A., Vauterin, L., Maraite, H. & Swings, J. (1997). Xanthomonas translucens from small grains: diversity and phytopathological relevance. Phytopathology 87, 1111-1117. Brikun, I., Suziedelis, K. & Berg, D.E. (1994). DNA sequence divergence among derivatives of Escherichia coli K-12 detected by arbitrary PCR (random amplified polymorphic DNA) fingerprinting. Journal of Bacteriology 176, 1673-1682. Caetano-Annol6s, G., Bassam, B. & Gresshoff, EM. (1991). DNA amplification using very short arbitrary primers. BioTechnology 9, 553-557. Chalhouk, B.A., Thibault, S., Laucou, V., Rameau, C., H6fte, H. & Cousin, R. (1997). Silver staining and recovery of AFLP amplification products on large denaturing polyacrylamide gels. BioTechniques 22, 216-220. Clerc, A., Manceau, C. & Nesme, X. (1998). Comparison of randomly amplified polymorphic DNA with amplified fragment length polymorphism to assess genetic diversity and genetic relatedness within genospecies III of Pseudomonas syringae. Applied and Environmental Microbiology 64, 1180-1187. Coenye, T., Schouls, M., Govan, J.R., Kersters, K. & Vandamme, E (1999). Identification of Burkholderia species and genomovars from cystic fibrosis patients by AFLP fingerprinting. International Journal of Systematic Bacteriology 49, 1657-1666. Delcher, A.L., Kasif, S., Fleishmann, R.D., Peterson, J., White, O. & Salzberg, S.L. (1999). Alignment of whole genomes. Nucleic Acids Research 27, 2369-2376. Desai, M., Tanna, A., Wall, R., Efstratiou, A., George, R. & Stanley, J. (1998). Fluorescent amplifiedfragment length polymorphism analysis of an outbreak of group A streptococcal invasive disease. Journal of Clinical Microbiology 36, 3133-3137. Desai, M., Efstratiou, A., George, R. & Stanley, J. (1999). High-resolution genotyping of Streptococcus pyogenes serotype M 1 isolates by fluorescent amplified-fragment length polymorphism analysis. Journal of Clinical Microbiology 37, 1948-1952. Dijkshoorn, L., Aucken, H., Gerner-Smidt, E, Janssen, E, Kaufmann, M.E., Garaizar, J., Ursing, J. & Pitt, T.L. (1996). Comparison of outbreak and non-outbreakAcinetobacter baumannii strains by genotypic and phenotypic methods. Journal of Clinical Microbiology 34, 1519-1525. Duim, B., Wassenaar, T.M., Tigter, A. & Wagenaar, J. (1999). High-resolution genotyping of Campylobacter strains isolated from poultry and humans with amplified fragment length polymorphism fingerprinting. Applied and Environmental Microbiology 65, 2369-2375. Esteve, C. (1997). Is AFLP fingerprinting a true alternative to the DNA-DNA pairing method to
207 assess genospecies in the genus Aeromonas? International Journal of Systematic Bacteriology 47, 245-246. Forbes, K.J., Bruce, K.D., Jordens, J.Z., Ball, A. & Pennington, T.H. (1991). Rapid methods in bacterial fingerprinting. Journal of General Microbiology 137, 2051-2058. Geornaras, I., Kunene, N.E, von Holy, A. & Hastings, J.W. (1999). Amplified fragment length polymorphism fingerprinting of Pseudomonas strains from a poultry processing plant. Applied and Environmental Microbiology 65, 3828-3833. Gibson, J.R., Slater, E., Xerry, J., Tompkins, D.S. & Owen, R.J. (1998). Use of an amplified-fragment length polymorphism technique to fingerprint and differentiate isolates of Helicobacter. Journal of Clinical Microbiology 36, 2580-2585. Grady, R., Desai, M., O'Neill, G., Cookson, B. & Stanley, J. (1999). Genotyping epidemic methicillin-resistant Staphylococcus aureus phage-type 15 by fluorescent amplified-fragment length polymorphism. Journal of Clinical Microbiology 37, 3189-3203. Hauben, L., Vauterin, L., Moore, E.R., Hoste, B. & Swings, J. (1999). Genomic diversity of the genus Stenotrophomonas. International Journal of Systematic Bacteriology 49, 1749-1760. Heyndrickx, M., Vandemeulebroucke, K., Hoste, B., Janssen, E, Kersters, K., De Vos, E, Logan, N.A. & Berkeley, C.W. (1996). Reclassification of Paenibacillus (formerly Bacillus) pulvifaciens (Nakamura 1984) Ash et al. 1194, a later subjective synonym of Paenibacillus (formerly Bacillus) larvae (White 1960) Ash et al. 1994, as a subspecies of P larvae. Emended description of P. larvae with P. larvae subsp, larvae and P larvae subsp, pulvifaciens. International Journal of Systematic Bacteriology 46, 270-279. Hookey, J.V., Edwards, V., Patel, S., Richardson, J.E & Cookson, B.D. (1999). Use of fluorescent amplified fragment length polymorphism (fAFLP) to characterise methicillinresistant Staphylococcus aureus. Journal of Microbiological Methods 37, 7-15. Hu, G. (1993). DNA polymerase-catalyzed addition of non-templatal extra nucleotides to the 3' end of a DNA fragment. DNA Cell Biology 12, 763-770. Huys, G. & Swings, J. (1999). Evaluation of a fluorescent amplified fragment length polymorphism (FAFLP) methodology for the genotypic discrimination ofAeromonas taxa. FEMS Microbiology Letters 177, 83-92. Huys, G., Coopman, R., Vancanneyt, M., Kersters, I., Verstraete, W., Kersters, K. & Janssen, E (1993). High resolution differentiation of aeromonads. Medical Microbiology Letters 2, 248-255. Huys, G., Altwegg, M., H~inninen, M.-L., Vancanneyt, M., Vauterin, L., Coopman, R., Torck, U., Ltithy-Hottenstein, J., Janssen, E & Kersters, K. (1996a). Genotypic and chemotaxonomic description of two subgroups in the species Aeromonas eucrenophila and their affiliation to A. encheleia and Aeromonas DNA hybridization group 11. Systematic and Applied Microbiology 19, 616-623. Huys, G., Coopman, R., Janssen, E & Kersters, K. (1996b). High-resolution genotypic analysis of the genus Aeromonas by AFLP fingerprinting. International Journal of Systematic Bacteriology 46, 572-580. Huys, G., Kersters, I., Coopman, R., Janssen, E & Kersters, K. (1996c). Genotypic diversity among Aeromonas isolates recovered from drinking water production plants as revealed by AFLP TM analysis. Systematic and Applied Microbiology 19, 428-435. Huys, G., K~impfer, E, Altwegg, M., Coopman, R., Janssen, E, Gillis, M. & Kersters, K. (1997a). Inclusion of Aeromonas DNA hybridization group 11 in Aeromonas encheleia and extended descriptions of the species Aeromonas eucrenophila and A. encheleia. International Journal of Systematic Bacteriology 47, 1157-1164. Huys, G., K~impfer, E, Altwegg, M., Kersters, I., Lamb, A., Coopman, R. Ltithy-Hottenstein, J., Vancanneyt, M., Janssen, E & Kersters, K. (1997b). Aeromonas popoffii sp. nov., a mesophilic bacterium isolated from drinking water production plants and reservoirs. International Journal of Systematic Bacteriology 47, 1165-1171. Janssen, E (1993). The application of a novel PCR-based genomic fingerprinting method for the high-
208 resolution differentiation of aeromonads. In Abstracts of the fourth international symposium on Aeromonas and Plesiomonas, p. 17. ASM Press, Herndon, VA. Janssen, P. & Dijkshoorn, L. (1996). High resolution fingerprinting of Acinetobacter outbreak strains. FEMS Microbiology Letters 142, 191-194. Janssen, P., Coopman, R., Huys, G., Swings, J., Bleeker, M., Vos, P., Zabeau, M. & Kersters, K. (1996). Evaluation of the DNA fingerprinting method AFLP as a new tool in bacterial taxonomy. Microbiology 142, 1881-1893. Janssen, P., Maquelin, K., Coopman, R., Tjernberg, I., Bouvet, P., Kersters, K. & Dijkshoorn, L. (1997). Discrimination of Acinetobacter genomic species by AFLP fingerprinting. International Journal of Systematic Bacteriology 47, 1179-1187. Karlin, S., Mrazek, J. & Campbell A.M. (1997). Compositional biases of bacterial genomes and evolutionary implications. Journal of Bacteriology 179, 3899-3913. Keim, P., Kalif, A., Schupp, J., Hill, K., Travis, S.E., Richmond, K., Adair, M., Hugh-Jones, M., Kuske, C.R. & Jackson, P. (1997). Molecular evolution and diversity in Bacillus anthracis as detected by amplified fragment length polymorphism markers. Journal of Bacteriology 179, 818-824. Kersters, I., Huys, G., Van Duffel, H., Vancanneyt, M., Kersters, K. & Verstraete, W. (1996). Survival potential of Aeromonas hydrophila in freshwaters and nutrient-poor waters in comparison with other bacteria. Journal of Applied Bacteriology 80, 266-276. Knowles, D.J.C. (1997). New strategies for antibacterial drug design. Trends in Microbiology 5, 379-383. Koeleman, J.G., Stoof, J., Biesmans, D.J., Savelkoul, P.H. & Vandebroucke-Grauls, C.M. (1998). Comparison of amplified ribosomal DNA restriction analysis, random polymorphic DNA analysis, and amplified fragment length polymorphism fingerprinting for identification of Acinetobacter genomic species and typing of Acinetobacter baumannii. Journal of Clinical Microbiology 36, 2522-2529. Kokotovic, B. & On, S.L.W. (1999). High-resolution genomic fingerprinting of Campylobacter jejuni and Campylobacter coli by analysis of amplified fragment length polymorphisms. FEMS Microbiology Letters 173, 77-84. Kokotovic, B., Friis, N.E, Jensen, J.S. & Ahrens, P. (1999). Amplified-fragment length polymorphism fingerprinting of Mycoplasma species. Journal of Clinical Microbiology 37, 3300-3307. Kozian, D.H. & Kirschbaum, B.J. (1999). Comparative gene-expression analysis. Trends in Biotechnology 17, 73-78. Ktihn, I., Albert, M.J., Ansaruzzaman, M., Bhuiyan, N.A., Alabi, S.A., Islam, M.S., Neogi, P.K., Huys, G., Janssen, P., Kersters, K. & Mrllby, R. (1997). Characterization of Aeromonas spp. isolated from humans with diarrhea, from healthy controls, and from surface water in Bangladesh. Journal of Clinical Microbiology 35, 369-373. Lin, J.-J., Ma, J. & Kuo, J. (1999). Chemiluminescent detection of AFLP markers. BioTechniques 26, 344-348. Matz, M.V. & Lukyanov, S.A. (1998). Different strategies of differential display: areas of application. Nucleic Acids Research 26, 5537-5543. Meijer, A., Morr6, S., van den Bmle, A., Savelkoul, P. & Ossewaarde, J. (1999). Genomic relatedness of Chlamydia isolates determined by amplified fragment length polymorphism analysis. Journal of Bacteriology 181, 4469-4475. Moir, D.T., Shaw, K.J., Hare, R.S. & Vovis, G.E (1999). Genomics and antimicrobial drug discovery. Antimicrobial Agents and Chemotherapy 43, 439-446. Money, T., Reader, S., Qu, L.J., Dunford, R.P. & Moore, G. (1996). AFLP-based mRNA fingerprinting. Nucleic Acids Research 24, 2616-2617. Naas, T., Blot, M., Fitch, W.M. & Arber, W. (1995). Dynamics of IS-related genetic rearrangements in resting Escherichia coli K-12. Molecular Biology and Evolution 12, 198-207. Nei, M. & Li, W.-H. (1979). Mathematical model for studying genetic variations in terms of restriction endonucleases. Proceedings of the National Academy of Sciences of the United States
209 of America 76, 5269-5273. Olive, D.M. & Bean, E (1999). Principles and applications of methods for DNA-based typing of microbial organisms. Journal of Clinical Microbiology 37, 1661-1669. Owen, R.J. (1989). Chromosomal DNA fingerprinting - a new method of species and strain identification applicable to microbial pathogens. Journal of Medical Microbiology 30, 89-99. Quinn, ED., Newman, G.W. & King, C.H. (1997). In search of virulence factors of human bacterial disease. Trends in Microbiology 5, 20-26. Pedersen, K., Verdonck, L., Austin, B., Austin, D.A., Blanch, A.R., Grimont, EA.D., Jofre, J., Koblavi, S., Larsen, J.L., Tiainen, T., Vigneulle, M. & Swings, J. (1998). Taxonomic evidence that Vibrio carchariae Grimes et al. 1985 is a junior synonym of Vibrio harveyi (Johnson and Shunk 1936) Baumann et al. 1981. International Journal of Systematic Bacteriology 48, 749-758. Phillips, G.J., Arnold, J. & Ivarie, R. (1987). Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis. Nucleic Acids Research 15, 2611-2626. Picardeau, M., Prod'hom, G., Raskine, L., LePennec, M.E & Vincent, V. (1997). Genotypic characterization of five subspecies of Mycobacterium kansasii. Journal of Clinical Microbiology 35, 25-32. Rademaker, J.LW., Hoste, B., Louws, EJ., Kersters, K., Swings, J., Vauterin, L., Vauterin, E, & de Bruijn, EJ. (2000). Comparison of AFLP and rep-PCR genomic fingerprinting with DNA-DNA homology studies: Xanthomonas as a model system. International Journal of Systematic and Evolutionary Microbiology 50, 665-677. Restrepo, S., Duque, M., Tohme, J. & Verdier, V. (1999). AFLP fingerprinting: an efficient technique for detecting variation of Xanthomonas axonopodis pv. Manihotis. Microbiology 145, 107-114. Robinson, K.A., Robb, ET. & Schreier, H.J. (1994). Isolation of maltose-regulated genes from the hyperthermophilic archaeum, Pyrococcus furiosus, by subtractive hybridization. Gene 148, 137-141. Rocha, E.E, Viari, A. & Danchin, A. (1998). Oligonucleotide bias in Bacillus subtilis: general trends and taxonomic comparisons. Nucleic Acids Research 26, 2971-2980. Roman, B.L., Pham, V.N., Bennett, EE. & Weinstein, B.M. (1999). Non-radioisotopic AFLP method using PCR primers fluorescently labeled with Cy5 TM. BioTechniques 26, 236-238. Sarkar, N. (1997) Polyadenylation of mRNA in prokaryotes. Annual Review of Biochemistry 66, 173-197. Sloos, J.H., Janssen, E, van Boven, C.EA. & Dijkshoorn, L. (1998). AFLP TM typing of Staphylococcus epidermidis in multiple sequential blood cultures. Research in Microbiology 149, 221-228. Speijer, H., Savelkoul, E, Bonten, M., Stobberingh, E. & Tjhie, J. (1999). Application of different genotyping methods for Pseudomonas aeruginosa in a setting of endemicity in an intensive care unit. Journal of Clinical Microbiology 37, 3654-3661. Tjernberg, I. & Ursing, J. (1989). Clinical strains of Acinetobacter classified by DNA-DNA hybridization. Acta Pathologica Microbiologica Scandinavica 97, 595-605. Towner, K.J. & Cockayne, A. (1993). Molecular methods for microbial identification and typing. Chapman & Hall, London. Valsangiacomo, C., Baggi, E, Gaia, V., Balmelli, T., Peduzzi, R. & Pifferetti, J.-C. (1995). Use of amplified fragment length polymorphism in molecular typing of Legionella pneumophila and application to epidemiological studies. Journal of Clinical Microbiology 33, 1716-1719. Van Belkum, A. (1994). DNA fingerprinting of medically important microorganisms by use of PCR. Clinical Microbiology Reviews 7, 174-184. Van Eldere, J., Janssen, E, Hoefnagels-Schuermans, A., van Lierde, S. & Peetermans, W. (1999). Amplified-fragment length polymorphism analysis versus macro-restriction fragment analysis for molecular typing of Streptococcus pneumoniae isolates. Journal of Clinical Microbiology 37, 2053-2057. Vaneechoutte, M. (1996). DNA fingerprinting techniques for microorganisms. A proposal for
210 classification and nomenclature. Molecular Biotechnology 6, 115-142. Vos, E, Hogers, R., Bleeker, M., Reijans, M., van de Lee, T., Homes, M., Frijters, A., Pot, J., Peleman, J., Kuiper, M. & Zabeau, M. (1995). AFLP: a new technique for DNA fingerprinting. Nucleic Acids Research 23, 4407-4414. Welsh, J. & McClelland, M. (1990). Fingerprinting genomes using PCR with arbitrary primers. Nucleic Acids Research 18, 7213-7218. Williams, J.G.K., Kubelik, A.R., Livak, K.J., Rafalski, J.A. & Collins, M.D. (1990). DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Research 18, 6531-6535. Zabeau, M. & Vos, E (1993). Selective restriction fragment amplification: a general method for DNA fingerprinting. European Patent Office 0 534 858 A1.
211
9
Application and Analysis of ARDRA Patterns in Bacterial Identification, Taxonomy and Phylogeny
M a r i o V a n e e c h o u t t e I and M a r c H e y n d r i c k x 2
ZDepartment of Clinical Chemistry, Microbiology & Immunology, Ghent University Hospital, Belgium; 2Departmentfor Animal Product Quality, Centerfor Agricultural Research, Melle, Belgium
CONTENTS 9.1
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.
B.
9.2
GENERAL REMARKS ON THE APPLICABILITY OF A R D R A A. B. C.
212 212 212 212 212 214
General remarks on PCR-RFLP analysis (i) Principle (ii) Monitoring the discriminatory power of PCR-RFLP analysis (iii) PCR-RFLP analysis for the differentiation of species ARDRA: Amplified ribosomal DNA restriction analysis; what's in a name?
..
214 214 215 215
..
216 216 219
Technical ease and speed Discriminatory power Implementation on automated electrophoresis equipment
9.3
A P P L I C A T I O N O F A R D R A TO SPECIES DIFFERENTIATION A. Identification of cultured organisms B. Direct detection in clinical samples
9.4
A R D R A AS A SCREENING METHOD FOR THE STUDY OF MICROBIAL ECOLOGY, EPIDEMIOLOGY AND BIODIVERSITY . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Screening collections of cultured isolates B. Screening population structure and diversity, starting from cloned amplified rRNA genes C. Studying profiles of whole communities
220 220 220 222
9.5
A R D R A AS A TOOL IN B A C T E R I A L P H Y L O G E N Y A N D TAXONOMY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 A. Overview 222 B. Theoretical considerations 223 (i) Limitations of rRNA genes as tools for studying phylogenetic relationships and bacterial taxonomy 223 (ii) Problems with the cluster analysis of restriction patterns 226 C. Practical considerations 228 (i) Selection of restriction enzymes 228 (ii) Gel electrophoresis 230 (iii) Digitisation and computer analysis of the patterns 230 (iv) Calculation of similarity coefficients and clustering 231
9
Elsevier Science B.V. All rights reserved.
212
9.6
D. Applicationof ARDRA for phylogenetic and taxonomic research (i) Use as a rapid taxonomic classification tool (ii) Applicationin the clarification of the phylogeny and taxonomy of the genus Bacillus sensu lato E. The use of ARDRA in phylogenetic studies: conclusions
232 235
OVERALL CONCLUSIONS
235
REFERENCES
......................
................................
9.1
INTRODUCTION
A.
General remarks on PCR-RFLP analysis
232 232
236
(i) Principle PCR-RFLP analysis consists of polymerase chain reaction (PCR)-based amplification of a stretch of DNA - usually a gene or a part of a gene - combined with subsequent restriction digestion of the PCR product and electrophoretic analysis of restriction fragment length polymorphism (RFLP analysis). It can therefore be used to differentiate between species and strains of living organisms as a shortcut to sequence determination. The PCR step enables enrichment and purification of a certain part of the genome, followed by restriction digestion of the amplified DNA to reveal sequence polymorphism in a rapid, technologically simple and highly reproducible manner. PCR-RFLP analysis compares well with other techniques which combine amplification with single strand conformation polymorphism (SSCP) analysis (Widjojoatmodjo et al., 1995) or cleavase fragment length polymorphism (CFLP) analysis (Brow et a/.,1996; Lyamicheva et al., 1996; Sreevatsan et al., 1998), or with denaturing gradient (DGGE) (Muyzer et al., 1993; Marsh et al., 1998) or temperature gradient gel electrophoresis (TGGE) (Ntibel et al., 1996; Smit et al., 1999) analysis, all of which also reveal sequence polymorphism among strains without the need for full sequence determination (Vaneechoutte, 1996). (ii) Monitoring the discriminatory power of PCR-RFLP analysis The discriminatory power of PCR-RFLP analysis can be monitored by the choice of more or less polymorphic (i.e., variable) regions to be amplified, by amplifying stretches of different length, and by performing restriction with enzymes which digest more or less frequently. This flexibility enables differences to be studied between strains or between species. (iii) PCR-RFLP analysis for the differentiation of species This chapter focuses on the application of PCR-RFLP analysis to differentiate bacterial (and eukaryotic) species. The genes which can be used for this purpose have
213 to be well-conserved in order to minimise intra-specific variability which would obscure the species differentiation possibilities. Protein-encoding genes have been used successfully to differentiate bacterial species: e.g., recA for Acinetobacter spp. (Nowak & Kur 1996; Jawad et al., 1998), hsp65 for Mycobacterium and Nocardia spp. (Telenti et al., 1993; Steingrtibe et al., 1995a, b) and the histidine operon for Azospirillum spp. (Grifoni et al., 1995). Intra-specific polymorphism in protein-encoding genes is generally high, and amplification of short regions is often necessary to avoid too much intra-specific variability in the RFLP patterns. For example, Plikaytis et al. (1992) found 10 different restriction types for 31 Mycobacterium gordonae strains when amplifying a 1380-bp fragment of the hsp65 gene, while Telenti et al. (1993) found only five different types among 24 strains when amplifying only 439 bp. Most applications for bacterial species differentiation have been based on the rRNA cistron, which contains the 16S (also named small subunit rRNA or ssu rRNA), 23S and 5S rRNA genes, ordered in the same manner in most bacteria and with the genes separated by spacer regions of variable length. For eukaryotes, mitochondrial DNA (Boudry et al., 1998; Orui, 1998) or rRNA genes (Vilgalys & Hester, 1990; Clark, 1997) have been used. Several regions of RNA genes are highly conserved, such that primers complementary to these regions will enable amplification of these genes for most or all bacteria or eukaryotes (Greisen et al., 1994; Clark & Diamond, 1997; Liu et al., 1997; Smitet al., 1997). Besides highly conserved regions, more variable regions are present within the rRNA operon, which allow for differentiation between most species. The intra-specific variability of these regions is limited, leading to relatively homogeneous restriction fingerprints for organisms of the same species. For example, only one 16S rRNA gene restriction fingerprint is observed with M. gordonae (Vaneechoutte et al., 1993; personal unpublished data). Different regions of the rRNA cistron have been used for PCR-RFLP analysis, including the 16S rRNA gene promotor region (Dobner et al., 1996), the 16S rRNA gene (Gtirtler et al., 1991; Jayarao et al., 1992; Ralph et al., 1993; Vaneechoutte et al., 1993; Carlotti & Funke 1994), the 16S-23S spacer region (Harasawa et al.; 1993; Dolzani et al., 1995; Nowak et al., 1995; Liveris et al., 1999), the 16S + spacer + part of the 23S rRNA gene (Vaneechoutte et al., 1992; Salzano et al., 1994), the 23S-5S spacer region (Wittenbrink et al., 1998), and the (almost) complete rRNA cistron (Smith-Vaughan et al., 1995; Ibrahim et al., 1996). A variation that has been described involves combined amplification of the 16S rRNA and recA genes, followed by simultaneous electrophoretic analysis of restriction digests of both products (Nowak & Kur, 1996). This chapter focuses on ARDRA - i.e., amplified rDNA restriction analysis, a short name for PCR-RFLP analysis of the rRNA gene(s) - aimed at differentiating between bacterial and eukaryotic species.
214 B.
A R D R A : Amplified ribosomal DNA restriction analysis; what's in a name ?
Amplified ribosomal DNA restriction analysis (ARDRA) or PCR-RFLP analysis of the rRNA gene(s) could be designated more briefly as 'restriction analysis of the rRNA gene', but this name has been used previously for ribotyping (e.g., Grimont & Grimont, 1986; Martinetti-Lucchini & Altwegg, 1992; see Chapter 5), which actually consists of selective restriction fragment hybridisation with the rRNA cistron as the probe, and which has technically and theoretically nothing in common with ARDRA (Vaneechoutte, 1996). To avoid confusion it has been suggested that ARDRA is used to designate PCR-RFLP analysis of the rRNA genes (Vaneechoutte et al., 1992). Adding to the confusion is the fact that 'PCR-ribotyping' has been used as a name for amplification of the spacer region between the 16S and 23S rRNA genes without subsequent restriction digestion (Kostman et al., 1992), and also as a name for ARDRA of the complete rRNA cistron (Smith-Vaughan et al., 1995). Further confusion is caused by the fact that ribotyping has been commercialised under the name of 'riboprinting' (Riboprinter; QualiCon Europe, Birmingham, UK), a term which was already in use for ARDRA of eukaryotes (Clark, 1993; 1997; Stothard et al., 1998) and also bacteria (Mas-Castella et al., 1996). As a result of this nomenclatural confusion, ARDRA appears to be at present the only unambiguous name in use to denote PCR-RFLP analysis of rRNA genes, while it also provides the best description of the different technical aspects involved. Names like PCR-ARDRA (Giraffa et al., 1998) are tautological. ARDREA is a variant name that has also been used for ARDRA (Selenska-Pobell et al., 1998). 9.2
GENERAL REMARKS ON THE APPLICABILITY OF ARDRA
A.
Technical ease and speed
ARDRA combines simple DNA extraction methods - boiling and/or alkaline digestion of cultured bacteria are usually sufficient to obtain amplifiable DNA with PCR and the simple techniques of restriction digestion and agarose gel electrophoresis. When a limited number of restriction enzymes is used, the method is significantly less laborious than sequencing, so that large collections of isolates can be screened rapidly for sequence polymorphism. Universal primers enable different species and strains to be studied with a single pair of primers and a single approach. In theory, this could lead to the construction of databases (libraries), although the fact that different restriction enzymes and different regions of the RNA cistron have been used has hampered the development of such a commonly accessible database. For instance, identification of Acinetobacter spp. by means of ARDRA has been described using the 16S rRNA gene (Vaneechoutte et al., 1995a; Dijkshoom et al., 1998), the spacer region between the 16S and 23S rRNA genes (Dolzani et al., 1995; Nowak et al., 1995), or the complete 16S and 23S rRNA genes, including the spacer (Ibrahim et al., 1996).
215 B.
Discriminatory power
The efficacy of ARDRA for species differentiation compares well with other methods such as PCR-RFLP analysis of the recA gene (Jawad et al., 1998), the hipl interspersed gene (Smith et al., 1998), or the histidine gene (Grifoni et al., 1995), as well as various other species differentiation techniques (Woo et al., 1997; Vaneechoutte et al., 1998a). Lee et al. (1997) found good correlation between amplified mitochondrial (mt) DNA-RFLP analysis, isoenzyme analysis and ARDRA for differentiation ofAcanthamoeba spp., while Conville et al. (2000) reported that ARDRA performed slightly better than hsp65 restriction analysis for differentiating Nocardia spp. Sreevatsan et al. (1998) reported that PCR-CFLP analysis was more sensitive for detecting point mutations than PCR-SSCP and PCR-RFLP analysis. The report by Koeleman et al. (1998) that ARDRA is less well suited for species identification than RAPD analysis and selective restriction fragment amplification (AFLPTM; see Chapter 8) should be considered with great caution, since the interpretation of the data was highly problematical (Vaneechoutte et al., 1999b; see also section 5.C.iii). However, in order to approach the discriminatory power of sequencing, multiple restriction enzymes need to be used (Laguerre et al., 1994a, b; Nesme et al., 1995; Lee et al., 1998). C.
Implementation on automated electrophoresis equipment
Several formats are available that allow ARDRA to be converted from an agarose gel-based technique to a technique capable of analysis on automated electrophoresis systems using fluorescence-based detection of the restriction fragments. The advantages of this approach are automation, a high resolution of the restriction fragments compared to agarose gel electrophoresis, and immediate digitisation of the fingerprints for subsequent computer analysis. Fluorescence can be introduced during rRNA gene amplification by use of a fluorescent primer or by the incorporation of fluorescent nucleotides during amplification. The latter method enables all the restriction fragments to be visualised, since a sufficient number of each of the fragments will be fluorescently labelled to yield a detectable signal for each possible restriction fragment. However, this approach has not been used frequently (Pukall et al., 1998), probably because it is too expensive. Most approaches have involved the incorporation of a single fluorescentlylabelled primer. Restriction digestion then results in only the fragment attached to the labelled primer being observed. This technique has also been named T(erminal)-RFLP analysis (Liu et al., 1997; Marsh et al., 1998). However, study of the polymorphism only in the 5' restriction fragment reduces the discriminatory power. Thus, Liu et al. (1997) found that restriction of terminally-labelled 16S rRNA gene fragments with HhaI yielded 102-bp fragments with 23 species belonging to four genera of the Bacteroides group and the genus Campylobacter, while 374-bp fragments were produced with 23 species belonging to four genera of the Neisseria group and eight genera of the Vibrio group. Nevertheless, the use
216 of different restriction enzymes has been reported to produce a sufficiently specific signal in most cases (Martin et al., 1993; Avaniss-Aghajani et al., 1994; 1996; Liu et al., 1997). A solution to possible loss of discriminatory power is partial digestion, whereby even the more distal restriction sites can be revealed (T6tsch et al., 1996). This has the additional advantage that the location of the restriction sites can be read directly from the length of the restriction fragments. 9.3
APPLICATION OF ARDRA TO SPECIES DIFFERENTIATION
A.
Identification of cultured organisms
ARDRA has been applied to the differentiation of species of most bacterial genera and groups, as well as several eukaryotes (Table 9.1). As described elsewhere in this book, species differentiation by a variety of genotypic methods may circumvent many phenotypic identification problems. This is also well established for ARDRA. For example, Manachini et al. (1998) found that 21 of 161 Bacillus licheniformis strains were phenotypically atypical, although assignation to this species was possible by means of ARDRA. Whereas many strains of lactobacilli were unidentifiable by biochemical methods, identification with ARDRA was unambiguous (Andrighetto et al., 1998). Similarly, atypical isolates of Listeria are readily identified by different genotypic techniques, including ARDRA (Vaneechoutte et al., 1998a). Apart from atypical isolates, the species comprising some genera are very difficult to identify phenotypically. For example, Veillonella spp. cannot be distinguished reliably by conventional phenotypic tests, but can be differentiated by ARDRA (Sato et al., 1997). Similarly, the four described species of the Acinetobacter calcoaceticus- A. baumannii complex are reliably distinguishable only by genotypic techniques, including ARDRA (Dijkshoorn et al., 1998). Organisms showing atypical growth (e.g., 'small-colony' Staphylococcus aureus, which are often coagulase negative) can also be identified genotypically. However, it should be noted that some closely related species, including members of the Mycobacterium tuberculosis complex (Vaneechoutte et al., 1993), Neisseria meningitidis and N. polysaccharea, and most species of the genus Staphylococcus (personal unpublished data) are difficult or impossible to differentiate by means of ARDRA because of limited intra-generic rRNA sequence divergence. In addition, ARDRA has been used to assess diversity below the species level. When long stretches of the rRNA operon are used to study species known for their high intra-specific genomic variability (e.g., non-typeable Haemophilus influenzae), ARDRA can be used for typing individual isolates (Smith-Vaughan et al., 1995), although the high variability of ARDRA pattens established in this study has been questioned (Ibrahim, 1997). Also, the spacer regions are more variable than the rRNA genes themselves. Restriction digestion of the amplified 16S-23S rRNA spacer region of uropathogenic Escherichia coli strains indicated the pres-
217 Table 9.1. Species differentiation studies by means of ARDRA
Prokaryotes Abiotrophia A choleplasma Acinetobacter
Actinomyces Agrobacterium Alcaligenes Aeromonas
hyperthermophilic Archaea Arcobacter Azospirillum Bacillus sensu lato Bacteroides Bartonella Bordetella Bradyrhizobium Brevibacterium Campylobacter Capnocytophaga Chlamydia Clostridium
Comamonadaceae Corynebacterium
Cyanobacteria Enterococcus Eubacterium Gardnerella vaginalis Helicobacter Lactobacillus Leptospira Listeria Mycobacterium Mycoplasma Neisseria meningitidis Nocardia Nitrobacter (Continued.)
Ohara-Nemoto et al. (1997) Deng et al. (1992) Dolzani et al. (1995); Vaneechoutte et al. (1995a); Nowak & Kur (1996); Ibrahim et al. (1996); Dijkshoorn et aL (1998); Jawad et aL (1998); Chu et al. (1999) Sato et al. (1998a) Khbaya et al. (1998); Terefework et aL (1998) Vandamme et al. (1996) Graf (1999) DiRuggiero et al. (1995) Marshall et al. (1999) Grifoni et al. (1995); Han & New (1998) Heyndrickx et al. (1995; 1996a, b; 1997; 1998); Manachini et al. (1998) Wood et al. (1998) Matar et al. (1999) Vandamme et al. (1997) Nuswantara et al. (1997) Carlotti et al. (1994) Cardarelli-Leite et al. (1996); Marshall et al. (1999) Wilson et al. (1995) Meijer et al. (1997) Gfirtler et al. (1991); Vaneechoutte et al. (1996) Vaneechoutte et al. (1992) Vaneechoutte et al. (1995b) Lyra et al. (1997); Smith et al. (1998) Jayarao et al. (1992) Sato et al. (1998b) Ingianni et al. (1997) Marshall et al. (1999) Andrighetto et al. (1998); Giraffa et al. (1998) Ralph et al. (1993); Woo et al. (1997) Vaneechoutte et al. (1998a) Hughes et al. (1993); Vaneechoutte et al. (1993); Dobner et al. (1996); T6tsch et al. (1996); Roth et al. (2000) Deng et al. (1992); Harasawa et al. (1993); Fan et al. (1995) McLaughlin et al. (1993) Conville et al. (2000) Navarro et al. (1992)
218 Table 9.1. Continued.
Prokaryotes Photorhabdus Phytoplasma Prevotella Propionibacterium Pseudomonas Ralstonia Rhizobium Rochalimea Saccharo mo no spo ra Spiroplasma Streptococcus Thiobacillus Ureaplasma Veillonella
Vibrionaceae Xanthomonas Xenorhabdus
Fischer-Le Saux et al. (1998) Gundersen et al. (1996); Lee et al. (1998) Milsom et al. (1996); Wood et al. (1998) Riedel et al. (1998) Laguerre et al. (1994b); Keel et al. (1996); Manceau & Hovais (1997) Brim et al. (1999); Vandamme et al. (1999) Laguerre et al. (1994a); Khbaya et al. (1998); Terefework et al. (1998) Matar et al. (1993) Yoon et al. (1997) Deng et al. (1992) Jayarao et al. (1992); Salzano et al. (1994) Selenska-Pobell et al. (1998) Deng et al. (1992) Sato et al. (1997) Urakawa et al. (1997; 1998) Nesme et al. (1995) Fischer-Le Saux et al. (1998)
Eukaryotes Acanthamoeba Armillaria Biomphalaria snails Cryptococcus Entamoaeba
Ectomycorrhizal fungi Microsporidia Meloidogynidae nematodes Trypanosoma cruzi
Wine yeast spp.
Vodkin et al. (1992); Lee et al. (1997); Chung et al. (1998) Frontz et al. (1998) Vidigal et al. (1998) Vilgalys & Hester (1990) Clark & Diamond (1997) Henrion et al. (1992) Fedorko et al. (1995); Sironi et al. (1997) Orui (1998) Clark & Pung (1994); Stothard et al. (1998) Guillamon et al. (1998)
ence of two groups which corresponded with differences in sucrose and raffinose utilisation and in G-adhesin production (Garcia-Martinez et al., 1996). Similarly, variability among Borrelia burgdorferi strains was observed when ARDRA was used to analyse the spacer region (Liveris et al., 1999). In clinical microbiology, ARDRA has been used to identify cultured clinical pathogens for which identification would otherwise have been difficult or impossible (Vaneechoutte et al. 1995a; Claeys et al., 1996; Bernards et al. 1997). For example, it has recently become clear that Corynebacterium amycolatum is gaining importance as a multi-resistant pathogen for man. Its importance has pre-
219 viously been obscured because of its misidentification as either C. xerosis, C. minutissimum or C. striatum as a result of poorly established taxonomy and difficulties in phenotypic identification. However, ARDRA enables rapid and unambiguous identification of C. amycolatum (Vaneechoutte et al., 1995b; 1998b). B.
Direct detection in clinical samples
When using universal 16S rRNA gene primers, direct identification of bacteria in clinical samples by means of ARDRA is only possible for normally sterile samples such as blood, tissue or lumbal fluid. Infections involving such samples usually consist of only a single pathogen, and restriction analysis of the amplified rRNA gene will therefore be capable of easy interpretation. In addition, instead of using universal bacterial primers, it is also possible to use genus-specific primers. This enables ARDRA to be performed directly on clinical samples that would normally be expected to be non-sterile. Meijer et al. (1997) showed how amplification of a 803-bp fragment of the 16S-23S rRNA spacer region with Chlamydia-specific primers allowed direct detection of these organisms in clinical specimens and immediate differentiation of the four species by means of RFLP analysis. Amplification with primers specific for the ssu rRNA gene of microsporidia, followed by restriction with HphI, enabled human microsporidioses to be diagnosed rapidly (Sironi et al., 1997). Harasawa et al. (1993) used ARDRA to check cell cultures for Mycoplasma contamination, while Matar et al. (1999) used ARDRA directly on clinical specimens to differentiate between Bartonella spp. A technical disadvantage of ARDRA is that the amplification efficiency must be high to obtain sufficiently visible restriction patterns. This efficiency can be achieved by the use of nested PCR, as has been demonstrated with Tropheryma whippelii (Dauga et al., 1997). Nested PCR (first round with Gram-positive 16S rRNA gene-specific primers, followed by a second round with internal primers specific for the 16S rRNA gene of T. whippelii) was used in combination with restriction digestion for the direct detection and identification of this pathogen in clinical samples. In this case, restriction digestion was used to confirm amplification specificity instead of the more usual (but more laborious) hybridisation of the amplicon with a specific probe. Resistance to antibiotics interfering with ribosomal activity can also be revealed by ARDRA. Thus, species-specific amplification of a 629-bp fragment of the 23S rRNA gene of Helicobacter pylori, followed by restriction digestion with BsaI and BbsI, has been used to recognise clarithromycin-resistant H. pylori strains (Sevin et al., 1998). It is obvious that the application range of ARDRA with specific primers is more limited than when universal primers are used.
220 9.4
ARDRA AS A SCREENING METHOD FOR THE STUDY OF MICROBIAL ECOLOGY, EPIDEMIOLOGY AND BIODIVERSITY
A.
Screening collections of cultured isolates
Rapid genotypic techniques such as ARDRA enable large collections of cultured isolates to be screened for the presence of different species, and are therefore well-suited for the screening of bacterial (and eukaryotic) communities and for epidemiological studies. While large-scale screening is capable of providing an impression of the biological diversity present in a population, representatives of the different ARDRA types observed can then be further characterised, e.g., by full sequence determination. The following few examples serve to illustrate use of the technique. Thus, using ARDRA, it was shown that the genotypic diversity among the bacterial genera Xenorhabdus and Photorhabdus, which are symbionts of entomopathogenic nematodes, reflected the genotypic diversity of their hosts (Fischer-Le Saux et al., 1998). In another study combining phenotypic identification and ARDRA to examine the commensal occurrence ofAcinetobacter spp. on human skin, it was demonstrated that A. baumannii and Acinetobacter sp. 13TU, the clinically most important nosocomial Acinetobacter species, occur only rarely on skin, which leaves their natural habitat as yet still unknown (Seifert et al., 1997). Other examples include the use of ARDRA by Andrighetto et al. (1998) to study the occurrence of homofermentative thermophilic lactobacilli in dairy products, by Navarro et al. (1992) to study Nitrobacter populations, and by Becker et al. (1998) to study lignite carbonisation wastewater populations. ARDRA of cultured bacteria from different communities can also be used to estimate bacterial diversity. A study by Fulthorpe et al. (1998) showed that most (91%) of the 3-chlorobenzoate mineralising soil organisms isolated from six regions on five continents were endemic (see also Staley, 1999). Barberio & Fani (1998) applied ARDRA to study that part of the microbial community of a sewage treatment plant which could be cultured on a selective medium containing two nonylphenol ethoxylates as the sole carbon source. In this way, the cultivable bacteria with bioremediating capacities could be readily isolated and characterised. Accordingly, the cultivable part of an oil-degrading bacterial community from the Venice lagoon was first studied by means of ARDRA (Di Cello et al., 1997). Characterisation of large collections of cultured strains by means of ARDRA has been carried out for agricultural soils (Ovre & Torsvik, 1998). B~
Screening population structure and diversity, starting from cloned amplified rRNA genes
One possible way to study bacterial communities is by direct amplification of the 16S rRNA genes present in a sample, using universal primers, whereafter the amplified mixture of 16S rRNA genes is cloned and the cloned genes are differen-
221 tiated by ARDRA. Using DNA reassociation analysis, Torsvik et al. (1998) demonstrated that bacterial communities in pristine soil and sediments may contain more than 10,000 different bacterial types. The diversity of the total soil community was at least 200-fold higher than the diversity estimated on the basis of cultivable bacterial isolates. This indicated that the culture conditions selected for only a distinct sub-population of all the bacterial species present in the environment. LaMontagne et al. (1998) showed that the more abundant species are not amplified any more efficiently than the less abundant species, and thus that direct amplification combined with cloning offers a reliable picture of the true composition of such a community. Other studies have also reported little amplification bias with mixed cultures (Smit et al., 1997; Dojka et al., 1998; Wood et al., 1998). However, Nusslein & Tiedje (1998) found that preceding fractionation of DNA on the basis of G+C content enabled the detection of less dominant organisms which would have been overlooked by direct amplification of eubacterial rRNA genes. Dojka et al. (1998) cloned 812 amplified ssu rRNA genes from hydrocarbonand chlorinated solvent-contaminated aquifers under bioremediation. More than 50% of the clones had a unique ARDRA fingerprint. Sequencing of the amplified gene of each ARDRA type which occurred more than once revealed 94 bacterial sequence types, of which 10 were found to have no phylogenetic association with known taxonomic divisions. Similarly, Chandler et al. (1998) amplified and cloned bacterial and archaeal 16S rRNA genes from a low biomass paleosol community. Of 746 bacterial and 190 archaeal clones that were characterised by ARDRA, 242 bacterial and 16 archaeal clones were partially sequenced and compared against the ssu rRNA gene database (RDP) and GenBank. Six novel eubacterial sequences, clustering with or near the Chloroflexaceae, and 16 unique archaeal ARDRA groups were recognised. Characterisation by ARDRA of more than 300 clones from the amplified rRNA gene mixture of a hot spring at Yellowstone National Park, followed by sequence determination of 122 clones with representative ARDRA patterns, indicated that 30% of the sequence types were unaffiliated with 14 previously recognised bacterial divisions, and that they comprised 12 novel candidate divisions (Hugenholtz et al., 1998). Accordingly, the epibiotic bacterial flora of the hydrothermal vent polychaete Alvinella pompejana has been characterised in this manner (Haddad et al., 1995). Other studies include the characterisation of bacterial communities in marine sediments (Gray & Herwig, 1996; Rath et al., 1998), in tundra soil (Zhou et al., 1997), in PCB-dechlorinating anaerobic enrichments (LaMontagne et al., 1988), and the microorganisms associated with the seagrass Halophila stipulacea (Weidner et al., 1996). Instead of starting from cloned amplified rRNA genes, microbial diversity can also be assessed by means of sequence-dependent electrophoretic separation of the mixture of amplified rRNA genes (Muyzer et al., 1993; Ntibel et al., 1996; Smit et al., 1999), followed by reamplification of excised bands with the same primers and sequence analysis, or by restriction digestion of the amplified products (see Chapter 11).
222 C.
Studying profiles of whole communities
Instead of performing ARDRA on cultured isolates or on individually cloned rRNA genes, it is possible to create profiles of total communities in a single step by performing restriction digestion on the mixture of amplified rRNA genes. This approach enables a quick estimate of bacterial diversity in a community and investigations of the population dynamics (Liu et al., 1997; Princic et al., 1998). Thus, ARDRA (with rRNA gene primers specific for ammonium-oxidising bacteria) has been used to study the influence of varying ammonium levels on the composition of a bacterial community (Princic et al., 1998). Reversal towards the original composition was observed when the initial ammonium levels were restored. Marsh et al. (1998) found that ARDRA of fluorescently-labelled end restriction fragments was more sensitive then DGGE for studying changes in the eukaryotic community of activated sludge. Smit et al. (1997) studied shifts in microbial community structure and diversity caused by copper contamination of soil, while ARDRA of reactor community DNA has demonstrated how fixed-film reactor communities with different starting compositions can converge to a community of the same composition which is then stable for several months (Massol-Deya et al., 1997). Alternative community fingerprinting methods make use of sequence dependent electrophoresis techniques (Muyzer et al., 1993; Ntibel et al., 1996; Smit et al., 1999). 9.5
ARDRA AS A TOOL IN BACTERIAL PHYLOGENY AND TAXONOMY
A.
Overview
Bacterial taxonomy consists of classification, nomenclature and identification of microorganisms. It is now generally accepted that the phylogenetic relationships between microorganisms should be used as a framework for modem bacterial taxonomy. The most generally applied method for determining phylogenetic relationships between microorganisms is based on comparative analysis of the 16S rRNA gene sequences (Neefs et al., 1990). PCR-based amplification of the rRNA gene, combined with cycle sequencing, has led to an explosion of available sequences deposited in several publically accessible international libraries. More than 41,000 entries were listed in July 2000 when searching for "16S" in the GenBank nucleotide sequence database (http://www.ncbi.nlm.nih.gov/htbin_post/ Entrez/Query ?db=n). The 16S rRNA gene is normally chosen for sequencing because it contains conserved as well as variable regions- as dictated by the structural and functional constraints of the rRNA molecule - thereby enabling study of phylogenetic relationships between all bacterial taxa (Woese, 1987). Instead of full sequence determination, cluster analysis of the combination of ARDRA patterns obtained with different restriction enzymes has been used successfully for phylogenetic analysis and/or taxonomic classification of microorganisms (Table 9.2). However, before
223 Table 9.2. Examples of the application of ARDRA for phylogenetic and/or taxonomic studies Alcaligenes Bacillus sensu lato
Thermophilic soil bacilli Bordetella Borrelia Capnocytophaga
Dairy lactobacilli Facultative hydrogenotrophs Phytoplasma Photorabdus Prevotella Pseudomonas
Rhizobia from root nodules Uncultured microorganisms from a seagrass Xenorhabdus Xanthomonas
Vandamme et al. (1996) Heyndrickx et al. (1995; 1996a, b; 1997; 1998) Mora et al. (1998) Vandamme et al. (1996) Wang et al. (1997a); Liveris et al. (1999) Wilson et al. (1995) Andrighetto et al. (1998) Brim et al. (1999) Gundersen et al. (1996); Davis et al. (1998); Lee et al. (1998) Fischer-Le Saux et al. (1998) Haraldson & Holbrook (1998) Laguerre et al. (1994a); Keel et al. (1996); Manceau & Horvais (1997) Rome et al. (1996); Khbaya et al. (1998) Weidner et al. (1996) Fischer-Le Saux et al. (1998) Nesme et al. (1995)
focusing on the application of ARDRA for phylogeny and taxonomy, several remarks should be made with regard to the general applicability of rRNA genes and of restriction digestion for these purposes.
B.
Theoretical considerations
(i)
Limitations of rRNA genes as tools for studying phylogenetic relationships and bacterial taxonomy
(a) Discriminatory power. It is generally accepted that organisms sharing >97% rRNA gene similarity may belong to a single species, but that the resolution of 16S rRNA gene sequence analysis between closely related species is usually low, so that DNA-DNA hybridisation or other techniques remain necessary for precise species delineation (Stackebrandt & Goebel, 1994). Furthermore, it is well known that some species which are clearly different and valid, as shown by DNA-DNA hybridisation, may have identical or nearly identical (>99% similarity) 16S rRNA gene sequences (Fox et al., 1992). For example, the species Acinetobacter haemolyticus and A. johnsonii are only related distantly according to DNA-DNA homology, but have almost identical ARDRA patterns (Vaneechoutte et al., 1995a). It should also be mentioned that there are some specific cases in which microorganisms have a separate species status for other than mere taxonomic reasons, e.g., because of different levels of clinical importance or pathogenicity. Thus, Bor-
224 detella pertussis, B. parapertussis and B. bronchiseptica share >80% DNA-DNA homology and 99.7% 16S rRNA gene similarity, and would normally be regarded as a single species on mere taxonomic grounds (Vandamme et al., 1997). (b) Mosaic rRNA genes. The existence of mosaic rRNA genes (Sneath, 1993; Gtirtler, 1999) indicates the occurrence of horizontal transfer and recombination of (parts of) rRNA cistrons between strains and species, possibly limiting the usefulness of the rRNA genes for phylogenetic purposes in some bacterial groups. Horizontal exchange might be an explanation for the observation with some genera that there is an apparent discongruence between 16S rRNA gene similarity and total DNA homology values. For example, the four described genomic species of the Acinetobacter calcoaceticus- A. baumannii complex have high DNA homology and are also biochemically closely related. However, restriction digestion of the 16S rRNA gene readily reveals several differences between these species. These findings for Acinetobacter are largely confirmed by 16S rRNA gene sequencing (Ibrahim et al., 1997). (c) Introns or intervening sequences. Introns or intervening sequences (IVSs) have been identified in some species. A 235-bp IVS was found at the same location in the ssu rRNA gene of different strains of Helicobacter canis (Linton et al., 1994), and multiple introns of different length were detected at different positions in the ssu rRNA gene of Thermoproteus strains (Itoh et al., 1998). The archaeon Pyrobaculum aerophilum contains a 713-bp intron, while the closely related species P. islandicum contains no intron (Burggraf et al., 1993). In most of the multiple 16S rRNA genes of Clostridium paradoxum, heterogeneous IVSs were found (Rainey et al., 1996). It is clear that the occurrence of introns will influence estimates of phylogenetic relatedness. (d) Intra-specific rRNA gene variability. A high level of intra-specific variability of the 16S rRNA gene has been observed in some species (Clayton et al., 1995; Graf, 1999). As mentioned in section 9.1.A.iii, this variability is usually much more limited than for protein-encoding genes, and may sometimes indicate the existence of overlooked species or subspecies. (e) Micro-heterogeneity. Another problem encountered when using rRNA gene sequences for phylogeny and taxonomy is the occurrence of inter-operon sequence variability within an individual genome (also called micro-heterogeneity). In most species, the genome of individual cells contains multiple RNA operons or alleles (Gtirtler et al., 1991; Cole & St. Girons, 1994; Giirtler & Stanisch, 1996; Schmidt, 1998). Apparently, the different alleles of the 16S rRNA gene on the bacterial genome can also show allelic sequence differences, at least in some species, with different degrees of micro-heterogeneity between strains within the species. Interoperon variability has been well-documented for Bacillus (Stewart et al., 1982), Clostridium (Gtirtler et al., 1991; Giirtler & Stanisch, 1996; Rainey et al., 1996),
225
Paenibacillus polymyxa, with 10 variant nucleotide positions in the 16S rRNA genes (Niibel et al., 1996), Bacillus sporothemodurans, with three different copies (Pettersson et al., 1996), and in Haloarcula marismortui (Mylvaganam & Dennis, 1992) and Thermobispora bispora (Wang et al., 1997b), each harbouring two 16S rRNA genes with sequence variation of >5%. Bascunana et al. (1994) showed micro-heterogeneity for the two 16S rRNA operons of Mycoplasma sp. strain F38. Sequence differences between alleles of multi-copy genes (like the rRNA cistron in most bacteria) will result in sequencing ambiguities at these positions when direct sequencing (sequencing starting from amplified rRNA genes without prior cloning) is performed, since the amplification mixture will contain alleles with different sequences. These ambiguities will interfere with the phylogenetic analysis. Micro-heterogeneity can also cause interpretation problems for ARDRA, since it may give rise to additional bands with lower intensity that result from low copy number alleles having specific sequence differences which influence the number of restriction sites. This phenomenon is observable in the HaeIII restriction patterns of the type strain of Paenibacillus polymyxa (Fig. 9.1 a), which has two additional bands - 685 bp and 225 bp long, respectively - compared to other P. polymyxa strains, indicating that an initial 910-bp fragment has an additional HaeIII restriction site in some 16S rRNA alleles of the type strain. Both the variable intensity and presence of these fragments in the HaeIII restriction patterns of different strains (Fig. 9.1 a) indicate that the number of alleles carrying this additional restriction site is strain-dependent, which corresponds well with the results obtained by sequence-dependent separation with TGGE of the 16S rRNA gene amplicon (Niibel et al., 1996). In a similarity calculation based on coding the presence or absence of bands, these additional bands will be assigned equal value as the other major bands, although they only represent a fraction of the 16S rRNA alleles carrying the restriction site difference. A solution could be to ignore the (very) weak additional bands, but then the problem arises of how to define the cut-off level of band intensity for scoring. In spite of these problems, ARDRA is more informative than sequencing in cases of micro-heterogeneity because it actually shows the nature and degree of micro-heterogeneity within a strain or species. (f) A high level of intra-specific rRNA variability. This may also be due to the existence of poorly delineated species which need taxonomic revision. A good example of this is Bacillus circulans, which actually consists of several DNA homology groups (Nakamura & Swezey, 1983), with ARDRA confirming that this 'species' consists of very diverse phylogenetic groups (De Vos et al., 1997). From the above remarks it follows that phylogenetic studies based on ribosomal genes have to be considered with caution, since normally one or only a few strains per species have been sequenced. Again, rapid techniques like ARDRA can allow a larger number of strains per species to be studied, thus providing a better estimate of possible intra-specific heterogeneity.
226
Fig. 9.1. (a) Normalised computer profiles of the ARDRA pattems of some representatives of the genera Paenibacillus (P) and Bacillus (B) obtained with different restriction enzymes. Lane M represents the molecular size marker; (b) Example of the combination by GelCompar of five ARDRA pattems obtained with the enzymes HaelII, DpnlI, RsaI, BfaI and Tru9I for the B. subtilis subsp, subtilis type strain ATCC 6051. The resulting combined pattem shown to the right of the arrow is used for the numerical analysis and clustering (reprinted from Heyndrickx et al., 1996c).
(ii) Problems with the cluster analysis o f restriction patterns Counting genetic events as differences in the number of restriction fragments poses several problems. For phylogenetic purposes, restriction analysis of genes is an inferential technique compared to sequence determination, and some limitations must be taken into account. The most important pitfall with regard to restriction digestion is that the presence or absence of every band will contribute equally to the homology or similarity calculation between any two fingerprints. However, this is not always a correct assessment of the number of genetic transformations that has occurred between the two corresponding taxa, as is illustrated below. A mutation, which introduces an additional restriction site in the 16S rRNA gene of taxon A represents only one genetic event or transformation, but it will cause a difference of three restriction fragments compared with the corresponding fingerprint obtained for taxon B in which this mutation did not occur. This follows from the fact that one fragment present in the fingerprint of taxon B disappears in the fingerprint of taxon A, to be replaced by two additional smaller fragments. In
227
Fig. 9.2. Schematic representation of the influence of the number and relative position of restriction sites on the theoretical ARDRA patterns of taxa A and B, illustrating the influence of the relative physical distance between restriction sites introduced by independent genetic events on the number of different restriction fragments generated between the two taxa. The top middle part of the figure presents the original situation in which there is no difference between the two taxa. A first new restriction site in taxon A, indicated by arrow 1, results in a three fragment difference between taxa A and B (lower part of the figure). A second new restriction site in taxon A results in either a four fragment difference between taxa A and B (left part of the figure) when this happens close to the first restriction site, i.e., on the same fragment indicated by arrow 2, or a six fragment difference between the two taxa (right part of the figure) when this happens further away from the first restriction site, i.e., on another fragment indicated by arrow 3. In each case, the effect is shown on the Dice coefficient D (D =. 2 • nAB/(nA +nB), in which nAB is the number of fragments common to pattern A and B, n Ais the total number of fragments in pattern A, and n Bis the total number of fragments in pattern B.
the similarity calculation, this single e v e n t is c o u n t e d three t i m e s u s i n g the c o d i n g strategy o f p r e s e n c e or a b s e n c e o f bands. This w o u l d not p r e s e n t a p r o b l e m if all i n d e p e n d e n t g e n e t i c events w e r e c o u n t e d in a similar m a n n e r . H o w e v e r , this is not a l w a y s the case (Fig. 9.2). If t w o i n d e p e n d e n t g e n e t i c events or t r a n s f o r m a t i o n s in t a x o n B i n t r o d u c e t w o restriction sites w h i c h are distantly l o c a t e d f r o m e a c h o t h e r in the 16S r R N A g e n e o f t a x o n A, this will result in a d i f f e r e n c e o f six f r a g m e n t s b e t w e e n the t w o c o r r e s p o n d i n g fingerprints, b e c a u s e a s e c o n d larger f r a g m e n t o f t a x o n B is r e p l a c e d b y t w o additional s m a l l e r f r a g m e n t s in t a x o n A. In the similar-
228 ity calculation, this double event is counted six times, which is still in accordance with the similarity calculation based on triple counting of a single event. In contrast, if two independent genetic events introduce two closely located restriction sites in the 16S rRNA gene of taxon A, this will cause a difference of four fragments with the fingerprint of taxon B, because a single large fragment of taxon B is replaced by three smaller fragments in taxon A. In the similarity calculation, this double event is now counted only four times using the coding strategy of presence or absence of bands. Similarity calculation by means of the Dice coefficient for taxa A and B is thus influenced by the physical distance between the different restriction sites on the 16S rRNA gene, as shown in Fig. 9.2. Besides this theoretical consideration, it is possible that the two smaller fragments produced by a genetic event leading to an additional restriction site are of (almost) equal size and thus are not well separated after agarose gel electrophoresis. As a consequence, a single event might be counted only twice. Of course, a combination of these different problems may also occur. Such problems can be solved if the actual restriction sites themselves are counted instead of the restriction fragments. This can be achieved by fluorescent end-labelling during PCR, combined with partial digestion and sequencing grade electrophoresis (T6tsch et al., 1996), with the latter also resulting in better separation of fragments of almost equal length. C.
Practical considerations
(i) Selection of restriction enzymes An important aspect in the experimental set-up of ARDRA is the choice of restriction enzymes. Since the advantage of ARDRA compared to direct sequencing is highly dependent on the speed of the analysis and on the taxonomic resolution and phylogenetic validity of the numerical analysis of the patterns, a minimal number of well-chosen enzymes should be used. Because of the limited length of the amplified region (1500 bp), the choice is restricted to frequently cutting enzymes such as tetrameric restriction enzymes (i.e., with a four-base recognition sequence) which will produce, on average, five to six restriction fragments. It is a matter of dispute whether restriction enzymes with a longer recognition sequence which contains only four defined bases and a certain number of undefined nucleotides (N) can be used. An enzyme such as DdeI (recognition sequence CTNAG) will give identical restriction patterns for the sequences CTAAG, CTTAG, CTCAG and CTGAG, which seems ambiguous. However, it may be argued that a position of N within the recognition sequence has no more taxonomic implications than a position just outside the recognition sequence. The most discriminative enzymes can be selected empirically from a larger collection of enzymes using a known set of clearly different strains (Wilson et al., 1995; Vaneechoutte et al., 1998a), or theoretically by a computer simulation of restriction digests on the 16S rRNA gene sequences available for the species involved (Heyndrickx et al., 1996b; Moyer et al., 1996). Heyndrickx et al. (1996b)
229 successfully selected five restriction enzymes on the basis of a computer simulation (see D.ii, below) of the restriction sites generated by all known tetrameric enzymes (available at the daily updated "Rebase" website at http://rebase.neb.com/ rebase/rebase.html) for the complete 16S rRNA gene sequences of Bacillus sensu lato (available from the Sequence Retrieval System at http://srs.ebi.ac.uk or the Ribosomal Database Project at http://www.cme.msu.edu/RDP~tml/index.html). Software packages which allow this simulation include HIBIO DNASIS software (Hitachi Software Engineering America, Brisbane, CA), GeneCompar (Applied Maths, Kortrijk, Belgium), and the Genetics Computer Group (GCG) sequence analysis software (University of Wisconsin, Madison, WI). In accordance with the findings of Heyndrickx et al. (1996b), a computer simulation study, testing random combinations of tetra-cutter restriction enzymes (Moyer et al., 1996), showed that combinations of three or more tetra-cutter restriction enzymes detected >99% of the 'operational taxonomic units' (OTUs) - defined in the latter study as known bacterial taxa spanning the entire Bacteria domain within a model sequence data set, and that OTUs remaining undetected had a median sequence similarity of 96.1%. Of the 10 restriction enzymes tested, the enzyme combination HhaI-RsaI-BstUI was, overall, the most efficacious at differentiating bacterial 16S rRNA sequences and at predicting correct phylogenetic affiliations, while the enzyme combination MboI-Hinfl-TaqI revealed the greatest percentage of successful affiliations to the Gram-positive phylum and the I]-subdivision of the Proteobacteria. On the other hand, the more precise use of published sequences to search for restriction enzymes that can differentiate between closely related species has been found to be problematic in several cases, since computer-guided digestion of the published sequences often results in erroneous prediction of restriction sites. The explanation that this is caused by the presence of sequencing errors in the databanks could be confirmed in a study of Listeria, in which re-sequencing of the 16S rRNA genes showed that the findings with ARDRA were completely reliable, whereas previously published sequences contained errors (Vaneechoutte et al., 1998a). Bacillus lautus showed ARDRA patterns which did not correspond with the expected pattern according to the deposited sequence (Heyndrickx et al., 1996a). This was also observed for the obligate insect pathogens B. popilliae and B. lentimorbus (M. Heyndrickx and P. De Vos, unpublished results). Therefore, it is sometimes necessary to screen a large number of restriction enzymes in order to establish empirically which enzymes can differentiate between the species under study. Once this is established, a more limited set of restriction enzymes can be used in the future. For example, a total of 22 enzymes had to be screened initially to enable differentiation between the six closely related Listeria spp., but the eventual results indicated that only five specified enzymes were sufficient to discriminate the six species (Vaneechoutte et al., 1998a). -
230
(ii)
Gel electrophoresis
To construct databases of normalised patterns, the use of an agarose and gel concentration with the appropriate resolving power is important. For example, 2% (w/v) Metaphor agarose (FMC B ioproducts, Rockland, ME) or MP agarose (Life Science International, Zellik, Belgium) can be used for the high resolution separation of restriction fragments with a length between 100 and 1000 bp. Secondly, a molecular size marker should be run at regular intervals on the gel for normalisation purposes in the computer analysis. It is important that this marker spans the whole molecular size range expected. Markers can be obtained from commercial sources or a self-made marker can be used, such as an AluI digest of pBR322 supplemented with a pGEM-11Zf 128-bp fragment containing two additional AluI restriction sites (Heyndrickx et al., 1996c). A self-made marker will not change in the future, whereas this cannot be guaranteed for commercial preparations. Pattern analysis software then enables normalisation of the patterns by using the external molecular size markers to compensate for electrophoretic variations within and between gels. Normalisation using a combination of external and internal markers is also possible and will give an even better normalisation result. DNA fragments added to all samples can be used as internal markers. Some automated electrophoresis equipment allows the use of four-colour fluorescent labelling technology, so that an internal standard, labelled differently to the sample, can be added to each lane. Software such as GeneScan Analysis (Perkin Elmer) then normalises the runs automatically with great accuracy.
(iii) Digitisation and computer analysis of the patterns For numerical analysis of ARDRA patterns, correct scoring and combination of the restriction fragments obtained with the different restriction enzymes is a crucial factor. Although this step can be done manually and visually, it is preferable to use appropriate software for this purpose (e.g., GelCompar or BioNumerics; Applied Maths). For computer-assisted analysis of the ARDRA patterns, digitisation of the gel photographs as TIFF files is necessary. As described in more detail in Chapter 3, this can be achieved by taking Polaroid photographs of the gels using conventional cameras and scanning these photographs with flatbed scanners or laser densitometers. Alternatively, the gel image can be captured directly by a digital or a video camera using charge coupled device (CCD) photography. Laser densitometers and video cameras are more expensive than flatbed scanners, but video cameras allow direct image acquisition at a high camera resolution (usually 800 x 600 pixels). The gel images captured as 8-bit (or 16-bit) TIFF files can then be imported into the pattern analysis software. GelCompar also makes it possible to combine the individual patterns obtained with each restriction enzyme in a certain fixed order for each strain, as shown in Fig. 9. lb for the Bacillus subtilis type strain, which was analyzed with the enzyme combination HaeIII-DpnII-RsaI-BfaI-Tru9I. The bands in these combined patterns can be scored automatically (using a band search filter), but automatic band assignations should be checked visually on the digitised patterns and, if possible, on the
231 original gel photographs. This is especially necessary for very dense bands, which may sometimes consist of two restriction fragments, and for the low molecular size bands, which have a lower intensity and which are therefore easily missed by the automatic band search option. It is possible to increase the contrast and to decrease the brightness of the normalised patterns, which often helps to visualise and assign low molecular size bands on the screen. Care also has to be taken that spots on the gel or in the background staining are not interpreted by the software as genuine bands. This may introduce artefactual bands in the digitised gels, and this phenomenon has been suggested by Vaneechoutte et al. (1999b) as a possible explanation for the aberrant results reported by Koeleman et al. (1998). In general, it is advisable not to score bands smaller than 50 bp in order to avoid primer and primerdimer band interference. If the database is calibrated for the molecular size in base pairs, the sum of all scored bands can be calculated and should match the length of the amplified DNA fragment. Smaller or higher sums may be explained by (i) micro-heterogeneity, resulting in additional weaker bands (see section 9.5.B.i.); (ii) problems in pattern generation caused by star activity of the restriction enzymes, i.e., the capability of restriction enzymes to cleave sequences which are similar but not identical to their defined recognition sequence; or (iii) inadequacies in the band assignments (e.g., missed scoring of weak bands or scoring double bands once only).
(iv) Calculation of similarity coefficients and clustering It is recommended to use the Dice coefficient for the estimation of genetic divergence between organisms on the basis of restriction fragment patterns (Nei & Li, 1979). When the Dice coefficient is used for calculating the similarity coefficients between each pair of combined patterns, the band position tolerance is an important parameter. This parameter defines the tolerance limits within which the software will consider bands in different lanes to be identical. For example, a band position tolerance of 0.2% on a total resolution of 2500 points in a combined digitised pattern (resulting from the combination of five digitised patterns with a resolution of 500 points each) means that bands from different lanes which deviate by not more than five points are regarded as identical. The value assigned to this parameter influences the similarity matrix calculation and consequently the clustering result, which is normally expressed as a dendrogram by means of the UPGMA clustering algorithm from the similarity matrix. It is advisable to perform a final quality control by checking that visually similar patterns actually cluster in the dendrogram at a high similarity level (minimal 90% with the Dice coefficient). If this is not the case, this usually indicates faulty band assignments and/or an inappropriate setting of the band position tolerance. Some important settings in GelCompar, which are relevant for numerical analysis of ARDRA patterns, are summarised in Table 9.3.
232
Table 9.3. Some suggested settings in GelCompar for numerical analysis of ARDRA patterns run on agarose gels Conversion module Normalisation module Analysis module
Parametera
Settings
Track resolutionb Normalised gel resolutionb Background subtraction Comparison coefficient Position tolerance
400-500 400-500 rolling disk, intensity 14 Dice (band-based comparison) 0.2%, with optimisation 'on' (0.5%)
Other parameters in GelCompar will not influence the clustering of the ARDRA patterns. bBased on a gel running distance of 10 cm in a 2% w/v agarose gel, which is sufficient for ARDRA patterns. a
D.
Application of ARDRA for phylogenetic and taxonomic research
(i)
Use as a rapid taxonomic classification tool
The most logical use of ARDRA is as a rapid taxonomic screening method to classify a large set of strains into OTUs. This screening serves to select one or a few representatives from each ARDRA OTU for polyphasic taxonomy, which might include SDS-PAGE of cellular proteins, fatty acid analysis, the determination of DNA-DNA homology, and sequence determination of the 16S rRNA gene. Related to this application is the confirmation of observed (phenotypic, ecological) similarities between different strains before starting polyphasic studies. For example, it was shown that the ARDRA patterns of oil-degrading Acinetobacter strains, isolated independently from three different marine environments (Reisfeld et al., 1972; Yamamoto & Harayama, 1996; Di Cello et al., 1997) were all identical and different from other Acinetobacter spp. Once this was established, the species identity of these strains could be confirmed by a polyphasic approach (Vaneechoutte et al., 1999a). Similarly, the synonymy of the hyphomycete Scytalidium hyalinum and the coelomycete Nattrassia mangiferae was first established by ARDRA and later confirmed by chromatographic techniques. This finding explained why both 'species' are regularly found in the same patient (Roeijmans et al., 1997).
(ii) Application in the clarification of the phylogeny and taxonomy of the genus Bacillus sensu lato ARDRA has been used extensively for phylogenetic and taxonomic analysis of the genus Bacillus sensu lato. The synonymy of the species Paenibacillus gordonae and P. validus, and of P. pulvifaciens and P. larvae (the former species being the later subjective synonym in both cases), was first indicated by A R D R A and confirmed in a further polyphasic approach (Heyndrickx et al., 1995; 1996a). Fig. 9.3 shows a UPGMA clustering, based on numerical analysis using the Dice coefficient, of a combination of five ARDRA patterns obtained with the enzyme combination HaeIII-DpnII-RsaI-BfaI-Tru9I. Several species of the genus Bacillus and the related genera Amphibacillus, Aneurinibacillus, Brevibacillus, Halobacillus,
233 30
40
50
60
70
80
90
100
d
I
',
| [
......
r
..........
H,~,~ll.. ~ . p ~
"
. . . . . . .
~- Halobacillus Igoratis Amphibacillus xylanus . Bacillus dip~osauri Marinococcus albus
j r
~--B~uusm~tocnenncus LMG12~8 / . ,Bac~Vuspantothen~us LMG 173s7| ,..l~,alu,~-~, U~G12~/ ! rl .B,~n,,~,~h..~, LUG 173451 Li ~ " ' ~ " ~ " ' ~.G 17343/ ~ s,~llu, panto,'~,~s LMG17344/ ,, 'B~us,o=,.'o~e,.~cu= LMG 173421
r--!
I
I I [
J
it
.
r ' l
["-]
[ !
I
.
.
......
~
I ' ' ' - B'~s =''~
.
~
! '
I
,
r " Bac///usmegaterlum
"
! !_
i
] i
i
Virgibacitlus
U.G lZ526T
LMG 12359 LMG t6798T LMG 17757T LMG 958tT LMG 15444
LMG 71271"
L_ Bacillus rne~aterium LMG 12409 B~ittus badius LMG 7122T Bacillus badius LMG 12332 Bacillus subti/is LMG 17727 Bacillus subtitis LMG 7135T f..-Becitks= amyfotiquefaciensLMG 98t4T I-.-BaciOu~ amyfoliquefaciens LMG 12234 lBacitlus llOmniformi$ LMG 12360 1Bacillus licheniformia LMG 12363T Bacillus pumitus LMG 7132T iBacillus sphae,qcus LMG 7134T i
[
1
Bacillus/entus
- - - - l . . . Bacillus/entus Bacillus insolitus ~ Bacillus azotoforrnan8 .... Bacillus azotoforrnans
,
! j ! !
Halobaditus
Bacillus pantothenticu8 LMG 7129T-,I .........Mw~nococcus ha/ophilus LMG 17439T I Bac//lusc/rculans LMG 12342" =Bacillus c/rculans LMG 13261T Bacillus flrmus LMG 7125T Bacillus sm/th/i LMG 6327
"
:
LMG 174327 LMG 17431T I LMG 17435/ LMGLM17437T 117436LI G
LMG 17667T LMG 17413T LMG 17430T - " - " Bac//luspantothent/cus LMG 12370q Bac~/uspantothent/cus LMG 17369| e . . ~ B a c i l t u s pantothent/cul LMG 12367| I'~ "Bacllluspent~henficus LMG 12369/
,~ t
....
I
Halobacillus hatophilus "---Halobacillushalophilus Hetobacillushatophilus
9
IB~Hus tu=/tw,.i= ~Bac///ussphaer/cus Bac/llus fus/form/s
~Bacillus psychrophilus Bacillus psychrophilus r - " Sporosarcina ureae I-.. Sporosarcina utaee B~millus cereus
Bacillus rRNA
groups 1 and 2
LMG~816T i
LMG 17382 LMG 17347 i LMG 6929T j LMG 17169 ]
LMG 17363 1
LMG 17366T i LMG 6923T i [ Bac/#usceres LMG 12334-~ . . . . . . . . . . . . . Paenibacillus azotoflxansLMG 14656t'= | Paenibacitlus azotofixans LMG 14659 P~ibacitlus macerans LMG 13281T r - . - . . Paenibaciltus maceran= LMG 13283 r.'-Peenibacilluslautua LMG 11157T "--- Paenibac#lus lautus LMG 14015
i
|
t
[
Paenibacittus vaiidus Paenibacillus validus
[.
'P~b~lu~Xx~
Paenibactttus po/ymyxa [ " Paenib~,illu$ larvae ~-- Paenibacitlus larvae Paenibacillus alvei Paenibacigtus atvei ~.,jBrevibacillus agt/
i
r"i ' ~ ' ~ " = ' ~ " ~'~
"--" Brevibacillus brevis " - - - ' - - Brevibaciltus breuis
I r"]
I
.....
!
i
! [
I
]
LMG 11161T
LMG 14018
LMGt32~4T
LMG 13296 I.biG 9820T LMG 14425 LMG 13253T LMG 13254-, LMG 15t031"1
~.u~ 1 5 . 2 /
LMG t6703T ! LMG 17054/
~ s LMG 160t01"i s,=,=,~ll, s~o~,~=,u~ 155~!
~----Brevibac4#us
I
1
,,
I ....
..........
'
Br~b~eus b o ~ e / ~ s LMG lSOO~TI
..
l '" "
r'"
~,lu.~.~.,.
Paenibacillus
Brevibacillus
LMG15S71TI
B~vU~c~/tus~ r e ~ s t.MG 154281 Brev/bac/lluslatero~porusLMG16000T !
Brevibac#lus laterosporusLMG 15436.,J Asleurinib. aneurini/yticus LMG 155337
~-- ~n.ib,
~tyn~,
CMG 15531T.J Aneurinibaciltus
" 'P~eudomonas fluore~.ensLMG 1799
Fig. 9.3. Dendrogram based on the UPGMA clustering of the Dice similarity coefficients of normalised combined ARDRA patterns of several representatives of the allied genera Aneurinibacillus, Bacillus (rRNA groups 1 and 2), Brevibacillus, Halobacillus, Paenibacillus, Marinococcus, Sporosarcina ureae and Virgibacillus. Pseudomonas fluorescens was used as the outgroup. The restriction enzyme combination used was HaeIII-DpnII-RsaI-BfaI-Tru9I. T indicates the position of a type strain (reprinted from Heyndrickx et al., 1998).
234 Paenibacillus and Virgibacillus, as well as Sporosarcina ureae and Marinococcus spp., are included. The dendrogram enables large groups to be distinguished at the 50-60% similarity level, and these groups correspond perfectly with genera which were all recently split off from the main genus Bacillus. The genus Virgibacillus was split off from Bacillus on the basis of ARDRA and phenotypic data (Heyndrickx et al., 1998), and this transfer was recently supported by additional data (Wainr et al., 1999). Several inter-specific phylogenetic relationships within these large ARDRA groups, as well as inter-group phylogenetic relationships, are in accordance with comparative 16S rRNA gene sequence analysis; i.e., the very close relationship between B. subtilis, B. amyloliquefaciens, B. licheniformis and B. pumilus, the close relationship between B. circulans, B. firmus and B. lentus in the Bacillus rRNA group 1 (Ash et al., 1991), and the close relationship of Sporosarcina ureae with B. psychrophilus, a member of the Bacillus rRNA group 2 (Ash et al., 1991). Also, the transfer of B. lautus from Bacillus rRNA group 1 (Ash et al., 1991) to the genus Paenibacillus, which was decided on the basis of ARDRA and phenotypic data (Heyndrickx et al., 1996b), was subsequently confirmed by sequencing the type strain (Shida et al., 1997) as the previously deposited 16S rRNA sequence for this species (Ash et al., 1991) turned out to be erroneous. Furthermore, the more remote relationship of Brevibacillus laterosporus with other members of the genus Brevibacillus (Shida et al., 1996), the specific relationship between the genera Aneurinibacillus and Brevibacillus (Shida et al., 1996), the specific relationship between the genera Halobacillus and Virgibacillus, and the species Marinococcus albus and Bacillus dipsosauri, now reclassified as Gracilibacillus dipsosauri (Wainr et al., 1999), are all convincingly represented in Fig. 9.3 based on ARDRA data. Conversely, some apparent phylogenetic positions or relationships indicated by ARDRA (Fig. 9.3) are not supported by comparative 16S rRNA sequence analysis. For example, ARDRA indicates a close relationship between B. lentus and B. smithii in the Bacillus rRNA group 1, which is not supported by the data of Ash et al. (1991), and B. insolitus is placed amongst Bacillus rRNA group 1 species in the ARDRA dendrogram, although it has been shown to be a member of the Bacillus rRNA group 2 (Ash et al., 1991). In general, the separation between Bacillus rRNA groups 1 and 2 on the basis of 16S rRNA sequence comparison (Ash et al., 1991) is not evident in the ARDRA dendrogram. This can be explained by the smaller phylogenetic distance between these rRNA groups compared to the distance between other Bacillus rRNA groups which now represent allied genera, such as Brevibacillus (rRNA group 4) and Paenibacillus (rRNA group 3) (Ash et al., 1991). A possible explanation for the less appropriate reflection of intermediary phylogenetic relationships indicated by ARDRA is given in the following section.
235 E.
The use of ARDRA in phylogenetic studies: conclusions
The above comparison, between phylogenetic clustering based on the scoring of bands in ARDRA patterns and that based on comparative 16S rRNA gene sequence analysis, gives a good indication of the validity of ARDRA for phylogenetic studies because it is applied to a bacterial lineage (Bacillus) which spans a wide phylogenetic spectrum. This is apparent from the large % G+C heterogeneity of Bacillus sensu lato and from the recent transfer of Bacillus rRNA groups to new allied genera. Studies on the Gram-negative Alcaligenes-Bordetella lineage (Vandamme et al., 1996; Brim et al., 1999) yielded comparable results. Despite some pitfalls introduced by numerical analysis of banding patterns (see earlier), it seems that ARDRA can be used as a rapid technique to study phylogenetic relationships between closely related species, and to study phylogenetic relationships which correspond with the genus level or with rRNA groups within a phylogenetically diverse genus. On the other hand, phylogenetic relationships situated between these two boundary levels are often not reflected appropriately by ARDRA. This follows, firstly, from the fact that phylogenetic analysis by means of A R D R A - using five tetra-cutter restriction enzymes - gives only a partial (about 10%) sequence analysis compared to sequence determination, and secondly, from the influence of the physical distance between individual restriction sites on the number of fragments generated and hence on the calculated Dice coefficient (see 9.5.B.ii and Fig. 9.2). In the case of closely related organisms which show only small sequence divergence, a first mutation, detected by ARDRA as the gain or loss of a restriction site by one or several of the five restriction enzymes used, has a large impact on the Dice coefficient (see restriction site 1 in Fig. 9.2), and is thus convincingly represented in the dendrogram. On the other hand, any additional mutation, which has occurred between somewhat more distally related microorganisms (i.e., at the intra-generic level) may not always be equally represented in the numerical analysis (compare Dice coefficients generated by restriction sites 1 + 2 and 1 + 3, respectively, in Fig. 9.2). On the more remote phylogenetic level (i.e., from the inter-genetic level onwards), this phenomenon is probably compensated by the more pronounced sequence divergence that is detected as multiple restriction site differences. 9.6
OVERALL CONCLUSIONS
Numerous studies indicate that ARDRA can be applied at the present time to differentiate between bacterial and eukaryotic species, and that it can be used as a tool in the study of complex microbial communities as a rapid classification method preceding more profound taxonomic studies, and also as a preliminary phylogenetic tool. The continued use and future of PCR-RFLP and similar approaches (e.g., PCR-SSCP analysis) which can be used as a short cut to sequencing, depends on whether new sequencing technologies, such as DNA sequencing arrays, become widely and cheaply available. Indeed, rapid, simple and cheap sequencing technol-
236
ogy, if it b e c o m e s available, w o u l d obviate the n e e d for techniques like A R D R A .
REFERENCES Andrighetto, C., De Dea, P., Lombardi, A., Neviani, E., Rossetti, L. & Giraffa, G. (1998). Molecular identification and cluster analysis of homofermentative thermophilic lactobacilli isolated from dairy products. Research in Microbiology 149, 631-643. Ash, C., Farrow, J.A.E., Wallbanks, S. & Collins, M.D. (1991). Phylogenetic heterogeneity of the genus Bacillus revealed by comparative analysis of small-subunit-ribosomal RNA sequences. Letters in Applied Microbiology 13, 202-206. Avaniss-Aghajani, E., Jones, K., Chapman, D. & Brunk, C. (1994). A molecular technique for identification of bacteria using small subunit ribosomal RNA sequences. BioTechniques 17, 144-149. Avaniss-Aghajani, E., Jones, K., Holtzman, A., Aronson, T., Glover, N., Boian, M., Froman, S. & Brunk, C.F. (1996). Molecular technique for rapid identification of Mycobacteria. Journal of Clinical Microbiology 34, 98-102. Barbiero, C. & Fani, R. (1998). Biodiversity of an Acinetobacter population isolated from activated sludge. Research in Microbiology 149, 665-673. Bascunana, C.R., Mattsson, J.G., B61ske, G. & Johansson, K.-E. (1994). Characterization of the 16S rRNA genes from Mycoplasma sp. strain F38 and development of an identification system based on PCR. Journal of Bacteriology 176, 2577-2586. Becker, P.M., Wand, H., Martius, G.G.S., Weissbrodt, E. & Stottmeister, U. (1998). Functional and structural successions in arbitrary samples of heterotrophic bacteria during aerobic treatments of lignite-carbonization wastewater in in situ enclosures. Canadian Journal of Microbiology 44, 211-220. Bernards, A.T., de Beaufort, A J., Dijkshoorn, L. & van Boven, C.P. (1997). Outbreak of septicaemia in neonates caused by Acinetobacterjunii investigated by amplified ribosomal DNA restriction analysis (ARDRA) and four typing methods. Journal of Hospital Infection 35, 129-140. Boudry, P., Heurtebise, S., Collet, B., Cornette, F. & Gerard, A. (1998). Differentiation between populations of the Portuguese oyster, Crassostrea angulata (Lamark) and the Pacific oyster, Crassostrea gigas (Thunberg), revealed by mtDNA RFLP analysis. Journal of Experimental Marine Biology and Ecology 226, 279-291. Brim, H., Heyndrickx, M., De Vos, P., Wilmotte, A., Springael, D., Schlegel, H.G. & Mergeay, M. (1999). Amplified rDNA restriction analysis and further genotypic characterisation of metalresistant soil bacteria and related facultative hydrogenotrophs. Systematic and Applied Microbiology 22, 258-268. Brow, M.A.D., Oldenburg, M.C., Lyamichev, V., Heisler, L.M., Lyamicheva, N., Hall, J.G., Eagan, N.J., Olive, D.M., Smith, L.M., Fors, L. & Dahlberg, J.E. (1996). Differentiation of bacterial 16S rRNA genes and intergenic regions and Mycobacterium tuberculosis katG genes by structure-specific endonuclease cleavage. Journal of Clinical Microbiology 34, 3129-3137. Burggraf, S., Larsen, N., Woese, C.R. & Stetter, K.O. (1993). An intron within the 16S ribosomal RNA gene of the archaeon Pyrobaculum aerophilum. Proceedings of the National Academy of Sciences of the United States of America 90, 2547-2550. Cardarelli-Leite, P., Blom, K., Patton, C.M., Nicholson, M.A., Steigerwalt, A.G., Hunter, S.B., Brenner, D.J., Barrett, T.J. & Swaminathan, B. (1996). Rapid identification of Campylobacter species by restriction fragment length polymorphism analysis of a PCR-amplified fragment of the gene coding for 16S rRNA. Journal of Clinical Microbiology 34, 62-67. Carlotti, A. & Funke, G. (1994). Rapid distinction of Brevibacterium species by restriction analysis of rDNA generated by polymerase chain reaction. Systematic and Applied Microbiology 17, 380-386. Chandler, D.P., Brockman, F.J., Bailey, T.J. & Fredrickson, J.K. (1998). Phylogenetic diversity of
237 archaea and bacteria in a deep subsurface paleosol. Microbial Ecology 36, 37-50. Chu, Y.W., Leung, C.M., Houang, E.T.S., Ng, K.C., Leung, C.B., Lueng, H.Y. & Cheng, A.EB. (1999). Skin carriage of acinetobacters in Hong Kong. Journal of Clinical Microbiology 37, 2962-2967. Chung, D.I., Yu, H.S., Hwang, M.Y., Kim, T.H., Kim, T.O., Yun, H.C. & Kong, H.H. (1998). Subgenus classification of Acanthamoeba by riboprinting. Korean Journal of Parasitology 36, 69-80. Claeys, G., Vanhouteghem, H., Riegel, E, Wauters, G., Hamerlynck, R. Dierick, J., De Witte, J., Verschraegen, G. & Vaneechoutte, M. (1996). Endocarditis of native aortic and mitral valves due to Corynebacterium accolens: report of a case and application of phenotypic and genotypic techniques for identification. Journal of Clinical Microbiology 34, 1290-1292. Clark, C.G. (1993). PCR detection of pathogenic Entamoeba histolytica and differentiation from other intestinal protozoa by riboprinting. In Diagnostic molecular microbiology. Principles and applications, Persing, D.H., Smith, T.E, Tenover, EC. & White, T.J., eds., pp. 468-474. ASM Press, Washington, D.C. Clark, C.G. (1997). Riboprinting: a tool for the study of genetic diversity in microorganisms. Journal of Eukaryotic Microbiology 44, 277-283. Clark, C.G. & Diamond, L.S. (1997). Intraspecific variation and phylogenetic relationships in the genus Entamoeba as revealed by riboprinting. Journal of Eukaryote Microbiology 44, 142-154. Clark, C.G. & Pung, O.J. (1994). Host specificity of ribosomal DNA variation in sylvatic Trypanosoma cruzi from North America. Molecular and Biochemical Parasitology 66, 175-179. Clayton, R.A., Sutton, G., Hinkle, ES., Bult, C. & Fields, C. (1995). Intraspecific variation in smallsubunit rRNA sequences in GenBank: why single sequences may not adequately represent prokaryotic taxa. International Journal of Systematic Bacteriology 45, 595-599. Cole, S.T. & Saint Girons, I. (1994). Bacterial genomics. FEMS Microbiological Reviews 14, 139-160. Conville, ES., Fischer, S.H., Cartwright, C.E & Witebsky, EG. (2000). Identification of Nocardia species by restriction endonuclease analysis of an amplified portion of the 16S rRNA gene. Journal of Clinical Microbiology 38, 158-164. Dauga, C., Miras, I. & Grimont, EA. (1997). Strategy for detection and identification of bacteria based on 16S rRNA genes in suspected cases of Whipple's disease. Journal of Medical Microbiology 46, 340-347. Davis, R.E., Jomantiene, R., Dally, E.L. & Wolf, T.K. (1998). Phytoplasmas associated with grapevine yellows in Virginia belong to group 16SrI, subgroup A (tomato big bud phytoplasma subgroup), and group 16SrIII, new subgroup I. Vitis 37, 131-137. Deng, S., Hiruki, C., Robertson, J.A. & Stemke, G.W. (1992). Detection by PCR and differentiation by restriction fragment length polymorphism of Acholeplasma, Spiroplasma, Mycoplasma, and Ureaplasma, based upon 16S rRNA genes. PCR Methods and Applications 1, 202-204. De Vos, E, Lebbe, L., Heyndrickx, M., Meert, E & Kersters, K. (1997). Phylogenetic localisation of Bacillus circulans strains. In Abstracts of the Belgian Society for Microbiology symposium on evolution and gene transfer in microorganisms, p. 44. Leuven, Belgium. Di Cello, E, Pepi, M., Baldi, E & Fani, R. (1997), Molecular characterization of an n-alkanedegrading bacterial community and identification of a new species, Acinetobacter venetianus. Research in Microbiology 148, 237-249. Dijkshoorn, L., van Harsselaar, B., Tjernberg, I., Bouvet, ELM. & Vaneechoutte, M. (1998). Evaluation of amplified ribosomal DNA restriction analysis for identification of Acinetobacter genomic species. Systematic and Applied Microbiology 21, 33-39. DiRuggiero, J., Tuttle, J.H. & Robb, ET. (1995). Rapid differentiation of hyperthermophilic Archaea by restriction mapping of the intergenic spacer regions of the ribosomal RNA operons. Molecular Marine Biology and Biotechnology 4, 123-127. Dobner, E, Feldmann, K., Rifai, M., Loscher, T. & Rinder, H. (1996). Rapid identification of mycobacterial species by PCR amplification of hypervariable 16S rRNA gene promotor region. Jour-
238
nal of Clinical Microbiology 34, 866-869. Dojka, M.A., Hugenholtz, E, Haack, S.K. & Pace, N.R. (1998). Microbial diversity in a hydrocarbonand chlorinated-solvent-contaminated aquifer undergoing intrinsic bioremediation. Applied and Environmental Microbiology 64, 3869-3877. Dolzani, L., Tonin, E., Lagatolla, C., Prandin, L. & Monti-Bragadin, C. (1995). Identification of Acinetobacter isolates in the A. calcoaceticus-A, baumannii complex by restriction analysis of the 16S-23S rRNA intergenic spacer sequences. Journal of Clinical Microbiology 33, 1108-1113. Fan, H.H., Kleven, S.H., Jackwood, M.W., Johansson, K.E., Pettersson, B. & Levisohn, S. (1995). Species identification of avian mycoplasmas by polymerase chain reaction and restriction fragment length polymorphism analysis. Avian Diseases 39, 398-407. Fedorko, D.E, Nelson, N.A. & Cartwright, C.E (1995). Identification of microsporidia in stool specimens by using PCR and restriction endonucleases. Journal of Clinical Microbiology 33, 1739-1741. Fischer-Le Saux, M., Mauleon, H., Constant, E, Brunel, B. & Boemaere, N. (1998). PCR-ribotyping of Xenorhabdus and Photorhabdus isolates from the Caribbean region in relation to the taxonomy and geographic distribution of their nematode hosts. Applied and Environmental Microbiology 64, 4246-4254. Fox, G.E., Wisotzkey, J.D. & Jurtshuk, E (1992). How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity. International Journal of Systematic Bacteriology 42, 166-170. Frontz, T.M., Davis, D.D., Bunyard, B.A. & Royse, D.J. (1998). Identification of Armillaria species isolated from bigtooth aspen based on rDNA RFLP analysis. Canadian Journal of Forest Research 28, 141-149. Fulthorpe, R.R., Rhodes, A.N. & Tiedje, J.M. (1998). High levels of endemicity of 3-chlorobenzoatedegrading soil bacteria. Applied and Environmental Microbiology 64, 1620-1627. Garcia-Martinez, J., Martinez-Murcia, A.J., Rodriguez-Valera, E & Zorraquino, A. (1996). Molecular evidence supporting the existence of two major groups in uropathogenic Escherichia coli. FEMS Immunology Medical Microbiology 14, 231-244. Giraffa, G., De Vecchi, E & Rossetti, L. (1998). Identification of Lactobacillus delbrueckii subspecies bulgaricus and subspecies lactis dairy isolates by amplified rDNA restriction analysis. Journal of Applied Microbiology 85, 918-924. Graf, J. (1999). Diverse restriction fragment length polymorphism patterns of the PCR-amplified 16S rRNA genes in Aeromonas veronii strains and possible misidentification of Aeromonas species. Journal of Clinical Microbiology 37, 3194-3197. Gray, J.E & Herwig, R.E (1996). Phylogenetic analysis of the bacterial communities in marine sediments. Applied and Environmental Microbiology 62, 4049-4059. Greisen, K., Loeffelholz, M., Purohit, A. & Leong, D. (1994). PCR primers and probes for the 16S rRNA gene of most species of pathogenic bacteria, including bacteria found in cerebrospinal fluid. Journal of Clinical Microbiology 32, 335-351. Grifoni, A., Bazzicalupo, M., Di Serio, C., Fancelli, S. & Fani, R. (1995). Identification of Azospirillum strains by restriction fragment length polymorphism of the 16S rDNA and of the histidine operon. FEMS Microbiology Letters 127, 85-91. Grimont, E & Grimont, EA.D. (1986). Ribosomal ribonucleic acid gene restriction patterns as potential taxonomic tools. Annales de l'Institut Pasteur/Microbiologie 137B, 165-175. Guillamon, J.M., Sabat6, J., Barrio, E., Cano, J. & Querol, A. (1998). Rapid identification of wine yeast species based on RFLP analysis of the ribosomal internal transcribed spacer (ITS) region. Archives of Microbiology 169, 387-392. Gundersen, D.E., Lee, I.M., Schaff, D.A., Harrison, N.A., Chang, C.J., Davis, R.E. & Kingsbury, D.T. (1996). Genomic diversity and differentiation among phytoplasma strains in 16S rRNA groups I (aster yellows and related phytoplasmas) and III (X-disease and related phytoplasmas). International Journal of Systematic Bacteriology 46, 64-75. Gtirtler, V. (1999). The role of recombination and mutation in 16S-23S rDNA spacer rearrangements.
239 Gene 238, 241-252. Gtirtler, V. & Stanisch, V.A. (1996). New approaches to typing and identification of bacteria using the 16S-23S rDNA spacer region. Microbiology 142, 3-16. Gtirtler, V., Wilson, V.A. & Mayall, B.C. (1991). Classification of medically important clostridia using restriction endonuclease site differences of PCR-amplified 16S rDNA. Journal of General Microbiology 137, 2673-2679. Haddad, A., Camacho, E, Durand, P. & Cary, S.C. (1995). Phylogenic characterization of the epibiotic bacteria associated with the hydrothermal vent polychaete Alvinella pompejana. Applied and Environmental Microbiology 61, 1679-1687. Han, S.O. & New, P.B. (1998). Variation in nitrogen fixing ability among natural isolates of Azospirillum. Microbial Ecology 36, 193-201. Haraldsson, G. & Holbrook, W.P. (1998). A hemagglutinating variant of Prevotella melaninogenica isolated from the oral cavity. Oral Microbiology and Immunology 13, 362-367. Harasawa, R., Mizusawa, H., Nozawa, K., Nakagawa, T., Asada, K. & Kato, I. (1993). Detection and tentative identification of dominant Mycoplasma species in cell cultures by restriction analysis of the 16S-23S rRNA intergenic spacer regions. Research in Microbiology 144, 489-493. Henrion, B., Le Tacon, E & Martin, E (1992). Rapid identification of genetic variation of ectomycorrhizal fungi by amplification of ribosomal RNA genes. New Phytologist 122, 289-298. Heyndrickx, M., Vandemeulebroecke, K., Scheldeman, P., Hoste, B., Kersters, K., De Vos, P., Logan, N.A., Aziz, A.M., Ali, N. & Berkeley, R.C.W. (1995). Paenibacillus (formerly Bacillus) gordonae (Pichinoty et al. 1986) Ash et al. 1994 is a later subjective synonym of Paenibacillus (formerly Bacillus) validus (Nakamura 1984) Ash et al. 1994: emended description of P validus. International Journal of Systematic Bacteriology 45, 661-669. Heyndrickx, M., Vandemeulebroecke, K., Hoste, B., Janssen, P., Kersters, K., De Vos, P., Logan, N.A., Ali, N. & Berkeley, R.C.W. (1996a). Reclassification of Paenibacillus (formerly Bacillus) pulvifaciens (Nakamura 1984) Ash et al. 1994, a later subjective synonym of Paenibacillus (formerly Bacillus) larvae (White 1906) Ash et al. 1994, as a subspecies of P larvae, with emended descriptions of P. larvae as P. larvae subsp, larvae and P. larvae subsp, pulvifaciens. International Journal of Systematic Bacteriology 46, 270-279. Heyndrickx, M., Vandemeulebroecke, K., Scheldeman, P., Kersters, K., De Vos, P., Logan, N.A., Aziz, A.M., Ali, N. & Berkeley, R.C.W. (1996b). A polyphasic reassessment of the genus Paenibacillus, reclassification of Bacillus lautus (Nakamura 1984) as Paenibacillus lautus comb. nov. and of Bacillus peoriae (Montefusco et al. 1993) as Paenibacillus peoriae comb. nov., and emended descriptions of P. lautus and of P peoriae. International Journal of Systematic Bacteriology 46, 988-1003. Heyndrickx, M., Vauterin, L., Vandamme, P., Kersters, K. & De Vos, P. (1996c). Applicability of combined amplified ribosomal DNA restriction analysis (ARDRA) patterns in bacterial phylogeny and taxonomy. Journal of Microbiological Methods. 26, 247-259. Heyndrickx, M., Lebbe, L., Vancanneyt, M., Kersters, K., De Vos, P., Logan, N.A., Forsyth, G., Nazli, S., Ali, A. & Berkeley, R.C. (1997). A polyphasic reassessment of the genus Aneurinibacillus, reclassification of Bacillus thermoaerophilus (Meier-Stauffer et al., 1996) as Aneurinibacillus thermoaerophilus comb. nov., and emended descriptions of A. aneurinilyticus corrig., A. migulanus, and A. thermoaerophilus. International Journal of Systematic Bacteriology 47, 808-817. Heyndrickx, M., Lebbe, L., Kersters, K., De Vos, P., Forsyth, G. & Logan, N.A. (1998). Virgibacillus: a new genus to accommodate Bacillus pantothenticus (Proom and Knight 1950). Emended description of Virgibacillus pantothenticus. International Journal of Systematic Bacteriology 48, 99-106. Hugenholtz, P., Pitulle, C., Hershberger, K.L. & Pace, N.R. (1998). Novel division level bacterial diversity in a Yellowstone hot spring. Journal of Bacteriology 180, 366-376. Hughes, M.S., Skuce, R.A., Beck, L.-A. & Neill, S.D. (1993). Identification of mycobacteria from animals by restriction enzyme analysis and direct DNA cycle sequencing of polymerase chain reaction-amplified 16S rRNA gene sequences. Journal of Clinical Microbiology 31,
240 3216-3222. Ibrahim, A. (1997). Amplification and restriction endonuclease digestion of a large fragment of genes coding for rRNA as a rapid method for discrimination of closely related pathogenic bacteriaReply. Journal of Clinical Microbiology 35, 1646-1647. Ibrahim, A., Gerner-Smidt, P. & Sjrstedt, A. (1996). Amplification and restriction endonuclease digestion of a large fragment of genes coding for rRNA as a rapid method for discrimination of closely related pathogenic bacteria. Journal of Clinical Microbiology 34, 2894-2896. Ibrahim, A., Gerner-Smidt, P. & Liesack, W. (1997). Phylogenetic relationship of the twenty-one DNA groups of the genus Acinetobacter as revealed by 16S ribosomal DNA sequence analysis. International Journal of Systematic Bacteriology 47, 837-841. Ingianni, A., Petruzzelli, S., Morandotti, G. & Pompei, R. (1997). Genotypic differentiation of Gardnerella vaginalis by amplified ribosomal DNA restriction analysis (ARDRA). FEMS Immunology Medical Microbiology 18, 61-66. Itoh, T., Suzuki, K. & Nakase, T. (1998). Occurrence of introns in the 16S rRNA genes of members of the genus Thermoproteus. Archives of Microbiology 170, 155-161. Jawad, A., Snelling, A.M., Heritage, J. & Hawkey, P.M. (1998). Comparison of ARDRA and recARFLP analysis for genomic species identification of Acinetobacter spp. FEMS Microbiology Letters 165, 357-362. Jayarao, B.M., Dor6, J.J. & Oliver S.P. (1992). Restriction fragment length polymorphism analysis of 16S ribosomal DNA of Streptococcus and Enterococcus species of bovine origin. Journal of Clinical Microbiology 30, 2235-2240. Keel, C., Weller, D.M., Natsch, A., Defago, G., Cook, R.J. & Thomashow, L.S. (1996). Conservation of the 2,4-diacetylphloroglucinol biosynthesis locus among fluorescent pseudomonas strains from diverse geographic locations. Applied and Environmental Microbiology 62, 552-563. Khbaya, B., Neyra, M., Normand, P., Zerhari, K. & Filali-Matouf, A. (1998). Genetic diversity and phylogeny of rhizobia that nodulate Acacia spp. in Morocco assessed by analysis of rRNA genes. Applied and Environmental Microbiology 64, 4912-4917. Koeleman, J.G.M., Stoof, J., Biesmans, D.J., Savelkoul, P.H.M. & Vandenbroucke-Grauls, C.M.J.E. (1998). Comparison of amplified rDNA restriction analysis, random amplified polymorphic DNA analysis, and amplified fragment length polymorphism fingerprinting for identification of Acinetobacter genomic species and typing of Acinetobacter baumannii. Journal of Clinical Microbiology 36, 2522-2529. Kostman, J.R., Edlind, T.D., Lipuma, J.J. & Stull, T.L. (1992). Molecular epidemiology of Pseudomonas cepacia determined by polymerase chain reaction ribotyping. Journal of Clinical Microbiology 30, 2084-2087. Laguerre, G., Allard, M.-R., Revoy, F. & Amarger, N. (1994a). Rapid identification of rhizobia by restriction fragment length polymorphism analysis of PCR-amplified 16S rRNA genes. Applied and Environmental Microbiology 60, 56-63. Laguerre, G., Rigottier-Gois, L. & Lemanceau, P. (1994b). Fluorescent Pseudomonas species categorized by using polymerase chain reaction (PCR)/restriction fragment analysis of 16S rDNA. Molecular Ecology 3, 479-487. LaMontagne, M.G., Davenport, G.J., Hou, L.H. & Dutta, S.K. (1998). Identification and analysis of PCB dechlorinating anaerobic enrichments by amplification: accuracy of community structure based on restriction analysis and partial sequencing of 16S rRNA genes. Journal of Applied Microbiology 84, 1156-1162. Lee, S.M., Choi, Y.J., Ryu, H.W., Kong, H.H. & Chung, D.I. (1997). Species identification and molecular characterization of Acanthamoeba isolated from contact lens paraphernalia. Korean Journal of Ophthalmology 11, 39-50. Lee, I.M., Gundersen-Rindal, D.E., Davis R.E. & Bartozsyk, I.M. (1998). Revised classification scheme of phytoplasmas based on RFLP analysis of 16S rRNA and ribosomal protein gene sequences. International Journal of Systematic Bacteriology 48, 1153-1169. Linton, D., Clewley, J.P., Burnens, A., Owen, R.J. & Stanley, J. (1994). An intervening sequence
241 (IVS) in the 16S rRNA gene of the eubacterium Helicobacter canis. Nucleic Acids Research 22, 1954-1958. Liveris, D., Varde, S., Iyer, R., Koenig, S., Bittker, S., Cooper, D., McKenna, D., Nowakowski, J., Nadelman, R.B., Wormser, G.P. & Schwartz, I. (1999). Genetic diversity of Borrelia burgdorferi in Lyme disease patients as determined by culture versus direct PCR with clinical specimens. Journal of Clinical Microbiology 37, 565-569. Liu, W.T., Marsh, T.L., Cheng, H. & Forney, L.J. (1997). Characterization of microbial diversity by determining terminal restriction fragment length polymorphisms of genes encoding 16S rRNA. Applied and Environmental Microbiology 63, 4516-4522. Lyamicheva, N., Heisler, L., Brow, M.A. & Olive, D.M. (1996). Analysis of bacterial genotypes, drug resistance loci, and p53 genes using cleavase fragment length polymorphism analysis. Biochemica 3, 33-34. Lyra, C., Hantula, J., Vainio, E., Rapala, J., Rouhiainen, L. & Sivonen, K. (1997). Characterization of cyanobacteria by SDS-PAGE of whole-cell proteins and PCR/RFLP of the 16S rRNA gene. Archives of Microbiology 168, 176-184. Manachini, P.L., Fortina, M.G., Levati, L. & Parini, C. (1998). Contribution to phenotypic and genotypic characterization of Bacillus licheniformis and description of new genomovars. Systematic and Applied Microbiology 21, 520-529. Manceau, C. & Hovais, A. (1997). Assessment of genetic diversity among strains of Pseudomonas syringae by PCR-restriction fragment length polymorphism analysis of rRNA operons with special emphasis on P. syringae pv. tomato. Applied and Environmental Microbiology 63, 498-505. Marsh, T.L., Liu, W.T., Forney, L.J. & Cheng, H. (1998). Beginning a molecular analysis of the eukaryal community in activated sludge. Water Science Technology 37, 455-460. Marshall, S.M., Melito, P.L., Woodward, D.L., Johnson, W.M., Rodgers, E G., & Mulvey, M.R. (1999). Rapid identification of Campylobacter, Arcobacter, and Helicobacter isolates by PCRrestriction fragment length polymorphism analysis of the 16S rRNA gene. Journal of Clinical Microbiology 37, 4158-4160. Martin, E, Vairelles, D. & Henrion, B. (1993). Automated ribosomal DNA-fingerprinting by capillary electrophoresis of PCR products. Analytical Biochemistry 214, 182-189. Martinetti Lucchini, G. & Altwegg, M. (1992). rRNA gene restriction patterns as taxonomic tools for the genus Aeromonas. International Journal of Systematic Bacteriology 42, 384-389. Mas-Castella, J., Guerrero, R. & De Jonckheere, J.E (1996). High degree of similarity between Chromatium vinosum and Chromatium minutissimum as revealed by riboprinting. International Journal of Systematic Bacteriology 46, 922-925. Massol-Deya, A., Weller, R., Rios-Hernandez, L., Zhou, J.Z., Hickey, R.E & Tiedje, J.M. (1997). Succession and convergence of biofilm communities in fixed-film reactors treating aromatic hydrocarbons in groundwater. Applied and Environmental Microbiology 63, 270-276. Matar, G.M., Swaminathan, B., Hunter, S.B., Slater, L.N. & Welch, D.E (1993). Polymerase chain reaction-based restriction fragment length polymorphism analysis of a fragment of the ribosomal operon from Rochalimea species for subtyping. Journal of Clinical Microbiology 31, 1730-1734. Matar, G.M., Koehler, J.E., Malcolm, G., Lambert-Fair, M.A., Tappero, J., Hunter, S.B. & Swaminathan, B. (1999). Identification of Bartonella species directly in clinical specimens by PCRrestriction fragment length polymorphism analysis of a 16S rRNA gene fragment. Journal of Clinical Microbiology 37, 4045-4047. McLaughlin, G.L., Howe, D.K., Biggs, D.R., Smith, A.R., Ludwinski, P., Fox, B.C., Tripathy, D.N., Frasch, C.E., Wenger, J.D., Carey, R.B., Hassan-King, M. & Vodkin, M.H. (1993). Amplification of rDNA loci to detect and type Neisseria meningitidis and other eubacteria. Molecular and Cellular Probes 7, 7-17. Meijer, A., Kwakkel, G.J., de Vries, A., Schouls L.M. & Ossewaarde, J.M. (1997). Species identification of Chlamydia isolates by analyzing restriction fragment length polymorphism of the
242 16S-23S rRNA spacer region. Journal of Clinical Microbiology 35, 1179-1183. Milsom, S.E., Sprague, S.V., Dymock, D., Weightman, A.J. & Wade, W.G. (1996). Rapid differentiation of Prevotella intermedia and P. nigrescens by 16S rDNA PCR-RFLP. Journal of Medical Microbiology 44, 41-43. Mora, D., Fortina, M.G., Nicastro, G., Parini, C. & Manachini, P.L. (1998). Genotypic characterization of thermophilic bacilli: a study on new soil isolates and several reference strains. Research in Microbiology 149, 711-722. Moyer, C.L., Tiedje, J.M., Dobbs, EC. & Karl, D.M. (1996). A computer-simulated restriction fragment length polymorphism analysis of bacterial small-subunit rRNA genes: efficacy of selected tetrameric restriction enzymes for studies of microbial diversity in nature. Applied and Environmental Microbiology 62, 2501-2507. Muyzer, G., de Waal, E.C. & Uitterlinden, A.G. (1993). Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding for 16S rRNA. Applied and Environmental Microbiology 59, 695-700. Mylvaganam, S. & Dennis, P.P. (1992). Sequence heterogeneity between the two genes encoding 16S rRNA from the halophilic archaebacterium Holoarcula marismortui. Genetics 130, 399-410. Nakamura, L.K. & Swezey, J. (1983). Deoxyribonucleic acid relatedness of Bacillus circulans Jordan 1890 strains. International Journal of Systematic Bacteriology 33, 703-708. Navarro, E., Simonet, P., Normand, P. & Bardin, R. (1992). Characterization of natural populations of Nitrobacter spp. using PCR/RFLP analysis of the ribosomal intergenic spacer. Archives of Microbiology 157, 107-115. Neefs, J.-M., Van de Peer, Y., Hendriks, L. & De Wachter, R. (1990). Compilation of small ribosomal sub-unit RNA sequences. Nucleic Acids Research 18, r2237-r2317. Nei, M. & Li, W.-H. (1979). Mathematical model for studying genetic variation in terms of restriction endonucleases. Proceedings of the National Academy of Sciences of the United States of America 76, 5269-5273. Nesme, X., Vaneechoutte, M., Orso, S., Hoste, B. & Swings, J. (1995). Diversity and genetic relatedness within genera Xanthomonas and Stenotrophomonas using restriction endonoclease site differences of PCR-amplified 16S rRNA gene. Systematic and Applied Microbiology 18, 127-135. Nowak, A. & Kur, J. (1995). Genomic species typing of acinetobacters by polymerase chain reaction amplification of the recA gene. FEMS Microbiology Letters 130, 327-332. Nowak, A. & Kur, J. (1996). Differentiation of seventeen genospecies of Acinetobacter by multiplex polymerase chain reaction and restriction fragment length polymorphism analysis. Molecular and Cellular Probes 10, 405-411. Nowak, A., Burkiewicz, A. & Kur, J. (1995). PCR differentiation of seventeen genospecies of Acinetobacter. FEMS Microbiology Letters 126, 181-188. Ntibel, U., Engelen, B., Felske, A., Snaidr, J., Wieshuber, A., Amann, R.I., Ludwig, W. & Backhaus, H. (1996). Sequence heterogeneities of genes encoding 16S rRNAs in Paenibacillus polymyxa detected by temperature gradient gel electrophoresis. Journal of Bacteriology 178, 5636-5643. Nusslein, K. & Tiedje, J.M. (1998). Characterization of the dominant and rare members of a young Hawaiian soil bacterial community with small-subunit ribosomal DNA amplified from DNA fractionated on the basis of its guanine and cytosine composition. Applied and Environmental Microbiology 64, 1283-1289. Nuswantara, S., Fujie, M., Sukiman, H.I., Yamashita, M., Yamada, T. & Murooka, Y. (1997). Phylogeny of bacterial symbionts of the leguminous tree Acacia mangium. Journal of Fermentation and Bioengineering 84, 511-518. Ohara-Nemoto, Y., Tajika, S, Sasaki, M. & Kaneko, M. (1997). Identification of Abiotrophia adiacens and Abiotrophia defectiva by 16S rRNA gene PCR and restriction fragment length polymorphism analysis. Journal of Clinical Microbiology 35, 2458-2463. Orui, Y. (1998). Identification of Japanese species of the genus Meloidogyne (Nematoda: Meloidogynidae) by PCR-RFLP analysis. Applied Entomology and Zoology 33, 43-51.
243 Ovre, S. & Torsvik, V. (1998). Microbial diversity and community structure in two different agricultural soil communities. Microbial Ecology 36, 303-315. Pettersson, B., Lembke, F., Hammer, R, Stackebrandt, E. & Priest, EG. (1996). Bacillus sporothermodurans, a new species producing highly heat-resistant endospores. International Journal of Systematic Bacteriology 46, 759-764. Pettersson, B., Rippere, K.E., Yousten, A.A. & Priest, EG. (1999). Transfer of Bacillus lentimorbus and Bacillus popilliae to the genus Paenbacillus with emended descriptions of Paenibacillus lentimorbus comb. nov. and Paenibacillus popilliae comb. nov. International Journal of Systematic Bacteriology 49, 531-540. Plikaytis, B.B., Plikaytis, B.D., Yakrus, M.A., Butler, W.R., Woodley, C.L., Silcox, V.A. & Shinnick, T.M. (1992). Differentiation of slowly growing Mycobacterium species, including Mycobacterium tuberculosis, by gene amplification and restriction fragment length polymorphism analysis. Journal of Clinical Microbiology 30, 1815-1822. Princic, A., Mahne, I., Megusar, E, Paul, E.A. & Tiedje, J.M. (1998). Effects of pH and oxygen and ammonium concentrations on the community structure of nitrifying bacteria from wastewater. Applied and Environmental Microbiology 64, 3584-3590. Pukall, R., Brambilla, E. & Stackebrandt, E. (1998). Automated fragment length analysis of fluorescently-labeled 16S rDNA after digestion with 4-base cutting restriction enzymes. Journal of Medical Microbiology 32, 55-63. Rainey, EA., Ward-Rainey, N.L., Janssen, RH., Hippe, H. & Stackebrandt, E. (1996). Clostridium paradoxum DSM 7308T contains multiple 16S rRNA genes with heterogeneous intervening sequences. Microbiology 142, 2087-2095. Ralph, D., McClelland, M., Welsh, J., Baranton, G. & Perolat, R (1993). Leptospira species categorized by arbitrarily primed polymerase chain reaction (PCR) and by mapped restriction polymorphisms in PCR-amplified rRNA genes. Journal of Bacteriology 175, 973-981. Rath, J., Wu, K.Y., Herndl, G.J. & DeLong, E.E (1998). High phylogenetic diversity in a marinesnow-associated bacterial assemblage. Aquatic Microbial Ecology 14, 261-269. Reisfeld, A., Rosenberg, E. & Gutnick, D. (1972). Microbial degradation of crude oil: factors affecting the dispersion in sea water by mixed and pure cultures. Applied Microbiology 24, 363-368. Riedel, K.H.J., Wingfield, B.D. & Britz, T.J. (1998). Identification of classical Propionibacterium species using 16S rDNA- restriction fragment length polymorphisms Systematic and Applied Microbiology 21, 419-428. Roeijmans, H.J., De Hoog, G.S., Tan, C.S. & Figge, M.J. (1997). Molecular taxonomy and GC/MS of metabolites of Scytalidium hyalinum and Nattrassia mangiferae (Hendersonula toruloidea). Journal of Medical Veterinary Mycology 35, 181-188. Rome, S., Brunel, B., Normand, R, Fernadez, M. & Cleyet-Marel, J.C. (1996). Evidence that two genomic species of Rhizobium are associated with Medicago truncatula. Archives of Microbiology 165, 285-288. Roth, A., Reischl, U., Streubel, A., Naumann, L., Kroppenstedt, R.M., Habicht, M., Fischer, M. & Mauch, H. (2000). Novel diagnostic algorithm for identification of mycobacteria using genusspecific amplification of the 16S-23S rRNA gene spacer and restriction endonucleases. Journal of Clinical Microbiology 38, 1094-1104. Salzano, G., Moschetti, G., Villani, E, Pepe, O., Mauriello, G. & Coppola, S. (1994). Genotyping of Streptococcus thermophilus evidenced by restriction analysis of ribosomal DNA. Research in Microbiology 145, 651-658. Sato, T., Matsuyama, J., Sato, M. & Hoshino, E. (1997). Differentiation of Veillonella atypica, Veillonella dispar and Veillonella parvula using restricted fragment-length polymorphism analysis of 16S rDNA amplified by polymerase chain reaction. Oral Microbiology and Immunology 12, 350-353. Sato, T., Matsuyama, J., Takahashi, N., Sato, M., Johnson, J., Schachtele, C. & Hoshino, E. (1998a). Differentiation of oral Actinomyces species by 16S ribosomal DNA polymerase chain reactionrestriction fragment length polymorphism. Archives of Oral Biology 43, 247-252.
244 Sato, T., Sato, M., Matsuyama, J., Kalfas, S., Sundqvist, G. & Hoshino, E. (1998b). Restriction fragment-length polymorphism analysis of 16S rDNA from oral asaccharolytic Eubacterium species amplified by polymerase chain reaction. Oral Microbiology and Immunology 13, 23-29. Schmidt, T.M. (1998). Multiplicity of ribosomal RNA operons in prokaryotic genomes. In Bacterial genomes: physical structure and analysis, de Bruijn, EJ., Lupski, J.R. & Weinstock, G.M., eds, pp. 221-229. Chapman & Hall, New York. Seifert, H., Dijkshoom, L., Gemer-Smidt, P., Pelzer, N., Tjemberg, I. & Vaneechoutte, M. (1997). Distribution of Acinetobacter species on human skin: comparison of phenotypic and genotypic identification methods. Journal of Clinical Microbiology 35, 2819-2825. Selenska-Pobell, S., Otto, A. & Kutschke, S. (1998). Identification and discrimination of thiobacilli using ARDREA, RAPD and rep-APD. Journal of Applied Microbiology 84, 1085-1091. Sevin, E., Lamarque, D., Delchier, J.C., Soussy, C.J. & Tankovic, J. (1998). Co-detection of Helicobacter pylori and of its resistance to clarithromycin by PCR. FEMS Microbiology Letters 165, 369-372. Shida, O., Takagi, H., Kadowaki, K. & Komagata, K. (1996). Proposal for two new genera, Brevibacillus gen. nov and Aneurinibacillus gen. nov. International Journal of Systematic Bacteriology 46, 939-946. Shida, O., Takagi, H., Kadowaki, K., Nakamura, L.K. & Komagata, K. (1997). Transfer of Bacillus alginolyticus, Bacillus chondroitinus, Bacillus curdlanolyticus, Bacillus glucanolyticus, Bacillus kobensis, and Bacillus thiaminolyticus to the genus Paenibacillus and emended description of the genus Paenibacillus. International Journal of Systematic Bacteriology 47, 289-298. Sironi, M., Bandi, C., Novati, S. & Scaglia, M. (1997). A PCR-RFLP method for the detection and species identification of human microsporidia. Parasitologia 39, 437-439. Smit, E., Leeflang, P. & Wemars, K. (1997). Detection of shifts in microbial community structure and diversity in soil caused by copper contamination using amplified ribosomal DNA restriction analysis. FEMS Microbiology Ecology 23, 249-261. Smit, E., Leeflang, P., Glansdorf, B., van Elsas, J.D. & Wemars, K. (1999). Analysis of fungal diversity in the wheat rhizosphere by sequencing of cloned PCR-amplified genes encoding 18S rRNA and temperature gradient gel electrophoresis. Applied and Environmental Microbiology 65, 2614-2621 Smith, J.K., Parry, J.D., Day, J.G. & Smith, R.J. (1998). A PCR technique based on the Hipl interspersed repetitive sequence distinguishes cyanobacterial species and strains. Microbiology 144, 2791-2801. Smith-Vaughan, H.C., Sriprakash, K.S., Mathews, J.D. & Kemp, D.J. (1995). Long PCR-ribotyping of nontypeable Haemophilus influenzae. Journal of Clinical Microbiology 33, 1192-1195. Sneath, P.H.A. (1993). Evidence from Aeromonas for genetic crossing-over in ribosomal sequences. International Journal of Systematic Bacteriology 43, 626-629. Sreevatsan, S., Bookout, J.B., Ringpis, EM., Mogazeh, S.L., Kreiswirth, B.N., Pottathil, R.R. & Raj, R. (1998). Comparative evaluation of cleavase fragment length polymorphism with PCR-SSCP and PCR-RFLP to detect antimicrobial agent resistance in Mycobacterium tuberculosis. Molecular Diagnosis 3, 81-91. Stackebrandt, E. & Goebel, B.M. (1994). Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. International Journal of Systematic Bacteriology 44, 846-849. Staley, J.T. (1999). Bacterial biodiversity: a time for place. ASM News 10, 681-687. Steingrube, V.A., Brown, B.A., Gibson, J.L., Wilson, R.W., Brown, J., Blacklock, Z., Jost, K., Locke, S., Ulrich, R.F. & Wallace, R.J. (1995a) DNA amplification and restriction endonuclease analysis for differentiation of 12 species and taxa of Nocardia, including recognition of four new taxa within the Nocardia asteroides complex. Journal of Clinical Microbiology 33, 3096-3101. Steingrube, V.A., Gibson, J.L., Brown, B.A., Zhang, Y., Wilson, R.W., Rajagopalan, M. & Wallace, R.J. (1995b). PCR amplification and restriction endonuclease analysis of a 65-Kilodalton heat shock protein gene sequence for taxonomic separation of rapidly growing mycobacteria. Jour-
245 nal of Clinical Microbiology 33, 149-153. Stewart, G.C., Wilson, EE. & Bott, K.E (1982). Detailed physical mapping of the ribosomal RNA genes of Bacillus subtilis. Gene 19, 153-162. Stothard, J.R., Frame, I.A., Carrasco, H.J. & Miles, M.A. (1998). On the molecular taxonomy of Trypanosoma cruzi using riboprinting. Parasitology 117, 243-247. Telenti, A., Marchesi, E, Balz, M., Bally, E, B6ttger, E.C. & Bodmer, T. (1993). Rapid identification of mycobacteria to the species level by polymerase chain reaction and restriction enzyme analysis. Journal of Clinical Microbiology 31, 175-178. Terefework, Z., Nick, G., Suomalainen, S., Paulin, L. & Lindstrom, K. (1998). Phylogeny of Rhizobium galegae with respect to other rhizobia and agrobacteria. International Journal of Systematic Bacteriology 48, 349-356. Torsvik, V., Daae, EL., Sandaa, R.A. & Ovreas, L. (1998). Novel techniques for analysing microbial diversity in natural and perturbed environments. Journal of Biotechnology 64, 53-62. T6tsch, M., Br6mmelkamp, E., Stticker, A., Fille, M., Gross, R., Wiesner, E, Wemer Schmid, K., B6cker, W. & Dockhom-Dwomiczak, B. (1996). Identification of mycobacteria to the species level by automated restriction enzyme fragment length polymorphism analysis. Virchows Archives 298, 1-5. Urakawa, H., Kita-Tsukamoto, K. & Ohwada, K. (1997). 16S rDNA genotyping using PCR/RFLP (restriction fragment length polymorphism) analysis among the family Vibrionaceae. FEMS Microbiology Letters 152, 125-132. Urakawa, H., Kita-Tsukamoto, K. & Ohwada, K. (1998). A new approach to separate the genus Photobacterium from Vibrio with RFLP patterns by HhaI digestion of PCR-amplified 16S rDNA. Current Microbiology 36, 171-174. Vandamme, E, Heyndrickx, M., Vancanneyt, M., Hoste, B., De Vos, E, Falsen, E., Kersters, K. & Hinz, K.H. (1996). Bordetella trematum sp. nov., isolated from wounds and ear infections in humans, and reassessment of Alcaligenes denitrificans Rtiger and Tan 1983. International Journal of Systematic Bacteriology 46, 849-858. Vandamme, E, Heyndrickx, M., De Roose, I., Lammens, C., De Vos, E & Kersters, K. (1997). Characterization of Bordetella strains and related bacteria by amplified ribosomal DNA restriction analysis and randomly and repetitive element-primed PCR. International Journal of Systematic Bacteriology 47, 802-807. Vandamme, E, Goris, J., Coenye, T., Hoste, B., Janssens, D., Kersters, K., De Vos, E & Falsen, E. (1999). Assignment of Centers of Disease Control group IVc-2 to the genus Ralstonia as Ralstonia paucula sp. nov. International Journal of Systematic Bacteriology 49, 663-669. Vaneechoutte, M. (1996). DNA fingerprinting techniques for microorganisms. A proposal for classification and nomenclature. Molecular Biotechnology 6, 115-142. Vaneechoutte, M., Rossau, R., De Vos, E, Gillis, M., Janssens, D., Paepe, N., De Rouck, A., Fiers, T., Claeys, G. & Kersters, K. (1992). Rapid identification of bacteria of the Comamonadaceae with amplified ribosomal DNA-restriction analysis (ARDRA). FEMS Microbiology Letters 93, 227-234. Vaneechoutte, M., de Beenhouwer, H., Claeys, G., Verschraegen, G., De Rouck, A., Paepe, N., Elaichouni, A. & Portaels, E (1993). Identification of Mycobacterium species by using amplified rDNA-restriction analysis. Journal of Clinical Microbiology 31, 2061-2065. Vaneechoutte, M., Dijkshoom, L., Tjemberg, I., Elaichouni, A., De Vos, E, Claeys, G. & Verschraegen, G. (1995a). Identification of Acinetobacter genomic species by amplified ribosomal DNA restriction analysis. Journal of Clinical Microbiology 33, 11-15. Vaneechoutte, M., Riegel, E, de Briel, D., Monteil, H., Verschraegen, G., De Rouck, A. & Claeys, G. (1995b). Evaluation of the applicability of amplified rDNA-restriction analysis to identification of species of the genus Corynebacterium. Research in Microbiology 146, 633-641. Vaneechoutte, M., Cartwright, C.E, Williams, E.C., J~iger, B., Tichy, H.-V., De Baere, T., De Rouck, A. & Verschraegen, G. (1996). Evaluation of 16S rRNA gene restriction analysis for the identification of cultured organisms of clinically important Clostridium species. Anaerobe 2, 249-256.
246 Vaneechoutte, M., Boerlin, E, Tichy, H.-V., Bannerman, E., J~iger, B. & Bille, J. (1998a). Comparison of the value of DNA-fingerprinting techniques for the identification and taxonomical classification of Listeria species. International Journal of Systematic Bacteriology 48, 127-139. Vaneechoutte, M., De Bleser, D., Claeys, G., Verschraegen, G., De Baere, T., Hommez, J., Devriese, L.A. & Riegel, P. (1998b). Cardioverter-lead electrode infection due to Corynebacterium amycolatum. Clinical Infectious Diseases 21, 1553-1554. Vaneechoutte, M., Tjernberg, I., Baldi, E, Pepi, M., Fani, R. & Sullivan, E.R., van der Toorn, J. & Dijkshoorn, L. (1999a). The oil-degrading Acinetobacter strain RAG-1 and the strains described as 'Acinetobacter venetianus sp. nov.' belong to the same genomic species. Research in Microbiology 150, 69-73. Vaneechoutte, M., Vauterin, L., van Harsselaar, B., Dijkshoorn, L. & De Vos, P. (1999b). Considerations in evaluation of the applicability of DNA fingerprinting techniques for species differentiation. Journal of Clinical Microbiology 37, 3428-3429. Vidigal, T.H.D.A., Spatz, L., Nunes, D.N., Simpson, A.J.G., Carvalho, O.S. & Neto, E.D. (1998). Biomphalaria spp: Identification of the intermediate snail hosts of Schistosoma mansoni by polymerase chain reaction amplification and restriction enzyme digestion of the ribosomal RNA gene intergenic spacer. Journal of Experimental Parasitology 89, 180-187. Vilgalys, R. & Hester, M. (1990). Rapid genetic identification and mapping of enzymatically amplified ribosomal DNA from several Cryptococcus species. Journal of Bacteriology 172, 4238--4246. Vodkin, M.H., Howe, D.K., Visvesvara, G.S. & McLaughlin, G.L. (1992). Identification of Acanthamoeba at the generic and specific levels using the polymerase chain reaction. Journal of Protozoology 39, 378-385. Wainr M., Tindall, B.J., Schumann, P. & Ingvorsen, K. (1999). Gracilibacillus gen. nov., with description of Gracilibacillus halotolerans gen. nov., sp. nov.; transfer of Bacillus dipsosauri to Gracilibacillus dipsosauri comb. nov., and Bacillus salexigens to the genus Salibacillus gen. nov., as Salibacillus salexigens comb. nov. International Journal of Systematic Bacteriology 49, 821-831. Wang, G., van Dam, A.P., Le Fleche, A., Postic, D., Peter, O., Baranton, G., de Boer, R., Spanjaard, L. & Dankert, J. (1997a). Genetic and phenotypic analysis of Borrelia valaisiana sp. nov. (Borrelia genomic groups VSll6 and M19). International Journal of Systematic Bacteriology 47, 926-932. Wang, Y., Zhang, Z. & Narendrakumar, R. (1997b). The actinomycete Thermobispora bispora contains two distinct types of transcriptionally active 16S rRNA genes. Journal of Bacteriology 179, 3270-3276. Weidner, S., Arnold, W. & Puhler, A. (1996). Diversity of uncultured microorganisms associated with the seagrass Halophila stipulacea estimated by restriction fragment length polymorphism analysis of PCR-amplified 16S rRNA genes. Applied and Environmental Microbiology 62, 766-771. Widjojoatmodjo, M.N., Fluit, A.C. & Verhoef, J. (1995). Molecular identification of bacteria by fluorescence-based PCR-single-strand conformation polymorphism analysis of the 16S rRNA gene. Journal of Clinical Microbiology 33, 2601-2606. Wilson, M.J., Wade, W.G. & Weightman, A.J. (1995). Restriction fragment length polymorphism analysis of PCR-amplified 16S ribosomal DNA of human Capnocytophaga. Journal of Applied Bacteriology 78, 394-401. Wittenbrink, M.M., Reuter, C., Baumeister, K., Schutze, H. & Krauss, H. (1998). Identification of group VS 116 strains among Borrelia burgdorferi sensu lato grown from the hard tick, lxodes ricinus (Linnaeus, 1758) by PCR-coupled restriction fragment length polymorphism analysis. Zentralblatt fur Bakteriologie 288, 45-57. Woese, C. (1987). Bacterial evolution. Microbiological Reviews 51, 221-271. Woo, T.H.S., Patel, B.K.C., Smythe, L.D., Symonds, M.L., Norris, M.A. & Dohnt, M.E (1997). Comparison of two PCR methods for rapid identification of Leptospira genospecies interro-
247
gans. FEMS Microbiology Letters 155, 169-177. Wood, J., Scott, K.E, Avgustin, G., Newbold, C.J. & Flint, H.J. (1998). Estimation of the relative abundance of different Bacteroides and Prevotella ribotypes in gut samples by restriction enzyme profiling of PCR-amplified 16S rRNA gene sequences. Applied and Environmental Microbiology 64, 3683-3689. Yamamoto, S. & Harayama, S. (1996). Phylogenetic analysis of Acinetobacter strains based on the nucleotide sequence of gyrB genes and on the amino acid sequences of their products. International Journal of Systematic Bacteriology 46, 506-511. Yoon, J.-H., Lee, S.T., Kim, S.-B., Kim, W.Y., Goodfellow, M. & Park, Y.-H. (1997). Restriction fragment length polymorphism analysis of PCR-amplified 16S ribosomal DNA for rapid identification of Saccharomonospora strains. Journal of Clinical Microbiology 47, 111-114. Zhou, J., Davey, M.E., Figueras, J.B., Rivkina, E., Gilichinsky, D. & Tiedje, J.M. (1997). Phylogenetic diversity of a bacterial community determined from Siberian tundra soil DNA. Microbiology 143, 3913-3919.
This Page Intentionally Left Blank
249
10 Insertion Sequence (IS) Typing and
Oligotyping Nicholas A. Saunders Molecular Biology Unit, Hepatitis and Retrovirus Laboratory, Central Public Health Laboratory, London, UK
CONTENTS 10.1 G E N E R A L I N T R O D U C T I O N
.....................
249
10.2 IS T Y P I N G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Introduction B. Methodological approaches to IS typing (i) Southernblotting (ii) PCR-basedmethods C. Examplesof IS typing methods (i) IS200 typing of Salmonella enterica by Southern blotting (ii) IS6110 typing of Mycobacterium tuberculosis by Southern blotting (iii) IS6110 typing of M. tuberculosis by linker-mediated PCR (iv) IS6110 typing of M. tuberculosis by inverse PCR D. Analysisof IS typing patterns E. IS typing: future perspectives and conclusions
250 250 251 251 251 252 252 253 255 257 258 258
10.3 O L I G O T Y P I N G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Introduction B. Approachesto oligotyping C. Examplesof oligotyping systems (i) Spoligotyping for M. tuberculosis (ii) Streptococcus pyogenes emm gene typing D. Oligotyping:future perspectives and conclusions
259 259 259 260 260 261 262
REFERENCES
262
................................
10.1 G E N E R A L I N T R O D U C T I O N Insertion sequence typing (IS typing) and oligotyping are two distinct genotyping approaches, each capable of detecting variations in the chromosomal DNA sequences of different strains of a bacterial species. IS typing relies upon analysis of the chromosomal milieu of different copies of a recurring sequence. Its discriminatory ability depends upon the properties of the insertion sequences targeted. Oligotyping uses hybridisation with arrays of oligonucleotide probes to determine whether particular nucleotide sequences are present within the chromosome. These two methods employ quite different formats, but recent developments O2001 Elsevier Science B.V. All rights reserved.
250 in both IS typing and oligotyping rely heavily upon one technique, the polymerase chain reaction. Some of these developments are reviewed below. 10.2 IS TYPING A.
Introduction
Insertion sequences (IS) are mobile DNA elements that are capable of transposition between different sites within the bacterial genome, where they are maintained and replicated. They vary in size (between approximately 0.7 and 2.5 kb), organisation and behaviour, and are present in from one to several hundred copies per genome. Typically they have at least one open reading frame (ORF) encoding a protein that can be identified as a transposase. Other genes may be present and in some cases these are known to be involved in the regulation of transposition. Generally, IS carry inverted repeat sequences at their termini. IS from a wide range of bacterial species have been characterised (Stanley & Saunders, 1996) and data from genome sequencing suggest that they are common features of the bacterial chromosome. At least in the Enterobacteriaceae, IS are also associated with plasmids where they generally occur at a higher frequency, per unit length of DNA, than on the chromosome. This means that plasmids may play a role in the dissemination of IS elements between strains and species. IS elements are also sometimes found as integral parts of composite transposons, in which the two IS in either orientation flank a sequence that is in itself non-transposable. Many of these interstitial sequences carry antibiotic resistance determinants. However, analysis of transposons carried by plasmids isolated in the pre-antibiotic era failed to reveal antibiotic resistance genes (Datta & Hughes, 1983). Insertion elements occasionally transpose to new loci by a replicative process. The rate of transposition varies depending upon the properties of the particular IS and the genetic background of the strain carrying it. Excision of IS (i.e., loss from a site) also occurs, but relatively infrequently (Egner & Berg, 1981). The number and chromosomal loci of IS is therefore a relatively stable feature of a strain and can be used for epidemiological typing. In addition, IS profile data can be used to reconstruct the evolutionary history of a strain over a timescale appropriate to the particular IS and its host organism. The fitness of any IS for epidemiological typing depends upon the rate of transposition. Isolates of a species derived from the same source (i.e., the index case or a single environmental locus) should share a common IS profile, although minor variations do occur and can often be tolerated. Ideally though, isolates from separate sources should have distinguishable profiles resulting in an index of discriminatory ability (Hunter & Gaston, 1988) of as close to one as possible. The discriminatory power of IS typing lies primarily in differences between strains in the number and sites of chromosomal integration of these elements. Additional discrimination is contributed by differences in the nucleotide sequences of the DNA flanking a site of insertion. Thus, IS typing of two strains carrying
251
Fig. 10.1. Three different schemes for IS typing are illustrated in A-C. The shaded block represents the IS and the unshaded areas the flanking sequences. The dark area within the IS corresponds to the probe sequence and restriction sites are indicated by arrows. In A, a restriction endonuclease with no sites in the IS gives a single fragment hybridising to the probe. If the enzyme has a site within the IS (B & C), then the probe may either hybridise to both fragments, as shown in B, to give two bands per IS, or to just one of the fragments producing a single band of hybridisation (C).
identical numbers of IS elements at identical chromosomal loci may still discriminate between them on the basis of single base mutations affecting key restriction sites.
B.
Methodological approaches to IS typing
(i)
Southern blotting
The classical method of IS typing is to use a probe complementary to all or part of the IS to detect restriction fragments carrying the sequence on Southern blots. The restriction enzyme is chosen so that the average size of the fragments produced can be readily resolved on an agarose gel, since the process of blotting and hybridisation/detection results in some loss of resolution. One approach (Fig. 10.1) is to select a nuclease that does not cleave the IS. The molecular size of each fragment is then determined by the length of the sequences flanking each side of the IS. Using this approach nearby IS copies may be found on a single restriction fragment. If the sequence of the IS is available, it is then possible to simplify interpretation of the results by using a restriction endonuclease with a site within the IS, and a probe that hybridises to only one of the resulting fragments carrying the IS termini (Fig. 10.1). In this way each band on the blots results from a single IS copy (van Embden et al., 1993). The difference between the molecular sizes of the bands is therefore due solely to differences in the chromosomal sequence flanking only one end of each IS copy.
(ii)
PCR-based methods
Although Southern blotting has proved to be a reliable and accurate method of IS typing, it does have disadvantages. Microgram quantities of relatively pure and
252 undegraded DNA are required, and the multi-step procedure is time-consuming with a significant potential for errors. In contrast, the PCR-based IS typing methods require less DNA and the quality of the DNA is less critical. Furthermore, the PCR procedure is usually simple and completed rapidly. Several uses of PCR for IS typing have been described. The first uses one primer 'facing' outward toward one IS terminus. The second primer can be complementary to any sequence repeated in the genome of interest, including the IS itself (Otal et al., 1997). In this way a PCR product is generated only when the IS is integrated close to one of the repeat sequences, which must also be in the 'correct' orientation. An alternative method is to again use a single outward facing IS primer, but to provide the second priming site by cutting the DNA sample with a restriction endonuclease, and to then ligate oligonucleotide linkers to the digested fragments (Haas et al., 1993; Palittapongarnpim et al., 1993). In this 'linker-mediated' PCR, only DNA fragments carrying both types of priming site (i.e., the linker and the IS) are amplified efficiently, and a series of PCR products is produced in which each type of PCR amplicon is derived from a different IS copy. The third approach, inverse PCR, also uses an outward facing primer within the IS. The second primer is also complementary to the IS and primes in the opposite direction. PCR amplification is only possible once the DNA has been cleaved, using an appropriate restriction endonuclease, and ligated at low concentration to form covalently closed DNA circles (Patel et al., 1996; Otal et al., 1997) in which the priming sites are in the correct orientation for PCR, separated by parts of the IS and some of the IS flanking sequence. The linker-mediated and inverse methods are discussed in more detail below. C.
Examples of IS typing methods
(i) IS200 typing of Salmonella enterica by Southern blotting IS200 was first described in S. enterica serovar Typhimurium strain LT2, and was originally reported to be limited to Salmonella (Lam & Roth, 1983). Now, IS200like elements with divergent sequences have been found in Escherichia coli (Bisercic & Ochman, 1993), Yersinia pestis, Y. pseudotuberculosis and Y. enterocolitica (Odaert et al., 1996; Simonet et al., 1996). The degree of sequence divergence between the IS200-1ike elements found in the different genera suggests that they may have co-evolved in situ within the chromosomes of their hosts. The presence of IS200 in different species cannot, therefore, be taken as evidence of lateral transfer of the element. It has been suggested that the lack of evidence for lateral transfer, and the infrequency of its association with plasmids, indicates that IS200 has little affinity for extrachromosomal elements (Stanley et al., 1993), and that it may therefore act as a marker to fingerprint the vertically inherited chromosome, independent of lateral transfer (Stanley & Saunders, 1996). Analyses of IS200 sites of insertion in S. enterica serovars have been performed with restriction endonucleases that either do not have sites within the element (e.g.,
253 PstI, BanI and PvuII) or which cut it at a single site (e.g., EcoRI and EcoRV). An IS200 detection probe of 692 bp can be generated conveniently by PCR (Baquar et al., 1993). The probe hybridises to both fragments generated by EcoRI and by EcoRV so that the potential number of bands is doubled when these enzymes are used. This can result in a small increase in the number of different patterns obtained from a given strain collection, but also reduces the clarity of the results and somewhat complicates their analysis. The extra degree of discrimination seen after separate analysis of the sequences flanking both ends of each IS copy derives from the reduced possibility that an IS copy will be masked. When only one restriction fragment is generated for each IS copy, there is a possibility that band co-migration will result in masking of some elements, but if two fragments are analysed for each IS copy, this possibility is greatly reduced. IS200 typing has usually been applied in a hierarchical manner to groups of strains that have already been grouped by serovar or serovar/phagetype. In this context the technique has given useful discrimination in epidemiological studies (Pelkonen et a/.,1994). Generally, the level of discrimination achieved by IS200 typing is dependent upon the number of copies present in the chromosome. Serovars with five or more copies of the element are more likely to give useful epidemiological typing data (Stanley et al., 1994). IS200 typing patterns have generally been assessed and compared by eye since the number of different patterns is relatively limited. Some attempts have been made to infer evolutionary relationships between Salmonella serovars from IS200 typing data (reviewed by Stanley & Saunders, 1996), but with limited success. It has been possible to reconstruct plausible lineages for closely related clones. However, it is apparent that inferences about the relationships of strains belonging to different serovars or serogroups would require direct identification of the sites of insertion by sequencing, and would also need to take account of the possibility of IS200 deletions as well as insertion events. (ii) IS6110 typing of Mycobacterium tuberculosis by Southern blotting IS6110 is a 1355-bp element with classical terminal inverted repeats. Copies from three different strains belonging to the M. tuberculosis complex ~ one from M. bovis (IS987) and two from M. tuberculosis (IS6110 and IS986) have been sequenced and found to differ at only a few nucleotides (McAdam et al., 1990; Thierry et al., 1990; Hermans et al., 1991). IS6110 is present in the other species of the MTB-complex, which also includes M. microti and M. africanum. It is not present in more distantly related mycobacteria (Cave et al., 1991). The sequence has two ORFs, the largest of which encodes a putative transposase with pronounced (51%) amino acid sequence similarity to the corresponding ORF of IS3411, one of the enterobacterial IS3 family of elements (McAdam et al., 1990). The mechanism of transposition of IS6110 was shown to be orthodox by experiments in M. smegmatis using artificial composite transposons comprising two copies of IS986 flanking a kanamycin resistance cassette (Fomukong & Dale, 1993). Studies on the sites of insertion of IS6110 have now demonstrated that the
254
Fig. 10.2. IS6110 typing using the method recommended by van Embden et al. (1993). Scheme C in Fig. 10.1 is used to produce a single band for each copy of the IS via digestion with PvulI. The IS (medium shading) componentof the fragments is constant, but the flanking sequences (unshaded) vary in length. The gel track diagram (fight) shows the expected profile for a strain carrying four copies of the element in the positions shown relative to flanking restriction sites.
element integrates at preferred chromosomal loci (Fang & Forbes, 1997; Patel, 1999), although not at specific points in the nucleotide sequence. Thus, six alternative sites of integration are reported for IS6110 at the ipl (IS preferential locus) sequence (Fang & Forbes, 1997). More work is required to describe the precise signals that allow IS6110 integration at preferred loci. To ensure that the results of IS6110 typing in different laboratories are comparable, an international group has recommended a standardised technique (van Embden et al., 1993). PvulI, which has a single cleavage site in the element, is used to prepare fragments that are subjected to Southern blotting and hybridised to a 245-bp probe (Fig. 10.2). The probe hybridises only to fragments carrying the 3' end of IS6110. Thus, a single band on the IS6110 profile is derived from each copy of the IS, and each band is expected to be of equal intensity since the sequence probed is identical in each case. An example is shown in Fig. 10.3. This method has proved to be highly reproducible and the band profiles are stable when strains are passaged in guinea-pigs (Hermans et al., 1990), subcultivated on laboratory media (van Soolingen et al., 1991), or isolated from the same patient at intervals of up to 4.5 years (Cave et al., 1994). However, minor differences in the band profiles of strains isolated from epidemiologically related cases are often encountered. This indicates that transpositions of IS6110 occur frequently and accounts for the high diversity of the typing patterns observed. The IS6110 copy number of M. tuberculosis strains follows a bimodal distribution, with peaks of either one or 10-12 elements. IS6110 typing of the majority of strains with five or more copies of the element is highly discriminatory, and consequently the method has been used in population-based studies. All available strains isolated within a population are analysed and the patterns compared (e.g., Chevrel-Dellagi et al., 1993; Alland et al., 1994; Small et al., 1994). Strains with
255
Fig. 10.3. A Southern blot of IS6110 restriction fragments prepared using the method recommended by van Embden et al. (1993). The end tracks (1 and 21) and tracks 6, 11 and 16 are standards with bands from 21 kb (top) to 0.8 kb (bottom). The remaining 16 tracks show banding profiles of 16 distinct isolates. The profiles in tracks 9 and 20 which differ by a single band were from epidemiologically related cases. indistinguishable patterns of five or more bands are generally assumed to be epidemiologically linked. As patterns differing by a single band are often found in groups of strains with proven epidemiological associations, these are also considered as being linked. Analysis of IS6110 typing data on the basis of these assumptions has been used to estimate the rate of active transmission of tuberculosis within different populations (Alland et al., 1994; Small et al., 1994). The number of strains with unique patterns is used as a measure of the rate of reactivation of old infections, while clusters are assumed to have resulted from new infections. Recently, clusters identified by IS6110 typing have also been used to assess the transmission of M. tuberculosis from patients smear-negative for acid-fast bacilli (Behr et al., 1999). The discriminatory power of IS6110 typing for strains carrying four or less copies of the element is not sufficient to allow linkage of strains in the absence of other data. However, for outbreak investigation, the method can give strong confirmatory evidence that strains are from a common source even when they only have one copy of the element.
(iii) IS6110 typing of M. tuberculosis by linker-mediated PCR Two variants of the linker-mediated PCR method have been applied to IS6110 typing of M. tuberculosis (Haas et al., 1993; Palittapongarnpim et al., 1993). The mixed-linker PCR method (Haas et al., 1993) has been the more widely used. In this technique, the adapter strand of the linker oligonucleotide, which is also the strand
256
Fig. 10.4. For mixed-linker PCR, restriction fragments of the genomic DNA are ligated to the linker comprised of the uracil-modified oligonucleotide (dark shading) and the unmodified oligonucleotide (light shading). The unmodified oligonucleotide becomes covalently bound to the 5' ends of the restriction fragments, while the uracil-containing material is hydrolysed. In the first round of the subsequent PCR, the only priming sites available (indicated by arrows) are within the insertion sequence (shaded black). The product of this synthesis has both priming sites and is amplified in subsequent rounds of the PCR.
that does not become covalently bound to the target restriction fragments, has its thymidine bases replaced by uracil. Following ligation, during which the adapter is essential, the now redundant oligonucleofide is destroyed by treatment with uracil N-glycosylase (Fig. 10.4). This modification of the method was reported to give more reproducible results than when the adapter was left intact during the subsequent PCR. Even when mixed-linkers are used, it seems that some non-specific amplicons are produced (Buffer et al., 1996). Further reproducibility studies are required to establish fully the degree of accuracy and reproducibility that can be expected from these methods. A development of IS6110 mixed-linker PCR, that improves the reproducibility and comparability of the method, involves analysis of the fragments on a DNA sequencer using fluorescence detection (Butler et al., 1996). For this purpose, fluors are first introduced into the PCR products via end-labelled primers. Mixed-linker PCR requires a minimal quantity of DNA, and can therefore be
257
Fig. 10.5. Inverse PCR for IS6110 is illustrated. The M. tuberculosis genome is cut with BsrFI and then self-ligated at a low concentration of ligatable ends. Circular DNA molecules, derived from the 5'-end of IS6110 (light shading) and its flanking sequence (dark shading), can act as a template for the PCR primers based on IS6110 in the orientation shown. The linear PCR products, one for each copy of the IS, vary in length and can be analysed by agarose gel electrophoresis. applied to early cultures of M. tuberculosis. In addition, since the length of the target sequences to be amplified is in the range 200-1500 bp, rapid methods of sample preparation resulting in significant shearing of the DNA can be employed. The method has also been used for IS6110 fingerprinting of heat-killed cells stored on filter paper for long periods (Burger et al., 1998). (iv) IS6110 typing o f M. tuberculosis by inverse PCR Inverse PCR has been applied to IS6110 typing in two studies (Patel et al., 1996; Otal et al., 1997). The method is illustrated in Fig. 10.5. When applied to one terminus of the IS with self-ligation and amplification of short restriction fragments of up to c. 1.5 kb (Patel et al., 1996), inverse PCR gave good results. Sequencing of several amplicons was used to show that they were each derived from the 5' end of IS6110 and sequences flanking the element. For strains shown to be carrying one or two copies of the IS by Southern blotting, the same number of amplicons was produced in the inverse PCR. The discriminatory power of the inverse PCR was shown to be similar to that of the standard IS6110 typing method (van Embden et al., 1993). The other study in which inverse PCR was applied to IS6110 used primers at either terminus of the IS (Otal et al., 1997). The DNA was cleaved by a restric-
258 tion endonuclease with no site in the element, followed by self-ligation. In this study, production of IS61/O-derived amplicons through inverse PCR could not be demonstrated. This might have been anticipated from the result of the earlier study, which had shown the low efficiency of self-ligation and subsequent PCR of fragments of DNA of > 1500 bp. Inverse PCR, like mixed-linker PCR, can be applied to picogram quantities of DNA isolated by methods that result in sheafing. The self-ligation step in inverse PCR is very simple to optimise and perform, since success depends primarily upon the concentration of fragments being low. Compared with the standardised method based on Southern blotting (van Embden et al., 1993) and the mixed-linker PCR (Haas et al., 1993), the inverse PCR technique requires few manipulations.
D.
Analysis of IS typing patterns
To obtain the best IS typing results, it is essential that appropriate standards are run in parallel on the electrophoretic gels used to separate the DNA fragments. To eliminate intra-gel variation, the standard DNA mixture is ideally added to each well and detected by any method that can discriminate between sample and standard bands (van Embden et al., 1993; Butler et al., 1996). However, it is generally sufficient to include standards in tracks adjacent to the samples. The standard chosen should cover the complete size-range of bands derived from the samples. In addition, the standard bands should be frequent and well-spaced. Provided a good standard is used, gel profiles obtained by any of the IS-based methods discussed above can easily be compared visually when present on the same gel. Comparison of profiles produced in different gel runs is greatly facilitated by use of a computer program that can normalise the patterns and provide tables of similarities between them. GelCompar (Applied Maths, Kortrijk, Belgium) is an example of the software packages available for this purpose and has been widely used in studies involving IS6110.
E.
IS typing: future perspectives and conclusions
Depending upon their rates of transposition, different IS can be employed in genotyping systems to give different levels of resolution between strains. Certain IS have characteristics that make them ideal genotypic markers for epidemiological typing studies. As further IS elements are discovered as a result of sequencing projects, it seems likely that more of them will be exploited for genotyping purposes. The development of PCR-based methods for IS typing should increase the accessibility of this approach by removing the need to perform time-consuming Southern blotting. The increasing availability of automated DNA sequencers should encourage their greater use for the analysis of the PCR amplicons produced by these methods. This should result in higher levels of reproducibility, comparability and accuracy.
259 10.3 OLIGOTYPING A.
Introduction
Oligotyping is a generally applicable technique in which the hybridisation of oligonucleotide probes is used to determine whether specific DNA or RNA sequences are present within a target specimen. The method is very powerful since, with the fight conditions, the probes can be used to distinguish between sequences differing by only a single base. Advances in oligonucleotide synthesis mean that it is now feasible to create large arrays of probes that can yield a correspondingly large number of data points. For efficient hybridisation, oligonucleotide probes must have minimal secondary structure. Generally, 20-mers with 50% G+C content make good probes that can be used to distinguish between sequences differing by one or more base pairs. Shorter probes give greater resolution of single base changes and are less expensive to produce. However, the interaction with the target sequence is less stable over short sequences, and may be affected by secondary structures that are only significant under the conditions required for hybridisation of short probes. Longer probes give good results if the target sequences are more divergent, but it may be difficult to detect single base differences. B.
Approaches to oligotyping
The most convenient approach to oligotyping is to hybridise the target nucleic acid to a series of probes bound to a solid phase. Various supports have been exploited as the solid phase, including microtitre trays (Borrow et al., 1997; Saunders et al., 1997), nitrocellulose or nylon filters (Kaufhold et al., 1994; Kamerbeek et al., 1997) and glass (Kozal et al., 1996). The method used to fix oligonucleotides to the solid phase depends upon the chemical and physical structure of the solid phase, and may result in either covalent or non-covalent binding. The advantage of establishing a covalent link is that a wider range of stringent washing conditions can be employed to remove mismatched target sequences. For the new high-density oligonucleotide probe arrays, the oligonucleotides are synthesised in situ by a process that relies on photolithography (Pease et al., 1994). The high density probe arrays also allow all possible mutations at each base position to be tested with a specific probe so that it is possible to obtain the partial sequence of any gene target. Oligotyping is made sensitive and convenient by PCR amplification and sometimes labelling of the target sequence prior to hybridisation. Although the specificity of the result does not depend primarily on the PCR, the use of specific amplification does allow high signal to noise ratios to be achieved with simple protocols. Oligotyping approaches include line-probes, microtitre plate arrays and high-density probe arrays.
260
Fig. 10.6. Biotinylated PCR primers based on the direct repeat sequence (black boxes) are used to amplify the variable spacer sequences (single lines) within the DR locus. The amplicons generated carry all of the spacers for a specific strain. An array of vertical strips of different spacer sequence probes is prepared using a blotter with long slots. The PCR products for the test strains are hybridised to the filter, separately, in horizontal strips using the same apparatus but with the slots turned through 90~ The black squares represent areas of hybridisation between probe and amplicon, while light squares indicate that no hybridisation occurred due to the absence of the particular spacer sequence. A specific horizontal 'bar-code' of areas of hybridisation/non-hybridisation is generated for each strain.
C.
Examples of oligotyping systems
(i)
Spoligotyping for M. tuberculosis
The spoligotyping (spacer oligotyping) m e t h o d for M. tuberculosis ( K a m e r b e e k et al., 1997) relies for its discriminatory ability upon variation at the direct repeat (DR) locus. This locus consists of a variable n u m b e r of perfect t a n d e m repeats of a 36-bp sequence interspersed with variable spacers of b e t w e e n 34 and 41 bp. There is some conservation of these sequences b e t w e e n strains, but c o m p a r i s o n of pairs of strains usually reveals m a n y differences between the inventory of spacer sequences carried.
261 For spoligotyping (Fig. 10.6), a series of 43 oligonucleotides, each corresponding to 25 bases of a different spacer sequence, found in either M. tuberculosis strain H37Rv or M. bovis strain P3, are covalently coupled to an activated Biodyne C membrane via a 5' aminolink. The target DNA is prepared and biotinylated by a PCR targeted at the DR region of the strains to be typed, and is then applied to the membrane. In order to allow analysis of multiple samples on a single membrane, the oligonucleotides and the target are applied to the membrane in parallel strips, at fight angles, as illustrated in Fig. 10.6 (Kamerbeek et al., 1997). Hybridisation of the biotinylated probes derived from the test strains is detected by chemiluminescence via a streptavidin-peroxidase conjugate. The pattern of negative and positive hybridisation signals depends upon the specific complement of spacer sequences present within the DR locus. The hybridised DNA can be stripped from the membrane following detection, and the recovered probe arrays can be re-used many times. Spoligotyping is reproducible (Goguet de la Salmoniere et al., 1997; Goyal et al., 1997), convenient and rapid. It can be applied directly to extracts of clinical material without prior culture (Kamerbeek et al., 1997). (ii)
Streptococcus p y o g e n e s e m m gene typing
Traditionally, typing of S. p y o g e n e s has depended upon serological detection of cell wall antigens by a combined system of T- and M-antigen typing and, for opacity factor-positive cultures, the inhibition of the opacity reaction with specific antisera. M-antigen typing provides the greatest discrimination between strains, with at least 74 types recognised (Johnson & Kaplan, 1993). However, M-typing reagents are not available commercially, and maintenance of a comprehensive system is expensive and consequently restricted to a few reference centres. The M proteins are encoded by the e m m gene family, and specificity is conferred by the variable N-terminal domain. Many of these sequences are available in databases (GenBank, EMBL) and can be used to design type-specific oligonucleotides. PCR primers that correspond to conserved parts of the e m m gene and amplify through the N-terminal-encoding sequence of strains of all M-types (Podbielski et al., 1991) can be used to prepare and label large quantities of the type-specific sequence. The S. p y o g e n e s e m m oligotyping methods rely on arrays of these M-type specific probes, bound either to filters, as described above for spoligotyping (Kauflaold et al., 1994), or to 96-well microtitre trays (Saunders et al., 1997). In the microtitre tray method, the biotinylated oligonucleotides are immobilised by interaction with streptavidin coating the wells. In either system, the amplified and labelled e m m gene fragment derived from the test strain is hybridised to each of the probe sequences in the array. Hybridised probe is then detected either by chemiluminescence or colorimetrically, as appropriate. The e m m type of test strains is revealed by positive hybridisation to one of the probes. If no positive reaction is obtained, the strain is considered non-typable by this system. Probes that hybridise to conserved sequences within the amplicon can be included in the array (Saunders et
262 al., 1997). These probes act as controls indicating that emm gene amplification has
occurred in samples that do not hybridise to any of the type-specific probes. E m m gene oligotyping has considerable advantages over either M-antigen serotyping, which is difficult to maintain due to the large number of sera needed, or emm gene sequencing (Beall et al., 1996), which requires more hands-on time. The microtitre tray-based arrays are easy to set up and convenient to use for assays based on up to 12 probes. However, with this number of probes, a significant proportion of strains will remain non-typable. For systems including larger numbers of probes, which are needed for a more comprehensive assay leaving few nontypable strains, the filter-bound arrays are the more promising. D.
Oligotyping: future perspectives and conclusions
The currently available oligotyping schemes have already shown the power and potential of this approach to bacterial typing. The data provided is of high quality and simple to interpret. As high density probe arrays consisting of many thousands of oligonucleotides become more widely available and accessible, it will be possible to design rapid systems for typing strains of any bacterial taxon. It is possible that arrays will be used to perform multilocus sequence typing (MLST; Maiden et al., 1998). MLST relies on comparison of the sequence of parts of housekeeping genes and is therefore highly reproducible, with results being easy to compare between laboratories. The discriminatory ability of MLST depends only on the degree of divergence between the particular genes analysed in the strain collection examined. Greater discrimination can be achieved by selecting more variable genes or by adding genes to the panel. High density probe arrays are generally less accurate for sequencing than conventional methods (Kozal et al., 1996). This is mostly due to the poor ability of probes to identify the sequence when two differences from the consensus occur at the same locus, especially at adjacent nucleotides. However, this limitation should not prove to be a significant drawback for MLST based on conserved bacterial housekeeping genes. The MLST approach is described in more detail in Chapter 12 of this book.
REFERENCES Alland, D., Kalkut, G.E. & Moss, A.R. (1994). Transmission of tuberculosis in New York City. New England Journal of Medicine 330, 1710-1716. Baquar, N., Threllfall, E.J., Rowe, B. & Stanley, J. (1993). Molecular subtyping within a single Salmonella typhimurium phage type, DT204c, with a PCR-generated probe for IS200. FEMS Microbiology Letters 112, 217-222. Beall, B., Facklam, R. & Thompson, T. (1996). Sequencing emm-specific PCR products for routine and accurate typing of group A streptococci. Journal of Clinical Microbiology 34, 953-958. Behr, M.A., Warren, S.A., Salamon, H., Hopewell, P.C., Ponce de Leon, A., Daley, C.L. & Small, P.M. (1999). Transmission of Mycobacterium tuberculosis from patients smear-negative for acid-fast bacilli. Lancet 353, 444-448. Bisercic, M. & Ochman, H. (1993). Natural populations of Escherichia coli and Salmonella typhimurium harbor the same classes of insertion sequences. Genetics 133, 449-454.
263 Borrow, R., Claus, H., Guiver, M., Smart, L., Jones, D.M., Kaczmarski, E.B., Frosch, M. & Fox, A.J. (1997). Non-culture diagnosis and serogroup determination of meningococcal B and C infection by a sialyltransferase (siaD) PCR ELISA. Epidemiology and Infection 118, 111-117. Burger, M., Raskin, S., Brockelt, S.R., Amthor, B., Geiss, H.K. & Haas, W.H. (1998). DNA fingerprinting of Mycobacterium tuberculosis complex culture isolates collected in Brazil and spotted onto filter paper. Journal of Clinical Microbiology 36, 573-576. Buffer, W.R., Haas, W.H. & Crawford, J.T. (1996). Automated DNA fingerprinting analysis of Mycobacterium tuberculosis using fluorescent detection of PCR products. Journal of Clinical Microbiology 34, 1801-1803. Cave, M.D., Eisenach, K.D., McDermott, P.E, Bates, J.H. & Crawford, J.T. (1991). IS6110: conservation of sequence in the Mycobacterium tuberculosis complex and its utilization in DNA fingerprinting. Molecular and Cellular Probes 5, 73-80. Cave, M.D., Eisenach, K.D., Templeton, G., Salfinger, M., Mazurek, G., Bates, J.H. & Crawford, J.T. (1994). Stability of DNA fingerprint pattern produced with IS6110 in strains of Mycobacterium tuberculosis. Journal of Clinical Microbiology 32, 262-266. Chevrel-Dellagi, D., Abderrahman, A., Haltiti, R., Koubaji, H., Gicquel, B. & Dellagi, K. (1993). Large-scale DNA fingerprinting of Mycobacterium tuberculosis strains as a tool for epidemiological studies of tuberculosis. Journal of Clinical Microbiology 31, 2446-2450. Datta, N. & Hughes, V.M. (1983). Plasmids of the same Inc groups in Enterobacteria before and after the medical use of antibiotics. Nature 306, 616-617. Egner, C. & Berg, D.E. (1981). Excision of tranposon Tn5 is dependent on the inverted repeats but not on the transposase function of Tn5. Proceedings of the National Academy of Sciences of the United States of America 78, 459-463. Fang, Z. & Forbes, K.J. (1997). A Mycobacterium tuberculosis IS6110 preferential locus (ipl) for insertion into the genome. Journal of Clinical Microbiology 35, 479-481. Fomukong, N.G. & Dale, J.W. (1993). Transpositional activity of IS986 in Mycobacterium smegmatis. Gene 130, 99-105. Goguet de la Salmoniere, Y.O., Li, H.M., Torrea, G., Bunschoten, A., van Embden, J. & Gicquel, B. (1997). Evaluation of spoligotyping in a study of the transmission of Mycobacterium tuberculosis. Journal of Clinical Microbiology 35, 2210-2214. Goyal, M., Saunders, N.A., van Embden, J.D.A., Young, D.B. & Shaw, R.J. (1997). Differentiation of Mycobacterium tuberculosis isolates by spoligotyping and IS6110 restriction fragment length polymorphism. Journal of Clinical Microbiology 35, 647--651. Haas, W.H., Butler, W.R., Woodley, C.L. & Crawford, J.T. (1993). Mixed-linker polymerase chain reaction: a new method for rapid fingerprinting of isolates of the Mycobacterium tuberculosis complex. Journal of Clinical Microbiology 31, 1293-1298. Hermans, P.W.M., van Soolingen, D., Dale, J.W., Schuitema, A.R.J., McAdam, R.A., Catty, D. & van Embden, J.D.A. (1990). Insertion element IS986 from Mycobacterium tuberculosis: a useful tool for diagnosis and epidemiology of tuberculosis. Journal of Clinical Microbiology 28, 2051-2058. Hermans, P.W.M., van Soolingen, D., Bik, E.M., de Haas, P.E.W., Dale, J.W. & van Embden, J.D.A. (1991). Insertion element IS987 from Mycobacterium bovis BCG is located in a hot-spot integration region for insertion elements in Mycobacterium tuberculosis complex strains. Infection and Immunity 59, 2695-2705. Hunter, P.R. & Gaston, M.A. (1988). Numerical index of the discriminatory ability of typing systems: an application of Simpson's index of diversity. Journal of Clinical Microbiology 26, 2465-2466. Johnson, D.R. & Kaplan, E.L. (1993). A review of the correlation of T-agglutination patterns and M-protein typing and opacity factor production in the identification of group A streptococci. Jounal of Medical Microbiology 38, 311-315. Kamerbeek, J., Schouls, L., Kolk, A., van Agterveld, M., van Soolingen, D., Kuijper, S., Bunschoten, A., Molhuizen, H., Shaw, R., Goyal, M. & van Embden, J. (1997). Simultaneous detection and
264 strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. Journal of Clinical Microbiology 35, 907-914. Kaufhold, A., Podbielski, A., Baumgarten, G., Blakpoel, M., Top, J. & Schouls, L. (1994). Rapid typing of group A streptococci by the use of DNA amplification and non-radioactive allelespecific oligonucleotide probes. FEMS Microbiology Letters 119, 19-26. Kozal, M.J., Shah, N., Shen, N., Yang, R., Fucini, R., Merigan, T.C., Richman, D.D., Hubbell, E., Chee, M. & Gingeras, T.R. (1996). Extensive polymorphisms observed in HIV-1 clade B protease gene using high-density oligonucletide arrays. Nature Medicine 2, 753-758. Lam, S. & Roth, J.R. (1983). IS200: a Salmonella-specific insertion sequence. Cell 34, 951-960. Maiden, M.C., Bygraves, J.A., Feil, E., Morelli, G., Russell, J.E., Urwin, R., Zhang, Q., Zhou, J., Zurth, K., Caugant, D.A., Feavers, I.M. Achtman, M. & Spratt, B.G. (1998). Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proceedings of the National Academy of Sciences of the United States of America 95, 3140-3145. McAdam, R.A., Hermans, P.W.M., van Soolingen, D., Zainuddin, Z.E, Catty, D., van Embden, J.D.A. & Dale, J.W. (1990). Characterization of a Mycobacterium tuberculosis insertion sequence belonging to the IS3 family. Molecular Microbiology 4, 1607-1613. Odaert, M., Berche, P. & Simonet, M. (1996). Molecular typing of Yersinia pseudotuberculosis by using an IS200-1ike element. Journal of Clinical Microbiology 34, 2231-2235. Otal, I., Samper, S., Asensio, M.P., Vitoria, M.A., Rubio, M.C. G6mez-Lus, R. & Martin, C. (1997). Use of a PCR method based on IS6110 polymorphism for typing Mycobacterium tuberculosis strains from BACTEC cultures. Journal of Clinical Microbiology 35, 273-277. Palittapongarnpim, P., Chomyc, S., Fanning, A. & Kunimoto, D. (1993). DNA fingerprinting of Mycobacterium tuberculosis isolates by ligation-mediated polymerase chain reaction. Nucleic Acids Research 21,761-762. Patel, S. (1999). Molecular typing and identification of Mycobacteria. PhD thesis, University of London. Patel, S., Wall, S. & Saunders, N.A. (1996). A hemi-nested inverse PCR for typing of sequences flanking the 5'-end of IS6110 from Mycobacterium tuberculosis strains. Journal of Clinical Microbiology 34, 1686-1690. Pease, A.C., Solas, D., Sullivan, E.J., Cronin, M.T., Holmes, C.P. & Fodor, S.A. (1994). Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proceedings of the National Academy of Sciences of the United States of America 91, 5022-5026. Pelkonen, S., Romppanen, E-L., Siitoen, A. & Pelkonen, J. (1994). Differentiation of Salmonella serovar infantis isolates from human and animal sources by fingerprinting IS200 and 16S rrn loci. Journal of Clinical Microbiology 32, 2128-2133. Podbielski, A., Melzer, B. & Ltitticken, R. (1991). Application of the polymerase chain reaction to study of the M protein(-like) gene family in beta-hemolytic streptococci. Medical Microbiology and Immunology 180, 213-227. Saunders, N.A., Hallas, G., Gaworzewska, E.T., Metherell, L., Efstratiou, A., Hookey, J.V. & George, R.C. (1997). PCR-enzyme-linked immunosorbent assay and sequencing: an alternative to serology for M typing of Streptococcus pyogenes. Journal of Clinical Microbiology 35, 2689-2691. Simonet, M., Riot, N., Fortineau, N. & Berche, P. (1996). Invasin production by Yersinia pestis is abolished by insertion of an IS200-1ike element within the inv gene. Infection and Immunity 64, 375-379. Small, P.M., Hopewell, P.C., Singh, S.P., Paz, A., Parsonnet, J., Ruston, D.C., Schecter, G.E, Daley, C.L. & Schoolnik, G.K. (1994). The epidemiology of tuberculosis in San Francisco: a populationbased study using conventional and molecular methods. New England Journal of Medicine 330, 1703-1709. Stanley, J. & Saunders, N.A. (1996). DNA insertion sequences and the molecular epidemiology of Salmonella and Mycobacterium. Journal of Medical Microbiology 45, 236-251. Stanley, J., Baquar, N. & Threllfall, E.J. (1993). Genotypes and phylogenetic relationships of Salmo-
265 nella typhimurium are defined by molecular fingerprinting of IS200 and 16S rrn loci. Journal of General Microbiology 139, 1133-1140. Stanley, J., Powell, N., Jones, C. & Burnens, A.E (1994). A framework for IS200, 16S rRNA gene and plasmid-profile analysis in Salmonella serogroup D 1. Journal of Medical Microbiology 41, 112-119. Thierry, D., Brisson-Noel, A., Vincent-Levy-Frebault, V., Nguyen, S., Guesdon, J. & Gicquel B. (1990). Characterization of a Mycobacterium tuberculosis insertion sequence, IS6110, and its application in diagnosis. Journal of Clinical Microbiology 28, 2668-2673. van Embden, J.D.A., Cave, M.D., Crawford, J.T., Dale, J.W., Eisenach, K.D., Gicquel, B., Hermans, E, Martin, C., McAdam, R.A., Shinnick, T.M. & Small, EM. (1993). Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. Journal of Clinical Microbiology 31,406-409. van Soolingen, D., Hermans, EW.M., de Haas, EE.W., Soll, D.R. & van Embden, J.D.A. (1991). Occurrence and stability of insertion sequences in Mycobacterium tuberculosis complex strains: evaluation of an insertion sequence-dependent DNA polymorphism as a tool in the epidemiology of tuberculosis. Journal of Clinical Microbiology 29, 2578-2586.
This Page Intentionally Left Blank
267
11 Molecular Characterisation of Microbial Communities Based on 16S rRNA Sequence Diversity Erwin G Z o e t e n d a l
~,2, A n t o o n
D L Akkermans
1 and Willem
M de Vos ~,2
1Laboratory of Microbiology, Wageningen University, Hesselink van Suchtelenweg 4, 6703 CT Wageningen, The Netherlands; 2Wageningen Centre for Food Sciences, PO Box 557, 6700 AN Wageningen, The Netherlands
CONTENTS 11.1 I N T R O D U C T I O N
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
267
. . . . . . . . . . . . . . . . . . . . . . . . . . .
269
11.2 METHODOLOGY A.
B.
C.
D.
Extraction of RNA and DNA (i) Cell lysis and purification of nucleic acids (ii) Quantification of nucleic acids RT-PCR/PCR of 16S rRNA/rDNA (i) Preferential amplification (ii) Quantitative (RT-)PCR Cloning, sequencing and phylogenetic analysis (i) Cloning (ii) Sequence analysis (iii) Tree construction (iv) Phylogeny of 16S rRNA genes Fingerprinting (i)
sscP
(ii) (iii)
DGGE and TGGE Quantitative fingerprint analysis
269 270 272 273 274 275 276 277 277 278 280 281 283 284 288
11.3 OLIGONUCLEOTIDE CHIP TECHNOLOGY 11.4 CONCLUSIONS AND PERSPECTIVES ACKNOWLEDGEMENTS REFERENCES
. . . . . . . . . . . .
290
. . . . . . . . . . . . . . .
291
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
292 292
11.1 INTRODUCTION This decade has shown an impressive development in the application of molecular techniques based on 16S and 23S rRNA genes to study the microbial diversity in ecosystems. Several overviews highlight the possibilities and drawbacks of these molecular approaches in ecology (Amann et al., 1995; Pace, 1997; Wintzingrode O2001 Elsevier Science B.V. All rights reserved.
268 et al., 1997; Head et al., 1998).
Before the rRNA approach, the composition of an ecosystem was investigated by the isolation and physiological characterisation of many microorganisms living in an ecosystem. The microbial composition of mammalian intestines, for example, has been studied extensively by plate count analysis of faecal samples, which usually contain 101~ cfu / g (Moore & Holdeman, 1974; Holdeman et al., 1976; Savage, 1977; Finegold et al., 1983). One of the limitations in using these conventional microbiological methods is that easily cultivable microorganisms are detected, but not those that only grow on specific media, require unknown growth conditions, or have obligate interactions with the host or other microorganisms. Other limitations of cultivation include the selectivity of the media used, the stress imposed by cultivation procedures, and the necessity of strictly anoxic conditions. Estimates of the cultivability of GI tract microorganisms range from 10% to 50%, but may vary considerably between species or genera (McFarlene & Gibson, 1994; Langendijk et al., 1995; Wilson & B litchington, 1996). As a consequence, insight into the function of the microbial community, its interactions with the host, and the influence of environmental factors on the microbial composition is very limited. During the past decade, the rRNA approach has been used to study the microbial ecology of several ecosystems, and its application in ecological studies is still increasing. The first application of this approach in studying GI tract ecology was focused on the detection of Bacteroides vulgatus in faecal samples using a species-specific 16S rRNA-targeted probe (Kuritza & Salyers, 1985). Recently, several populations in the GI tract have been monitored, resulting in quantification of Bacteroides populations by dot blot hybridisation (Dor6 et al., 1998), and analysis of the genetic diversity of cultivable Lactobacillus and Bifidobacterium spp. (McCartney et al., 1996). From the latter study, it was concluded that the microbial composition of lactic acid bacteria in the intestine varies according to each individual. Fluorescent in-situ hybridisation (FISH) has also been used to quantify different phylogenetic groups in human faecal samples (Langendijk et al., 1995; Franks et al., 1998). About two-thirds of the total bacterial community could be counted with the probes used. The polymerase chain reaction (PCR) has been used to quantify specific groups of bacteria in human faeces (Wang et al., 1996a), and random cloning approaches have been used to analyse the microbial diversity of faeces from single individuals (Wilson & Blitchington 1996; Zoetendal et al., 1998; Suau et al., 1999). In one case, this analysis was combined with another powerful approach based on temperature gradient gel electrophoresis (TGGE) analysis of 16S rRNA and rDNA amplicons, resulting in identification of the most prominent and expressed sequences (Zoetendal et al., 1998). In addition, individual differences and temporal changes in the predominant microbial GI tract community could easily be monitored using this approach. TGGE and other fingerprinting techniques, including denaturing gradient gel electrophoresis (DGGE) and single strand conformation polymorphism (SSCP) analysis have been used in different ecosystems to rapidly analyse microbial communities based on sequencespecific separation of 16S rDNA amplicons (reviewed by Muyzer & Smalla, 1998;
269
[Phylogeny T !
- - ~ I~AI
]SequeneingHCloning HRT-PCR[ tFingerprintingt Fig. 11.1.
IDNA! ~
[Phyligeny]
[PiR~'~ Cl~nmg~'~Sequeneingl [Fingerprintingl
Schematicoutlineof molecularapproachesusedto analysemicrobialcommunities.
Muyzer, 1999). This chapter describes the use, benefits and drawbacks associated with the application of these genetic fingerprinting approaches, which are based on the sequence variability of different 16S rRNA and 16S rDNA molecules, to study the microbial composition of different environments, such as the GI tract. 11.2 M E T H O D O L O G Y
To describe the bacterial diversity in communities, molecular approaches based on the sequence variability of the 16S rRNA gene can be used (Fig. 11.1). First, RNA and DNA have to be isolated simultaneously from environmental samples, and used as templates for amplification of fragments of the 16S rRNA gene by reverse transcriptase (RT)-PCR and regular PCR, respectively (see sections A and B, below). Subsequently, the genetic diversity of the amplicons can be analysed using different fingerprinting techniques. Additionally, a clone library of complete 16S rDNA and rRNA sequences can be made and divided into groups of different ribotypes using the same fingerprinting techniques. Cloned fragments of the different ribotypes can be sequenced and analysed phylogenetically (see section C). Comparison of the fingerprinting techniques and the cloning approaches may result in a reliable picture of the relative composition of numerically dominating microbes in a community (see section D). However, the results cannot simply be converted to total numbers of cells and the fingerprints only reflect the actual number of rRNA genes when each product is amplified equally. A.
Extraction of RNA and DNA
When genetic fingerprinting techniques are used to characterise a microbial community, reliable extraction of DNA and RNA is the most critical step in the whole procedure because all further analyses are based on the extracted nucleic acids. Various nucleic acid extraction methods have been developed that can be applied to all kinds of ecosystems (Akkermans et al., 1998). While most of the reported isolation procedures are promoted as rapid, accurate, simple, or universal methods, a general protocol does not exist because all environments have their own characteristics and, as a consequence, require dedicated purification procedures. In gen-
270
Fig. 11.2. Schematicrepresentation of the procedures used to isolate DNA and RNA from a mixture of bacterial cells. eral, procedures for the isolation of nucleic acids from microorganisms or environmental samples consist of three steps that will be discussed below: cell lysis, purification of nucleic acids, and isolation of nucleic acids (Fig. 11.2). (i) Cell lysis and purification of nucleic acids One of the important steps in the extraction of nucleic acids from an environmental sample is the lysis of microbial cells. Equally efficient lysis of all cells in an ecosystem is necessary to obtain a reliable picture of the microbial community. Efficient cell lysis may be hampered by the different cell envelope composition of various microorganisms. Hence, a protocol which is suitable for one species may not necessarily be suitable for another species. Microbial cells can be lysed chemically, enzymatically or mechanically. Various Gram-negative bacteria can be lysed chemically by treatment with detergents, such as sodium dodecyl sulphate (SDS). Disruption of the cell envelope of Gram-positive bacteria by detergents needs prior treatment with enzymes such as lyzozyme, N-acetylmuramidase or other muramidases. Most of these lytic enzymes are restricted to a certain range of microorganisms, because the cell-envelope composition differs for each species (Johnson, 1991). As a consequence, it is very difficult to develop chemical or enzymatic based lysis methods for complex communities. Therefore, procedures which include mechanical cell lysis by French press disruption, freeze/thaw incubations, sonication, or bead beating are preferred (Maniatis et al., 1989; Johnson, 1991; Akkermans, et al., 1998). Bead beating is a widely used method to lyse bacterial communities, sometimes in combination with different chemical, enzymatic, or other mechanical lysis pro-
271 cedures. Bead beating has been shown to be successful in all kinds of samples, varying from soil systems to the mammalian GI tract (Stahl et al., 1988; Harmsen et al., 1995; Felske et al., 1996; Ramirez-Saad et al., 1996; Wilson & Blitchington, 1996; Dor6 et al., 1998; Zoetendal et al., 1998). In this procedure, glass or zirconium beads are added to an environmental sample in a buffered solution and shaken vigorously (3000 to 5000 rpm). The beads collide during this treatment, thereby facilitating the disruption of cells between the beads. Phenol can be added to the sample to prevent enzymatic degradation of the nucleic acids during the bead beating procedure. A disadvantage of this mechanical cell lysis is that nucleic acids are partly sheared, especially when fragile bacteria such as some Gram-negative species are involved. Sheared nucleic acids cannot be used for genetic fingerprinting methods based on intact genomic DNA, such as RAPD and RFLP. Shearing of nucleic acids may also result in a reduced recovery of bacterial DNA or RNA, or increase the formation of chimeric structures during amplification of certain genes (Kopczynsld et al., 1994). Determination of the optimal conditions for efficient cell disruption is therefore very important. The easiest way is to examine the sample under the microscope before and after the lysis procedure (Apajalathi et al., 1998). However, it may be difficult to distinguish between pieces of lysed cells and intact cells. Plate counting analysis of easily cultivable bacteria may give some indication of lysis efficiency when non-detrimental treatments are used during the lysis procedure (Felske et al., 1996). However, this may be difficult or cumbersome, notably when anoxic environments such as the GI tract are analysed. Another way to check for lysis efficiency is to determine the nucleic acid concentration before and after treatment. A disadvantage of this calculation method is that the genome size and ribosome number may vary according to cell and species, making the calculation less reliable. TGGE of 16S rDNA amplicons has been used to check the lysis efficiency of GI tract samples following different periods of bead beating (Zoetendal et al., 1998; see section 11.2.Dii). In this way, shearing of nucleic acids could be minimised by determining the minimal time required for maximal disruption of the cells. It was shown that at least 3 min of bead beating was necessary to lyse a R u m i n o c o c c u s - l i k e species in a faecal sample. Following cell lysis, nucleic acids have to be purified, because most analytical procedures involve enzymes and require relatively pure DNA or RNA. Most proteins, carbohydrates and lipids can be removed using phenol and chloroform extractions (Maniatis et al., 1989). This purification procedure can be enhanced by addition of cetyltrimethylammonium bromide (CTAB), which forms complexes with the nucleic acids. It has been shown that CTAB extraction following bead beating of actinorrhisal nodules facilitated the recovery of DNA (Ramirez-Saad et al., 1996). High molecular size DNA can also be separated from contaminants by CsC1 centrifugation. Additional steps for purification can be added to the protocol, although it should be kept in mind that each additional step results in a decreased yield of extracted nucleic acids. An alternative step is the addition of specific proteins, such as bovine serum albumine or protein gp32 to the DNA sample. These
272 proteins have been shown to enhance the amplification efficiency of template DNA containing PCR-inhibiting compounds (Kreader, 1996). Recently, improved DNA recovery from ancient faecal samples has been reported in which cross-links between reducing sugars and amino groups could be cleaved by adding N-phenacylthiazoliumbromide, thereby allowing for the amplification of DNA sequences (Poinar et al., 1998). Following purification, nucleic acids can be concentrated by polyethyleneglycol, isoamylalcohol or ethanol. Addition of sodium acetate facilitates the precipitation of nucleic acid fragments in ethanol. Instead of precipitation, nucleic acids can also be concentrated by binding to glass fibres or silica materials. Most commercial nucleic acid isolation kits rely on this procedure. When RNA is isolated, all procedures should be performed with special care. RNA is more sensitive to degradation than DNA because its ribose contains a 2' hydroxyl group which makes RNA chemically less stable, especially under alkaline conditions. In addition, the double helix B structure found in DNA cannot be formed by RNA. Besides the chemical differences, RNase is much more stable than DNase, making removal of RNase more difficult and contamination with RNase easier. If possible, every step should be performed at 4~ or on ice, and equipment should be RNase free. Most procedures to isolate DNA and RNA are based on the principle of lysing the cells followed by direct purification of the nucleic acids. Another approach to obtain rRNA is based on the isolation of ribosomes (Felske et al., 1996, 1998c; Fig. 11.2). In this procedure, cells from soil samples are mechanically lysed, followed by the isolation of intact ribosomes by ultracentrifugation. RNA is then subsequently isolated from the ribosome collection and purified. This method, which may also be applicable to other ecosystems such as the GI tract, resulted in a high yield of purified 16S rRNA and 23S rRNA which could be used directly for RT-PCR. A number of studies have reported the extraction of nucleic acids from faecal and rumen samples (Stahl et al., 1988; Wilson & Blitchington, 1996; Apajalathi et al., 1998, Dor6 et al., 1998; Zoetendal et al., 1998). Most protocols are based on bead beating cell lysis in phenol, followed by phenol/chloroform extraction. A different approach has also been described in which bacterial cells from faeces are lysed by boiling in a phosphate-buffered saline (PBS) solution containing 1% Triton X100 (Wang et al., 1996a). Following cell lysis, the solution was used directly as a template for PCR amplification. Amplicons derived from a variety of Gram-positive and Gram-negative species known to be present in faecal samples could be detected following amplification with specific primers. Although the number of strains tested is limited, this fast PCR approach seems to be accurate and may be useful for analysing samples containing a low number of microorganisms. (ii) Quantification of nucleic acids Several methods to visualise and quantify nucleic acids have been developed. DNA fragments separated on agarose or polyacrylamide gels are usually visual-
273 ised with fluorescent dyes, such as ethidium bromide (Maniatis et al., 1989), or by silver staining (Bassam et al., 1991; Sanguinetti et al., 1994). Staining is used to quantify the yield of nucleic acids following addition of a concentration standard. Quantification works fine when RNA and DNA are not sheared. The correlation between the concentration of nucleic acids and the signal of the stain is normally not linear and extrapolation is not possible. Instead of quantification by gel electrophoresis, DNA and RNA can also be quantified by Southern, northern or dot blot hybridisation using a universal probe for a gene of interest. A third way to measure the nucleic acid concentration is to determine the UV light absorption at 260 nm and 280 nm. Additionally, the ratio A260/A280gives an indication of purity. DNA is relatively pure when this ratio is between 1.80 and 2.00. Another less well-known procedure for measuring the DNA and RNA concentration is use of High-Performance Liquid Chromatography (HPLC). HPLC has been used to determine the copy number of plasmids in recombinant yeast or Escherichia coli cells. Chromosomal DNA, plasmid DNA, rRNA and tRNA could be separated using HPLC (Coppella et al., 1987a, b). When the various methods were compared for nucleic acids isolated from marine sediments (Dell'Anno et al., 1998), it was found that the yield of DNA appeared to be similar with spectrometric and HPLC measurements, but was significantly lower when the yield was determined by the fluorescent method. This might be due to the fact that the fluorescent stain is dependent on the structure, the size and the composition of the nucleic acids. Another finding was that RNA and DNA could be separated by HPLC, so that RNA measurement was not biased by DNA and vice versa, which might not be the case for the other two techniques. B.
R T - P C R / P C R o f 16S rRNA/rDNA
In order to gain an insight into the microbial structure in different ecosystems, various methods have been developed based on the nucleic acid sequences of small subunit (ssu) rRNA or rDNA because these molecules are ideal phylogenetic and taxonomic markers (Woese, 1987; Amann et al., 1995). There are various reasons to use rRNA and rDNA genes as markers, including (1) their presence in all cells; (2) their high degree of sequence conservation which facilitates their detection; (3) the presence of highly variable regions in their sequences which makes them useful to discriminate at (sub)species to higher phylogenetic levels; and (4) the presence of databases containing up to 20,000 ssu rRNA sequences (M. Wagner, personal communication) from different taxa that facilitates the phylogenetic characterisation of cultured and uncultured microbes. Moreover, rDNA can be amplified by PCR in vitro (Saiki et al., 1988). The principle of PCR is that cycles of DNA melting, primer annealing and elongation using a thermostable polymerase are repeated, resulting in an exponential increase of amplified genes. In addition, rRNA can also be amplified, but it has first to be converted into DNA by reverse transcription. This can be done by reverse transcriptase using an oligonucleotide primer which targets the RNA template (a
274 Table 11.1. Overview of some general artifacts concerned with (RT-)PCR and some solutions to minimise these artifacts
Factors causing PCR artifacts
Some bias-preventing solutions
Reference
Nucleic acid purity
Additional purification steps Use of BSA or protein gp32
Kreader (1996)
High G+C content of template
Increase denaturing time
Secondary structure of template
Increase denaturing time Use DMSO
Baskaran et al. (1996)
Preferential amplification
Decrease PCR cycles Mix replicate reactions Exclude degenerated primers Use high template concentration
Polz & Cavanough (1998) Polz & Cavanough (1998) Polz & Cavanough (1998) Polz & Cavanough (1998)
Formation of chimeric constructs
Longer elongation time
Wang & Wang (1996)
procedure termed RT-PCR). Although DNA and RNA can be amplified with other techniques (Carrino & Lee, 1995), this section will only focus on RT- and regular PCR. Some important factors which may influence the amplification procedure, notably when mixtures of DNA or RNA from different organisms are amplified, include the purity of DNA, the G+C content of the target, the secondary structure of the target, preferential amplification, and formation of chimeric structures. Several methods minimising these factors have been reported (Table 11.1; Baskaran et al., 1996; Kreader, 1996; Wang & Wang, 1996; Polz & Cavanough, 1998). (i) Preferential amplification Some sequences may be preferentially amplified in a mixture of different sequences from comparable genes. For 16S rDNA it has regularly been reported that variations in primer pairs result in biased amplification when using mixtures of DNA as a template (Reysenbach et al., 1992; Suzuki & Giovannoni, 1996; Wilson & Blitchington, 1996; Polz & Cavanough, 1998). Equal amplification efficiency of 16S rDNA is necessary to get an insight into the microbial composition of an ecosystem. It was suggested that the bias in amplification observed with the canonical universal primers 27F and 1492 (Lane, 1991) can be decreased by (1) decreasing the number of amplification cycles; (2) mixing several replicate PCR amplifications; (3) using high template concentrations; and (4) excluding degenerate primers (Polz & Cavanough, 1998). A disadvantage of high template concentrations might be a high risk of the formation of chimera, consisting of PCR fragments originating from more than one target gene. Chimera formation during the amplification of 16S rDNA from an environmental sample results in an overestimation of the biodiversity. Since the homology between different 16S rRNA genes is relatively high, chimeras are thought to arise from reannealing of different 16S rDNA genes during PCR (Liesack et al., 1991).
275 Multiple competitive PCR and quantitative RT-PCR have been used to test the universal bacterial primers U968-GC and L 1401 when used to amplify 16S rRNA from soil (Ntibel et al., 1996; Felske et al., 1998b). It was found that 16S rDNA clones and bacterial 16S rRNA sequences from different phylogenetic groups were not preferentially amplified, although some target sequences have some minor sequence differences at the annealing sites of the primers (Felske et al., 1998b). It has been shown that there are no differences in the TGGE patterns of DNA amplified for different numbers of cycles with the same primers (Zoetendal et al., 1998). However, it was observed that the primers which were used to amplify complete 16S rDNA were preferentially amplifying Prevotella-like sequences. This was specifically noted when amplified 16S rDNA was reamplified using the primer pair which amplified the V6 to V8 regions. Reamplification of amplicons using another primer pair is called nested PCR. Although primer pairs may show limited preferential amplification, this undesired bias can never be excluded. For example, if target DNA or rRNA from an unknown, uncultured microbe is not amplified during the first PCR cycles then it will stay undetected forever. (ii) Quantitative (RT-)PCR Several studies have described the quantification of microbial 16S rDNA or rRNA amplicons by PCR and RT-PCR, respectively. It should be remembered that quantification of amplicons only reflects the relative number of ribosomes or corresponding genes in a community and not the relative frequency of a species. The number of ribosomes per cell depends on the type and activity of a species. A positive correlation between the activity of a cell and the amount of rRNA has been described (Wagner, 1994). However, it has also been shown for two food-associated pathogens that this correlation was only found under extreme heat conditions (McKillip et al., 1998). Moreover, the number of 16S rRNA genes per genome varies between species. For example, seven 16S rRNA genes were found in E. coli (Bachmann, 1990), five to six genes in Streptococcus spp. (Bentley & Leigh, 1995), and four genes in S. pneumoniae (Bacot & Reeves, 1991). Besides differing copy numbers of 16S rRNA genes per genome, the genome sizes of bacteria are also different. It has been shown that differences in genome size and 16S rDNA copy number influence the ratio of amplicons when mixtures of target DNA from E. coli, Pseudomonas aeruginosa, Bacillus subtilis and Thermus thermophilus are mixed in equal molarities (Farrelly et al., 1995). From this study it was concluded that the number of bacterial cells could not be calculated exactly when both parameters are unknown. Despite the fact that relative cell numbers cannot be extrapolated from (RT-)PCR data, changes in the structure and activity of a microbial community can be analysed when 16S rRNA or rDNA amplicons are quantified. E. coli and P. aeruginosa 16S rRNA genes from mixtures of these bacteria with ratios of 1:100 could be quantified using the Perkin Elmer QPCR TM system 5000 (Blok et al., 1997). Biotinylated PCR products were captured on streptavidin-coated paramagnetic beads after different PCR cycles, and specific PCR products were quantified by measur-
276 ing the electrochemoluminescent signals from the specific reporter probes directed against the different amplicons. Another way of quantifying 16S rRNA genes is a so-called most probable number (MPN)-PCR (Sykes et al., 1992). In this method the target DNA for PCR is diluted to extinction, followed by analysis of PCR products by agarose gel electrophoresis. This approach was used to quantify the relative amounts of 16S rDNA derived from different groups of bacteria in faecal samples using different primer combinations (Wang et al., 1996a). Although this form of multiplex PCR has the potential to be useful, the data have to be analysed carefully, since the PCR conditions may not be quantitative as the primer pairs and product sizes are different. Another approach involves the use of competitive PCR for the quantification of mRNA (Wang et al., 1989). In this method a specific standard of known concentration is added in different amounts to the target. The different sizes of the standard and target allows for differentiation and subsequent quantification following agarose or polyacrylamide gel electrophoresis. The combination of TGGE and competitive RT-PCR resulted in the development of a new quantification method called multiple competitive RT-PCR (Felske et al., 1998b). In this approach, changes in amplification conditions were minimised because the products were amplified with the same primer pairs and had the same size as the added standard. TGGE was used to separate and quantify the different products. It was found that the 20 most abundant sequences, which derived mainly from Gram-positive species of low G+C content, represented about 50% of the total microbial community in the Drentse A grassland soils. C.
Cloning, sequencing and phylogenetic analysis
To get an overview of the complexity of ecosystems, it is essential to classify the individuals from a population. Classification is used to clarify the relationships between different organisms. It has to be emphasised that there is no single unifying classification of organisms. This section will focus on classification based on phylogenetic relationships between organisms. Phylogeny is determined by scoring for the presence or absence of homologous morphological or physiological characteristics across operational taxonomic units (OTUs), which can be populations, species or strains. Both physiological and genetic characteristics can be used for phylogenetic analysis. The principle of phylogenetic analysis is the assumption that all life forms have evolved from a common origin. The common ancestor of two closely related organisms disappeared more recently than that of two more distantly related organisms. It is believed that evolution follows a pattern of successive branchings into populations in which further evolutionary changes subsequently proceed independently. Phylogeny involves the determination and analysis of these branching patterns. Cloning, sequencing and phylogenetic analysis of 16S rDNA sequences have become powerful tools in microbial ecology, particularly since it was discovered that the majority of microorganisms in environmental samples are unknown
277 (Amann et al., 1995). The highly conserved, but discriminative 16S rDNA molecule makes it possible to identify a species in an ecosystem without the use of unreliable culturing methods. Cloning and sequencing of 16S rDNA amplicons has become a standard procedure in molecular ecology and provides information about the genetic diversity and phylogenetic relationships between microorganisms in an ecosystem. Since up to 20,000 different ssu rRNA sequences are available in different databases, the comparison of new sequences is reliable. However, a clone library (a collection of clones from a DNA sample) has to be very large to give a reliable picture of the genetic diversity in complex ecosystems. This makes the approach expensive and time consuming. Therefore, a rapid approach for screening 16S rDNA clone libraries has recently been developed (Mau & Timmis, 1998). Habitat-based probes were designed using subtractive hybridisation. These habitat-based probes were used to screen a 16S rDNA library generated from the same habitat. It was shown that this screening method prevents sequencing many similar or identical clones of the dominant members in sediments.
(i)
Cloning
To construct a clone library, a mixture of 16S rDNA amplicons is first generated by PCR using bacterial or universal primers to amplify 16S rDNA from an environmental sample. A cloning strategy is necessary to sequence individual amplicons derived from the DNA of a complex microbial community. There are different strategies to create a clone library. DNA fragments can be cloned into a plasmid vector or a bacteriophage. Detailed principles, possibilities and procedures for cloning DNA fragments have been described elsewhere (Maniatis et al., 1989; Old & Primrose, 1989). Amplicons are usually cloned into a sequencing vector, which is then transformed into an E. coli strain. Although a great variety of different cloning vectors are available, they all show some common characteristics. One common feature is that amplicons are inserted into a gene with many restriction sites (e.g., the polylinker in that part of the lacZ gene coding for the ~-peptide). The amplicons and the vectors have to be restricted with the same restriction enzyme(s) for cloning. Some vectors contain a 3'T-overhang at the insertion sites, and these are particularly useful for amplicons produced by certain polymerases (e.g., Taq polymerase) which sometimes make 3'A-overhangs (Clark, 1988). Amplicons of 16S rDNA are cloned into the vector with a ligase. After ligation, the vectors containing a single 16S rDNA insert are transformed into a competent E. coli strain. The cells are grown on selective plates and single transformants are screened for the presence of vectors containing a 16S rDNA insert by means of PCR or colony hybridisation with 16S rDNA-specific probes. The vectors containing a 16S rDNA insert can be isolated after regrowing the positive transformants and can then be subjected to sequence analysis or fingerprinting.
(ii)
Sequence analysis
Sequence analysis is used to provide information about the nucleotide sequence of a cloned amplicon. There are different .methods to determine the sequence of
278
Table 11.2. Overview of possibilities for calculating phylogenetic relationships and making a phylogenetic tree Tree construction
Basis for calculation
Refs.
Distance matrix
Each nucleotide change equal Differing substitution rates Differing substitution rates and transition/transversion correction
Jukes & Cantor (1969) Tajima & Nei (1984) Kimura (1980)
Discrete character data
Maximum parsimony Maximum likelihood
Edwards & Cavilla-Sforza (1963) Felsenstein (1981)
a DNA fragment, but these will not be discussed here. Several programs to determine the closest relative of the DNA sequence are available on internet sites. Mostly, these programs use homology searches provided by BLAST (Altschul et al., 1990; http://www.ncbi.nlm, nih. gov/BLAST/) or FASTA (Pearson & Lipman, 1988; http://biogate.mlg.co.jp/tssfree/Fasta.html). The benefit of these programs is that the search for homology is fast and reliable, and several DNA databases can be used for comparison. When DNA sequences are compared, the alignment of sequences for highest homology is a crucial step. The alignment of sequences is performed by giving homologous or conserved parts the same nucleotide position. The variable regions in between are compared in such a way that the highest homology is found. Gaps in a nucleotide sequence are also included in the alignment, but the number of gaps should be minimised. Each position in a sequence can be one of the four nucleotide bases or a gap. This alignment of nucleotide sequences is necessary in order to construct phylogenetic trees and to develop oligonucleotide probes (Lane, 1991; Stahl & Amann, 1991; see below).
(iii) Tree construction It is difficult to visualise phylogenetic relationships between species from numerical values based on multiple pairwise comparisons, particularly when many different sequences are compared. An alternative way to visualise phylogenetic relationships is by the construction of a phylogenetic tree based on the identity values. The calculations for construction of phylogenetic trees can be handled in two ways: by distance matrix or from discrete character data (Table 11.2). In the first calculation, data based on evolutionary distances are set in a distance matrix. Most calculation methods do not weight each nucleotide mutation equally. The DNA structure plays an important role in the calculation procedures. It has been postulated that transversions are more easily recognised by the DNA repair system than are transitions because of the spherical DNA helix distortions (Kimura, 1980). These changes are therefore considered to be less frequent and result in a lower substitution rate, which can be taken into account when calculating distance values. Another example of differences in substitution rates is postulated for proteincoding genes. Substitution rates of the third nucleotide position in a triplet coding
279 for an amino acid are usually higher than in the other two nucleotide positions (Shoemaker & Fitch, 1989). A more complex feature is the formation of gaps. The cost of introducing a gap in an alignment is generally higher than the introduction of a base substitution. Although the introduction of gaps is necessary to align sequences, it is often omitted from distance calculations because it is difficult to verify how the gap has originated. The most frequently used distance calculation models are those developed by Jukes & Cantor (1969), Kimura (1980), and Tajima & Nei (1984). The Jukes and Cantor model does not discriminate between different nucleotide substitutions, in contrast to that of Tajima and Nei which, however, does not correct for nucleotide transitions or transversions, as does the model of Kimura. The model of Jukes and Cantor has probably been applied most frequently in evolutionary studies because it performed well in most studies simulating the evolution of nucleic acid sequences (van de Peer & de Wachter, 1995). Phylogenetic trees can be plotted from distance matrices. Commonly used models which calculate distance trees are the unweighted pair group method using arithmetic averages (UPGMA; Sokal & Michener, 1958) and neighbour joining method (Saitou & Nei, 1987). UPGMA is a clustering method which pairs the least distant sequences into a node, and subsequently pairs two nodes into a new node. The neighbour joining method uses a simplified algorithm to calculate branch lengths and tree topologies. Discrete character data calculations are not based on evolutionary distances, but consider each character state of the nucleotide position in the sequence separately. Trees can be constructed from each nucleotide position. The data can be handled in two ways. The first way is based on the maximum parsimony principle in which the true tree is the one which requires the fewest number of mutational changes to explain the differences observed between the gene sequences (Edwards & Cavilla-Sforza, 1963). Only so called 'informative nucleotides' (a common nucleotide position in a set of sequence positions which favour only some of all possible trees) are used. In general, this means that a constant base in all sequences and a variable base which does not favour one tree over all others are not informative. The second way to handle discrete character data is called the maximum likelihood phylogeny. This calculation uses statistical models to calculate the probability that one sequence is converted into another sequence by mutation over time (Felsenstein, 1981). More detailed explanations and comparison of the methods have been described extensively elsewhere (Sneath, 1989; Priest & Austin, 1993; Wolters, 1998). It has to be realised that evolutionary events cannot be checked for and that phylogenetic trees therefore only represent a systematic ordering of genes. Furthermore, calculations based on different DNA sequences or different genes may result in completely different trees. As a consequence, it is difficult to choose which treeconstructing approach is most optimal. For 16S rDNA sequences from cultured and uncultured Frankia strains, it was found that trees constructed by methods based on discrete character data or distance matrices were roughly the same (Wol-
280 ters, 1998). The choice of the program might therefore depend on the speed, ease and possibilities of the different programs and on the applications of the user. A comparison of different methods can be used to demonstrate the robustness of the phylogenetic tree generated.
(iv) Phylogeny of 16S rRNA genes Phylogeny based on 16S and 23S rRNA analysis has led to the construction of phylogenetic trees which illustrate the evolutionary relationship between different organisms. This has resulted in a division of all life into three main domains: Archaea, Bacteria and Eucarya (Woese, 1987; Woese et al., 1990). The increasing number of 16S rDNA sequences of bacterial isolates has allowed phylogenetic analysis of 16S rDNA to be applied to bacterial taxonomy. The threshold for species determination is set at 70% DNA-DNA hybridisation between the genomes of different strains (Wayne et al., 1987). Strains showing values above this threshold are considered to be the same species and this threshold has been translated into a 16S rRNA value (Stackebrandt & Goebel, 1994). It was estimated that strains with less than 97% 16S rDNA homology have less than 70% DNA-DNA hybridisation values. This threshold can be used to determine whether two strains do not belong to the same species, but cannot be used as the only characteristic for species determination. Indeed, some Bacillus spp. have less than 70% DNA-DNA hybridisation, but more than 99.5% 16S rRNA homology (Fox et al., 1992). Sometimes traditional taxonomic methods can be compared to the 16S rRNA phylogeny. Some genera in the GI tract that have been characterised physiologically (e.g., bifidobacteria) form a monophyletic group in the 16S rRNA tree. However, many bacterial genera in the GI tract do not form monophyletic clusters in the 16S rRNA phylogenetic tree. In particular, the genera Clostridium and Eubacterium are mixed and divide into different distinguishable clusters (Collins et al., 1994). Other genera like Ruminococcus and Butyrivibrio are mixed in these clusters, making identification quite difficult. Additionally, it was found that the 16S rRNA sequences of two strains, identified as Fusobacterium prausnitzii by physiological characteristics, were not phylogenetically related to other Fusobacterium strains, but grouped in one of the Clostridium clusters (Wang et al., 1996b). In such cases, physiological characteristics cannot be translated from phylogenetic characters when the species are not closely related. The use of 16S rRNA sequence analysis in taxonomy has also resulted in proposals for renaming bacterial species. Based on their 16S rRNA sequences and their physiological characteristics, it was proposed to redesignate Peptostreptococcus productus and Streptococcus hansenii as Ruminococcus species (Ezaki et al., 1994). In conclusion, it seems that phylogenetic and physiological analysis are both necessary for a reliable identification. Following the increase in the number of DNA sequences, databases such as EMBL and Genbank have become available on intemet sites. These databases contain up to 20,000 ssu rRNA sequences and are suitable for comparing, downloading and the deposition of sequences. The most commonly used database for
281 rRNA sequence analysis is that of the Ribosomal Database Project (RDP; Maidak et al., 1997; http://www.cme.msu.edu/RDP/analysis.html). The RDP contains an aligned database of 16S rRNA sequences, which are present in a phylogenetic tree. Another software package containing a rRNA database is the ARB software package (Strunk & Ludwig, 1995). This program is comparable to the RDP, but the package needs a powerful computer for calculations. The secondary structure of the rRNA molecule is used for the similarity calculation and is visualised in the alignment program of the ARB package. This facilitates sequence analysis and is ideal when checking for sequencing errors. The phylogenetic trees in both programs are comparable. 16S rRNA databases are not only used for strain identification, but are also used to study the bacterial diversity in an ecosystem. The development of molecular methods in microbial ecology has resulted in an increasing number of sequences from cloned amplicons derived from different types of environments. Several papers on intestinal samples from different types of animals have shown that many of the cloned 16S rDNA sequences show identity below the 97% threshold to their closest culturable relative in the DNA databases (Ohkuma & Kudo, 1996; Wilson & Blitchington, 1996; Whitford et al., 1998; Zoetendal et al., 1998, Suau et al., 1999). This means that the species from which the sequences have been derived have not yet been cultured or, alternatively, are present in a culture collection but their 16S rDNA has not yet been sequenced. Fig. 11.3 shows a phylogenetic tree built from 16S rDNA sequences of bacterial clones from human faeces and their closest cultured relatives found in the 16S rDNA database of the ARB software package. These results reinforce the concept that most bacteria in existence have not yet been cultured (Amann et al., 1995). For the GI tract, this means that our knowledge about the role of the microbial community in the intestine is limited. Therefore, cloning and sequencing of faecal or intestinal clones is needed to determine the microbial diversity and to study the structure of the community in the GI tract. D.
Fingerprinting
Sequence analysis of 16S rDNA/rRNA clone libraries gives reliable information about the genetic diversity of an ecosystem. However, this approach is expensive, time-consuming and not suitable for monitoring complex ecosystems. It will be necessary to study complex ecosystems using alternative methods that are better suited for studying the composition and temporal variation in ecosystems, probably based on sequence differences of the nucleic acids. Fingerprinting techniques are suitable to describe the genetic diversity at different levels of a microbial community. There are many types of fingerprinting techniques which can be useful at the community, species and even strain level. The next section describes SSCP, DGGE and TGGE fingerprinting techniques which are commonly used to study the microbial diversity of different ecosystems. These techniques are based on differences in the nucleotide sequence of amplicons of similar size and are suitable
282 Prevotella otis ~Atdh//aFeCal clone A09 Prevotella veroralis Adhufec 94 Adhufec 235 ufec 43 oulora ~ i A d h u fee 23 Bacteroides thetaiotaomieron cteroides ovatus Adhufec 355 Adhufec 51 Bacteroides caccae ides uniformis Adhufec 153 [~ Bacteroides stercoris ~ Adhufec 303 f Adhufec 27 [ r~ t Bacteroid . . . . lgatus I~ ~ Adhufec 367 Adhufec 151 [- Adhufec 55 Bacteroides merdae I ~- BacteroideSAdhuf29 ec lsplanchnicus Adhufec 84 ~- Adhufee 73 Bacteroides putredinis Erysipelothrix rhusiopathiae [ Adhufec 279 - Adhufec 202 - - ~ A d h Butyrivibrio ~dhufec 406crossotus Adhufec 363 ufec 85 ~- Eubacterium eligens d------ Butyrivibrio fibrisolvens [ L - - F Eubacteri . . . . . . lus ~ ~ Adhufec 310
1[ II II
II II II
I1 ~- mdhufec 8 It_~ ~ Eubaete~ . . . . . tale d ~ Fecal clone A13
[I
~~
Fecal clone A22
II ~ - Roseburia cecicola I I d ~ - A~nfec 150 II d ~- AdhnfecE25
I t--I - r~ ~
[[ [[
[I
Fecal clone A11 Adhufec 68 ~- Fecal clone A21 r - - ~ Fecal clone A54
d
[I
~- adhufec El7
d ~
Adhufec 250
I I d ~ - - - Adhufec 382 II [[ f--- Fecalclone AlE II l[-~ I1~ ~
mdhufec405 mdhufec 295 Clo_stridi. . . . l. . . . . . . . . . Ruminococcus obeum
Iq ~
I] [ II I
I I
I[ [~
~
I
[-- Fecal clone A14 ~- Fecal clone A20 Adhufec 171 ~ Fecal clone A57
~~u~::
,n . . . . . . .
/
[[
~o~
....
uec 2 Adhufec 40 Fecal clone A71 Adhufec 420 Eubacterium formicigenerans ~ Eabacterium ventriosam
I ~ _ _ l ~ a , ~ f e c 24
r~ ~ Adhufec 335 ] ~ Adhufec 25 1 ' Eabacteriam hadrum Adhufec 236 I ~ Fecal clone A07 Eubacterium halii ~ _ _ ~ Fecal clone A19 Adhufec 157 Coprococcus eutactus - Adhufec 57 Fecal clone A 10 Adhufec 113 ufec 365 I~- Adhufec 218 t__ Fusobacteriam prausnitzii I ~ Fecal clone A03 ~ Adhufec !3 . R;~tufe,cC57. . . . llidas
_ _
I[ r-~ ~-- Ruminococcus albus I I_] ~
] [ _ _ [ [
[ I [ I
[
adhufec lO1
I ~ Eabacterium siraeum ~R)m~UfeC Slu b ii ~---~ AffhT;n:21 [q Clostridiam leptam ' Adhufec 168 r--q Adhufec 269 I I r-q---- Eubacteriumplautii I L____[ ~-- Fecal clone A27 ~ ' Clostridium viride IL~--- gdhufec 108 1 ~ Adhufec 311 i Eubacterium desmolans Adhufec 296 { _ _ Phascolarctobacterium faecium Adhufec 395
283 for describing ecosystems at the species level. The sequence-specific separation of PCR amplicons is an essential element, but differs between the techniques. SSCP relies on the secondary structures of the single strands, while the other techniques rely on the melting behaviour of the double stranded amplicon. (i)
SSCP
SSCP is an electrophoretic technique which has been developed for the detection of mutations in genes and has been used widely in the field of human genetics (Ofita et al., 1989; Hayashi, 1991). The principle of SSCP is that the mobility of a single-stranded DNA fragment is dependent on the secondary structure of the fragment. The secondary structure is determined by the nucleotide sequence and the physiological environment (e.g., temperature, pH and ionic strength). SSCP has been shown to be able to detect single base differences in 99% of amplicons which are up to 300 bases in size. This detection limit drops using longer fragments (Hayashi, 1991; Hayashi & Yandell, 1993). A typical SSCP profile consists of two single-stranded DNA fragments and one double-stranded fragment, although different conformations from one strand are also possible. This technique has only been used occasionally in ecological studies. SSCP of the 16S-23S rRNA spacer has been used to analyse mixtures of bacteria (Scheinert et al., 1996), and SSCP of different regions in the rRNA operon has been used to differentiate between rootassociated fungi (Clapp, 1999). SSCP has been used to analyse microbial communities in a few studies. For example, the V3 region of the 16S rRNA gene was used for SSCP fingerprinting of bacterial strains and environmental samples (Lee et al., 1996). The problem of bands caused by heteroduplex formation in mixed DNA samples could be solved by removing glycerol from the gel, but this removal resulted in a lower separation efficiency of the single strands. It has been reported that the bands in the profiles of the environmental samples did not correspond to bands in the profiles of those bacteria that could be cultivated. The 16S rDNA sequence of a bacterium making up about 1.5% of a community could be visualised with this technique. Recently, a new approach for SSCP analysis was reported (Schwieger & Tebbe, 1998). In this study, amplicons containing the V4 to V5 region were used. One of the primers was phosphorylated at the 5' end. After amplification the phosphorylated strand could be digested selectively with lambda exonuclease. Using this technique, the number of bands per species could be minimised and heteroduplex formation in mixed DNA samples could be prevented. Clear banding patterns Fig. 11.3. Phylogenetictree showing the phylogeneticrelationships between cloned 16S rDNA from
faeces and the closest culturable relatives found in 16S rDNA databases. Complete and partial 16S rDNA sequences from faecal samples were added to the phylogenetic tree of the ARB software. Sequences called faecal clone A03-A71 and adhufec 8-420 were retrieved from Zoetendal et al. (1998) and Suau et al. (1999) respectively.These sequences and the sequences from the closest cultivable relatives were marked, and the remaining sequences were removed from the tree. Bold-marked sequences represent the closest cultivable relative found in the database; the bar indicates the calculated genetic distance between the sequences.
284 could be obtained from environmental samples. (ii)
D G G E and T G G E
The separation of 16S rDNA amplicons is based either on a linear gradient of denaturants, at constant temperature in the case of DGGE (Fischer & Lerman, 1979), or on a linear temperature gradient parallel to the running direction in the case of TGGE (Rosenbaum & Riesner, 1987), or on increasing temperature with time in the case of TTGE (also called TSGE; Yoshino et al., 1991). These fingerprinting techniques are frequently used to study the microbial diversity of different ecosystems. At the time of writing, TTGE has been used occasionally in ecological studies, but there are no published articles describing its use. In DGGE, TGGE and TTGE, amplicons of the same length with different nucleotide sequences can be separated on polyacrylamide gels containing denaturants (urea and formamide). During electrophoresis the amplicons start to melt in so-called melting domains with identical melting behaviour. The size of these domains may vary between 50 and 300 bp (Myers et al., 1987). In this way, the electric mobility of amplicons which contain the double helix structure and disordered single-stranded regions drops. Sequence variations within such domains causes the different melting behavior of the amplicons. Attachment of a GC-clamp makes it possible that sequence variations in the most stable domains can also be separated (Myers et al., 1985; Sheffield et al., 1985). This GC-clamp is a G + C rich domain which is attached to the amplicons by adding it at the 5' end of one of the primers, and which prevents complete melting of the amplicons. In principle, all single base differences at each position of the amplicons can be separated for amplicons of up to 500 bp (Sheffield et al., 1985). The final position of the amplicons in the gel depends on the melting behaviour of the amplicons (and, therefore, the nucleotide sequence) and the running time. A simplified representation of DGGE and TGGE analysis of amplicons is shown in Fig. 11.4. The introduction of DGGE into ecological studies was originally designed to separate amplified V3 regions of 16S rDNA from marine ecosystems. Amplicons derived from sulphate-reducing bacteria could be detected after blotting the DGGE profiles with a specific probe (Muyzer et al., 1993). Following this study, the application of these techniques to ecological studies increased enormously. Different ecosystems have been analysed by separation of different amplified regions from 16S rDNA and 16S rRNA using these fingerprinting techniques. These techniques have not only been used for analysing the composition and stability of different ecosystems, but have also been used for comparing DNA extraction protocols, screening of clone libraries, determining 16S rRNA sequence heterogeneities, monitoring enrichment and isolation procedures, and determining biases introduced by PCR and cloning. Recent overviews of the use of these and other methods for studying different ecosystems are available (Muyzer & Smalla, "1998; Muyzer, 1999). To increase the separation efficiency, an optimal gradient has to be chosen.
285
Fig. 11.4. Schematic representation of a polyacrylamide gel which explains the principles of DGGE
and TGGE. This can be done by applying the gradient perpendicular to the running direction (Muyzer et al., 1993). For TTGE, the optimal temperature gradient has to be calculated from known sequences. Amplicons with only one nucleotide difference can be separated when an optimal gradient is applied (Myers et al., 1985; Ntibel et al., 1996). Additionally, it has been shown that a wobble base (either C or T) in the reverse primer may result in two distinct bands (Kowalchuk et al., 1997). The opposite, however, also takes place. Sometimes, 16S rDNA amplicons cannot be separated although they differ in nucleotide sequence (Buchholz-Cleven et al., 1997; Vallaeys et al., 1997). TGGE and DGGE of 16S rDNA and rRNA amplicons have been used to describe the microbial composition of several ecosystems. In these studies, different universal primer pairs have been used to describe dominant populations. An MPN (RT-)PCR can be used to check if the dominant community is visualised on TGGE (Fig. 11.5). In general, 16S rDNA, but also 16S rRNA, is used as a target for analysing microbial diversity (Fig. 11.5). Profiles derived from 16S rRNA represent the relative number of different ribosomes in an environmental sample, which reflects the active fraction of a community. Comparing rRNA- and rDNA-derived amplicons may give information about the activity in the microbial community of a certain group (Felske et al., 1996; Teske et al., 1996; Zoetendal et al., 1998), but it has to be realised that the number of ribosomes and rRNA genes may differ per species. Several studies have shown that sequences derived from a bacterium
286
Fig. 11.5. TGGE profiles of amplified V6 to V8 regions of 16S rRNA from a faecal sample using the MPN RT-PCR approach. 10-x represents the dilution of template RNA used for amplification.
which makes up about 1% of a microbial community can still be visualised using TGGE and DGGE (Muyzer et al., 1993; Murray et al., 1996; Zoetendal et al., 1998), which is similar to the sensitivity of SSCP analysis. Instead of using universal primers, group- or species-specific primers can be used to focus on particular groups. The genetic diversity of uncultured ammonia-oxidising bacteria (Kowalchuk et al., 1997) and cyanobacteria (Ntibel et al., 1997) has been studied using specific 16S rDNA primers for both groups. DGGE and TGGE have also been used to describe the expression of functional genes such as the [NiFe] hydrogenases from Desulfovibrio populations (Wawer et al., 1997). The combination of D/TGGE analysis of 16S rDNA and a functional gene may be used to study relationships between the structures and functions of ecosystems. TGGE and DGGE analysis of amplicons is semi-quantitive, i.e., an intensive band is more abundant than a weak band in a profile. When an appropriate stand-
287 ard template of known concentration is added to the nucleic acids extracted from a microbial community, different genes or ribosome fractions can be quantified. Bands for which the intensity is identical to the intensity of the standard can be quantified. This approach is called multiple competitive (RT-)PCR (Felske et al., 1998b). Different ribosome fractions from Drentse A soils could be quantified this way. Equal amplification of different ribosomes is necessary for quantification. In case of the Drentse A soils, the primers used did not preferentially amplify specific cloned 16S rDNA amplicons or ribosomes from strains of different phylogenetic clusters, although the primers did not match 100% to the target of any of the strains and clones tested. However, preferential amplification cannot be ruled out completely because species can always be missed during amplification and will therefore not be analysed. TGGE and DGGE analysis of amplicons is a quick and reliable method for studying the dynamics of ecosystems, but the identification of single bands in the profiles is very time-consuming. Identification can be done by cutting out the bands in a profile followed by reamplification and sequence analysis. This approach has been applied successfully to ethidium bromide-stained gels (Muyzer & de Waal, 1994) and silver-stained gels (Ramfrez-Saad, 1999; E Schut, personal communication), but the disadvantage of this method is that a maximum of 500 bp can be used for sequence analysis. Identification can also be done by screening a clone library for dominant band positions, followed by sequence analysis. In this way, complete sequences could be retrieved, thereby making the phylogenetic analysis more reliable. This approach has been introduced by Felske et al. (1997; 1998a). The identity of the bands can be checked by sequencing more clones with identical motility or by blotting the profiles and using specific probes (Muyzer et al., 1993; Felske et al., 1997; Kowalchuk et al., 1997). Despite the increasing number of applications in molecular ecology, only a few studies have been performed with GI tract ecosystems. DGGE analysis of the V3 regions of 16S rDNA was used to study the role of uncultured bacteria in pre-term infants with and without necrotising enterocolitis (NEC) (Millar et al., 1996). It was found that the number of uncultured bacteria in faecal samples from children with NEC was not more frequent than in faecal samples from children without NEC. TGGE based on the V6 to V8 regions of amplified 16S rDNA and 16S rRNA has been used to study the bacterial composition of different faecal samples (Zoetendal et al., 1998). This study showed that each adult individual has his own faecal microbiota, which is relatively stable over time. Only a few amplicons were shared by all faecal samples. It was found that the faecal community in one person remained stable for almost 2 years (Fig. 11.6). A band corresponding to a cloned Fusobacterium prausnitzii-like ribotype increased slightly over this period. Furthermore, it was found that most dominant amplicons in an individual's profile derived from species that have not been cultured. Recently, the microbial community in the porcine GI tract has been studied using DGGE analysis of the V3 region of 16S rDNA. It was observed that unique bands were found in the fingerprints of faecal samples from pigs differing in age, and that the profiles were most similar
288
Fig. 11.6. DGGE profiles of the V6 to V8 regions of 16S rDNA from different faecal samples of one individual taken over a period of 23 months. A band originating from a Fusobacterium prausnitziilike ribotype which increased over time is indicated. within a single GI tract compartment and between adjacent ones (Simpson et al., 1999).
(iii) Quantitative fingerprint analysis The use of TGGE and DGGE to study complex ecosystems can be enhanced by quantifying profile similarities. Computer analysis of scanned fingerprints can be used to calculate similarity indices between fingerprints. These indices can be used to determine the stability of microbial communities or to monitor the effect of certain conditions which may change the composition of a microbial community. The calculation of similarity indices of DGGE profiles has been used to monitor the spatial and seasonal diversity of antarctic picoplankton assemblages (Murray et al., 1998). A similar approach was used for samples from the porcine GI tract (Simpson et al., 1999). The highest similarity indices were found within a single
289
B 0 0
A
2
4
100.0
A
50.4 100.0
2
33.5
42.7 100.0
4
62.1
37.5
73.3100.0
Fig. 11.7. DGGE profiles (A) of the V6-V8 regions of 16S rDNA, and (B) a similarity matrix expressed in Pearson correlations (x 100) of the DGGE profiles from faecal samples of one individual taken before (0), during (A), and after (2 and 4 months) treatment with doxycycline for 1 week. Lane M contains a marker consisting of cloned V6-V8 amplicons.
compartment and between adjacent compartments, indicating that the microbial communities were quite similar in these compartments.
290 Fig. 11.7 illustrates how densitometric curves of DGGE profiles can be used with the Molecular Analyst software (BioRad) to quantify the effect of treatment for 1 week with the antibiotic doxycycline on the dominant microbial community in faeces. A relatively stable microbial community was recovered 2 months after the treatment. The matrix illustrates these changes in a quantitative way and shows the high similarity between the faecal samples taken after the antibiotic treatment, thereby indicating the recovery of a stable microbial community. However, the community structures before and after treatment were not identical. These examples illustrate that the use of quantitative DGGE analysis is a reliable method to monitor changes in microbial communities and should be preferred above the subjective comparison made by eye. Another method for quantifying DGGE profiles has recently been published (Ntibel et al., 1999). The Shannon-Weaver indices (which are the most common diversity indices and richness estimates of DGGE profiles) and two other cultivation-independent methods were used to quantify the microbial diversity and richness in different hypersaline microbial mats. A similar approach was used to study the effect of chlorobenzoates on the microbial community in soil (Ramfrez-Saad, 1999). It was clearly demonstrated that the genetic diversity in the contaminated soils decreased significantly. In the case of the GI tract, the role of the host, food and antibiotics on the bacterial composition can be quantified by the methods described above. This fingerprinting analysis, in combination with multiple competitive PCR, quantitative profile comparison, or the calculation of diversity indices, can be used to determine and quantify changes in microbial composition caused by exposure to antibiotics in intestinal samples. 11.3 OLIGONUCLEOTIDE CHIP TECHNOLOGY One of the new approaches that can be used to analyse environmental samples is the application of oligonucleotide microchips or microarrays (reveiwed by Lemieux et al., 1998; Schena et al., 1998). Microchips consist of oligonucleotides which are immobilised in a polyacrylamide gel matrix bound to a glass slide. Labelled target DNA or RNA can be added to the microchip, and the subsequent hybridisation signal can be detected and quantified using a computer-regulated camera connected to the microscope. Microarrays consist of numerous cloned or amplified DNA fragments rather than synthesised oligonucleotide probes, but the principle of microarrays is the same as that of microchips. Oligonucleotide microchips and microarrays have already been used several times for nucleotide sequencing. This sequencing by hybridisation (SBH) has been proposed as an alternative technique for genome sequencing (Drmanac et al., 1993; Broude et al., 1994; Yershov et al., 1996). Microarrays have also been used to study the expression of certain genes, and also nucleotide variability between genes (Schena et al., 1995; 1996; Chee et al., 1996). Recently, microarrays have been used to identify unique DNA regions present in a pathogenic strain of Pseudomonas aeruginosa which appeared to be missing
291 in a control strain (Bangera et al., 1999). The use of microchips could eventually be relatively cheap, because the concentration of probes per chip is very low, the information per analysis is high, and the chips can be used 20 to 30 times (Guschin et al., 1997). The use of microarrays could be even cheaper because cloned amplicons are produced more economically than oligonucleotide probes. The application of these microchips or microarrays to answer microbial ecological questions looks promising. Environmental samples can be screened on microchips containing hundreds or thousands of 16S rRNA targeted probes, or on microarrays containing many cloned 16S rDNA fragments. However, one of the difficulties in using these approaches is the optimisation of the hybridisation conditions for the different immobilised DNA fragments (Kelly et al., 1999). Recently, it has been reported that nitrifying bacteria and Bacillus spp. could be detected and identified at the rRNA level using oligonucleotide microchips (Guschin et al., 1997; Kelly et al., 1999). These approaches look promising for wider applications in microbial ecology. In the near future it might be possible to monitor expression of ribosomal and functional genes of an ecosystem with a single microchip or microarray. 11.4 CONCLUSIONS AND PERSPECTIVES The application of TGGE, DGGE, and SSCP to studies in microbial ecology is growing and the future perspectives are promising. The combination of (RT-) PCR, cloning and fingerprinting of environmental samples may give an accurate description of a community. Despite some pitfalls concerned with biases in nucleic acid extraction and amplification methods, these techniques have been shown to be useful for describing the microbial composition of different ecosystems. The power of these techniques is their reliability, speed and ease of use. TGGE, DGGE and SSCP are ideal for studying the temporal and spatial variation in microbial communities, both qualitatively and quantitatively. The only time-consuming aspect is the identification of specific amplicons in the profiles. The approach of combining 16S rDNA profiles and profiles of functional genes may enable the structure to be related to the function of an ecosystem. Another variant of this approach is the use of oligonucleotide microchips. Extensive data can be obtained from a single analysis and may be quantitative, although the different T m values might be a problem in quantification. In studying the role of the microbial community in the intestine of man and other animals, the approaches described in this chapter should be applied instead of unreliable plate counting analysis to describe the microbial composition. The fingerprinting approach has already demonstrated that the dominant microbial community in adults is quite stable with time and differs for each individual. This approach can also be used to monitor the fate of certain bacteria, such as probiotic strains in the intestine (de Vos et al., 1999). The impact of these strains on the microbial community in the GI tract can be analysed by quantifying similarities between the profiles, or calculating microbial diversity and richness indices from profiles. Changes in band positions can be identified using cloning and sequenc-
292 ing analysis. The introduction of internal standards to the profiles may help the changes to be monitored quantitatively. These approaches will definitely help in gaining an understanding of some aspects of the microbial community in the intestine. It is evident that the use of PCR-based fingerprinting techniques is useful in answering ecological questions. Although the use of these techniques is still in development, their application has already been shown to be a powerful tool for determining the structure of microbial communities in different environments and monitoring changes in microbial communities without unreliable cultivation procedures. ACKNOWLEDGEMENTS
Part of this work was supported by The Wageningen Centre for Food Sciences, the EU FAIR CT-97-3035, the PROBDEMO-Project Fair CT96-1028, and the VTT Biotechnology and Food Research. REFERENCES Akkermans, A.D.L., van Elsas, J.D. & de Bruijn EJ., eds. (1999). Molecular microbial ecology manual. Kluwer, Dordrecht. Altschul, S.E, Gish, W., Miller, W., Meyers, E.W. & Lipman, D.J. (1990). Basic local alignment search tool. Journal of Molecular Biology 215, 403-410. Amann, R.I., Ludwig, W. & Schleifer, K-H. (1995). Phylogenetic identification and in situ detection of individual cells without cultivation. Microbiological Reviews 59, 143-169. Apajalathi, J.H., S~rkilahti, L.K., M~iki, B.R.E., Heikkinen, J.E, Nurminen, EH. & Holben, W.E. (1998). Effective recovery of bacterial DNA and percent-guanine-plus-cytosine based analysis of community structure in the gastrointestinal tract of broiler chickens. Applied and Environmental Microbiology 64, 4084-4088. Bachmann, B.J. (1990). Linkage map of Escherichia coli K-12, edition 8. Microbiological Reviews 54, 130-197. Bacot, C.M. & Reeves, R.H. (1991). Novel tRNA gene organization in the 16S-23S intergenic spacer of the Streptococcus pneumoniae rRNA gene cluster. Journal of Bacteriology 173, 4234--4236. Bangera, M.G., Norris, A., Lorry, S. & Olsen, M. (1999). Pathogenicity determinants in Pseudomonas aeruginosa associated with cystic fibrosis. In Abstracts of the 99th general meeting of the American Society for Microbiology, p.32. American Society for Microbiology, Washington DC. Bassam, B.J., Caetano-Anoll6s, G. & Gresshoff, EM. (1991). Fast and rapid silver staining of DNA in polyacrylamide gels. Analytical Biochemistry 196, 80-83. Baskaran, N., Kandpal, R.E, Bhargava, A.K., Glynn, M.W., Bale, A. & Weissman, S.M. (1996). Uniform amplification of a mixture of deoxyribonucleic acids with varying GC content. Genome Research 6, 663-668. Bentley, R.W. & Leigh, J.A. (1995). Determination of 16S ribosomal RNA gene copy number in Streptococcus uberis, S. agalactiae, S. dysgalactiae and S. pauberis. FEMS Immunology and Medical Microbiology 12, 1-8. Blok, H.J., Gohlke, A.M. & Akkermans, A.D.L. (1997). Quantitative analysis of 16S rDNA using competitive PCR and the QPCRTM system 5000. BioTechniques 22, 700-704. Broude, N.E., Sano, T., Smith, C.L. & Cantor, C.R. (1994). Enhanced sequencing by hybridization. Proceedings of the National Academy of Sciences of the United States of America 91,
293 3072-3076. Buchholz-Cleven, B.E.E., Rattunde, B. & Straub, K.L. (1997). Screening for genetic diversity of isolates of anaerobic Fe(II)-oxidizing bacteria using DGGE and whole cell hybridization. Systematic and Applied Microbiology 20, 301-309. Carrino, J.J. & Lee, H.H. (1995). Nucleic acid amplification methods. Journal of Microbiological Methods 23, 3-20. Chee, M., Yang, R., Hubbell, E., Berno, A., Huang, X.C., Stem, D., Winkler, J., Lockhart, D.J., Morris, M.S. & Fodor, S.P.A. (1996). Accessing genetic information with high-density DNA arrays. Science 274, 610-614. Clark, J.M. (1988). Novel non-templated nucleotide addition reactions catalyzed by procariotic and eukariotic DNA polymerases. Nucleic Acids Research 16, 9677-9686. Clapp, J.P. (1999). The identification of root-associated fungi by polymerase chain reaction-single strand conformational polymorphism (PCR-SSCP). In Molecular microbial ecology manual Akkermans, A.D.L., van Elsas, J.D. & de Bruijn, EJ., eds, 3.4.7, pp.l-18. Kluwer, Dordrecht. Collins, M.D., Willems, A., Cordoba, J.J., Fernandez-Garayzabal, J., Garcia, P., Cal, J., Hippe, H. & Farrow, J.A.E. (1994). The phylogeny of the genus Clostridium: proposal of five new genera and eleven new species combinations. International Journal of Systematic Bacteriology 44, 812-826. Coppella, S.J., Acheson, C.M. & Dhurjati, P. (1987a). Measurement of copy number using HPLC. Biotechechnology and Bioengineering. 29, 646-647. Coppella, S.J., Acheson, C.M. & Dhurjati, P. (1987b). Isolation of high-molecular weight nucleic acids for copy number analysis using high-performance liquid chromatography. Journal of Chromatography 402, 189-199. de Vos, W.M., Zoetendal, E.G., Poelwijk, E., Heilig, H. & Akkermans, A.D.L. (1999). Molecular tools for analyzing the functionality of probiotic properties of microorganisms. In Proceedings of the 25th international dairy congress, pp. 323-328. Danish National Committee of the International Dairy Federation, Aarhus. Dell'Anno, A., Fabiano, M., Duineveld, G.C.A., Kok, A. & Danovaro, R. (1998). Nucleic acid (DNA, RNA) quantification and RNA/DNA ratio determination in marine sediments: comparison of spectrophotometric, fluorometric, and high-performance liquid chromatography methods and estimation of detrital DNA. Applied and Environmental Microbiology 64, 3238-3245. Dorr, J., Sghir, A., Hennequart-Gramet, G., Corthier, G. & Pochart, P. (1998). Design and evaluation of a 16S rRNA-targeted oligonucleotide probe for specific detection and quantitation of human faecal Bacteroides populations. Systematic and Applied Microbiology 21, 65-71. Drmanac, R., Drmanac, S., Strezosca, Z., Paunesku, T., Labat, I., Zeremski, M., Snoddy, J., Funkhouser, W.K., Koop, B., Hood, L. & Crkvanjakov, R. (1993). DNA sequence determination by hybridization: a strategy for efficient large-scale sequencing. Science 260, 1649-1652. Edwards, A.W.E & Cavilla-Sforza, L.L. (1963). The reconstruction of evolution. Heredity 18, 553. Ezaki, T., Li, N., Hashimoto, Y., Miura, H. & Yamamoto, H. (1994). 16S ribosomal DNA sequences of anaerobic cocci and proposal of Ruminococcus hansenii comb. nov. and Ruminococcus productus comb. nov. International Journal of Systematic Bacteriology 44, 130-136. Farrelly, V., Rainey, EA. & Stackebrandt, E. (1995). Effect of genome sizes and rrn gene copy number on PCR amplification of 16S rRNA genes from a mixture of bacterial species. Applied and Environmental Microbiology 61, 2798-2801. Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17, 368-376. Felske, A., Engelen, B., Ntibel, U. & Backhaus, H. (1996). Direct ribosome isolation from soil to extract bacterial rRNA for community analysis. Applied and Environmental Microbiology 62, 4162-4167. Felske, A., Rheims, H., Wolterink, A., Stackebrandt, E. & Akkermans, A.D.L. (1997). Ribosome analysis reveals prominent activity of an uncultured member of the class Actinobacteria in grassland soils. Microbiology 143, 2983-2989.
294 Felske, A., Wolterink, A., van Lis, R. & Akkermans, A.D.L. (1998a). Phylogeny of the main bacterial 16S rRNA sequences in Drentse A Grassland soils. Applied and Environmental Microbiology 64, 871-879. Felske, A., Akkermans, A.D.L. & de Vos, W.M. (1998b). Quantification of 16S rRNAs in complex bacterial communities by multiple competitive reverse transcription-PCR in temperature gradient gel electrophoresis. Applied and Environmental Microbiology 64, 4581-4587. Felske, A., Backhaus, H. & Akkermans, A.D.L. (1998c). Direct ribosome isolation from soil. In Molecular microbial ecology manual Akkermans, A.D.L., van Elsas, J.D. & de Bruijn, F.J., eds, 1.2.4, pp. 1-10, Kluwer, Dordrecht. Finegold, S.M., Sutter, V.L. & Mathisen, G.E. (1983). Normal indigenous intestinal flora. In Human intestinal microflora in health and disease, Hentges D.J., ed., pp. 3-31. Academic Press, New York. Fischer, S.G. & Lerman, L.S. (1979). Length-independent separation of DNA restriction fragments in two-dimensional gel electrophoresis. Cell 16, 191-200. Fox, G.E., Wisotzkey, J.D. & Jurtshuk, P. (1992). How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity. International Journal of Systematic Bacteriology 42, 166-170. Franks, A.H., Harmsen, H.J.M., Raangs, G.C., Jansen, G.J., Schut, E & Welling, G.J. (1998). Variations of bacterial populations in human feces measured by fluorescent in situ hybridization with group-specific 16S rRNA-targeted oligonucleotide probes. Applied and Environmental Microbiology 64, 3336-3345. Guschin, D.Y., Mobarry, B.K., Proudnikov, D., Stahl, D.A., Rittmann, B.E. & Mirzabekov, A.D. (1997). Oligonucleotide microchips as genosensors for determinative and environmental studies in microbiology. Applied and Environmental Microbiology 63, 2397-2402. Harmsen H.J.M., Stams, A.J.M., Akkermans, A.D.L. & de Vos, W.M. (1995). Phylogenetic analysis of two syntrophic propionate-oxidizing bacteria in enrichment cultures. Systematic and Applied Microbiology 18, 67-73. Hayashi, K. (1991). PCR-SSCP: a simple and sensitive method for detection of mutations in the genomic DNA. PCR Methods and Applications 1, 34-38. Hayashi, K. & Yandell, D.W. (1993). How sensitive is PCR-SSCP? Human Mutation 2, 38-346. Head, I.M., Saunders, J.R. & Pickup, R.W. (1998). Microbial evolution, diversity, and ecology: a decade of ribosomal RNA analysis of uncultivated microorganisms. Microbial Ecology 35, 1-21. Holdeman, L.V., Good, I.J. & Moore, W.E.C. (1976). Human fecal flora: Variation in bacterial composition within individuals and a possible effect of emotional stress. Applied and Environmental Microbiology 31, 359-375. Johnson, J.L. (1991). Isolation and purification of nucleic acids. In Nucleic acid techniques in bacterial systematics, Stackebrandt, E. & Goodfellow, M., eds, pp. 1-19. Wiley, Chichester. Jukes, T.H. & Cantor, C.R. (1969). Evolution of protein molecules. In Mammalian protein metabolism, Munro, H.N., ed., pp. 21-132. Academic Press, New York. Kelly, J.J., Sappelsa, L., Bavykin, S.G., Mirzabekov, A.D. & Stahl, D.A. (1999). Optimization of DNA microarrays for the rapid characterization of microbial community population structure. In Abstracts of the 99th general meeting of the American Society for Microbiology, pp. 470-471. American Society for Microbiology, Washington D.C. Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitution through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16, 111-120. Kopczynski, E.D., Bateson, M.M. & Ward, D.M. (1994). Recognition of chimeric small-subunit ribosomal DNAs composed of genes from uncultivated micro-organisms. Applied and Environmental Microbiology 60, 746-748. Kowalchuk, G.A., Stephen, J.R., de Boer, W., Prosser, JT, Embley, T.M. & Woldendorp, J.W. (1997). Analysis of ammonia-oxidizing bacteria of the 1~ subdivision of the class Proteobacteria in coastal sand dunes by denaturing gel electrophoresis and sequencing of PCR-amplified 16S
295 ribosomal DNA fragments. Applied and Environmental Microbiology 63, 1489-1497. Kreader, C.A. (1996). Relief of amplification in PCR with bovine serum albumin or T4 gene 32 protein. Applied and Environmental Microbiology 62, 1102-1106. Kuritza, A.E & Salyers, A.A. (1985). Use of a species-specific DNA hybridization probe for enumeration of Bacteroides vulgatus in human feces. Applied and Environmental Microbiology 50, 958-964. Lane, D.J. (1991). 16S/23S rRNA sequencing. In Nucleic acid techniques in bacterial systematics, Stackebrandt, E. & Goodfellow, M., eds, pp. 115-175. Wiley, Chichester. Langendijk, ES., Schut, E, Jansen, G.J., Raangs, G.C., Camphuis, G.R., Wilkinson, M.E & Welling, G.J. (1995). Quantitative fluorescence in situ hybridization of Bifidobacterium spp. with genusspecific 16S rRNA-targeted probes and its application in fecal samples. Applied and Environmental Microbiology 61, 3069-3075. Lee, D-H., Zo, Y-G. & Kim, S-J. (1996). Nonradioactive method to study genetic profiles of natural bacterial communities by PCR-single-strand-conformation polymorphism. Applied and Environmental Microbiology 62, 3112-3120. Lemieux, B., Aharoni, A. & Schena, M. (1998). Overview of DNA chip technology. Molecular Breeding 4, 277-289. Liesack, W., Weyland, H. & Stackebrandt, E. (1991). Potential risks of gene amplification by PCR as determined by 16S rDNA analysis of a mixed-culture of strict barophilic bacteria. Microbial Ecology 21, 191-198. Maidak, B.L., Olsen, G.J., Larsen, N., Overbeek, R., McCaughey, M.J. & Woese, C. (1997). The RDP (Ribosomal Database Project). Nucleic Acids Research 25, 109-111. Maniatis, T., Fritsch, E.E & Sambrook, J. (1989). Molecular cloning: a laboratory manual, 2nd edn. Cold Spring Harbor Laboratory, New York. Mau, M. & Timmis, K.N. (1998). Use of subtractive hybridization to design habitat-based oligonucleotide probes for investigation of natural bacterial communities. Applied and Environmental Microbiology 64, 185-191. McCartney, A.L., Wenzhi, W. & Tannock, G.W. (1996). Molecular analysis of the composition of the Bifidobacterial and Lactobacillus microflora of humans. Applied and Environmental Microbiology 62, 4608-4613. McFarlene, G.T. & Gibson, G.R. (1994). Metabolic activities of the normal colonic microflora. In Human health: contribution of microorganisms, Gibson, S.A.W., ed., pp. 17-38. Springer, Frankfurt. McKillip, J.L., Jaykus, L-A. & Drake, M. (1998). rRNA stability in heat-killed and UV-irradiated enterotoxigenic Staphylococcus aureus and Escherichia coli O157:H7. Applied and Environmental Microbiology 64, 4264--4268. Millar, M.R., Linton, C.J., Cade, A., Glancy, D., Hall, M. & Jalal, H. (1996). Application of 16S rRNA gene PCR to study bowel flora of preterm infants with and without necrotizing enterocolitis. Journal of Clinical Microbiology 34, 2506-2510. Moore, W.E.C. & Holdeman, L.V. (1974). Human fecal flora: The normal flora of 20 JapaneseHawaiians. Applied and Environmental Microbiology 27, 961-979. Murray, A.E., Hollibaugh, J.T. & Orrego, C. (1996). Phylogenetic compositions of bacterioplankton from two California estuaries compared by denaturing gradient gel electrophoresis of 16S rDNA fragments. Applied and Environmental Microbiology 62, 2676-2680. Murray, A.E., Preston, C.M., Massana, R., Taylor, L.T., Blakis, A., Wu, K. & de Long, E.E (1998). Seasonal and spatial variability of bacterial and archaeal assemblages in the coastal waters near Avers island, Antarctica. Applied and Environmental Microbiology 64, 2585-2595. Muyzer, G. (1999). DGGE/TGGE a method for identifying genes from natural ecosystems. Current Opinion in Microbiology 2, 317-322. Muyzer, G. & de Waal, E.C. (1994). Determination of the genetic diversity of microbial communities using DGGE analysis of PCR-amplified 16S rRNA. NATO ASI Series G35, 207-214. Muyzer, G. & Smalla, K. (1998). Application of denaturing gradient gel electrophoresis (DGGE) and
296 temperature gradient gel electrophoresis (TGGE) in microbial ecology. Antonie van Leeuwenhoek 73, 127-141. Muyzer, G., de Waal, E.C. & Uitterlinden, G.A. (1993). Profiling of complex populations by denaturating gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding for 16S rRNA. Applied and Environmental Microbiology 59, 695-700. Myers, R.M., Fischer, S.G., Lerman, L.S. & Maniatis, T. (1985). Nearly all single base substitutions in DNA fragments joined to a GC-clamp can be detected by denaturing gradient gel electrophoresis. Nucleic Acids Research 13, 3131-3145. Myers, R.M., Maniatis, T. & Lerman, L.S. (1987). Detection and localization of single base changes by denaturing gradient gel electrophoresis. Methods in Enzymology 155, 501-527. Ntibel, U., Engelen, B., Felske, A., Snaidr, J., Wieshuber, A., Amann, RT, Ludwig, W. & Backhaus, H. (1996). Sequence heterogeneities of genes encoding 16S rRNAs in Paenibacillus polymyxa detected by temperature gradient gel electrophoresis. Journal of Bacteriology 178, 5636-5643. Ntibel, U., Garcia-Pichel, E & Muyzer, G. (1997). PCR primers to amplify 16S rRNA genes from Cyanobacteria. Applied and Environmental Microbiology 63, 3327-3332. Ntibel, U., Garcia-Pichel, E, Ktihl, M. & Muyzer, G. (1999) Quantifying microbial diversity: morphotypes, 16S rRNA genes, and carotenoids of oxygenic phototrophs in microbial mats. Applied and Environmental Microbiology 65, 422-430. Ohkuma, M. & Kudo, T. (1996). Phylogenetic diversity of the intestinal bacterial community in the termite Reticulitermis speratus. Applied and Environmental Microbiology 62, 461-468. Old, R.W. & Primrose, S.B. (1989). Principles of gene manipulation: an introduction to genetic engineering, 4th edn. Blackwell, Oxford. Orita, M., Iwahana, H., Kanazawa, H., Hayashi, K. & Sekiya, T. (1989). Detection of polymorphisms of human DNA by gel electrophoresis as single-strand conformation polymorphism. Proceedings of the National Academy of Sciences of the United States of America 86, 2766-2770. Pace, N.R. (1997). A molecular view of microbial diversity and the biosphere. Science 276, 734-740. Pearson, W.R. & Lipman, D.J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America 85, 2444-2448. Poinar, H.N., Hofreiter, M., Spaulding, W.G., Martin, P.S., Stankiewitz, B.A., Bland, H., Evershed, R.P., Possnert, G. & P~i~ibo, S. (1998). Molecular coproscopy: dung and diet of the extinct ground sloth Nothrotheriops shastensis. Science 281,402-406. Polz, M.E & Cavanough, C.M. (1998). Bias in template-to-product ratios in multitemplate PCR. Applied and Environmental Microbiology 64, 3724-3730. Priest, E & Austin, B. (1993). Modern bacterial taxonomy, 2nd edn. Chapman and Hall, London. Ramfrez-Saad, H.C. (1999). Molecular ecology of Frankia and other soil bacteria under natural and chlorobenzoate-stressed conditions. PhD thesis, Wageningen University. Ramirez-Saad, H.C., Akkermans, W.M. & Akkermans, A.D.L. (1996). DNA extraction from actinorhizal nodules. In Molecular microbial ecology manual, Akkermans, A.D.L., van Elsas, D.J. & de Bruin, EJ., eds, 1.4.4, pp. 1-11. Kluwer, Dordrecht. Reysenbach, A., Giver, L.J., Wickham, G.S. & Pace, N.R. (1992). Differential amplification of rRNA genes by polymerase chain reaction. Applied and Environmental Microbiology 58, 3417-3418. Rosenbaum, V. & Riesner, D. (1987). Temperature-gradient gel electrophoresis - thermodynamic analysis of nucleic acids and proteins in purified form and in cellular extracts. Biophysical Chemistry 26, 235-246. Saiki, R.K., Gelfand, D.H., Stoffel, S.J., Scharf, S.J., Higuchi, R., Horn, G.T., Mullis, K.B. & Erlich, H.A. (1988). Primer-directed enzymatic amplification of DNA with thermostable DNA polymerase. Science 239, 487-491. Saitou, R.R. & Nei, M. (1987). A neighbour-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 44, 406-425. Sanguinetti,C.J., Dias Neto, E. & Simpson, A.J.G. (1994). Rapid silver staining and recovery of PCR products separated on polyacrylamide gels. BioTechniques 17, 915-919.
297 Savage, D.C. (1977). Microbial ecology of the gastrointestinal tract. Annual Review of Microbiology 31, 107-133. Scheinert, P., Krausse, R., Ullmann, U., S611er, R. & Krupp, G. (1996). Molecular differentiation of bacteria by PCR amplification of the 16S-23S rRNA spacer. Journal of Microbiological Methods 26, 103-117. Schena, M., Shalon, D., Davis, R.W. & Brown, P.O. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467-470. Schena M., Shalon, D., Heller, R., Chai, A., Brown, P.O. & Davis, R.W. (1996). Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proceedings of the National Academy of Sciences of the United States of America 93, 10614-10619. Schena, M., Heller, R.A., Theriault, T.P., Konrad, K., Lachenmeier, E. & Davis, R.W. (1998). Microarrays: biotechnology's discovery platform for functional genomics. Trends in Biotechnology 16, 301-306. Schwieger, E & Tebbe, C.C. (1998). A new approach to utilize PCR-single-strand-conformation polymorphism for 16S rRNA gene-based microbial community analysis. Applied and Environmental Microbiology 64, 4870-4876. Sheffield, V.C., Cox, D.R., Lerman, L.S. & Myers, R.M. (1985). Attachment of a 40-base-pair G+C rich sequence (GC-clamp) to genomic DNA fragments by polymerase chain reaction results in improved detection of single-base changes. Proceedings of the National Academy of Sciences of the United States of America 86, 232-236. Shoemaker, J.S. & Fitch, W.M. (1989). Evidence from nuclear sequences that invariable sites should be considered when sequence divergence is calculated. Molecular Biology and Evolution 6, 270-289. Simpson, J.M., McCracken, V.J., White, B.A., Gaskins, H.R. & Mackie, R.I. (1999). Application of denaturing gradient gel electrophoresis for the analysis of the porcine gastrointestinal microbiota. Journal of Microbiological Methods 36, 167-179. Sneath, P.H.A. (1989). Analysis and interpretation of sequence data for bacterial systematics: the view of a numerical taxonomist. Systematic and Applied Microbiology 12, 15-31. Sokal, R.R., & Michener, C.D. (1958). A statistical method for evaluating systematic relationships. University of Kansas Scientific Bulletin 28, 1409-1438. Stackebrandt, E. & Goebel, B.M. (1994). Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. International Journal of Systematic Bacteriology 44, 846-849. Stahl, D.A. & Amann, R. (1991). Development and application of nucleic acid probes. In Nucleic acid techniques in bacterial systematics, Stackebrandt, E. & Goodfellow, M., eds, pp. 115-175. Wiley, Chichester. Stahl, D., Flesher, B., Mansfield, H.M. & Montgomery, L. (1988). Use of phylogenetically based hybridization probes for studies of ruminal microbial ecology. Applied and Environmental Microbiology 54, 1079-1084. Strunk, O. & Ludwig, W. (1995). ARB - a software environment for sequence data. Department of Microbiology, Technical University of Munich, Munich, Germany. (http://www.mikro.biologie.tumue nchen, de/p ub/ARB/doc umen tatio rdarb.ps ). Suau, A., Bonnet, R., Sutren, M., Godon, J-J., Gibson, G.R., Collins, M.D. & Dor6, J. (1999). Direct analysis of genes encoding 16S rRNA from complex communities reveals many novel molecular species within the human gut. Applied and Environmental Microbiology 65, 4799-4807. Suzuki, M.T. & Giovannoni, S.J. (1996). Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR. Applied and Environmental Microbiology 61,625-630. Sykes, P.J., Neoh, S.H., Brisco, M.J., Hughes, E., Condon, J. & Morley, A.A. (1992). Quantitation of targets for PCR by use of limiting dilution. BioTechniques 13, 444-449. Tajima, E & Nei, M. (1984). Estimation of evolutionary distance between nucleotide sequences. Molecular Biology and Evolution 41, 269-285. Teske, A., Wawer, C., Muyzer, G. & Ramsing, N.B. (1996). Distribution of sulfate-reducing bac-
298 teria in a stratified fjord (Mariager Fjord, Denmark) as evaluated by most-probable-number counts and denaturating gradient gel electrophoresis of PCR-amplified ribosomal DNA fragments. Applied and Environmental Microbiology 62, 1405-1415. Vallaeys, T., Topp, E., Muyzer, G., Macheret, V., Laguerre, G. & Soulas, G. (1997). Evaluation of denaturing gradient gel electrophoresis in the detection of 16S rDNA sequence variation in Rhizobia and Methanotrophs. FEMS Microbiology Ecology 24, 279-285. van de Peer, Y. & de Wachter, R.(1995). Investigation of fungal phylogeny on the basis of small ribosomal subunit RNA sequences. In Molecular microbial ecology manual Akkermans, A.D.L., van Elsas, J.D. & de Bruijn, EJ., eds, 3.3.4, pp.l-12. Kluwer, Dordrecht, Wagner, R. (1994). The regulation of ribosomal RNA synthesis and bacterial cell growth. Archives of Microbiology 161, 100-106. Wang, G.C.Y. & Wang Y. (1996). The frequency of chimeric molecules as a consequence of PCR co-amplification of 16S rRNA genes from different bacterial species. Microbiology 142, 1107-1114. Wang, A.M., Doyle, M.V. & Mark, D.E (1989). Quantification of mRNA by the polymerase chain reaction. Proceedings of the National Academy of Sciences of the United States of America 86, 9717-9721. Wang, R-E, Cao, W-W. & Cerniglia, C.E. (1996a). PCR detection of predominant anaerobic bacteria in human and animal fecal samples. Applied and Environmental Microbiology 62, 1242-1247. Wang, R-F., Cao, W-W. & Cerniglia, C. E. (1996b). Phylogenetic analysis of Fusobacterium prausnitzii based upon the 16S rRNA gene sequence and PCR confirmation. International Journal of Systematic Bacteriology 6, 341-343. Wawer, C., Jetten, M.S.M. & Muyzer, G. (1997). Genetic diversity and expression of the [NiFe] hydrogenase large subunit gene of Desulfovibrio spp. in environmental samples. Applied and Environmental Microbiology 63, 4360-4369. Wayne, L.G., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, O., Krichevski, M.I., Moore, L.H., Moore W.E.C., Murray, R.G.E., Stackebrandt, E., Starr, M.P. & Trtiper, H.G. (1987). Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. International Journal of Systematic Bacteriology 37, 463-464. Whitford, M.E, Forster, R.J., Beard, C.E., Gong, J. & Teather, R.M. (1998). Phylogenetic analysis of rumen bacteria by comparative sequence analysis of cloned 16S rRNA genes. Anaerobe 4, 153-163. Wilson, K.H. & Blitchington, R.H. (1996) Human colonic biota studied by ribosomal DNA sequence analysis. Applied and Environmental Microbiology 62, 2273-2278. Wintzingrode, E v., Grbel, U. B. & Stackebrandt, E. (1997). Determination of microbial diversity in environmental samples: pitfalls of PCR-based rRNA analysis. FEMS Microbiology Reviews 21, 213-229. Woese, C.R. (1987). Bacterial evolution. Microbiological Reviews 51, 221-271. Woese, C.R., Kandler, O. & Wheelis, M.L. (1990). A definition of the domains Archaea, Bacteria, and Eucarya in terms of small subunit ribosomal characteristics. Systematic and Applied Microbiology 14, 305-310. Wolters D.J. (1998). Ineffective Frankia in wet alder soils. PhD Thesis, University of Groningen. Yershov, G., Barsky, V., Belgovskiy, A., Kirillov, E., Kreidlin, E., Ivanov, I., Parinov, S., Guschin, D., Drobishev, A., Dubiley, S. & Mirzabekov, A. (1996). DNA analysis and diagnostics on oligonucleotide microchips. Proceedings of the National Academy of Sciences of the United States of America 93, 4913-4918. Yoshino, K., Nishigaki, K. & Husimi,Y. (1991). Temperature sweep gel electrophoresis: a simple method to detect point mutations. Nucleic Acids Research 19, 3153. Zoetendal, E.G., Akkermans, A.D.L. & de Vos, W.M. (1998). Temperature gradient gel electrophoresis analysis from human fecal samples reveals stable and host-specific communities of active bacteria. Applied and Environmental Microbiology 64, 3854-3859.
299
12 From Multilocus Enzyme Electrophoresis to Multilocus Sequence Typing Dominique
A . C a u g a n t 1,2
~Department of Bacteriology, WHO Collaborating Centrefor Reference and Research on Meningococci, National Institute of Public Health, PO Box 4404 Nydalen, N-0403 Oslo, Norway; 21nstitute for Oral Biology, PO Box 1052 Blindern, N-0316 Oslo, Norway
CONTENTS 12.1 I N T R O D U C T I O N 12.2
MEE METHODOLOGY A. B. C. D. E. E
A.
B. C.
OF MEE
. . . . . . . . . . . . . . . . . . . . . . .
Preparation of PCR products Nucleotide sequencing of PCR products Analysis of the sequence data Use of the database Applications of MLST
12.5 P E R S P E C T I V E S
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
ACKNOWLEDGEMENTS REFERENCES
9
. . . . . . . . . . . . . . . . . . . . . . .
Bacterial systematics (i) Borrelia species associated with Lyme disease (ii) Identification of cryptic Legionnella spp. (iii) Differentiation of a newly recognised Prevotella spp. (iv) Listeria spp. Population genetics Molecular epidemiology Neisseria meningitidis (i) (ii) Streptococcus pyogenes (iii) Streptococcus pneumoniae (iv) Listeria monocytogenes
MLST METHODOLOGY A. B. C. D. E.
. . . . . . . . . . . . . . . . . . . . . . . .
Preparation of the bacterial extracts Gel preparation Electrophoresis Staining procedures Interpretation of the gels Analysis of the data
12.3 A P P L I C A T I O N S
12.4
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
300 305
306 306 308 309 309 315 317
318 318 318 318 319 319 322 322 325 326 327 328
329 333 333 333 334 334
. . . . . . . . . . . . . . . . . . . . . . . . . .
335
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
335
Elsevier Science B.V. All rights reserved.
300 12.1 INTRODUCTION Differentiation and classification of bacterial strains at the sub-species level requires methods that are highly reproducible, little affected by the experimental conditions, and that give effective discrimination of epidemiologically unrelated strains. Bacteria derived from a single precursor through clonal expansion are genetically identical or closely related. To provide significant biological information, a typing method should reflect the amount of genetic variation that has accumulated over time since the divergence of the microbial organisms from a common ancestor. The two main factors that determine the extent to which different microbial organisms have diverged from a common cell line are the time elapsed since their divergence and their potential for evolutionary changes. The potential for evolution of bacteria depends upon the genetic mechanisms that they have available (mutation, recombination, inversion, DNA repair, etc.), and their respective rates, as well as upon the genetic structure and the ecology of their population. Bacteria that are able to incorporate foreign DNA into their genome via (e.g.) transformation are less stable than species relying mainly on mutation. Thus, differentiation from the parental type in transformable organisms will generally occur much more rapidly than in non-transformable ones. Depending on the type of questions to be addressed, different methods may be more or less suitable for molecular typing. The level of discrimination provided by a given technique varies with the degree of genetic diversity of the species, and the appropriateness of the variation depends on the epidemiological questions that are posed (local versus global epidemiology). It is essential, however, that any typing technique, regardless of the narrowness of the problem to be addressed, should be validated in comparison to a genotyping method using a multilocus approach. Multilocus enzyme electrophoresis (abbreviated MEE or MLEE) has been used extensively in studies of population biology and phylogenetics of both eukaryotes and prokaryotes, and has proved to be the gold standard for studying the population genetics of bacteria (Boerlin, 1997). The impact of the method on our knowledge of bacterial genetics, systematics and epidemiology has been considerable since the beginning of the 1980s, essentially as a result of the pioneering work of Selander's group (Selander & Levin, 1980; Caugant et al., 1981; Ochman et al., 1983). In the following years, MEE was successfully applied to numerous bacterial species, and some of the most significant studies for medical and veterinary bacteriology are listed in Table 12.1. The application of the method has contributed to the knowledge of the natural history of infectious diseases and has practical consequences for disease control strategies (Caugant, 1998). The technique is based on the analysis of electrophoretic variation in a set of chromosomally encoded cytoplasmic enzymes. Although the allelic variation is measured indirectly, it is a fully validated genotyping method (Struelens et al., 1996). For several species of bacteria, the genetic relationships obtained by MEE have been shown to be strongly correlated with those obtained from DNA-DNA
Table 12.1. Bacterial species analysed using MEE, together with selected references. References marked in bold are the source of the listed data.
Species
No. isolates
No. loci Mean no. No. ETs a HETsb alleles/locus
Acinetobacter baumanii
65
13
Actinobacillus acinomycetemcomitans
97
14
3.1
50
0.34
Caugant et al., 1990; Poulsen et al., 1994; Haubek et al., 1995; 1997
Actinobacillus pleuropneumoniae
250
18
2.9
37
0.247
Musser et al., 1987; MOiler et al., 1992
A e r o m o n a s spp.
153
11
6.6
122
0.645
Altwegg et al., 1991; Kokka et al., 1991; Boyd et al., 1994
Bacillus cereus/thuringiensis
36
15
5.3
27
0.553
Carlson et al., 1994; Helgason et al., 1998; Zahner et al., 1989
Bacillus subtilis
60
13
5.9
55
0.93
Istock et al., 1992
Bordetella spp.
188
15
2.7
38
0.178
Musser et al., 1996; 1987; Van der Zee et al., 1997
50
11
5.9
35
0.673
Boerlin et al., 1992; Balmelli & Piffaretti, 1995; 1996;
213
8
4.4
164
0.574
Wise et al., 1995
18
10
125
9
6.9
64
0.634
Aeschbacher & Piffaretti, 1989
2.2
16
0.244
Li et al., 1990
Borrelia spp. Burkholderia cepacia Burkholderia pseudomaIlei Campylobacter spp.
14
References
Thurm & Ritter, 1993
9
Norton et al., 1998
42
20
156
23
Enterobacter cloacae
62
12
3.7
Enterococcusfaecalis
65
15
2.7
26
0.264
Tomayko & Murray, 1995
1608
12
9.3
301
0.517
Selander & Levin, 1980; Caugant et al., 1981; 1985; Ochman et al., 1983; Ochman & Selander, 1983; Achtman et al., 1989; Selander et al., 1986; 1987; Maslow, 1985; Beutin et al., 1990; Tzabar & Pennington, 1991; Whittam et al., 1983a, b; 1993; Ngeleka et al., 1996; Rodrigues et al., 1996; Gordon, 1997; Pupo et al., 1997; Feng et al., 1998; Martinez et al., 1999
Citrobacter diversus Corynebacterium diphteriae
Escherichia coli
76
Popovic et al., 1996
Gaston & Warner, 1989
(Continued.) o
taO O tO
Table 12.1. Continued.
HET~b
References
280
0.467
Musser et aL, 1985; 1986; 1988a; 1988b; 1990; Porras et al., 1986a, b; Weinberg et al., 1989; Musser & Selander, 1990; Quentin et al., 1990; 1993; Lagos et al., 1991; Fust6 et al., 1996; van Alphen et al., 1997
3.1
34
0.405
Blackall et aL, 1997
11.2
73
0.735
Go et aL, 1996
Species
No. isolates
No. loci Mean no. No. alleles/locus
Haemophilus influenzae
2209
17
Haemophilus parasuis
40
17
Helicobacter pylori
74
6
6.4
ETs a
19
9
2.3
11
0.378
Nouvellon et aL, 1994
Legionella spp.
292
22
4.3
62
0.380
Selander et al., 1985; Woods et al., 1988; Lanser et al., 1990; Marques et al., 1995
Listeria monocytogenes
175
16
3.6
45
0.424
Pifaretti et al., 1989; Bibb et al., 1989; 1990; Boerlin & Pifaretti, 1991; Boerlin et al., 1991; 1996; 1997; Farber et al., 1991; Kolstad et al., 1992; Baxter et al., 1993; NCrrung & Gerner-Schmidt, 1993; NCrrung & Skovgaard, 1993; Trott et al., 1993; Graves et al., 1994; Harvey & Gilmour, 1994; Lawrence & Gilmour, 1995; Rcrvik et al., 1995; Caugant et al., 1996; Flint et al., 1996; Nesbakken et al., 1996
76
26
1.1
5
0.1
Feizabadi et aL, 1996
115 97
17 17
3.8 4.2
58 74
0.29 0.38
Wasem et al., 1991; Arbeit et al., 1993; Feizabadi et al., 1997
Mycobacterium spp. Neisseria meningitidis
688
15
7.2
331
0.547
Caugant et al., 1986; 1987; 1988; 1994; 1998; Crowe et al., 1987; Olyhoek et al., 1987; Moore et al., 1989; Achtman 1990; 1994; 1995; 1997; Ashton et al., 1991; Sacchi et al., 1992a, b; Wang et al., 1992; 1993; Woods et al., 1992
Neisseria gonorrhoeae
227
9
0.41
Gutjar et al., 1997; O'Rourke & Stevens, 1993; 1994; Vasquez et al., 1993
Klebsiella pneumoniae
Mycobacterium tuberculosis complex Mycobacterium avium
(Continued.)
89
Wallace et al., 1989; Feizabadi et al., 1996
Table 12.1. Continued.
Species Ornithobacterium rhinotracheale
No. isolates
No. loci Mean no. alleles/locus
No. ETs a
HETsb
55
18
Pasteurella haemolytica
178
18
2.5
22
0.297
Angen et al., 1997; Davies et al., 1997a
Pasteurella multocida
100
18
3.4
71
0.302
Blackall et al., 1998
Pasteurella trehalosi Porphyromonas gingivalis Prevotella spp. Pseudomonas aeroginosa Pseudomonas syringae Rhizobium spp.
Salmonella enteritidis
6
References Amonsin et al., 1997
60
19
2.6
20
0.289
Davies et al., 1997b
100
17
4.5
78
0.384
Loos et al., 1993
51
14
3.9
46
0.52
Frandsen et al., 1995
314
18
17
0.138
Levin et al., 1984; Martin et al., 1999
23
26
3.5
10
0.683
Denny et al., 1988
147
15
7.7
75
0.669
Young 1985; Young & Wexler, 1988; Pinero et al., 1988; Harrison et al., 1989; Segovia et al., 1991; Eardly et al., 1990; 1995
96
24
80
0.627
Beltran et al., 1988; 1991; Reeves et al., 1989; Selander et al., 1990a, b; 1992; Boyd et al., 1993; 1996; Cox et al., 1996 Lymbery et al., 1990; Trott et al., 1997
Serpulina hyoclysenteriae
231
15
2.7
50
0.29
Serpulina pilosicoli
164
15
1.5
33
0.18
Trott et al., 1997; 1998; Oxberry et al., 1998
Serratia marcescens
99
9
3.0
33
0.376
Gargallo-Viola, 1989
Serpulina spp.
56
15
40
0.587
Stanton et al., 1996; Trott et al., 1996; McLaren et al., 1997
Staphylococcus aureus
315
20
3.2
49
0.267
Musser et al., 1990; Musser & Kaput, 1992; Kapur et al., 1995b; Fitzgerald et al., 1997
Streptococcus agalactiae
128
11
2.5
19
0.304
Musser et al., 1989; Helmig et al., 1993; Quentin et al., 1995; Hauge et al., 1996
(Continued.) o
Table 12.1. Continued.
Species Streptococcus pyogenes
Streptococcus p n e u m o n i a e
O 4~
No. loci Mean no. No. ETs a HETsb alleles/locus
References
108
12
4.4
33
0.420
Musser et al., 1991; 1992; 1993a, b; 1995; Martin & Single, 1993; Haase et al., 1994; Reda et al., 1994; Whatmore et al., 1994
66
17
2.2
28
0.247
Munoz et al., 1991; 1992; Martin et al., 1992; Sibold et aL, 1992; Soares et al., 1993; Versalovic et al., 1993; Klugman et al., 1994; Lomholt, 1995; Coffey et al., 1995; McDougal et al., 1995; Hall et al., 1996; Pons et al., 1996; Takala et al., 1996
No. isolates
Streptococcus suis
124
Streptococcus spp.
50
16
12.2
Treponema spp.
34
10
Vibrio cholerae
397
17
Yersinia spp.
244
18
Hampson et al., 1993; Mwaniki et aL, 1994
17
Gilmour et aL, 1987; Bert et al., 1995
40
0.857
7.8
34
0.751
DaMe et aL, 1995
9.5
279
0.441
Wachsmuth et al., 1993; Evins et al., 1995; Popovic et al., 1995; Beltran et al., 1999
6.7
137
0.531
Schill et al., 1984; Caugant et al., 1989; Dolina & Peduzzi, 1993
a ETs, electrophoretic types. b HErs' mean genetic diversity among ETs at the loci studied.
305 hybridisation analysis (Ochman et al., 1983; Selander et al., 1985; Gilmour et al., 1987; Boerlin et al., 1991). This demonstrates that the analysis of 10-20 enzymes can adequately index the variation in the whole bacterial genome and give a representative measure of genetic relatedness among isolates. This also provides evidence for the relevance of the method to taxonomic applications. The cytoplasmic enzymes that are analysed in MEE are essential for bacterial metabolism and are coded by so-called housekeeping genes. Allelic variation in such genes is usually neutral or very nearly so, which means that strains possessing either one or another allele will have basically the same fitness. Convergence to the same allele through adaptative evolution is thus unlikely to happen. Differences between strains in a set of housekeeping genes reflect the genetic events that have occurred overall in their genome since their divergence from a common ancestor, making possible phylogenetic interpretations of the data (Musser, 1996). Bacteria with identical multilocus enzyme genotypes are assumed to descend from a single ancestral cell line and to be members of the same clone. As a result, MEE analyses have often been designated as clonal analyses. 12.2 MEE METHODOLOGY
The technique identifies naturally occurring allelic variation in chromosomal housekeeping genes by indexing differential electrophoretic migration of the gene products (the enzymes) in a support, usually a starch gel. The migration of a protein in such gels is a function of its molecular mass, its electrophoretic charge and the conformation of the molecule. Mutations, especially those affecting the net electrostatic charges of the proteins, are reflected by differential migration of the individual enzymes in the electrophoretic field. Evidently, only a fraction of all the base substitutions occurring in the nucleotide sequence of genes can be detected in this manner; silent substitutions at the nucleotide level, for example, will not be revealed, as they are not affecting the structure of the protein. Up to 50% of the mutations resulting in an amino acid change can be detected by ordinary electrophoresis, but by using sequential electrophoresis in which the pH, buffer system and pore size of the gel are varied, nearly all amino acid sequence variation can be revealed (Selander et al., 1986a). The level of expression of certain enzymes may vary with the growth temperature or the composition of the culture medium (e.g., detection of [3-galactosidase activity in Escherichia coli requires induction of the enzyme). The electrophoretic mobilities (electromorphs) of individual enzymes are, however, unaffected by the culture conditions, methods of storage, number of passages in the laboratory and other environmental factors (Selander et al., 1986a; Musser et al., 1987a, b). Consequently, all bacterial isolates can be unequivocally characterised by their combination of electromorphs over the loci studied, and all strains are typable. To provide a reliable index of the overall genome and enable appropriate interpretation of the data for epidemiological studies, an analysis of at least 10 enzyme loci is necessary. For taxonomic and population genetics analyses, somewhat
306 larger numbers of chromosomal loci are required. In some studies, to establish the strength of the technique, more than 30 loci have been assayed for limited numbers of isolates (Selander et al., 1987b). Because of its relative rapidity, the method employed by most investigators has been starch gel electrophoresis (Selander et al., 1986a; Boerlin & Pifaretti, 1995). A.
Preparation of the bacterial extracts
A pure culture of each isolate is grown under appropriate conditions to obtain a sufficient number of cells (about 1011) for the preparation of the enzyme lysate. The cells can be scraped directly from agar plates or slants (Feizabadi et al., 1997), but an overnight culture in broth is usually preferred, as large quantity of cells may be obtained at lower cost in that way. For most organisms that are easy to grow, a 100-ml broth culture will give a suitable cell quantity. For difficult organisms, such as Borrelia burgdorferi, 2-L cultures have been used to prepare the protein extracts (Boeflin et al., 1992). If a broth culture is used, the cells are collected by centrifugation. After harvest, the cells are suspended in a small volume (1-2 ml) of buffer solution (e.g., 10 mM Tris-HC1, lmM EDTA, pH 6.8) and lysed by a suitable method. Sonication, vortexing with glass beads, repeated freezing and thawing, or a combination of these methods can be used. The essential element is that the proteins must not be denatured during the lysis of the cells. Thus, the bacterial suspension must be kept cool at all time during the lysis process and thereafter. When beginning the study of a new organism, it is recommended to test several methods of protein extraction, with a limited set of isolates, for enzymes commonly found in most species, in order to identify the most convenient method providing sufficient enzyme activity. After lysis, the cellular debris are pelleted by centrifugation at 20,000 g for 20 min at 4~ For pathogenic bacteria, filtration of the supernatant through a 0.45 lam membrane is recommended, followed by storage of the lysate at-70~ until required for electrophoresis. B.
Gel preparation
Starch gels are convenient because they can be sliced in their thickness and different enzymes may be stained on each slice of a gel. However, other types of gels such as acrylamide (Gaston & Warner, 1989; Fust6 et al., 1996), cellulose acetate (Wise et al., 1995; Souza et al., 1999) or agarose may be employed. The resolution obtained using different supports has not been evaluated in detail, but improved resolution and reproducibility has been reported using polyacrylamide instead of starch (John & Hussain, 1994; Flint et al., 1996). However, a significant advantage of starch over polyacrylamide is its absence of toxicity. The quality of the starch is critical to permit a regular slicing of the gel in the thickness and assure a good resolution of the electromorphs. Hydrolysed starch from Connaught Laboratories (Ontario, Canada) has proved reliable through many
307 Table 12.2. Six buffer systems commonly used for MEE
System Electrode buffer
Gel buffer
Voltage (V)
A
Tris-citrate, pH 8.0 83.20 g Tris, 33.09 g citric acid monohydrate, 1 L water
Tris-citrate, pH 8.0 Electrode buffer diluted 1:29
130
B
Tris-citrate, pH 6.3 27.00 g Tris, 18.07 g citric acid monohydrate, 1 L water; pH adjusted with NaOH
Tris-citrate, pH 6.7 0.97 g Tris, 0.63 g citric acid monohydrate, 1 L water; pH adjusted with NaOH
150
C
Borate, pH 8.2 18.50 g boric acid, 2.40 g NaOH, 1 L water
Tris-citrate, pH 8.7 9.21 g Tris, 1.05 g citric acid monohydrate, 1 L water
250
D
Lithium hydroxide, pH 8.1 Lithium hydroxide, pH 8.3 1.20 g LiOH monohydrate, 11.89 g boric Electrode buffer diluted 1:9 in acid, 1 L water 6.20 g Tris, 1.60 g citric acid monohydrate, 1 L water
325
F
Tris-maleate, pH 8.2 12.10 g Tris, 11.60 g maleic acid, 3.72 g Na2EDTA, 2.03 g MgC12 . 6H20, 1 L water, pH adjusted with 5.15 g NaOH
100
G
Potassium phosphate, pH 6.7 Potassium phosphate, pH 7.0 100 18.14 g KH2PO4, 2.39 g NaOH, 1 L water 1.06 g KH2PO4, 0.25 g citric acid monohydrate, 1 L water
Tris-maleate, pH 8.2 Electrode buffer diluted 1:9
years of use, with little quality variation from batch to batch. Starch at a concentration of 11-12% is suspended in an appropriate volume of gel buffer (e.g., 48 g starch for 420 ml buffer) in a 1-L Erlenmeyer flask with thick walls, tolerating heat and vacuum. Some commonly used gel buffers and the corresponding electrode buffers are listed in Table 12.2. The mixture is heated over a Bunsen burner with continuous and vigorous hand swirling until the suspension starts boiling and develops large air bubbles. The boiling time may need to be adjusted depending on the gel buffer and the batch of the starch. Constant swirling is necessary to avoid burning of the starch at the bottom of the flask. After boiling, the gel is degassed for 1 min and immediately poured into a gel mould, positioned on a perfectly level surface. A gel mould with a size of 18 • 20 • 1 cm is suitable for a volume of 420 ml of gel buffer and can be used for the electrophoresis of 20 samples. If air bubbles are visible in the gel after pouring, they should be quickly removed by aspiration with a Pasteur pipette before the starch begins to solidify. The surface of the gel should be level and even. A gel that sticks to the flask during pouring and has an uneven surface is undercooked. The gel is left to solidify either at room temperature for 2 h or at room temperature for 30 min, followed by 30 min at 4~ before wrapping in plastic film
308 to prevent desiccation. To keep the surface of the gel smooth, no air bubbles should be trapped under the plastic film. For optimal results, a gel must be used within 24 h of its preparation. Normally, gels are stored overnight at room temperature before electrophoresis; the gels may be used on the day they are made, but they will be more difficult to slice.
C.
Electrophoresis
The protein extracts of the bacteria to be analysed are thawed and immediately put on ice. The gel is unwrapped and a slit is cut through it with a scalpel at a distance of 5 cm from the shorter side of the gel mould. Pieces of Whatman filter paper no. 3 (6 x 9 mm) are used to load the lysate into the gel. Using forceps, a piece of filter paper is dipped in the cell lysate, blotted on a filter paper to eliminate excess liquid, then inserted into the slit in the gel, leaving a 1 cm space from the left end of the gel. The paper must be placed perfectly straight, with its short side in contact with the bottom of the gel form. The same procedure is repeated with the next lysate, leaving a 2-3 mm space between the filter papers. Up to 20 samples can be loaded on a 18-cm wide gel. Pieces of filter paper, dipped in amaranth dye (100 mg amaranth dissolved in 1 ml ethanol, plus 19 ml water), are placed in the spaces remaining at each end of the slit to mark the migration of the buffer front during electrophoresis. The two pieces of the gel are then carefully pressed together to eliminate the air between the filter papers and avoid denaturation of the proteins by drying. The gel slit containing the filter papers is then covered again with the plastic film, which is then folded back in a straight line at 2.5 cm from the end of the gel. The electrophoresis tanks are filled with about 250 ml of appropriate buffer (see Table 12.2). Two thin sponge wicks, 10 cm apart on each side of the gel, provide contact between the gel and the buffer. One wick is lined up with the row of samples, separated by the plastic film, and the other wick is placed directly on the gel. Both are covered again with the plastic film. During electrophoresis, the gel is cooled by a pan of ice supported by a glass plate. A constant voltage is maintained during electrophoresis (Table 12.2). The duration of electrophoresis in these conditions varies from 4-8 h, depending on the buffer. An alternative is to perform the electrophoresis at a lower voltage, running the gels overnight. In this case, the electrophoresis should be carried out in a cool room. Standardisation of the migration between gels is assured by measuring the migration of the amaranth dye. After electrophoresis, three to four slices (1-2 mm thick) may be cut from each gel using a thin wire drawn through the gel placed on a slicing tray. Each slice is carefully placed in a plastic box, labelled with the gel number and the name of the enzyme to be stained. The quality of the slices decreases from the bottom to the top of the gel. It is thus recommended to always stain the same enzymes on the same slice level to assure reproducibility of the reading.
309
D.
Staining procedures
The gels are immersed in a freshly prepared solution containing the substrate for the enzyme, coenzyme, intermediary catalyst and dye, and possibly other ingredients such as intermediary enzymes. Staining methods commonly used with bacteria for 31 enzymes are given in Table 12.3. The enzymatic reactions involved are illustrated in the handbook of Harris & Hopkinson (1976), where staining methods for additional enzymes can also be found. When intermediary enzymes are needed to reveal a specific enzyme activity (e.g., for an isomerase, such as phosphoglucose isomerase, or a transferase, such as hexokinase), it is usually necessary to mix the staining ingredients in an agar overlay to ensure the sharpness of the bands of enzymatic activity. The gels are incubated at 37~ usually in the dark, until appearance of the enzyme reaction (dark bands). Indophenol oxidase and catalase are stained in the light, at room temperature. Enzyme activity for these two enzymes shows as white bands on a dark background. Depending on the enzyme and the bacterial species being analysed, the staining reaction may take from a few minutes to several hours. In some cases, the gel may be incubated overnight. After staining, the solution is poured off and the gel slice rinsed in water if no agar overlay was used. The gels are then fixed in a 1:5:5 solution of acetic acid, methanol and water. Gels stained for indophenol oxidase and catalase should not be fixed, but can be kept in water. For each bacterial species, the optimal electrophoretic conditions for each individual enzyme need to be determined. An enzyme that may appear monomorphic in one buffer system may present numerous electromorphs in another buffer. Therefore, electrophoresis of a small number of strains, using the various buffer systems given in Table 12.2, possibly with different times of migration or different gel concentrations, needs to be performed for each enzyme. The bands of enzyme activity must be narrow and clearly visible to assure good resolution. Fig. 12.1 shows gel slices in which the same 20 isolates of Neisseria meningitidis have been electrophoresed and stained for glucose 6-phosphate dehydrogenase, isocitrate dehydrogenase and alkaline phosphatase, respectively.
E.
Interpretation of the gels
Relative mobilities of each enzyme from different isolates must be compared visually against one another on the same gel. Distinctive electromorphs are numbered in order of decreasing anodal mobility, i.e., the electromorph that has migrated the farthest is assigned the number 1. In Fig. 12.1, the gel slice stained for glucose 6-phosphate dehydrogenase revealed two electromorphs, while the gel slice stained for isocitrate dehydrogenase from the same 20 bacteria revealed four electromorphs. In some bacterial species, more than 20 electromorphs of an individual enzyme may be detected by ordinary starch gel electrophoresis, illustrating the discriminatory power of the technique. In bacteria, genes coding for an enzyme activity usually exist as a single copy.
Table 12.3. Staining solutions for enzymes commonly used in MEE Enzyme
EC no.
Agar overlay
Staining solution
Aconitase (ACO)
4.2.1.3
Yes
15 ml Tris-HC1 a, 10 ml MgC12b, 25 mg cis-aconitic acid, 10 units isocitrate dehydrogenase, 1 ml NADP c, 0.5 ml PMS J, 1 ml MTT e
Acid phosphatase (ACP)
3.1.3.2
No
Adenylate kinase (ADK)
2.7.4.3
Yes
50 ml 0.05 M sodium acetate, pH 5.0, 50 mg o~-naphthyl acid phosphate, 50 mg ~-naphthyl acid phosphate, 20 mg black K salt 25 ml Tris-HC1 a, 100 mg glucose, 25 mg ADP, 1 mg hexokinase, 1 ml MgC12b, 15 units glucose 6-phosphate dehydrogenase, 1 ml NADP c, 0.5 ml PMS ~, 0.5 ml MTT e
Alanine dehydrogenase (ALD)
1.4.1.1
No
50 ml sodium phosphate buffer f, 50 mg L-alanine, 2 ml NAD g, 0.5 ml PMS d, 1 ml MTT e
Alcohol dehydrogenase (ADH)
1.1.1.1
No
3 ml 96% ethanol, 2 ml isopropanol, 2 ml NAD ~, 0.5 ml PMS ~, 1 ml MTT e
Alkaline phosphatase (ALP)
3.1.3.1
No
Catalase (CAT)
1.11.1.6 No
50 ml Tris-HC1, pH8.5, 1g NaC1, 2 ml MgC12b, 2 ml 0.25 M MnC12, 50 mg ~-naphthyl acid phosphate, 100 mg polyvinylpyrrolidone, 50 mg fast blue BB salt 31.5 ml water with 250 mg sodium sulfite, 3 ml hydrogen peroxide for 1 min; rinse with water; 50 ml water with 750 mg potassium iodine until appearance of white bands
Carbamate kinase (CAK)
2.7.2.2
Yes
25 ml Tris-HC1 a, 100 mg carbamyl-phosphate, 100 mg glucose, 25 mg ADP, 1 mg hexokinase, 1 ml MgC12b, 15 units glucose 6-phosphate dehydrogenase, 1 ml NADP c, 0.5 ml PMS ~, 0.5 ml MTT e
Esterase (EST)
3.1.1.1
No
Fumarase (FUM)
4.2.1.2
No
Glutamate dehydrogenase (NAD-dependent) (GD 1)
1.4.1.2
No
40 ml sodium phosphate buffer f, 1.5 ml of 1% c~-/~-naphthyl acetate (or propionate) in acetone, 25 mg fast blue RR salt 50 ml Tris-HC1 a, 50 mg fumaric acid, 50 units malic dehydrogenase, 2 ml NAD g, 0.5 ml PMS d, 1 ml MTT e 50 ml Tris-HC1 a, 2.1 g glutamic acid, 2 ml NAD g, 0.5 ml PMS ~, 1 ml MTT e
Glutamate dehydrogenase (NADP-dependent) (GD2)
1.4.1.4
No
50 ml Tris-HC1 a, 2.1 g glutamic acid, 1 ml NADP c, 0.5 ml PMS ~, 1 ml MTT e
Glucose 6-phosphate dehydrogenase (G6P)
1.1.1.49
No
50 ml Tris-HC1 a, 100 mg glucose 6-phosphate, 1 ml MgC12b 1 ml NADP c, 0.5 ml PMS ~, 1 ml MTT e
(Continued.)
O
Table 12.3. Continued. Enzyme
EC no.
Agar overlay
Staining solution
Glutamic oxaloacetic transaminase (GOT)
2.6.1.1
No
50 ml Tris-HC1 a, 100 mg a-ketoglutaric acid, 50 mg aspartic acid, 1 mg pyridoxal-5-phosphate, 100 mg fast blue BB salt
Glyceraldehyde 3-phosphate dehydrogenase (NAD dependent) (GP1)
1.2.1.12 No
10 ml Tris-HC1 a, 100 ml fructose 1,6-diphosphate, 10 units aldolase; incubate 15 min, then add 30 ml Tris-HC1 a, 50 mg sodium arsenate, 2 ml NAD g, 0.5 ml PMS d, 1 ml MTT e
Glyceraldehyde 3-phosphate dehydrogenase (NADP dependent) (GP2)
1.2.1.13 No
l0 ml Tris-HC1 a, 100 ml fructose 1,6-diphosphate, 10 units aldolase; incubate 15 min, then add 30 ml Tris-HC1 ", 50 mg sodium arsenate, 1 ml NADP c, 0.5 ml PMS ~, 1 ml MTT e
Hexokinase (HEX)
2.7.1.1
No
50 ml Tris-HC1 a, 200 mg glucose, 50 mg ATP, 2 ml MgC12b, 10 units glucose 6-phosphate dehydrogenase, 1 ml NADP c, 0.5 ml PMS d, 1 ml MTT e
No
40 ml Tris-HC1 a, 1 ml MgC12b, 0.5 ml PMS J, 1 ml MTTe; expose to light
Indophenol oxidase (IPO)
1.15.1.1
Isocitrate dehydrogenase (IDH)
1.1.1.42 No
50 ml Tris-HC1 a, 2 ml 0.1 M isocitric acid, 2 ml MgC12b, 1 ml NADP c, 0.5 ml PMS ~, 1 ml MTT e
Lactate dehydrogenase (LAD)
1.1.1.27 No
40 ml Tris-HC1 a, 0.5 M lithium lactate, 2 ml NAD g, 0.5 ml PMS ~, 1 ml MTT e
Leucine aminopeptidase (LAP)
3.4.1.1
Malate dehydrogenase (MDH)
1.1.1.37 No
50 ml 0.1 M potassium phosphate, pH 5.5, 1 ml MgC12b 30 mg leucyl-~3-naphthyl-amide HC1, 30 mg black K salt 40 ml Tris-HC1 a, 6 ml 2M malic acid, 2 ml NAD g, 0.5 ml PMS ~, 1 ml MTT e
Malic enzyme (ME)
1.1.1.40 No
40 ml Tris-HC1 a, 6 ml 2M malic acid, 2 ml MgC12b, 1 ml NADP c, 0.5 ml PMS ~, 1 ml MTT e
Mannitol 1-phosphate dehydrogenase (M1P)
1.1.1.17 No
50 ml Tris-HC1 a, 5 mg mannitol 1-phosphate, 2 ml NAD g, 0.5 ml PMS ~, 1 ml MTT e
Mannose phosphate isomerase (MPI)
5.3.1.8
Yes
25 ml Tris-HC1 a, 10 mg mannose 6-phosphate, 2 ml MgC12b, 10 units glucose 6-phosphate dehydrogenase, 50 units phosphoglucose isomerase, 1 ml NADP c, 1 ml NAD g, 0.5 ml PMS ~, 1 ml MTT r
Nucleoside phosphorylase (NSP)
2.4.2.1
Yes
40 ml sodium phosphate buffer f, 20 mg inosine, 2 units xanthine oxidase, 0.5 ml PMS ~, 1 ml MTT r
No
(Continued.) taO
Table 12.3. Continued.
tO
Enzyme
EC no.
Agar overlay
Staining solution
Peptidases (PE1, PE2 .... )
3.4.-.-
Yes
6-phosphogluconate dehydrogenase (6PG) Phosphoglucose isomerase (PGI) Phosphoglucomutase (PGM)
1.1.1.44 No
25 ml Tris-HC1 a, 2 ml 0.25 M MnC12, 10 mg peroxidase, 10 mg o-dianisidine di-HC1, 10 mg venom from Crotalus atrox, 20 mg peptides", 20 ml Tris-HC1 a, 10 mg 6-phosphogluconic acid, 10 ml MgC12b, 0.5 ml NADP c, 0.3 ml PMS ~, 0.6 ml MTT e
Shikimate dehydrogenase (SKD)
5.3.1.8
Yes
25 ml Tris-HC1 a, 10 mg fructose 6-phosphate, 0.3 ml MgC12b, 3 units glucose 6-phosphate dehydrogenase, 0.6 ml NADP c, 1 ml NAD ~, 0.5 ml PMS ~, 1 ml MTT e
2.7.5.1
Yes
5 ml Tris-HC1 a, 25 ml water, 5 mg glucose 1-phosphate, 5 ml MgC12b, 50 units glucose 6-phosphate dehydrogenase, 0.5 ml NADP c, 0.5 ml PMS ~, 1 ml MTT e
1.1.1.25
No
50 ml Tris-HC1 a, 30 mg shikimic acid, 2 ml MgC12b, 1 ml NADP c, 1 ml NAD g, 0.5 ml PMS ~, 1 ml MTT e
0.2 M Tris-HC1, pH 8.0. b 0.1 M MgC12. c NADP solution: 1% (w/v) in water. PMS solution: 1% (w/v) in water. MTT solution: 0.8% (w/v) in water. 10 mM sodium phosphate buffer, pH 7.0. g NAD solution: 1% (w/v) in water. "Phenyl-alanyl-leucine; leucyl-glycyl-glycine; leucyl-alanine, etc. a
e f
313
Fig. 12.1. Electrophoretic patterns for glucose 6-phosphate dehydrogenase (G6P), isocitrate dehydrogenase (IDH) and alkaline phosphatase (ALK) in 20 isolates of Neisseria meningitidis. Anodal direction of migration from the origin is indicated by the arrow. The number under each band indicates the electromorph assignment.
Thus, for each individual bacterium, a single band of activity is expected and electrophoretic variation can be related directly to allelic variation at a single genetic locus. Several events, however, may lead to the presence of several bands of enzyme activity in individual bacteria. Contamination of the bacterial preparation during cultivation can be one of these, but this is easy to detect in that a single protein extract will usually present two (or three) bands of activity for several enzymes, while extracts from organisms of the same species have a single band. On rare occasions, additional bands of activity for one enzyme in a single bacte-
314 rium may result from the occurrence of an additional copy of the gene carried on a plasmid (Caugant et al., 1981). For some enzymes, several bands of activity may appear in all isolates of a given species. This may be due to the existence of several conformational forms of certain enzymes (i.e., different multimeric structures of the same protein showing enzymatic activity) or to a low specificity of the staining method. The later eventuality is exemplified by the esterases, where several genetic loci code for esterases differing in their affinity for various substrates. To ensure that the recorded polymorphism originates from a single genetic locus, substrates that can be utilised only by more specific enzymes should be chosen (e.g., naphthyl-propionate instead of naphthyl-acetate). The use of esterase inhibitors may also eliminate some bands of activity and permit the identification of electromorphs coded by the same locus. Enzyme activities that are not being specifically stained for may sometimes be detected. In N. meningitidis, an additional band of activity appears at a similar location on most gels stained for a dehydrogenase. This unknown dehydrogenase is clearly polymorphic and has been incorporated as one of the enzymes routinely scored for epidemiological analysis of the meningococcus (Caugant et al., 1990a). However, it is important that allelic variation in such enzymes is recorded only once, although they may be revealed on gels stained for different enzymes. An analogous situation sometimes occurs with the peptidases. The detection of peptidases encoded by different genes can be achieved by using different peptides (see Table 12.3). However, in some species, a peptidase may have broad specificity and hydrolyse several peptides, e.g., phenylalanyl-leucine and leucylglycyl-glycine, although, in most bacteria, this is achieved by enzymes encoded by two distinct loci. One or a few strains of a bacterial species may occasionally lack activity for an enzyme. Although housekeeping genes code for the stained metabolic enzymes, they are not always needed for survival, especially when the strain has been subcultured in the laboratory for a long time. Null alleles are recorded as such. It is essential, however, to assess that they do not result from a poor preparation or storage of the protein lysate. Because the electromorphs of an enzyme detected within a species are often very close to each other, measuring the distance of migration from the origin is not sufficient. All electromorphs must be identified by side by side comparison on the same gel slice. In practice, this means that the protein extract from a single bacterium will be electrophoresed and stained for the same enzyme many times before its electromorph can be unambiguously identified. When analysing a new set of strains, it is recommended to include on the gel at least the extracts from two reference strains of the same species, if possible presenting distinct electromorphs for most enzymes assayed. For a gel allowing electrophoresis of 20 samples, the reference extracts should be put in positions 7 and 14 in the gel. The first migration permits a rough evaluation of the electromorphs of the unknown strains in comparison to the two standards. Accurate electromorph assignment for each individual enzyme should be obtained by repeatedly running the strains side by side
315 with different reference electromorphs, until identity is ascertained. For species with only a few alleles at individual loci, identification of the electromorphs is an easy task. For highly polymorphic species, numerous successive runs are necessary to identify the electromorphs. Consequently, the amount of work involved in the analysis of 100 strains by MEE increases significantly in relation to the degree of polymorphism of the species. F.
Analysis of the data
For each isolate, an electromorph is assigned to each of the 10-20 enzyme loci. This combination of electromorphs represents the allelic profile of the bacterium or its multilocus genotype (Table 12.4). Distinct allelic profiles are also designated as electrophoretic types (ETs). Thus, each isolate can be characterised by its ET, and bacteria can readily be classified as having identical or different ETs. Further analyses of the data are performed using computers. Programs for analysing MEE data for bacterial strains have been developed and made available for general use by Dr T.S. Whittam (Selander et al., 1986a). These programs have been further improved in the past decade and are now available through the Internet at the home page of Dr Whittam's laboratory (http://www.bio.psu.edu/People/Faculty/ Wh i ttarrdLab/p ro g rams). The statistical package includes five programs: ETDIV, ETCLUS, ETMEGA, ETLINK and ETBOOT. To analyse the electrophoretic data for a set of strains, each electromorph must be given as an integer. Null alleles are coded as "0" and are treated as missing information. Consequently, strains with null alleles are assigned to the same ET as strains with similar enzyme profiles, but which express enzyme activity at the locus for which a null allele is found. If it is wished to distinguish the strains with null alleles, the electromorph should be assigned an integer different from the electromorphs represented in the collection of strains. The input data files need to be saved as text files. ETDIV identifies ETs within the group of bacterial isolates analysed. It provides a list of the distinct allelic profiles, indicates the number of isolates with each ET, and lists the isolates belonging to each ET represented by more than one strain. For each enzyme locus, the alleles identified are indicated together with their frequency, and from the allele frequencies the genetic diversity is calculated as h = (1 - Z xi 2) [n/(n-1)], where xiis the frequency of the ith allele and n is the number of ETs (Nei, 1978). Mean diversity per locus (H) is the arithmetic average of h values over the loci studied. The strains to be analysed can be grouped beforehand in defined populations, according to specific parameters, such as their geographical origin, clinical source, serotype, biotype, etc. ETDIV will analyse the allele frequencies and genetic diversity concomitantly for isolates in each population and for the whole sample. A table will then be generated indicating the ET diversity within the population and in the whole sample, together with the coefficient of genetic differentiation for each locus and averaged over all loci (Nei, 1977).
Table 12.4. Electrophoretic types of 15 Streptococcus pyogenes strains isolated in Gloucestershire, 1994. Strains from patients are indicated in bold. ET
Strain
Source
M-type
Allele at indicated enzyme loci" NSP
CAK
PE3
EST
IPO
ADK
HEX
MPI
PGI
PM1
PM2
Nose
4
2
3
4
1
2
2
2
4
3
2
325 327 389 1239 1240
Wound Nose Throat Blood Throat
4 4 4 4 4
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
1 1 1 1 1
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
3 3 3 3 3
1 1 1 1 1
470
Wound
4
2
4
3
1
2
2
2
3
388 446 475
Tissue Throat Throat
4 4 4
2 2 2
3 3 3
3 3 3
1 1 1
2 2 2
2 2 2
2 2 2
5
328 444
Throat Throat
4 4
3 3
2 2
3 3
1 1
1 1
2 2
6
972 973
Aspirate Blood
5 5
2 2
3 3
3 3
1 1
2 2
7
443
Skin
3
2
7
5
1
2
1
471
PE2
G3P
LAP
3
2
3
12
2 2 2 2 2
12 12 12 12 12
2 2 2 2 2
3 3 3 3 3
1 1 1 1 1
3
2
2
2
3
9
2 2 2
3 3 3
1 1 1
6 6 6
2 2 2
3 3 3
3 3 3
2 2
3 3
2 2
2 2
8 8
2 2
3 3
6 6
2 2
4 4
3 3
1 1
2 2
4 4
1 1
3 3
NT NT
2
3
3
3
1
6
1
3
NT
Enzyme abbreviations: NSP, nucleoside phosphorylase; CAK, carbamate kinase; PE3, leucyl-glycyl-glycine peptidase; EST, (x-naphthyl-propionate esterase; IPO, indophenol oxidase; ADK, adenylate kinase; HEX, hexokinase; MPI, mannose 6-phosphate isomerase; PGI, phosphoglucose isomerase; PM1 and PM2, two phosphoglucomutases; PE2, phenylalanyl-leucine peptidase; G3P, glyceraldehyde 3-phosphate dehydrogenase; and LAP, leucine aminopeptidase. a
317 Finally, the last table of results provides the proportion of polymorphic loci and mean number of alleles per locus within each population, as well as the genetic diversity and standard deviation among both ETs and isolates within each population. ETDIV also generates a file named ETLIST.DAT which is used as the input of the ETCLUS program. ETCLUS provides a dendrogram of genetic relationships between ETs based on the average-linkage algorithm (UPGMA), as described by Sneath & Sokal (1973). Distances are measured as the proportion of loci at which mismatches occur between pairs of ETs. Null alleles that have been scored as '0' are not used in the calculation of pairwise distances. The program generates the distances between the ETs and their nearest relative and provides a simple drawing of the dendrogram. For publication, the dendrogram needs to be drawn again, either manually or using other programs. An altemative is to use ETMEGA and the MEGA program. ETMEGA uses the same input file format and has the same default parameter values as ETCLUS, and creates a matrix of genetic distances for input into the MEGA program (Kumar et al., 1994). MEGA is available upon request from these authors, using an order form linked into the homepage of Dr Whittam's laboratory. ETLINK calculates several measures of linkage disequilibrium, including the distribution of standardised coefficient between all pairs of alleles, the two-locus coefficient for multiple alleles per locus, and the index of multilocus association I a based on the properties of the mismatch distribution (Maynard Smith et al., 1993). ETBOOT is a bootstrap program that randomly selects loci, obtains a distance matrix, finds a tree based on the average-linkage or the neighbour-joining algorithm, and records the nodes of the tree. The process is repeated for a number of bootstrapped trees, as determined by the user. ETBOOT then tabulates the number and frequency of each observed node recovered among the randomly generated trees. Alternative procedures used in analysing MEE data include principal components and principal co-ordinates analyses. 12.3 APPLICATIONS OF MEE
The applications of MEE are numerous and have led to important discoveries regarding the population biology of microorganisms, with practical consequences for medical microbiology. MEE data have been used to assess the taxonomic relationships between organisms, the genetics of bacterial populations and the molecular epidemiology of infectious diseases. In all these applications, but especially for systematics and population genetics, it is essential that the selection of genes is random, without a priori consideration of their degree of polymorphism. The examples cited in this review are taken exclusively from studies of medically important organisms and represent only a very small fraction of the contribution of MEE to bacterial molecular population genetics, epidemiology and taxonomy.
318
A.
Bacterial systematics
The application of MEE to microbial systematics is limited to the study of closely related species, i.e., normally within the same genus. For phylogenetically more remote microorganisms, sharing of alleles at individual loci is unlikely. Thus, the genetic distances, calculated as the proportion of allelic mismatches, will be close to 1 and, consequently, the relationships between the groups of organisms will not be possible to establish. Nevertheless, as will be seen from the examples below, because of the strong correlation of chromosomal divergence indexed by MEE data and DNA-DNA hybridisation studies, MEE has contributed significantly to taxonomic questions, and several bacterial species have been identified solely after their existence was evidenced by population genetic analyses employing MEE.
(i) Borrelia species associated with Lyme disease Boerlin et al. (1992) analysed 50 isolates from human and ticks classified as B. burgdorferi and identified three main genetic clusters that were differentiated from one another at a genetic distance of >0.75. They suggested that each genomic division represented a different genospecies, and defined division I, which was found exclusively in the United States, as B. burgdorferi sensu stricto. The existence of these three genospecies was subsequently confirmed by DNA-DNA reassociation experiments, and the two new species, B. garinii and B. afzelii, were formally recognised. Later, Balmelli & Pifaretti (1995; 1996) identified additional genomic groups, some of which have been associated with distinct clinical manifestations of Lyme borreliosis. (ii) Identification of cryptic Legionnella spp. Since Legionnella pneumophila was first recognised as a causative agent of pneumonia in 1977, more than 30 species of Legionella have been described. One of the first pieces of evidence for the existence of several cryptic species came from the MEE study of Selander et al. (1985) who, in an analysis of 292 isolates, identified two groups of strains diverging from the L. pneumophila isolates at a genetic distance of >0.50. These two species were later confirmed by DNA-DNA hybridisation experiments and numerous other methods (Brenner et al., 1985). The usefulness of MEE for characterising other genomic species of Legionnella has been shown by the studies of Woods et al. (1988), Lanser et al. (1990) and Marques et al. (1995). (iii) Differentiation of a newly recognised Prevotella spp. Two genotypes of Prevotella intermedia were elevated to the rank of species on the basis of the level of homology between whole-cell DNA (Shah & Gharbia, 1992), and were designated as P. intermedia and P. nigrescens. Because of the possible association of P. intermedia and periodontal disease, it was important to find additional, easy-to-use methods to distinguish the two genotypes. Analyses of strains assigned to the two species by MEE with 14 enzyme loci revealed two geneti-
319 cally distinct populations separated at a genetic distance of 0.77, thus providing an adequate tool to assign strains to these two species (Frandsen et al., 1995). (iv) Listeria spp. Boerlin et al. (1991) performed an extensive study of the Listeria genus using MEE. Seventy-three strains of seven Listeria species were characterised at 18 enzyme loci. The analysis revealed six main clusters diverging at genetic distances of >0.8, thus confirming the delineation of the species L. monocytogenes, L. innocua, L. welshimeri, L. seeligeri and L. ivanovii. However, L. grayi and L. murrayi formed a single cluster (Fig. 12.2). It was concluded that differentiation of these two species was not justified by the genetic data (Boerlin et al., 1991) and that L. grayi and L. murrayi should be considered as two biovars of the same species. B.
Population genetics
Bacteria are an attractive group of experimental organisms for population geneticists because of their large degree of diversity, short generation times, haploid chromosomal genomes, and numerous accessory genetic elements (Musser, 1996). Research in bacterial population genetics had its origin mainly in the work of population geneticists interested in bacteria (Milkman, 1973; Selander & Levin, 1980); only in recent years have microbiologists become interested in population genetics. The general thinking of bacterial population genetics is very different from that employed in the molecular study of infectious disease outbreaks, where the main question is to determine whether isolates are the same or not. Still, even when dealing with short term or local epidemiology, some knowledge of the population structure of the microorganism is necessary to ensure a correct interpretation of the data. Population genetics studies of bacteria using MEE have been concerned with determining the relative contributions of mutation and recombination in generating genetic diversity. For public health decisions, the rate of horizontal transfer of genes is important to consider when evaluating the risks associated with genetically engineered microorganisms and with transmission of antibiotic resistance and virulence genes between pathogens. Early studies, using E. coli as a model, showed a very limited transfer of genetic material between individual strains, with most of the genetic diversity resulting from the mutational process (Caugant et al., 1981; Selander et al., 1987a). These observations led to the confirmation of the clonal concept of Orskov & Orskov (1983), which states that much of the transmission of genetic material in E. coli occurs vertically from parents to offspring, with mutation leading to the generation of occasional variants. Recombination occurs at such a low frequency that it is considered insignificant as a source of genetic diversification. This view was then extended to numerous other species and resulted in the clonal paradigm for the genetic structure of bacterial populations.
320
SERO- SPECIES ET VAR ! 10
3b
: =b 4b
"
112b l/|b II 4b 1 4b 11 4b
i !
MONO
2 3a V2a
V2= 1/2a lib 6b lib OI lib
. . . . .
1 11 lS 18 21 14 20 :!:1 10 11 17 |3
-
36 311 37 40 It 14
6b eb lib l/lb tt Illb"
211 II 33 34 41 27 32 29 30 31 44 46 411 42 48 411
lib lllb Ib 4 4 lib 1/2b 4r 1/2b u.s.. II n.d. S n.d. n.d. n.d.
1 II III
-
--
9
IV
_
~
_
~
--"
V
ea
eb ia
tb eb
43 SO n.s. ] 66 n.a.
~
VI
St
-
n.e.
$3 n.a. S2 n.m. 1 ~ 1 54 n.,. S6 n.a.
L
1.0
I
I
I
0.8
1
0.6
GENETIC
I
,,I
0.4
I
I
0.2
INNO
WELS
SEEL
IVAN GRAY MURR
,,,
0
DISTANCE
Fig. 12.2. Genetic relationships among ETs of seven Listeria species. The dendrogram was generated by the average-linkage method of clustering from a matrix of pairwise coefficients of genetic distances, based on electrophoretically demonstrable allelic variation at 18 enzyme loci. MONO, L. monocytogenes; INNO, L. innocua; WELS, L. welshimeri; SEEL, L. seeligeri; IVAN, L. ivanovii; GRAY, L. grayi; MURR, L murrayi. From Boerlin et al., 1991.
In a clonal population, alleles at different loci are strongly associated, i.e., they are in linkage disequilibrium. As a result, different methods based on different
321
Fig. 12.3. Index of association in selected bacterial populations. I A values were those calculated from ETs as reported in the literature: S. pneumoniae (Hall et al., 1996); S. pilosicoli (Trott et al., 1998); H. pylori (Go et al., 1996); B. cepacia (Wise et al., 1995); S. hyodysenteriae (Trottet al., 1997b); S. aureus (Kapur et al., 1995); B. pertussis (Van der Zee et al., 1997); H. influenzae (Fust6 et al., 1996); P. multicida (Blackall et al., 1998); other values are from Maynard Smith et al. (1993).
marker systems will lead to largely similar interpretations. In a freely recombining or panmictic population, no linkage disequilibrium is expected. Then, the use of different markers may result in very different conclusions. Linkage disequilibrium, however, may arise in a bacterial population in many ways, sometimes even as a result of recombination. To unequivocally determine the extent of clonality within bacterial populations, Maynard Smith et al. (1993) developed a new test, termed the index of association. Analysing MEE data for a variety of species, these authors showed that the actual population structure of bacteria ranged from being strictly clonal, as exemplified by the species E. coli and S a l m o n e l l a , to being effectively sexual or panmictic in species naturally competent for transformation, such as N e i s s e r i a g o n o r r h o e a e (O'Rourke & Spratt, 1994) and S t r e p t o c o c c u s p n e u m o n i a e (Lomholt, 1995; Hall et al., 1996). Other species, e.g., N. m e n i n g i t i d i s (Maynard Smith et al., 1993) and S e r p u l i n a h y o d y s e n t e r i a e (Trott et al., 1997b), present an epidemic population structure; i.e., although horizontal genetic exchange occurs at high frequency, a few clonal groups with selective advantage occasionally emerge and dominate the overall panmictic population. Fig. 12.3 shows a comparison of indices of association, calculated from ETs reported in the literature, for different bacterial species.
322 C.
Molecular epidemiology
Distinct clones within a species are frequently characterised by unique combinations of virulence genes or alleles of virulence genes. Thus, the identification of pathogenic clones has important implications for our understanding and control of infectious diseases. MEE measures variation that is accumulating very slowly in the population. Thus, for short-term or local molecular epidemiology of the highly clonal species, the method presents relatively moderate discriminatory power. For some salmonellae, for example, MEE may not be sufficient to ascertain whether or not strains belong to the same outbreak, or whether a special food source may be linked to an outbreak. In contrast, for species with significant rates of recombination in nature, such as the transformable species N. meningitidis and S. pneumoniae, MEE has proved very effective for short-term or local epidemiology. For long-term or global epidemiology of nearly all bacterial species, MEE has been, until recently, the most appropriate method. As will be seen from the examples below, although MEE has been applied to the study of bacteria for less than two decades, it has already provided considerable insight into the molecular mechanisms of temporal and geographical variation in disease frequency, the adaptation of clonal lineages to the environment, and the relationship between disease severity and specific naturally occurring bacterial clones. However, because it is somewhat cumbersome, the method has only been used by a limited number of specialised laboratories (i) Neisseria meningitidis Extensive work has been devoted in the last 15 years to analysing the molecular epidemiology of N. meningitidis, predominantly using MEE. Thousands of meningococcal strains from patients and healthy carriers in all continents have been analysed and many hundreds of clones have been identified (Caugant et al., 1986; 1990a; Olyhoek et al., 1987; Achtman et al., 1990; 1995; Ashton et al., 1991; Woods et al., 1992; Caugant, 1998; Maiden et al., 1998). In spite of tremendous clonal diversity, most meningococcal disease in the world is caused by a handful of lineages of very closely related clones, reflecting the epidemic population structure of the organism. One such lineage (the ET-5 complex), which has no close genetic relationship to other lineages of the species, was responsible for an epidemic of serogroup B meningococcal disease that began in Norway in the early 1970s, and subsequently spread through much of Europe. Clones of the ET-5 complex have been traced to all continents, where they have caused outbreaks and epidemics of invasive disease (Caugant, 1998). MEE analysis has provided evidence of the dynamics of meningococcal clones causing disease. In a study of serogroup B isolates that caused invasive disease in The Netherlands between 1958 and 1986, significant temporal variation in the clonal composition of meningococcal populations was found (Caugant et al., 1990a). Starting in 1980, a new clone complex (called lineage III) was identified
323 and, in the subsequent years, its prevalence increased to reach 20% of the disease isolates, becoming the most prevalent clone causing disease in the population (Scholten et al., 1994). Thereafter, lineage III spread to many European countries, then reached New Zealand in the mid-1990s where a severe epidemic is ongoing, affecting especially the Maoris and Pacific Islanders (Martin et al., 1998a, b). An increase of invasive disease due to serogroup C N. meningitidis strains has been recently reported in several countries, including Canada (Ashton et al., 1991), various regions of the United States (Jackson et al., 1995), the Czech Republic (Kriz et al., 1999), the United Kingdom (Kaczmarski, 1997) and Australia (Jelfs et al., 1998). The organism responsible for these outbreaks is a new variant of an "old" clone-complex of N. meningitidis that can be differentiated from the ancestral clone at a single enzyme locus, the fumarase gene, and was designated ET-15 (Ashton et al., 1991). ET-15 organisms have a significantly higher case-fatality ratio than other invasive meningococcal disease isolates, which may be due to lower herd immunity to the newly emerged clone (Whalen et al., 1995). Clonal analysis of N. meningitidis strains expressing the serogroup A capsular polysaccharide has shown that they are a restricted phylogenetic subpopulation of the species (Caugant et al., 1987). Serogroup A meningococcal strains are unusual in that they may cause large epidemics, with an incidence of >500/100,000, sometimes encompassing several countries or continents. Since the Second World War, such epidemics have been restricted to China and the Sahel region of sub-Saharan Africa. Achtman and co-workers have assembled and characterised serogroup A meningococcal isolates representing the organisms responsible for most epidemics or outbreaks since the 1960s (Olyhoek et al., 1987; Wang et al., 1992). Most epidemics were due to a single clone, and the same clone was often responsible for epidemics in contiguous countries. One such clone, designated clone III-1, has been responsible for two pandemics that both started in China, 15 years apart (Fig. 12.4). The first pandemic moved from China in the 1960s to Romania, Russia and Scandinavia in 1969, then to Brazil at the beginning of the 1970s. The second clone III-1 pandemic started in the early 1980s, again in China. In August 1987, 7000 cases of meningococcal disease occurred during the annual Haj pilgrimage to Mecca, Saudi Arabia, probably carried by pilgrims from South Asia (Moore et al., 1989). The clone was then spread worldwide by returning pilgrims. Cases were reported in the United States, England and France among pilgrims and their closecontacts, but the strain did not spread further and no epidemics developed in these countries (Achtman, 1995). Clone III-1 strains were then introduced for the first time to the African continent. Major epidemics followed in Chad, Ethiopia, Sudan and Kenya in 1988 and 1989 (Moore et al., 1989; Tekle Haimanot et al., 1990; Salih et al., 1990; Pinner et al., 1992), followed by Niger, the Central African Republic, Burundi, Guinea, Mali, Zambia, Cameroon, Uganda and Rwanda (Caugant, 1998). In 1996, the sub-Saharan region of Africa was again affected by a clone III-1 epidemic of unprecedented scale with over 150,000 reported cases and 16,000 deaths, of which nearly 80% occurred in two countries, Burkina Faso and Nigeria. Meningitis outbreaks caused by clone III-1 are still ongoing in sub-Saha-
4~
Fig. 12.4. Geographic spread of serogroup A N. meningitidis belonging to clone III-1.
325 ran Africa. In 1999, Sudan experienced its third clone III-1 epidemic in 10 years, and Senegal was reached for the first time by a clone III-1 epidemic. Thus, the introduction of clone III- 1 in Africa after the Haj pilgrimage in 1987 had extremely severe consequences in a population never previously exposed to the organism, leading to epidemics and outbreaks encompassing basically the whole continent. (ii) Streptococcus pyogenes Severe invasive infections and episodes of acute rheumatic fever caused by S. pyogenes have been reported with increased frequency in recent years in the United 1990; Holm et al., 1992) States (Schwartz et al., 1990), Europe (Martin & Hr and elsewhere (Martin & Single, 1993; Carapetis et al., 1995). Many infections have occurred in previously healthy subjects, and many infected patients have presented with clinical symptoms similar to the toxic shock syndrome caused by Staphylococcus aureus, leading to the characterisation of a streptococcal toxic-shocklike syndrome (TSLS). Musser et al. (1991) characterised the clonal relationships among 108 isolates of S. pyogenes recovered from patients with TSLS or other invasive diseases in the United States by using MEE at 12 loci and an analysis of exotoxin A, B and C production. Thirty-three clones were identified, but nearly half of the disease episodes, including more than two-thirds of the cases of TSLS, were caused by strains of two related clones. These were designated ET-1 and ET-2, and were associated with the M1 and M3 protein serotypes, respectively. The production of pyrogenic exotoxin A (SPE-A), either alone or in combination with other pyrogenic exotoxins, was associated with recovery of the strains in TSLS patients. Analysis of ET- 1 and ET-2 strains from disease episodes in the 1920s and 1930s revealed a different speA allele (speA1) than that seen in recent isolates. The speA1 allele was also found in various other clones of the species and was probably the ancestral type. This change in the SPE-A exotoxin may, in part, explain the temporal and geographical variation in disease frequency and severity due to these clones (Musser et al., 1993a, b). Further analyses of the genetic diversity of M1 organisms were performed to encompass strains associated with severe invasive disease on an intercontinental basis (Musser et al., 1995). Limited diversity was revealed by MEE and pulsedfield gel electrophoresis (PFGE), with only six ETs and 16 PFGE-types distinguished. The study showed that the M1 serotype did not represent a distinct lineage of S. pyogenes. One subclone of ET- 1, of PFGE-type 1a, was recovered worldwide. Virtually all isolates of this subclone had identical speA, e m m l , speB and ska alleles, showing that these organisms shared a common ancestor and that global dispersion of this M1 subclone has occurred very recently (Musser et al., 1995). The lack of congruency between variation in the emm, ssa and ska sequences, and estimates of overall chromosomal relationships determined by MEE, demonstrated that horizontal transfer and recombination play a fundamental role in diversifying natural populations of S. pyogenes (Whatmore et al., 1994; 1995; Kapur et al., 1995a; Reda et al., 1996). Most M1 strains associated with the recent increases in invasive disease worldwide were extremely similar, probably as a result of recent
326 descent from a common ancestor. However, the sic gene, which codes for an extracellular protein that inhibits complement (streptococcal inhibitor of complement), unexpectedly showed a high level of polymorphism (Perea Mejia et al., 1997). A total of 62 alleles were revealed by sequence analysis of the sic gene in 165 M1 isolates. The variation was produced by in-flame insertions and deletions, and basically all nucleotide substitutions resulted in an amino acid change in the Sic protein, indicating that natural selection is mediating structural changes in the protein (Stockbauer et al., 1998). Thus, when applied to S. pyogenes, MEE analyses have provided a population genetic framework against which allelic variation in putative virulence genes can be studied. (iii) Streptococcus pneumoniae S. pneumoniae is a major cause of illness and death in children and adults worldwide. Capsular polysaccharide is an essential virulence determinant providing protection from phagocytosis, and 84 capsule serotypes have been described. Relatively recently, antibiotic resistance in S. pneumoniae became a global public health problem, and most population genetic studies using MEE have been concerned with the emergence and spread of antibiotic resistance in this bacterium. Resistance to penicillin in S. pneumoniae is due to the expression of altered high molecular mass penicillin-binding proteins (PBPs) that have reduced affinity to [~-lactams (Dowson et al., 1989). Using MEE, Munoz et al. (1991) showed that a multiresistant clone of serotype 23F had spread intercontinentally from Spain to the United States. The same clone was also identified later in South Africa (Sibold et al., 1992). The dramatic consequences of the introduction of a virulent multiresistant S. pneumoniae clone into a population can be illustrated by such an event in Iceland. The first penicillin-resistant strain was recovered in December 1988; thereafter, the frequency of penicillin-resistant organisms rose sharply over the next 3 years to reach 17% of all isolates in the first quarter of 1992 (Kristinsson et al., 1992). Almost 70% of the resistant isolates expressed serogroup 6 capsule polysaccharide, and were also resistant to tetracycline, chloramphenicol, erythromycin and trimethoprim-sulphamethoxazole. Soares et al. (1993) examined 57 such organisms for serotype, PBP pattern, PFGE pattern and ET, and found that all isolates were of serotype 6B and had closely similar or identical patterns for each of the molecular markers examined. The Icelandic organisms were indistinguishable from a subgroup of multiresistant serotype 6B pneumococci that had been present with high incidence in Spain for the past two decades. Thus, the authors concluded that the multiresistant Icelandic clone was probably imported from Spain. Lomholt (1995) used MEE to examine pneumococcal isolates of the serotypes associated with severe childhood disease in Northern Europe. In contrast to the apparent clonality of strains harbouring resistance genes, the study revealed a high degree of genotypic diversity, with 70 ETs represented among the 114 strains analysed, and linkage equilibrium between loci calculated for 66 two-locus compari-
327 sons and six four-locus comparisons. Further evidence was provided by the study of Hall et al. (1996) which confirmed the lack of linkage disequilibrium in ETs of S. pneumoniae. It was suggested that, as with the meningococci, pneumococci have a freely recombining population structure with occasional epidemic spread of rare successful clones. Horizontal transfer and recombinational processes have also been shown to generate variation in capsule type, PBP and the immunoglobulin A 1 protease gene (Dowson et al., 1989; Coffey et al., 1995; Lomholt, 1995). (iv) Listeria monocytogenes L. monocytogenes is widely spread in the environment and also occurs in the intestinal tract of healthy animals and man. In immunologically-compromised individuals, the neonate and foetus, this organism may cause serious invasive disease with a mortality rate of approximately 30%. While the incidence of sporadic disease is low (2-10 cases per million per year), several outbreaks of listeriosis have been traced to contaminated foods, causing great concern in the medical community and the food industry. Numerous epidemiological studies have then been undertaken to elucidate the routes of transmission of the bacteria from raw food products, through the food chain, to the consumers. Piffaretti et al. (1989) distinguished 45 ETs among 175 isolates recovered from man, animals, food and the environment in several countries. Common to all MEE studies of L. monocytogenes, two well-defined phylogenetic divisions of ETs were identified, designated cluster I and cluster II (Pifaretti et al., 1989; Bibb et al., 1990; Trott et al., 1993). While many multilocus genotypes in cluster I were associated with disease in man or animals, strains of two closely related clones were responsible for two-thirds of the cases of disease, including four epidemics occurring in widely separated geographic regions. Thus, a few clones of L. monocytogenes may present characteristics associated with specific ecological and epidemiological adaptation. It has been hypothesised that strains in cluster I might be more virulent (NCrrung & Skovgaard, 1993; Trottet al., 1993), as major outbreaks have been linked to strains of this lineage. Sequence analyses of parts of putative virulence genes, such as the listeriolysin gene (hly), the flagellin gene (flaA), the invasine-associated protein gene (iap) (Rasmussen et al., 1991; 1995), and the actin nucleating protein gene (actA), a key determinant of L. monocytogenes virulence (D. Caugant, unpublished data), have confirmed the existence of at least two very divergent evolutionary lineages in the species, and support the idea of differences in the pathogenic potential of these lineages. Several laboratories have used MEE to identify or confirm the routes of transmission of the bacteria in outbreaks and sporadic cases of human listeriosis (Bibb et al., 1989; 1990; Schwartz et al., 1989; Boerlin & Pifaretti, 1991; Farber et al., 1991; Pinner et al., 1992; NCrrung & Skovgaard, 1993; Trott et al., 1993; Boerlin et al., 1996), as well as in outbreaks of animal listeriosis (Baxter et al., 1993). MEE has enabled the links between contaminated foodstuffs and cases to be ascertained, and was able to rule out a single common source as a cause of an outbreak involving 36 cases in the Philadelphia area over a 4-month period (Schwartz et al.,
328 1989). In the food industry, it was generally believed that contamination of products with L. monocytogenes resulted from the spread of bacteria that originated from animal or fish sources. However, the studies of Boerlin & Pifaretti (1991), Harvey & Gilmour (1994), RCrvik et al. (1995), Nesbakken et al. (1996) and Boerlin et al. (1997) have shown that contamination of meat and fish products with L. monocytogenes originates mainly from the processing environment rather than from animal or fish sources. 12.4 MLST M E T H O D O L O G Y Recent developments in automation and nucleic acid sequencing chemistries have made possible the use of large-scale DNA sequencing techniques. These can be used to rapidly and unambiguously identify a causative infectious agent, and confirm or refute the identity of isolates recovered from temporally-linked patients thought to be involved in a disease outbreak. Accurate strain identification using a newly developed multilocus typing scheme based on DNA sequencing, designated multilocus sequence typing (MLST), has improved epidemiological surveillance of major bacterial pathogens, such as N. meningitidis and S. pneumoniae. MLST and MEE are based on the same principles, i.e., the analysis of allelic variation in multiple, randomly selected, housekeeping genes that diversify through random accumulation of neutral variation. While the technique of MEE identifies naturally occurring allelic variation in chromosomal housekeeping genes indirectly by indexing the variation at the protein level, MLST uses variations in fragments of the nucleic acid sequence (about 500 bases) that make-up these housekeeping genes. The principles and an evaluation of MLST were recently reviewed by Spratt (1999). MLST clearly presents a number of advantages over MEE. The technique is highly automated and the sequence data are unambiguous. Thus, data can be readily compared between laboratories. No reference strains are needed to standardise the results within or between laboratories. This is a significant advantage because, for a very polymorphic organism such as N. meningitidis, more than 50 reference strains are necessary to include all the alleles at the 14 loci studied. Potentially, the MLST method can be utilised even without prior cultivation of the organism. Due to its electronic portability via the Internet and the possibilities for creation of a global database, the technique has considerable importance for international epidemiological surveillance. Currently, MLST schemes have been developed for three organisms: N. meningitidis (Maiden et al., 1998), S. pneumoniae (Enright & Spratt, 1998) and Helicobacter pylori (Achtman et al., 1999). In each case, seven housekeeping gene loci have been used, which appears to be the minimal set required to obtain a good representation of the overall genome. The genes sequenced should be widely separated on the chromosome and should not be adjacent to genes that might be under selective pressure. While working with a more limited number of genes than with
329
6050U'I
~L ,5
Z
403020 ,
10~ _
--T--'I'--T~I
1
2
3
4
5
I
1
I
6
7
8
T~I'
9
I
I
I
10 11 12 13 14
No. genes Fig. 12.5. Comparison of the discriminatory power of MEE and MLST in a collection of N. meningitidis isolates from patients. The cumulativenumber of types distinguishedby addition of genes are indicated by circles for the MLST scheme and triangles for the MEE scheme. MEE, the degree of resolution at individual loci is much higher; thus, the discriminatory potential of MLST is extremely high. For meningococci, it has been estimated that the typing scheme could resolve over 24 million sequence types (STs) (Maiden et al., 1998). Fig. 12.5 compares the cumulative number of types obtained in a collection of 70 N. meningitidis strains from patients by MEE and MLST, respectively. The individual genes are ordered according to the number of discriminated alleles, starting with the more polymorphic. While MLST provided a higher degree of discrimination with the first loci, the number of types discriminated by each method was very similar after reaching seven loci, and was increased only slightly by the use of seven additional loci in MEE. Consequently, MLST provides very comparable information to that obtained with MEE, and it can accordingly be used as a general tool for studies of population genetics and the molecular epidemiology of bacteria. The one inconvenient aspect of MLST compared with MEE is that some a priori knowledge of the nucleotide sequence of the organism to be studied is necessary to determine which genes will be suitable for the typing scheme and to enable the design of appropriate primers.
A.
Preparation of PCR products
A pure culture of each isolate is grown on solid medium under appropriate conditions. One loopful of cells is scraped from the agar plate or slant and suspended in 100 ~tl of Tris-EDTA buffer and boiled for 5 min. After centrifugation, the supernatant is used as the DNA source. For H. pylori (Achtman et al., 1999), DNA was extracted using the CTAB method of Ausubel et al. (1994).
Table 12.5. PCR and sequencing primers for meningococcal genes used in MLST
PCR primer
PCR primer sequence
PCR primer
abcZ-P1 abcZ-P2 adk-P1 adk-P2 aroE-P1 aroE-P2 fumC-A1 fumC-A2 gdh-P1 gdh-P2 pdhC-P1 pdhC-P2 pgm-P1 pgm-P2
5'-AATCGTTTATGTACCGCAGG-3' 5'-GTTGATTTCTGCCTGTTCGG-3' 5'-ATGGCAGTTTGTGCAGTTGG-3' 5'-GATTTAAACAGCGATTGCCC-3' 5'-ACGCATTTGCGCCGACATC-3' 5'-ATCAGGGCTTTTTTCAGGTT-3' 5'-CACCGAACACGACACGAT GG-3' 5'-ACGACCAGTTCG TCAAAC TC-3' 5'-ATCAATACCGATGTGGCGCGT-3' 5'-GGTTTTCATCTGCGTATAGAG-3' 5'-GGTTTCCAACGTATCGGCGAC-3' 5'-ATCGGCTTTGATGCCGTATTT-3' 5'-CTTCAAAGCCTACGACATCCG-3' 5'-CGGATTGCTTTCGATGACGGC-3'
abcZ-S 1 abcZ-S2 adk-S 1 adk-S2 aroE-S 1 aroE-S2 fumC-S1 fumC-S2 gdh-S1 gdh-S2 pdhC-S 1 pdhC-S2 pgm-S 1 pgm-S2
ta~
Sequencing primer sequence same as abcZ-P 1 5'-GAGAACGAGCCGGGATAGGA-3' 5'-AGGCTGGCACGCCCTTGG-3' 5'-CAATACTTCGGCTTTCACGG_3' 5'-GCGGTCAAC/TACGCTGATT-3' 5'-ATGATGTTGCCGTACACATA_3' 5'-TCGGCACGGGTTTGAACAGC-3' 5'-CAACGGCGGTTTCGCGCAAC-3' 5'-CCTTGGCAAAGAAAGCCTGC_3' 5'-GCGCACGGATTCATATGG-3' 5'-TCTACTACATCACCCTGATG-3' same as pdhC-P2 5 '-C GGC GATGC C GACC GCTTGG- 3' 5'-GGTGATGATTTCGGTTGCGCC-3'
331 Table 12.6. PCR and sequencing primers for pneumococcal genes used in MLST
Primer
PCR/sequencing primer sequence
AroE-up AroE-dn Gdh-up Gdh-dn Gki-up Gki-dn RecP-up RecP-dn Spi-up Spi-dn Xpt-up Xpt-dn Ddl-up Ddl-dn
5'-GCCTTTGAGGCGACAGC-3' 5'-TGCAGTTCA(G/A)AAACAT(A/T)TTCTAA-3' 5'-ATGGACAAACCAGC(G/Aff/C)AG(C/T)TT-3' 5'-GCTTGAGGTCCCAT(G/A)CT(G/A/T/C)CC-3' 5'-GGCATTGGAATGGGATCACC-3' 5'-TCTCCCGCAGCTGACAC-3' 5'-GCCAACTCAGGTCATCCAGG-3' 5'-TGCAACCGTAGCATTGTAAC-3' 5'-TTATTCCTCCTGATTCTGTC-3' 5'-GTGATTGGCCAGAAGCGGAA-3' 5'-TTATTAGAAGAGCGCATCCT-3' 5'-AGATCTGCCTCCTTAAATAC-3' 5'-TGC(C/T)CAAGTTCCTTATGTGG-3' 5'-CACTGGGT(G/A)AAACC(A/T)GGCAT-3'
The meningococcal MLST scheme uses internal fragments of the following seven housekeeping genes: putative ABC transporter (abcZ), adenylate kinase (adk), shikimate dehydrogenase (aroE), fumarate dehydrogenase (fumC), glucose6-phosphate dehydrogenase (gdh), pyruvate dehydrogenase subunit (pdhC) and phosphoglucomutase (pgm). The pneumococcal MLST scheme uses internal fragments of shikimate dehydrogenase (aroE), glucose-6-phosphate dehydrogenase (gdh), glucose kinase (gki), transketolase (recP), signal peptidase I (spi), xanthine phosphoribosyltransferase (xpt) and D-alanine-D-alanine ligase (ddl). Fragments of the following seven genes are used for H. pylori: urease accessory protein (urel), A/G-specific adenine glycosylase (mutY), elongation factor EF-P (efp), inorganic pyrophosphatase (ppa), GTPase (yphC), ATP synthase-Flo~ (atpA) and anthranilate isomerase (trpC). The primer pairs used for PCR amplification of internal fragments of the genes and sequencing of the PCR products are listed in Tables 12.5-12.7 for N. meningitidis, S. pneumoniae and H. pylori, respectively. PCR amplification is carried out with 1 ~tl of the chromosomal DNA preparation, 5 ~tl of 10 x PCR buffer, 0.2 glVl of each PCR primer, 200 pM of each dNTP mix, 0.5 U Taq polymerase and H20 to a total volume of 50 gl. The annealing temperature should be determined for each different set of primers and may depend on the PCR equipment; primers are generally designed to have optimal annealing temperatures of 52-60~ After completion of the PCR, 5 gl portions of the endproducts are electrophoresed on an agarose gel, together with size standards, to check for a successful reaction.
Table 12.7. PCR and sequencing primers for H. pylori genes used in MLST
t.~ t,~
PCR primer
PCR primer sequence
PCR primer
Sequencing primer sequence
atp-A1 atp-A6 efp-F01 efp-R02 mutY-101 mutY-102 ppal (+) ppa2 (-) ppa3 (-) HptrpC- 1 (+) HptrpC-5 (-) HptrpC-6 (+)
5'-GCTTAAATGGTGTGATGTCG-3' 5'-AATGGGCAAGGGCGAATAAG-3' 5'-GGCAATTGGGATGAGCGAGCTC-3' 5'-CTTCACCTTTTCAAGATACTC-3' 5'-AGCGAAGTGATGAGCCAACAAAC-3' 5'-AAAGGGCAAATCGCACATTTGGG-3' 5'-GTGAGCCATGACGCTGATTCTTTGT-3' 5'-GCCTTGATAGGCTTTTATCGCTTTCT-3' 5'-GCCATTTCACACCAACACCCAAT-3' 5'-CAAGCTCCTAGAAGTCTCTG-3' 5'-CCCAGCTAGCATGAAAGG-3' 5'-TAGAATGCAAAAAAGCATCGCCCTC-3'
Hp71S1 (+) Hp71 $2 (+) Hp71S3 (+) Hp71AS 1 (-) yphC-F1 yphC-F3 yphC-R4 yphC-R5
5'-CAATAAAGTCAGCTTGGCGCAACT-3' 5'-GTTATTCGTAAGGTGCGTTTGTTG-3' 5'-GGCAATGCTAGGACTTGT-3' 5'-TCCCTTAGATTGCCAACTAAACGC-3' 5'-CACTATTACCACGCCTATTTTTTTGAC-3' 5'-CTTATGCGTTTTCTTCTTTTGG-3' 5'-AAGCAGCTGTTGTGATCACGGGGGC-3' 5'-TTTCTARGCTTTCTAAAATATC-3'
atp-A7 atp-A4 efp-F02 efp-R01 mutY-101 mutY-102 ppal (+) ppa2 (-) ppa3 (-) HptrpC-9 (+) HptrpC-7 (-) HptrpC-6 (+) HptrpC-8 (-) Hp71S1 (+) Hp71S2 (+) Hp71 $3 (+) Hp71AS 1 (-) yphC-F1 yphC-F3 yphC-R4 yphC-R5
5'-CGCTTTGGGTGAGCCTATTG-3' 5'-TGCCCGTCTGTAATAGAAATG-3' 5'-GGGCTTGAAAATTGAATTGGGCGG -3' 5'-GTATTGACTTTAATGATCTCACCC-3' 5'-AGCGAAGTGATGAGCCAACAAAC-3' 5'-AAAGGGCAAATCGCACATTTGGG-3' 5'-GTGAGCCATGACGCTGATTCTTTGT-3' 5'-GCCTTGATAGGCTTTTATCGCTTTCT-3' 5'-GCCATTTCACACCAACACCCAAT-3' 5'-CGCTTGCTCAA(AG)CTCCAATACGAC-3' 5'-TAAGCCCGCACACTTTATTTTCGCC-3' 5'-TAGAATGCAAAAAAGCATCGCCCTC-3' 5'-GTCGTATTGGCG(CT)TTGAGCAAGCG-3' 5'-CAATAAAGTCAGCTTGGCGCAACT-3' 5'-GTTATTCGTAAGGTGCGTTTGTTG-3' 5'-GGCAATGCTAGGACTTGT-3' 5'-TCCCTTAGATTGCCAACTAAACGC-3' 5'-CACTATTACCACGCCTATTTTTTTGAC-3' 5'-CTTATGCGTTTTCTTCTTTTGG-3' 5'-AAGCAGCTGTTGTGATCACGGGGGC-3' 5'-TTTCTARGCTTTCTAAAATATC-3'
333
B.
Nucleotide sequencing of PCR products
The DNA fragments are purified either by column filtration through (e.g.) QIAquick (Qiagen, Crawley, UK) or ChromaSpin + TEl00 (Clontech, Cambridge, UK) columns, or in an enzymatic reaction (7 gl of PCR product + 10 U exonuclease + 2 U shrimp alkaline phosphatase; Amersham Pharmacia Biotech, Little Chalfort, UK), as recommended by the respective manufacturers. Sequencing reactions should be carried out on both strands, using the primers that were used for the initial PCR amplification (S. pneumoniae) or internal primers (N. meningitidis, H. pylori). Protocols recommended for the BigDye Ready Reaction Termination Mix (PE Applied Biosystems, Warrington, UK) are used, reduced to quarter-volumes. Samples are then electrophoresed on a Prism 377 Automated Sequencer (PE Applied Biosystems).
C.
Analysis of the sequence data
After electrophoresis, the complementary strands must be aligned and the sequences edited so that they correspond exactly to the regions that are used to define the alleles. Various packages are suitable, including GCG, DNASTAR, AutoAssembler and Sequence Navigator Software (PE Applied Biosystems). For each strain, the consensus sequences for each gene fragment must be determined. The combination of alleles at the seven gene fragments identifies the allelic profile or ST of the strains, and the relatedness between isolates can be analysed in the same manner as for MEE data (i.e., genetic diversity and construction of a dendrogram from the matrix of pairwise differences between STs).
D.
Use of the database
Meningococcal and pneumococcal MLST databases have been set up and are continuously expanding (http://mlst.zoo.ox.ac.uk). The meningococcal database currently contains 415 STs from isolates recovered from patients and healthy carriers, while the pneumococcal database contains 270 STs of isolates, obtained mostly from cases of serious invasive disease, together with various penicillin-resistant and multiply antibiotic-resistant isolates. Consensus sequences of each gene fragment can be compared with those in the databases. The software checks that the sequences are the correct length and that they do not contain any undetermined characters. A check is also made to verify that the submitted sequence is at least 70% similar to another allele at that locus. Options are available to identify the allele at a single locus, to enter an allele profile, to find isolates in the database that match or nearly match an allele profile, or to browse the database. Consensus sequences not represented in the database can be submitted as a new allele. The database curator evaluates the traces of the sequence before assigning a number to the new allele and including it in the database.
334
E.
Applications of MLST
MLST was first evaluated for N. meningitidis using sequences of c.470-bp fragments from 11 housekeeping genes in a reference set of 107 isolates from invasive disease and healthy carriers, selected on the basis of their multilocus genotypes as determined by MEE (Maiden et al., 1998). The strain associations obtained by MLST were consistent with the clonal groupings determined by MEE. Most isolates from hyper-virulent lineages of meningococci belonging to serogroups A, B and C were identical for all loci or differed from the majority type at only a single locus. Six loci were selected that reliably identified the major lineages associated with epidemic disease. A seventh gene (fumC) was then added in an attempt to discriminate the new clone variant (ET-15) that was recently associated with serogroup C outbreaks in North America, Europe and Australia. A similar study has been performed for S. pneumoniae. Among 274 isolates from recent cases of invasive pneumococcal disease in eight countries, 143 STs were resolved. Isolates of the same ST were recovered from cases of invasive disease in different countries, implying that strains with specific genotypes had an increased capacity to cause invasive disease. The relationship between STs and serotypes suggested that horizontal exchange of capsular genes was uncommon in the pneumococcal population associated with invasive disease (Enright & Spratt, 1998). A study of 74 penicillin-resistant isolates from Taiwan using MLST showed that 86% of the isolates belonged to one of three clusters, two of which were previously undescribed (Shiet al., 1998). Coffey et al. (1999) provided new evidence by MLST that penicillin-resistant serotype 14 pneumococcal isolates were identical to the Spanish penicillin-resistant serotype 9V clone, and that they arose by recombinational replacement of the capsular locus and flanking regions. A collection of 20 strains of H. pylori analysed by MLST for fragments of seven housekeeping genes revealed the existence of two weak clonal groups, in spite of extensive inter-strain recombination (Achtman et al., 1999). 12.5 PERSPECTIVES
MLST is a powerful new approach for the characterisation of microorganisms, since it provides unambiguous molecular typing data that are electronically portable between laboratories, and which can be used in studies of global epidemiology. MLST schemes for Staph. aureus, Strep. pyogenes, H. influenzae, Yersinia spp. and other major pathogens are under development. MLST is a simple technique that requires only the ability to amplify specific DNA fragments by PCR and to sequence those fragments. While the method is still quite expensive, further technological developments in automated sequencing should soon render MLST accessible to all major public health laboratories, including those in the developing world. In line with the impact of MEE, MLST will further improve our understanding of the population and evolutionary biology of microbial pathogens, as well as other microorganisms. For the more stable microbial pathogens, sequence
335 analysis of one or a few hypervariable loci, in addition to the M L S T scheme based on housekeeping genes, will provide a versatile tool for studies of both global and local epidemiology. ACKNOWLEDGEMENTS This paper is dedicated to Prof. Robert K. Selander.
REFERENCES Achtman, M. (1990). Molecular epidemiology of epidemic bacterial meningitis. Reviews in Medical Microbiology 1, 29-38. Achtman, M. (1994). Clonal spread of serogroup A meningococci. A paradigm for the analysis of microevolution in bacteria. Molecular Microbiology 11, 15-22. Achtman, M. (1995). Global epidemiology of meningococcal disease. In Meningococcal disease, Cartwright K., ed., pp. 159-175. Wiley, New York. Achtman, M. (1997). Microevolution and epidemic spread of serogroup A Neisseria meningitidis a review. Gene 192, 135-140. Achtman, M., Heuzenroeder, M., Kusecek, B., Ochman, H., Caugant, D.A., Selander, R.K., V~iis~inenRhen, V., Korhonen, T.K., Stuart, S., Orskov, E & Orskov, I. (1986). Clonal analysis of Escherichia coli O2:K1 isolated from diseased humans and animals. Infection and Immunity 51, 268-276. Achtman, M., Azuma, T., Berg, D.E., Ito, Y., Morelli, G., Pan, Z.J., Suerbaum, S., Thompson, S.A., van Der Ende, A. & van Doom, L.J. (1999). Recombination and clonal groupings within Helicobacter pylori from different geographical regions. Molecular Microbiology 32, 459-470. Aeschbacher, M. & Piffaretti, J.-C. (1989). Population genetics of human and animal enteric Campylobacter strains. Infection and Immunity 57, 1432-1437. Altwegg, M., Reeves, M.W., Altwegg-Bissig, R., & Brenner, D.J. (1991). Multilocus enzyme analysis of the genus Aeromonas and its use for species identification. Zentralblattfiir Bakteriologie 275, 28-45. Amonsin, A., Wellehan, J.E, Li, L.L., Vandamme, E, Lindeman, C., Edman, M., Robinson, R.A. & Kapur, V. (1997). Molecular epidemiology of Ornithobacterium rhinotracheale. Journal of Clinical Microbiology 35, 2894-2898. Angen, O., Caugant, D.A., Olsen, J.E. & Bisgaard, M. (1997). Genotypic relationships among strains classified under the (Pasteurella) haemolytica-complex as indicated by ribotyping and multilocus enzyme electrophoresis. Zentralblattfiir Bakteriologie 286, 333-354. Arbeit, R.D., Slutsky, A., Barber, T.W., Maslow, J.N., Niemczyk, S., Falkinham, J.O., O'Connor, G.T. & Von Reyn, C.E (1993). Genetic diversity among strains of Mycobacterium avium causing monoclonal and polyclonal bacteremia in patients with AIDS. Journal of Infectious Diseases 167, 1384-1390. Ashton, EE., Ryan, J.A., Borczyk, A., Caugant, D.A., Mancino, L. & Huang, D. (1991). Emergence of a virulent clone of Neisseria meningitidis serotype 2a that is associated with meningococcal group C disease in Canada. Journal of Clinical Microbiology 29, 2489-2493. Ausubel, EM., Brent, R., Kingston, R.E., Moore, D.D., Seidman, J.G., Smith, J.A. & Struhl, K. (1994). Current protocols in molecular biology. Wiley, New York. Balmelli, T. & Piffaretti, J.C. (1995). Association between different clinical manifestations of Lyme disease and different species of Borrelia burgdorferi sensu lato. Research in Microbiology 146, 329-340. Balmelli, T. & Piffaretti, J.C. (1996). Analysis of the genetic polymorphism of Borrelia burgdorferi sensu lato by multilocus enzyme electrophoresis. International Journal of Systematic Bacteriol-
336 ogy 46, 167-172. Baxter, E, Wright, E, Chalmers, R.M., Low, J.C. & Donachie, W. (1993). Characterization by multilocus enzyme electrophoresis of Listeria monocytogenes isolates involved in ovine listeriosis outbreaks in Scotland from 1989 to 1991. Applied and Environmental Microbiology 59, 3126-3129. Beltran, E, Musser, J.M., Helmuth, R., Farmer, J.J., Frerichs, W.M., Wachsmuth, I.K., Ferris, K., McWhorter, A.C., Wells, J.G., Cravioto, A. & Selander, R.K. (1988). Toward a population genetic analysis of Salmonella: genetic diversity and relationships among strains of serotypes S. choleraesuis, S. derby, S. dublin, S. enteritidis, S. heidelberg, S. infantis, S. newport, and S. typhimurium. Proceedings of the National Academy of Sciences of the United States of America 85, 7753-7757. Beltran, E, Plock, S.A., Smith, N.H., Whittam, T.S., Old, D.C. & Selander, R.K. (1991). Reference collection of strains of the Salmonella typhimurium complex from natural populations. Journal of General Microbiology 137, 601-606. Beltran, E, Delgado, G., Navarro, A., Trujillo, E, Selander, R.K. & Cravioto, A. (1999). Genetic diversity and population structure of Vibrio cholerae. Journal of Clinical Microbiology 37, 581-590. Bert, E, Picard, B., Lambert Zechovsky, N. & Goullet, E (1995). Identification and typing of pyogenic streptococci by enzyme electrophoretic polymorphism. Journal of Medical Microbiology 42, 442-451. Beutin, L., Orskov, I., 0rskov, E, Zimmermann, S., Prada, J., Gelderblom, H., Stephan, R. & Whittam, T.S. (1990). Clonal diversity and virulence factors in strains of Escherichia coli of the classic enteropathogenic serogroup O114. Journal of Infectious Diseases 162, 1329-1334. Bibb, W.E, Schwartz, B., Gellin, B.G., Plikaytis, B.D. & Weaver, R.E. (1989). Analysis of Listeria monocytogenes by multilocus enzyme electrophoresis and application of the method to epidemiologic investigations. International Journal of Food Microbiology 8, 233-239. Bibb, W.E, Gellin, B.G., Weaver, R., Schwartz, B., Plikaytis, B.D., Reeves, M.W., Pinner, R.W. & Broome, C.V. (1990). Analysis of clinical and food-borne isolates of Listeria monocytogenes in the United States by multilocus enzyme electrophoresis and application of the method to epidemiologic investigations. Applied and Environmental Microbiology 56, 2133-2141. Blackall, EJ., Trott, D.J., Rapp-Gabrielson, V. & Hampson, D.J. (1997). Analysis of Haemophilus parasuis by multilocus enzyme electrophoresis. Veterinary Microbiology 56, 125-134. Blackall, EJ., Fegan, N., Chew, G.T. & Hampson, D.J. (1998). Population structure and diversity of avian isolates of Pasteurella multocida from Australia. Microbiology 144, 279-289. Boerlin, E (1997). Applications of multilocus enzyme electrophoresis in medical microbiology. Journal of Microbiological Methods 28, 221-231. Boerlin, E & Piffaretti, J.C. (1991). Typing of human, animal, food, and environmental isolates of Listeria monocytogenes by multilocus enzyme electrophoresis. Applied and Environmental Microbiology 57, 1624-1629. Boerlin, E & Piffaretti, J.C. (1995). Multilocus enzyme electrophoresis. In Methods in molecular biology, Howard, J. & Whitcombe D.M., eds, vol. 46, pp. 63-78. Humana Press, Totowa, NJ. Boerlin, E, Rocourt, J. & Piffaretti, J.C. (1991). Taxonomy of the genus Listeria by using multilocus enzyme electrophoresis. International Journal of Systematic Bacteriology 41, 59-64. Boerlin, E, Peter, O., Bretz, A.-G., Postic, D., Baranton, G. & Piffaretti, J.-C. (1992). Population genetic analysis of Borrelia burgdorferi isolates by multilocus enzyme electrophoresis. Infection and Immunity 60, 1677-1683. Boerlin, E, Bannerman, E., Jemmi, T. & Bille, J. (1996). Subtyping Listeria monocytogenes isolates genetically related to the Swiss epidemic clone. Journal of Clinical Microbiology 34, 2148-2153. Boerlin, E, Boerlin Petzold, E, Bannerman, E., Bille, J. & Jemmi, T. (1997). Typing Listeria monocytogenes isolates from fish products and human listeriosis cases. Applied and Environmental Microbiology 63, 1338-1343.
337 Boyd, E.E, Hiney, M.E, Peden, J.E, Smith, RR. & Caugant, D.A. (1994). Assessment of genetic diversity among Aeromonas salmonicida isolates by multilocus enzyme electrophoresis. Journal ofFish Diseases 17, 97-98. Boyd, E.E, Wang, ES., Whittam, T.S. & Selander, R.K. (1996). Molecular genetic relationships of the salmonellae. Applied and Environmental Microbiology 62, 804-808. Brenner, D.J., Steigerwalt, A.G., Gorman, G.W., Wilkinson, H.W., Bibb, W.E, Hackel, M., Tyndall, R.L., Campbell, J., Feeley, J.C., Thacker, W.L., Skaliy, E, Martin, W.T., Brake, B.J., Fields, B.S., McEachern, H.V. & Corcoran, L.K. (1985). Ten new species of Legionella. International Journal of Systematic Bacteriology, 35, 50-59. Carapetis, J., Robins-Browne, R., Martin, D., Shelby-James, T. & Hogg, G. (1995). Increasing severity of invasive group A streptococcal disease in Australia: clinical and molecular epidemiological features and identification of a new virulent M-nontypeable clone. Clinical Infectious Diseases 21, 1220-1227. Carlson, C.R., Caugant, D.A. & Kolsto, A.B. (1994). Genotypic diversity among Bacillus cereus and Bacillus thuringiensis strains. Applied and Environmental Microbiology 60, 1719-1725. Caugant, D.A. (1998). Population genetics and molecular epidemiology of Neisseria meningitidis. Acta Pathologica Microbiologica Scandinavica 106, 505-525. Caugant, D.A., Levin, B.R. & Selander, R.K. (1981). Genetic diversity and temporal variation in the E. coli population of a human host. Genetics 98, 467-490. Caugant, D.A., Levin, B.R., Orskov, I., Orskov, E, Svanborg, E.C. & Selander, R.K. (1985). Genetic diversity in relation to serotype in Escherichia coli. Infection and Immunity 49, 407-413. Caugant, D.A., FrCholm, L.O., BCvre, K., Holten, E., Frasch, C.E., Mocca, L.E, Zollinger, W.D. & Selander, R.K. (1986). Intercontinental spread of a genetically distinctive complex of clones of Neisseria meningitidis causing epidemic disease. Proceedings of the National Academy of Sciences of the United States of America 83, 4927-4931. Caugant, D.A., Mocca, L.E, Frasch, C.E., FrCholm, L.O., Zollinger, W.D. & Selander, R.K. (1987). Genetic structure of Neisseria meningitidis populations in relation to serogroup, serotype, and outer membrane protein pattern. Journal of Bacteriology 169, 2781-2792. Caugant, D.A., Kristiansen, B.-E., FrCholm, L.O., BCvre, K. & Selander, R.K. (1988). Clonal diversity of Neisseria meningitidis from a population of asymptomatic carriers. Infection and Immunity 56, 2060-2068. Caugant, D.A., Aleksic, S., Mollaret, H.H., Selander, R.K. & Kapperud, G. (1989). Clonal diversity and relationships among strains of Yersinia enterocolitica. Journal of Clinical Microbiology 27, 2678-2683. Caugant, D.A., Bol, E, Hr E.A., Zanen, H.C. & FrCholm, L.O. (1990a). Clones of serogroup B Neisseria meningitidis causing systemic disease in the Netherlands, 1958-1986. Journal of Infectious Diseases 162, 867-874. Caugant, D.A., Selander, R.K. & Olsen, I. (1990b). Differentiation between Actinobacillus (Haemophilus) actinomycetemcomitans, Haemophilus aphrophilus and Haemophilus paraphrophilus by multilocus enzyme electrophoresis. Journal of General Microbiology 136, 2135-2141. Caugant, D.A., Hr E.A., Magnus, P., Scheel, O., Hoel, T., Bjune, G., Wedege, E., Eng, J. & FrCholm, L.O. (1994). Asymptomatic carriage of Neisseria meningitidis in a randomly sampled population. Journal of Clinical Microbiology 32, 323-330. Caugant, D.A., Ashton, EE., Bibb, W.E, Boerlin, P., Donachie, W., Low, C., Gilmour, A., Harvey, J. & Norrung, B. (1996). Multilocus enzyme electrophoresis for characterization of Listeria monocytogenes isolates: results of an international comparative study. International Journal of Food Microbiology 32, 301-311. Coffey, T.J., Daniels, M., McDougal, L.K., Dowson, C.G., Tenover, EC. & Spratt, B.G. (1995). Genetic analysis of clinical isolates of Streptococcus pneumoniae with high-level resistance to expanded-spectrum cephalosporins. Antimicrobial Agents and Chemotherapy 39, 1306-1313. Coffey, T.J., Daniels, M., Enright, M.C. & Spratt, B.G. (1999). Serotype 14 variants of the Spanish penicillin-resistant serotype 9V clone of Streptococcus pneumoniae arose by large recombina-
338 tional replacements of the cpsA-pbpla region. Microbiology 145, 2023-2031. Cox, J.M., Story, L., Bowles, R. & Woolcock, J.B. (1996). Multilocus enzyme electrophoretic (MEE) analysis of Australian isolates of Salmonella enteritidis. International Journal of Food Microbiology 31, 273-282. Crowe, B.A., Olyhoek, T., Neumann, B., Wall, B., Hassan-King, M., Greenwood, B. & Achtman, M. (1987). A clonal analysis of Neisseria meningitidis serogroup A. Antonie Van Leeuwenhoek Journal 53, 381-388. Dahle, U.R., Olsen, I., Tronstad, L. & Caugant, D.A. (1995). Population genetic analysis of oral treponemes by multilocus enzyme electrophoresis. Oral Microbiology and Immunology 10, 265-270. Davies, R.L., Arkinsaw, S. & Selander, R.K. (1997a). Evolutionary genetics of Pasteurella haemolytica isolates recovered from cattle and sheep. Infection and Immunity 65, 3585-3593. Davies, R.L., Arkinsaw, S. & Selander, R.K. (1997b). Genetic relationships among Pasteurella trehalosi isolates based on multilocus enzyme electrophoresis. Microbiology 143, 2841-2849. Denny, T.P., Gilmour, M.N. & Selander, R.K. (1988). Genetic diversity and relationships of two pathovars of Pseudomonas syringae. Journal of General Microbiology 134, 1949-1960. Dolina, M. & Peduzzi, R. (1993). Population genetics of human, animal, and environmental Yersinia strains. Applied and Environmental Microbiology 59, 442-450. Dowson, C.G., Hutchison, A., Brannigan, J.A., George, R.C., Hansman, D., Lifiares, J., Tomasz, A., Maynard Smith, J. & Spratt, B.G. (1989). Horizontal transfer of penicillin-binding protein genes in penicillin-resistant clinical isolates of Streptococcus pneumoniae. Proceedings of the National Academy of Sciences of the United States of America 86, 8842-8846. Eardly, B.D., Materon, L.A., Smith, N.H., Johnson, D.A., Rumbaugh, M.D. & Selander, R.K. (1990). Genetic structure of natural populations of the nitrogen-fixing bacterium Rhizobium meliloti. Applied and Environmental Microbiology 56, 187-194. Eardly, B.D., Wang, ES., Whittam, T.S. & Selander, R.K. (1995). Species limits in Rhizobium populations that nodulate the common bean (Phaseolus vulgaris). Applied and Environmental Microbiology 61, 507-512. Enright, M.C. & Spratt, B.G. (1998). A multilocus sequence typing scheme for Streptococcus pneumoniae: identification of clones associated with serious invasive disease. Microbiology 144, 3049-3060. Evins, G.M., Cameron, D.N., Wells, J.G., Greene, K.D., Popovic, T., Giono Cerezo, S., Wachsmuth, I.K. & Tauxe, R.V. (1995). The emerging diversity of the electrophoretic types of Vibrio cholerae in the Western Hemisphere. Journal oflnfectious Diseases 172, 173-179. Farber, J.M., Peterkin, P.I., Carter, A.O., Varughese, P.V., Ashton, EE. & Ewan, E.P. (1991). Neonatal listeriosis due to cross-infection confirmed by isoenzyme typing and DNA fingerprinting. Journal of lnfectious Diseases 163, 927-928. Feizabadi, M.M., Robertson, I.D., Cousins, D.V., Dawson, D., Chew, W., Gilbert, G.L. & Hampson, D.J. (1996a). Genetic characterization of Mycobacterium avium isolates recovered from humans and animals in Australia. Epidemiology and Infection 116, 41-49. Feizabadi, M.M., Robertson, I.D., Cousins, D.V. & Hampson, D.J. (1996b). Genomic analysis of Mycobacterium bovis and other members of the Mycobacterium tuberculosis complex by isoenzyme analysis and pulsed-field gel electrophoresis. Journal of Clinical Microbiology 34, 1136-1142. Feizabadi, M.M., Robertson, I.D., Cousins, D.V., Dawson, D.J. & Hampson, D.J. (1997). Use of multilocus enzyme electrophoresis to examine genetic relationships amongst isolates of Mycobacterium intracellulare and related species. Microbiology 143, 1461-1469. Feng, P., Lampel, K.A., Karch, H. & Whittam, T.S. (1998). Genotypic and phenotypic changes in the emergence of Escherichia coli O 157: H7. Journal of Infectious Diseases 177, 1750-1753. Fitzgerald, J.R., Meaney, W.J., Hartigan, P.J., Smyth, C.J. & Kapur, V. (1997). Fine-structure molecular epidemiological analysis of Staphylococcus aureus recovered from cows. Epidemiology and Infection 119, 261-269.
339 Flint, S.H. & Kells, N.J. (1996). The sub-typing of Listeria monocytogenes isolates from food, environments surrounding food manufacturing sites, and clinical samples in New Zealand using multilocus enzyme electrophoresis. International Journal of Food Microbiology 31, 349-355. Flint, S.H., Hartley, N.J., Avery, S.M. & Hudson, J.A. (1996). A comparison between starch and polyacrylamide gels for the analysis of Listeria monocytogenes using multilocus enzyme electrophoresis. Letters in Applied Microbiology 22, 16-17. Frandsen, E.V., Poulsen, K. & Kilian, M. (1995). Confirmation of the species Prevotella intermedia and Prevotella nigrescens. International Journal of Systematic Bacteriology 45, 429-435. Fustt, M.C., Pineda, M.A., Palomar, J., Vinas, M. & Loren, J.G. (1996). Clonality of multidrugresistant nontypeable strains of Haemophilus influenzae. Journal of Clinical Microbiology 34, 2760-2765. Gargallo-Viola, D. (1989). Enzyme polymorphism, prodigiosin production, and plasmid fingerprints in clinical and naturally occurring isolates of Serratia marcescens. Journal of Clinical Microbiology 27, 860-868. Gaston, M.A. & Warner, M. (1989). Electrophoretic typing of Enterobacter cloacae with a limited set of enzyme stains. Epidemiology and Infection 103, 255-264. Gilmour, M.N., Whittam, T.S., Kilian, M. & Selander, R.K. (1987). Genetic relationships among the oral streptococci. Journal of Bacteriology 169, 5247-5257. Go, M.E, Kapur, V., Graham, D.Y. & Musser, J.M. (1996). Population genetic analysis of Helicobacter pylori by multilocus enzyme electrophoresis: extensive allelic diversity and recombinational population structure. Journal of Bacteriology 178, 3934-3938. Gordon, D.M. (1997). The genetic structure of Escherichia coli populations in feral house mice. Microbiology 143, 2039-2046. Graves, L.M., Swaminathan, B., Reeves, M.W., Hunter, S.B., Weaver, R.E., Plikaytis, B.D. & Schuchat, A. (1994). Comparison of ribotyping and multilocus enzyme electrophoresis for subtyping of Listeria monocytogenes isolates. Journal of Clinical Microbiology 32, 2936-2943. Griffith, S.J., Nathan, C., Selander, R.K., Chamberlin, W., Gordon, S., Kabins, S. & Weinstein, R.A. (1989). The epidemiology of Pseudomonas aeruginosa in oncology patients in a general hospital. Journal of Infectious Diseases 160, 1030-1036. Gutjahr, T.S., O'Rourke, M., Ison, C.A. & Spratt, B.G. (1997). Arginine-, hypoxanthine-, uracilrequiting isolates of Neisseria gonorrhoeae are a clonal lineage with a non-clonal population. Microbiology 143, 633-640. Haase, A.M., Melder, A., Mathews, J.D., Kemp, D.J. & Adams, M. (1994). Clonal diversity of Streptococcus pyogenes within some M-types revealed by multilocus enzyme electrophoresis. Epidemiology and Infection 113, 455-462. Hall, L.M., Whiley, R.A., Duke, B., George, R.C. & Efstratiou, A. (1996). Genetic relatedness within and between serotypes of Streptococcus pneumoniae from the United Kingdom: analysis of multilocus enzyme electrophoresis, pulsed-field gel electrophoresis, and antimicrobial resistance patterns. Journal of Clinical Microbiology 34, 853-859. Hampson, D.J., Trott, D.J., Clarke, I.L., Mwaniki, C.G. & Robertson, I.D. (1993). Population structure of Australian isolates of Streptococcus suis. Journal of Clinical Microbiology 31, 2895-2900. Harris, H. & Hopkinson, D.A. (1976). Handbook of enzyme electrophoresis in human genetics. North-Holland, Amsterdam. Harrison, S.P., Jones, D.G. & Young, J.P.W. (1989). Rhizobium population genetics: genetic variation within and between populations from diverse locations. Journal of General Microbiology 135, 1061-1069. Harvey, J. & Gilmour, A. (1994). Application of multilocus enzyme electrophoresis and restriction fragment length polymorphism analysis to the typing of Listeria monocytogenes strains isolated from raw milk, nondairy foods, and clinical and veterinary sources. Applied and Environmental Microbiology 60, 1547-1553. Haubek, D., Poulsen, K., Asikainen, S. & Kilian M. (1995). Evidence for absence in Northern Europe
340 of especially virulent clonal types of Actinobacillus actinomycetemcomitans. Journal of Clinical Microbiology 33, 395-401. Haubek, D., DiRienzo, J.M., Tinoco, E.M.B., Westergaard J., Lopez, N.J., Chung, C.-E, Poulsen, K. & Kilian, M. (1997). Geographic dissemination of a highly toxic clone of Actinobacillus actinomycetemcomitans associated with juvenile periodontitis. Journal of Clinical Microbiology 35, 3037-3042. Hauge, M., Jespersgaard, C., Poulsen, K. & Kilian, M. (1996). Population structure of Streptococcus agalactiae reveals an association between specific evolutionary lineages and putative virulence factors but not disease. Infection and Immunity 64, 919-925. Helgason, E., Caugant, D.A., Lecadet, M.M., Chen, Y., Mahillon, J., Lovgren, A., Hegna, I., Kvaloy, K. & Kolsto, A.B. (1998). Genetic diversity of Bacillus cereus/B, thuringiensis isolates from natural sources. Current Microbiology 37, 80-87. Helmig, R., Uldbjerg, N., Boris, J. & Kilian, M. (1993). Clonal analysis of Streptococcus agalactiae isolated from infants with neonatal sepsis or meningitis and their mothers and from healthy pregnant women. Journal of Infectious Diseases 168, 904-909. Holm, S.E., Norrby, A., Bergholm, A.M. & Norgren, M. (1992). Aspects of pathogenesis of serious group A streptococcal infections in Sweden, 1988-1989. Journal of Infectious Diseases 166, 31-37. Istock, C.A., Duncan, K.EI, Ferguson, N. & Zhou, X. (1992). Sexuality in a natural population of bacteria- Bacillus subtilis challenges the clonal paradigm. Molecular Ecology 1, 95-103. Jackson, L.A., Schuchat, A., Reeves, M.W. & Wenger, J.D. (1995). Serogroup C meningococcal outbreaks in the United States. An emerging threat. Journal of the American Medical Association 273, 383-389. Jells, J., Jalaludin, B., Munro, R., Patel, M., Kerr, M., Daley, D., Neville, S. & Capon, A. (1998). A cluster of meningococcal disease in western Sydney, Australia initially associated with a nightclub. Epidemiology and Infection 120, 263-270. John, M.A. & Hussain, Z. (1994). Multilocus enzyme electrophoresis using ultrathin polyacrylamide gels. Journal of Microbiological Methods 19, 307-313. Kaczmarski, E.B. (1997). Meningococcal disease in England and Wales: 1995. Communicable Diseases Report Review 7, R55-R59 Kapur, V., Kanjilal, S., Hamrick, M.R., Li, L.L., Whittam, T.S., Sawyer, S.A. & Musser, J.M. (1995a). Molecular population genetic analysis of the streptokinase gene of Streptococcus pyogenes: mosaic alleles generated by recombination. Molecular Microbiology 16, 509-519. Kapur, V., Sischo, W.M., Greer, R.S., Whittam, T.S. & Musser, J.M. (1995b). Molecular population genetic analysis of Staphylococcus aureus recovered from cows. Journal of Clinical Microbiology 33, 376-380. Klugman, K.E, Coffey, T.J., Smith, A., Wasas, A., Meyers, M. & Spratt, B.G. (1994). Cluster of an erythromycin-resistant variant of the Spanish multiply resistant 23F clone of Streptococcus pneumoniae in South Africa. European Journal of Clinical Microbiology and Infectious Diseases 13, 171-174. Kokka, R.E, Janda, J.M., Oshiro, L.S., Altwegg, M., Shimada, T., Sakazaki, R. & Brenner, D.J. (1991). Biochemical and genetic characterization of autoagglutinating phenotypes of Aeromonas species associated with invasive and noninvasive disease. Journal of Infectious Diseases 163, 890-894. Kolstad, J., Caugant, D.A. & Rcrvik, L.M. (1992). Differentiation of Listeria monocytogenes isolates by using plasmid profiling and multilocus enzyme electrophoresis. International Journal of Food Microbiology 16, 247-260. Kristinsson, K.G., Hjalmarsdottir, M.A. & Steingrimsson, O. (1992). Increasing penicillin resistance in pneumococci in Iceland. Lancet 339, 1606-1607. Kriz, P., Giorgini, D., Musilek, M., Larribe, M. & Taha, M.K. (1999). Microevolution through DNA exchange among strains of Neisseria meningitidis isolated during an outbreak in the Czech Republic. Research in Microbiology 150, 273-280.
341 Kumar, S., Tamura, K. & Nei, M. (1994). MEGA: molecular evolutionary genetics analysis software for microcomputers. Computers in Applied Bioscience 10, 189-191. Lagos, R., Avendano, A., Horwitz, I., Musser, J.M., Hoiseth, S.K., Maneval, D.R., Jones, M.J., Levine, M.M., Dattas, J.P., Prenzel, I., Enriquez, N., Topelberg, S., Olivari, E & Morris, J.G. (1991). Molecular epidemiology of Haemophilus influenzae within families in Santiago, Chile. Journal of lnfectious Diseases 164, 1149-1153. Lanser, J.A., Adams, M., Doyle, R., Sangster, N. & Steele, T.W. (1990). Genetic relatedness of Legionella longbeachae isolates from human and environmental sources in Australia. Applied and Environmental Microbiology 56, 2784-2790. Lawrence, L.M. & Gilmour, A. (1995). Characterization of Listeria monocytogenes isolated from poultry products and from the poultry-processing environment by random amplification of polymorphic DNA and multilocus enzyme electrophoresis. Applied and Environmental Microbiology 61, 2139-2144. Levin, M.H., Weinstein, R.A., Nathan, C., Selander, R.K., Ochman, H. & Kabins, S.A. (1984). Association of infection caused by Pseudomonas aeruginosa serotype O 11 with intravenous abuse of pentazocine mixed with tripelennamine. Journal of Clinical Microbiology 20, 758-762. Li, J., Musser, J.M., Beltran, P., Kline, M.W. & Selander, R.K. (1990). Genotypic heterogeneity of strains of Citrobacter diversus expressing a 32-kilodalton outer membrane protein associated with neonatal meningitis. Journal of Clinical Microbiology 28, 1760-1765. Lomholt, H. (1995). Evidence of recombination and an antigenically diverse immunoglobulin A1 protease among strains of Streptococcus pneumoniae. Infection and Immunity 63, 4238-4243. Loos, B.G., Dyer, D.W., Whittam, T.S. & Selander, R.K. (1993). Genetic structure of populations of Porphyromonas gingivalis associated with periodontitis and other oral infections. Infection and Immunity 61, 204-212. Lymbery, A.J., Hampson, D.J., Hopkins, R.M., Combs, B. & Mhoma, J.R. (1990). Multilocus enzyme electrophoresis for identification and typing of Treponema hyodysenteriae and related spirochaetes. Veterinary Microbiology 22, 89-99. Maiden, M.C., Bygraves, J.A., Feil, E., Morelli, G., Russell, J.E., Urwin, R., Zhang, Q., Zhou, J., Zurth, K., Caugant, D.A., Feavers, I.M., Achtman, M. & Spratt, B.G. (1998). Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proceedings of the National Academy of Sciences of the United States of America 95, 3140-3145. Marques, M.T., Bornstein, N. & Fleurette, J. (1995). Combined monoclonal antibody typing, multilocus enzyme electrophoresis, soluble protein profiles and plasmid analysis of clinical and environmental Legionella pneumophila serogroup 1 isolated in a Portuguese hospital. Journal of Hospital Infection 30, 103-110. Martin, P.R. & Hoiby, E.A. (1990). Streptococcal serogroup A epidemic in Norway 1987-1988. Scandinavian Journal of Infectious Diseases 22, 421-429. Martin, D.R. & Single, L.A. (1993). Molecular epidemiology of group A streptococcus M type 1 infections. Journal of Infectious Diseases 167, 1112-1117. Martin, C., Sibold, C. & Hakenbeck, R. (1992). Relatedness of penicillin-binding protein la genes from different clones of penicillin-resistant Streptococcus pneumoniae isolated in South Africa and Spain. EMBO Journal 11, 3831-3836. Martin, D.R., Walker, S.J., Baker, M.G. & Lennon, D.R. (1998a). New Zealand epidemic of meningococcal disease identified by a strain with phenotype B:4:P1.4. Journal oflnfectious Diseases 177, 497-500. Martin, D.R., Walker, S.J., Glennie, A.C., Baker, M.G., Eyles, R.E, Lennon, D.R. & Roberts, A.P. (1998b). Continuation of meningococcal disease epidemic in New Zealand. In Abstracts of the eleventh international pathogenic neisseria conference, Nassif, X., Quentin-Millet, M.-J. & Taha, M.-K., eds, p. 8. Paris. Martin, C., Boyd, E.E, Quentin, R., Massicot, P. & Selander, R.K. (1999). Enzyme polymorphism in Pseudomonas aeruginosa strains recovered from cystic fibrosis patients in France. Microbiol-
342
ogy 145, 2587-2594. Martinez, M.B., Whittan, T.S., McGraw, E.A., Rodrigues, J. & Trabulsi, L.R. (1999). Clonal relationship among invasive and non-invasive strains of enteroinvasive Escherichia coli serogroups. FEMS Microbiology Letters 172, 145-151. Maslow, J.N., Whittam, T.S., Gilks, C.E, Wilson, R.A., Mulligan, M.E., Adams, K.S. & Arbeit, R.D. (1995). Clonal relationships among bloodstream isolates of Escherichia coli. Infection and Immunity 63, 2409-2417. Maynard Smith, J., Smith, N.H., O'Rourke, M. & Spratt, B.G. (1993). How clonal are bacteria? Proceedings of the National Academy of Sciences of the United States of America, 90, 4384-4388. McDougal, L.K., Rasheed, J.K., Biddle, J.W. & Tenover, EC. (1995). Identification of multiple clones of extended-spectrum cephalosporin-resistant Streptococcus pneumoniae isolates in the United States. Antimicrobial Agents and Chemotherapy 39, 2282-2288. McLaren, A.J., Trott, D.J., Swayne, D.E., Oxberry, S.L. & Hampson, D.J. (1997). Genetic and phenotypic characterization of intestinal spirochetes colonizing chickens and allocation of known pathogenic isolates to three distinct genetic groups. Journal of Clinical Microbiology 35, 412-417. Milkman, R. (1973). Electrophoretic variation in Escherichia coli from natural sources. Science 182, 1024-1026. Moore, P.S., Reeves, M.W., Schwartz, B., Gellin, B.G. & Broome, C.V. (1989). Intercontinental spread of an epidemic group A Neisseria meningitidis strain. Lancet ii, 260-263. Munoz, R., Coffey, T.J., Daniels, M., Dowson, C.G., Laible, G., Casal, J., Hakenbeck, R., Jacobs, M., Musser, J.M., Spratt, B.G. & Tomasz, A. (1991). Intercontinental spread of a multiresistant clone of serotype 23F Streptococcus pneumoniae. Journal of lnfectious Diseases 164, 302-306. Munoz, R., Musser, J.M., Crain, M., Briles, D.E., Marton, A., Parkinson, A.J., Sorensen, U. & Tomasz, A. (1992). Geographic distribution of penicillin-resistant clones of Streptococcus pneumoniae: characterization by penicillin-binding protein profile, surface protein A typing, and multilocus enzyme analysis. Clinical Infectious Diseases 15, 112-118. Musser, J.M. (1996). Molecular population genetic analysis of emerged bacterial pathogens: selected insights. Emerging Infectious Diseases 2, 1-17. Musser, J.M. & Kapur, V. (1992). Clonal analysis of methicillin-resistant Staphylococcus aureus strains from intercontinental sources: association of the mec gene with divergent phylogenetic lineages implies dissemination by horizontal transfer and recombination. Journal of Clinical Microbiology 30, 2058-2063. Musser, J.M. & Selander, R.K. (1990). Brazilian purpuric fever: evolutionary genetic relationships of the case clone of Haemophilus influenzae biogroup aegyptius to encapsulated strains of Haemophilus influenzae. Journal of Infectious Diseases 161, 130-133. Musser, J.M., Granoff, D.M., Pattison, P.E. & Selander, R.K. (1985). A population genetic framework for the study of invasive diseases caused by serotype b strains of Haemophilus influenzae. Proceedings of the National Academy of Sciences of the United States of America 82, 5078-5082. Musser, J.M., Barenkamp, S.J., Granoff, D.M. & Selander, R.K. (1986). Genetic relationships of serologically nontypable and serotype b strains of Haemophilus influenzae. Infection and Immunity 52, 183-191. Musser, J.M., Hewlett, E.L., Peppier, M.S. & Selander, R.K. (1986). Genetic diversity and relationships in populations of Bordetella spp. Journal of Bacteriology 166, 230-237. Musser, J.M., Bemis, D.A., Ishikawa, H. & Selander, R.K. (1987a). Clonal diversity and host distribution in Bordetella bronchiseptica. Journal of Bacteriology 169, 2793-2803. Musser, J.M., Rapp, V.J. & Selander, R.K. (1987b). Clonal diversity in Haemophilus pleuropneumoniae. Infection and Immunity 55, 1207-1215. Musser, J.M., Kroll, J.S., Moxon, E.R. & Selander, R.K. (1988a). Clonal population structure of encapsulated Haemophilus influenzae. Infection and Immunity 56, 1837-1845. Musser, J.M., Kroll, J.S., Moxon, E.R. & Selander, R.K. (1988b). Evolutionary genetics of the encap-
343 sulated strains of Haemophilus influenzae. Proceedings of the National Academy of Sciences of the United States of America 85, 7758-7762. Musser, J.M., Mattingly, S.J., Quentin, R., Goudeau, A. & Selander, R.K. (1989). Identification of a high-virulence clone of type III Streptococcus agalactiae (group B Streptococcus) causing invasive neonatal disease. Proceedings of the National Academy of Sciences of the United States of America 86, 4731-4735. Musser, J.M., Kroll, J.S., Granoff, D.M., Moxon, E.R., Brodeur, B.R., Campos, J., Dabernat, H., Frederiksen, W., Hamel, J. & Hammond, G. (1990a). Global genetic structure and molecular epidemiology of encapsulated Haemophilus influenzae. Reviews of Infectious Diseases 12, 75-111. Musser, J.M., Schlievert, P.M., Chow, A.W., Ewan, P., Kreiswirth, B.N., Rosdahl, V.T., Naidu, A.S., Witte, W. & Selander, R.K. (1990b). A single clone of Staphylococcus aureus causes the majority of cases of toxic shock syndrome. Proceedings of the National Academy of Sciences of the United States of America 87, 225-229. Musser, J.M., Hauser, A.R., Kim, M.H., Schlievert, P.M., Nelson, K. & Selander, R.K. (1991). Streptococcus pyogenes causing toxic-shock-like syndrome and other invasive diseases: clonal diversity and pyrogenic exotoxin expression. Proceedings of the National Academy of Sciences of the United States of America 88, 2668-2672. Musser, J.M., Gray, B.M., Schlievert, P.M. & Pichichero, M.E. (1992). Streptococcus pyogenes pharyngitis: characterization of strains by multilocus enzyme genotype, M and T protein serotype, and pyrogenic exotoxin gene probing. Journal of Clinical Microbiology 30, 600-603. Musser, J.M., Kapur, V., Kanjilal, S., Shah, U., Musher, D.M., Barg, N.L., Johnston, K.H., Schlievert, P.M., Henrichsen, J., Gerlach, D., Rakita, R.M., Tanna, A., Cookson, B.D. & Huang, J.C. (1993a). Geographic and temporal distribution and molecular characterization of two highly pathogenic clones of Streptococcus pyogenes expressing allelic variants of pyrogenic exotoxin A (scarlet fever toxin). Journal oflnfectious Diseases 167, 337-346. Musser, J.M., Nelson, K., Selander, R.K., Gerlach, D., Huang, J.C., Kapur, V. & Kanjilal, S. (1993b). Temporal variation in bacterial disease frequency: molecular population genetic analysis of scarlet fever epidemics in Ottawa and in eastern Germany. Journal of Infectious Diseases 167, 759-762. Musser, J.M., Kapur, V., Szeto, J., Pan, X., Swanson, D.S. & Martin, D.R. (1995). Genetic diversity and relationships among Streptococcus pyogenes strains expressing serotype M 1 protein: recent intercontinental spread of a subclone causing episodes of invasive disease. Infection and Immunity 63, 994-1003. Mwaniki, C.G., Robertson, I.D., Trott, D.J., Atyeo, R.F., Lee, B.J. & Hampson, D.J. (1994). Clonal analysis and virulence of Australian isolates of Streptococcus suis type 2. Epidemiology and Infection 113, 321-334. Mr K., Nielsen, R., Andersen, L.V. & Kilian, M. (1992). Clonal analysis of the Actinobacillus pleuropneumoniae population in a geographically restricted area by multilocus enzyme electrophoresis. Journal of Clinical Microbiology 30, 623-627. Nei, M. (1977). F-statistics and analysis of gene diversity in subdivided populations. Annals of Human Genetics 41, 225-233. Nei, M. (1978). Estimation of average heterozygosity and genetic distance from a small sample of individuals. Genetics 89, 583-590. Nesbakken, T., Kapperud, G. & Caugant, D.A. (1996). Pathways of Listeria monocytogenes contamination in the meat processing industry. International Journal of Food Microbiology 31, 161-171. Ngeleka, M., Kwaga, J.K., White, D.G., Whittam, T.S., Riddell, C., Goodhope, R., Potter, A.A. & Allan, B. (1996). Escherichia coli cellulitis in broiler chickens: clonal relationships among strains and analysis of virulence-associated factors of isolates from diseased birds. Infection and Immunity 64, 3118-3126. Norton, R., Roberts, B., Freeman, M., Wilson, M., Ashhurst-Smith, C., Lock, W., Brookes, D. & La
344 Brooy, J. (1998). Characterisation and molecular typing of Burkholderia pseudomallei: are disease presentations of melioidosis clonally related? FEMS Immunology and Medical Microbiology 20, 37-44. Nouvellon, M., Pons, J.L., Sirot, D., Combe, M.L. & Lemeland, J.F. (1994). Clonal outbreaks of extended-spectrum beta-lactamase-producing strains of Klebsiella pneumoniae demonstrated by antibiotic susceptibility testing, beta-lactamase typing, and multilocus enzyme electrophoresis. Journal of Clinical Microbiology 32, 2625-2627. NCrrung, B. & Gemer Smidt, E (1993). Comparison of multilocus enzyme electrophoresis (MEE), ribotyping, restriction enzyme analysis (REA) and phage typing for typing of Listeria monocytogenes. Epidemiology and Infection 111, 71-79. Ncrrung, B. & Skovgaard, N. (1993). Application of multilocus enzyme electrophoresis in studies of the epidemiology of Listeria monocytogenes in Denmark. Applied and Environmental Microbiology 59, 2817-2822. Ochman, H. & Selander, R.K. (1984a). Evidence for clonal population structure in Escherichia coli. Proceedings of the National Academy of Sciences of the United States of America 81, 198-201. Ochman, H. & Selander, R.K. (1984b). Standard reference strains of Escherichia coli from natural populations. Journal of Bacteriology 157, 690-693. Ochman, H., Whittam, T.S., Caugant, D.A. & Selander, R.K. (1983). Enzyme polymorphism and genetic population structure in Escherichia coli and Shigella. Journal of General Microbiology 129, 2715-2726. Olyhoek, T., Crowe, B.A. & Achtman, M. (1987). Clonal population structure of Neisseria meningitidis serogroup A isolated from epidemics and pandemics between 1915 and 1983. Reviews of Infectious Diseases 9, 665-692. O'Rourke, M. & Spratt, B.G. (1994). Further evidence for the non-clonal population structure of Neisseria gonorrhoeae: extensive genetic diversity within isolates of the same electrophoretic type. Microbiology 140, 1285-1290. O'Rourke, M. & Stevens, E. (1993). Genetic structure of Neisseria gonorrhoeae: a non-clonal pathogen. Jounal of General Microbiology 139, 2603-2611. Orskov, E & Orskov, I. (1983). Summary of a workshop on the clone concept in the epidemiology, taxonomy, and evolution of the enterobacteriaceae and other bacteria. Journal of Infectious Diseases 148, 346-357. Oxberry, S.L., Trott, D.J. & Hampson, D.J. (1998). Serpulina pilosicoli, waterbirds and water: potential sources of infection for humans and other animals. Epidemiology and Infection 121, 219-225. Perea Mejia, L.M., Stockbauer, K.E., Pan, X., Cravioto, A. & Musser, J.M. (1997). Characterization of group A Streptococcus strains recovered from Mexican children with pharyngitis by automated DNA sequencing of virulence-related genes: unexpectedly large variation in the gene (sic) encoding a complement-inhibiting protein. Journal of Clinical Microbiology 35, 3220-3224. Piffaretti, J.-C., Kressebuch, H., Aeschbacher, M., Bille, J., Bannerman, E., Musser, J.M., Selander, R.K. & Rocourt, J. (1989). Genetic characterization of clones of the bacterium Listeria monocytogenes causing epidemic disease. Proceedings of the National Academy of Sciences of the United States of America 86, 3818-3822. Pinero, D., Martinez, E. & Selander, R.K. (1988). Genetic diversity and relationships among isolates of Rhizobium leguminosarum biovar phaseoli. Applied and Environmental Microbiology 54, 2825-2832. Pinner, R.W., Schuchat, A., Swaminathan, B., Hayes, ES., Deaver, K.A., Weaver, R.E., Plikaytis, B.D., Reeves, M., Broome, C.V. & Wenger, J.D. (1992). Role of foods in sporadic listeriosis. II. Microbiologic and epidemiologic investigation. Journal of the American Medical Association 267, 2046-2050. Pons, J.L., Mandement, M.N., Martin, E., Lemort, C., Nouvellon, M., Mallet, E. & Lemeland, J.E (1996). Clonal and temporal pattems of nasopharyngeal penicillin-susceptible and penicillinresistant Streptococcus pneumoniae strains in children attending a day care center. Journal of
345
Clinical Microbiology 34, 3218-3222. Popovic, T., Fields, EI., Olsvik, O., Wells, J.G., Evins, G.M., Cameron, D.N., Farmer, J.J., Bopp, C.A., Wachsmuth, K. & Sack, R.B. (1995). Molecular subtyping of toxigenic Vibrio cholerae O 139 causing epidemic cholera in India and Bangladesh, 1992-1993. Journal oflnfectious Diseases 171, 122-127. Popovic, T., Kombarova, S.Y., Reeves, M.W., Nakao, H., Mazurova, I.K., Wharton~ M., Wachsmuth, I.K. & Wenger, J.D. (1996). Molecular epidemiology of diphtheria in Russia, 1985-1994. Journal of Infectious Diseases 174, 1064-1072. Porras, O., Caugant, D.A., Gray, B., Lagerg~rd, T., Levin, B.R. & Svanborg-Ed6n, C. (1986a). Difference in structure between type b and nontypable Haemophilus influenzae populations. Infection and Immunity 53, 79-89. Porras, O., Caugant, D.A., Lagerg~rd, T. & Svanborg-Ed6n, C. (1986b). Application of multilocus enzyme gel electrophoresis to Haemophilus influenzae. Infection and Immunity 53, 71-78. Poulsen, K., Theilade, E., Lally, E.T., Demuth, D.R. & Kilian, M. (1994). Population structure of Actinobacillus actinomycetemcomitans: a framework for studies of disease-associated properties. Microbiology 140, 2049-2060. Pupo, G.M., Karaolis, D.K., Lan, R. & Reeves, P.R. (1997). Evolutionary relationships among pathogenic and nonpathogenic Escherichia coli strains inferred from multilocus enzyme electrophoresis and mdh sequence studies. Infection and Immunity 65, 2685-2692. Quentin, R., Goudeau, A., Wallace, R.J., Smith, A.L., Selander, R.K. & Musser, J.M. (1990). Urogenital, maternal and neonatal isolates of Haemophilus influenzae: identification of unusually virulent serologically non-typable clone families and evidence for a new Haemophilus species. Journal of General Microbiology 136, 1203-1209. Quentin, R., Martin, C., Musser, J.M., Pasquier Picard, N. & Goudeau, A. (1993). Genetic characterization of a cryptic genospecies of Haemophilus causing urogenital and neonatal infections. Journal of Clinical Microbiology 31, 1111-1116. Quentin, R., Huet, H., Wang, ES., Geslin, P., Goudeau, A. & Selander, R.K. (1995). Characterization of Streptococcus agalactiae strains by multilocus enzyme genotype and serotype: Identification of multiple virulent clone families that cause invasive neonatal disease. Journal of Clinical Microbiology 33, 2576-2581. Rasmussen, O.E, Beck, T., Olsen, J.E., Dons, L. & Rossen, L. (1991). Listeria monocytogenes isolates can be classified into two major types according to the sequence of the listeriolysin gene. Infection and Immunity 59, 3945-3951. Rasmussen, O.E, Skouboe, P., Dons, L., Rossen, L. & Olsen, J.E. (1995). Listeria monocytogenes exists in at least three evolutionary lines: evidence from flagellin, invasive associated protein and listeriolysin O genes. Microbiology 141, 2053-2061. Reda, K.B., Kapur, V., Goela, D., Lamphear, J.G., Musser, J.M. & Rich, R.R. (1996). Phylogenetic distribution of streptococcal superantigen SSA allelic variants provides evidence for horizontal transfer of ssa within Streptococcus pyogenes. Infection and Immunity 64, 1161-1165. Reeves, M.W., Evins, G.M., Heiba, A.A., Plikaytis, B.D. & Farmer, J.J. (1989). Clonal nature of Salmonella typhi and its genetic relatedness to other salmonellae as shown by multilocus enzyme electrophoresis, and proposal of Salmonella bongori comb. nov. Journal of Clinical Microbiology 27, 313-320. Rodrigues, J., Scaletsky, I.C., Campos, L.C., Gomes, T.A., Whittam, T.S. & Trabulsi, L.R. (1996). Clonal structure and virulence factors in strains of Escherichia coli of the classic serogroup 055. Infection and Immunity 64, 2680-2686. Rcrvik, L.M., Caugant, D.A. & Yndestad, M. (1995). Contamination pattern of Listeria monocytogenes and other Listeria spp. in a salmon slaughterhouse and smoked salmon processing plant. International Journal of Food Microbiology 25, 19-27. Sacchi, C.T., Pessoa, L.L., Ramos, S.R., Milagres, L.G., Camargo, M.C.C., Hidalgo, N.T.R., Melles, C.E.A., Caugant, D.A. & Frasch, C.E. (1992a). Ongoing group B Neisseria meningitidis epidemic in Sao Paulo, Brazil, due to increased prevalence of a single clone of the ET-5 complex.
346 Journal of Clinical Microbiology 30, 1734-1738. Sacchi, C.T., Zanella, R.C., Caugant, D.A., Frasch, C.E., Hidalgo, N.T., Milagres, L.G., Pessoa, L.L., Ramos, S.R., Camargo, M.C.C. & Melles, C.E.A. (1992b). Emergence of a new clone of serogroup C Neisseria meningitidis in Sao Paulo, Brazil. Journal of Clinical Microbiology 30, 1282-1286. Salih, M.A.M., Danielsson, D., B~ickman,A., Caugant, D.A., Achtman, M. & Olc6n, P. (1990). Characterization of epidemic and non-epidemic Neisseria meningitidis serogroup A strains from Sudan and Sweden. Journal of Clinical Microbiology 28, 1711-1719. Schill, W.B., Phelps, S.R. & Pyle, S.W. (1984). Multilocus electrophoretic assessment of the genetic structure and diversity of Yersinia ruckeri. Applied and Environmental Microbiology 48, 975-979. Scholten, R.J.P.M., Poolman, J.T., Valkenburg, H.A., Bijlmer, H.A., Dankert, J. & Caugant, D.A. (1994). Phenotypic and genotypic changes in a new clone complex of Neisseria meningitidis causing disease in the Netherlands, 1958-1990. Journal oflnfectious Diseases 169, 673-676. Schwartz, B., Hexter, D., Broome, C.V., Hightower, A.W., Hirschhorn R.B., Porter, J.D., Hayes, P.S., Bibb, W.E, Lorber, B. & Fails D.G. (1989). Investigation of an outbreak of listeriosis: new hypotheses for the etiology of epidemic Listeria monocytogenes infections. Journal of Infectious Diseases 159, 680-685. Schwartz, B., Facklam, R.R. & Breiman, R.E (1990). Changing epidemiology of group A streptococcal infection in the USA. Lancet 336, 1167-1171. Segovia, L., Pinero, D., Palacios, R. & Martinez Romero, E. (1991). Genetic structure of a soil population of nonsymbiotic Rhizobium leguminosarum. Applied and Environmental Microbiology 57, 426-433. Selander, R.K. & Levin, B.R. (1980). Genetic diversity and structure in Escherichia coli populations. Science 210, 545-547. Selander, R.K., McKinney, R.M., Whittam, T.S., Bibb, W.E, Brenner, D.J., Nolte, ES. & Pattison, P.E. (1985). Genetic structure of populations of Legionella pneumophila. Journal of Bacteriology 163, 1021-1037. Selander, R.K., Caugant, D.A., Ochman, H., Musser, J.M., Gilmour, M.N. & Whittam, T.S. (1986a). Methods of multilocus enzyme electrophoresis for bacterial population genetics and systematics. Applied and Environmental Microbiology 51,873-884. Selander, R.K., Korhonen, T.K., V~iis~inen-Rhen, V., Williams, P.H., Pattison, P.E. & Caugant, D.A. (1986b). Genetic relationships and clonal structure of strains of Escherichia coli causing neonatal septicemia and meningitis. Infection and Immunity 52, 213-222. Selander, R.K., Caugant, D.A. & Whittam, T.S. (1987a). Genetic structure and variation in natural populations of Escherichia coli. In Escherichia coli and Salmonella typhimurium cellular and molecular biology, Neidhardt, EC., Ingraham, J.L., Low, EC., Magasanik, B., Schaechter, M. & Umbarger, H.E., eds, vol.II, pp. 1625-1648. American Society for Microbiology, Washington, D.C. Selander, R.K., Musser, J.M., Caugant, D.A., Gilmour, M.N. & Whittam, T.S. (1987b). Population genetics of pathogenic bacteria. Microbial Pathogenesis 3, 1-7. Selander, R.K., Beltran, P., Smith, N.H., Barker, R.M., Crichton, P.B., Old, D.C., Musser, J.M. & Whittam, T.S. (1990a). Genetic population structure, clonal phylogeny, and pathogenicity of Salmonella paratyphi B. Infection and Immunity 58, 1891-1901. Selander, R.K., Beltran, P., Smith, N.H., Helmuth, R., Rubin, EA., Kopecko, D.J., Ferris, K., Tall, B.D., Cravioto, A. & Musser, J.M. (1990b). Evolutionary genetic relationships of clones of Salmonella serovars that cause human typhoid and other enteric fevers. Infection and Immunity 58, 2262-2275. Selander, R.K., Smith, N.H., Li, J., Beltran, P., Ferris, K.E., Kopecko, D.J. & Rubin, EA. (1992). Molecular evolutionary genetics of the cattle-adapted serovar Salmonella dublin. Journal of Bacteriology 174, 3587-3592. Shah, H.N. & Gharbia, S.E. (1992). Biochemical and chemical studies on strains designated Pre-
347
votella intermedia and proposal of a new pigmented species, Prevotella nigrescens sp. nov. International Journal of Systematic Bacteriology 42, 542-546. Shi, Z.Y., Enright, M.C., Wilkinson, P., Griffiths, D. & Spratt, B.G. (1998). Identification of three major clones of multiply antibiotic-resistant Streptococcus pneumoniae in Taiwanese hospitals by multilocus sequence typing. Journal of Clinical Microbiology 36, 3514-3519. Sibold, C., Wang, J., Henrichsen, J. & Hakenbeck, R. (1992). Genetic relationships of penicillinsusceptible and -resistant Streptococcus pneumoniae strains isolated on different continents. Infection and Immunity 60, 4119-4126. Sneath, P.H.A. & Sokal, R.R. (1973). Numerical taxonomy. Freeman, San Francisco. Soares, S., Kristinsson, K.G., Musser, J.M. & Tomasz, A. (1993). Evidence for the introduction of a multiresistant clone of serotype 6B Streptococcus pneumoniae from Spain to Iceland in the late 1980s. Journal of lnfectious Diseases 168, 158-163. Souza, V., Rocha, M., Valera, A. & Eguiarte, L.E. (1999). Genetic structure of natural populations of Escherichia coli in wild hosts on different continents. Applied and Environmental Microbiology 65, 3373-3385. Spratt, B.G. (1999). Multilocus sequence typing: molecular typing of bacterial pathogens in an era of rapid DNA sequencing and the internet. Current Opinion in Microbiology 2, 312-316. Stanton, T.B., Trott, D.J., Lee, J.I., McLaren, A.J., Hampson, D.J., Paster, B.J. & Jensen, N.S. (1996). Differentiation of intestinal spirochaetes by multilocus enzyme electrophoresis analysis and 16S rRNA sequence comparisons. FEMS Microbiology Letters 136, 181-186. Stockbauer, K.E., Grigsby, D., Pan, X., Fu, Y.X., Mejia, L.M., Cravioto, A. & Musser, J.M. (1998). Hypervariability generated by natural selection in an extracellular complement-inhibiting protein of serotype M1 strains of group A Streptococcus. Proceedings of the National Academy of Sciences of the United States of America 95, 3128-3133. Struelens, M.J. & the Members of the European Study Group on Epidemiological Markers. (1996). Consensus guidelines for appropriate use and evaluation of microbial epidemiologic typing systems. Clinical Microbiology and Infection 2, 2-11. Takala, A.K., Vuopio-Varkila, J., Tarkka, E., Leinonen, M. & Musser, J.M. (1996). Subtyping of common pediatric pneumococcal serotypes from invasive disease and pharyngeal carriage in Finland. Journal of lnfectious Diseases 173, 128-135. Tekle Haimanot, R., Caugant, D.A., Fekadu, D., Bjune, G., Belete, B., FrCholm, L.O., Hr E.A., Rosenqvist, E., Selander, R.K. & Bjorvatn, B. (1990). Characteristics of serogroup A Neisseria meningitidis responsible for an epidemic in Ethiopia, 1988-89. Scandinavian Journal oflnfectious Diseases 22, 171-174. Thurm, V. & Ritter, E. (1993). Genetic diversity and clonal relationships of Acinetobacter baumannii strains isolated in a neonatal ward: epidemiological investigations by allozyme, whole-cell protein and antibiotic resistance analysis. Epidemiology and Infection 111, 491-498. Tomayko, J.E & Murray, B.E. (1995). Analysis of Enterococcusfaecalis isolates from intercontinental sources by multilocus enzyme electrophoresis and pulsed-field gel electrophoresis. Journal of Clinical Microbiology 33, 2903-2907. Trott, D.J., Robertson, I.D. & Hampson, D.J. (1993). Genetic characterisation of isolates of Listeria monocytogenes from man, animals and food. Journal of Medical Microbiology 38, 122-128. Trott, D.J., Atyeo, R.F., Lee, J.I., Swayne, D.A., Stoutenburg, J.W. & Hampson, D.J. (1996). Genetic relatedness amongst intestinal spirochaetes isolated from rats and birds. Letters in Applied Microbiology 23, 431-436. Trott, D.J., Jensen, N.S., Saint, G.I., Oxberry, S.L., Stanton, T.B., Lindquist, D. & Hampson, D.J. (1997a). Identification and characterization of Serpulina pilosicoli isolates recovered from the blood of critically ill patients. Journal of Clinical Microbiology 35, 482-485. Trott, D.J., Oxberry, S.L. & Hampson, D.J. (1997b). Evidence for Serpulina hyodysenteriae being recombinant, with an epidemic population structure. Microbiology 143, 3357-3365. Trott, D.J., Mikosza, A.S., Combs, B.G., Oxberry, S.L. & Hampson, D.J. (1998). Population genetic analysis of Serpulina pilosicoli and its molecular epidemiology in villages in the eastern High-
348 lands of Papua New Guinea. International Journal of Systematic Bacteriology 48, 659-668. Tzabar, Y. & Pennington, T.H. (1991). The population structure and transmission of Escherichia coli in an isolated human community; studies on an Antarctic base. Epidemiology and Infection 107, 537-542. van Alphen, L., Caugant, D.A., Duim, B., O'Rourke, M. & Bowler, L.D. (1997). Differences in genetic diversity of nonecapsulated Haemophilus influenzae from various diseases. Microbiology 143, 1423-1431. van der Zee, A., Mooi, E, Van Embden, J. & Musser, J. (1997). Molecular evolution and host adaptation of Bordetella spp.: phylogenetic analysis using multilocus enzyme electrophoresis and typing with three insertion sequences. Journal of Bacteriology 179, 6609-6617. V~izquez, J.A., De La Fuente, L., Berron, S., O'Rourke, M., Smith, N.H., Zhou, J. & Spratt, B.G. (1993). Ecological separation and genetic isolation of Neisseria gonorrhoeae and Neisseria meningitidis. Current Biology 9, 567-572. Versalovic, J., Kapur, V., Mason, E.O., Shah, U., Koeuth, T., Lupski, J.R. & Musser, J.M. (1993). Penicillin-resistant Streptococcus pneumoniae strains recovered in Houston: Identification and molecular characterization of multiple clones. Journal of Infectious Diseases 167, 850-856. Wachsmuth, I.K., Evins, G.M., Fields, EI., Olsvik, O., Popovic, T., Bopp, C.A., Wells, J.G., Carrillo, C. & Blake, EA. (1993). The molecular epidemiology of cholera in Latin America. Journal of Infectious Diseases 167, 621-626. Wallace, R.J.J., Musser, J.M., Hull, S.I., Silcox, V.A., Steele, L.C., Forrester, G.D., Labidi, A. & Selander, R.K. (1989). Diversity and sources of rapidly growing mycobacteria associated with infections following cardiac surgery. Journal of Infectious Diseases 159, 708-716. Wang, J.-E, Caugant, D.A., Li, X., Hu, X., Poolman, J.T., Crowe, B.A. & Achtman, M. (1992). Clonal and antigenic analysis of serogroup A Neisseria meningitidis with particular reference to epidemiological features of epidemic meningitis in China. Infection and Immunity 60, 5267-5282. Wang, J.-E, Caugant, D.A., Morelli, G., Koumar6, B. & Achtman, M. (1993). Antigenic and epidemiological properties of the ET-37 complex of Neisseria meningitidis. Journal of Infectious Diseases 167, 1320-1329. Wasem, C.E, McCarthy, C.M. & Murray, L.W. (1991). Multilocus enzyme electrophoresis of the Mycobacterium avium complex and other mycobacteria. Journal of Clinical Microbiology 29, 264-271. Weinberg, G.A., Ghafoor, A., Ishaq, Z., Nomani, N.K., Kabeer, M., Anwar, E, Burney, M.I., Qureshi, A.W., Musser, J.M., Selander, R.K. & Granoff, D.M. (1989). Clonal analysis of Haemophilus influenzae isolated from children from Pakistan with lower respiratory tract infections. Journal of lnfectious Diseases 160, 634-643. Whalen, C.M., Hockin, J.C., Ryan, A. & Ashton, E (1995). The changing epidemiology of invasive meningococcal disease in Canada, 1985 through 1992. Emergence of a virulent clone of Neisseria meningitidis. Journal of the American Medical Association 273, 390-394. Whatmore, A.M., Kapur, V., Sullivan, D.J., Musser, J.M. & Kehoe, M.A. (1994). Non-congruent relationships between variation in emm gene sequences and the population genetic structure of group A streptococci. Molecular Microbiology 14, 619-631. Whatmore, A.M., Kapur, V., Musser, J.M. & Kehoe, M.A. (1995). Molecular population genetic analysis of the enn subdivision of group A streptococcal emm-like genes: horizontal gene transfer and restricted variation among enn genes. Molecular Microbiology 15, 1039-1048. Whittam, T.S., Ochman, H. & Selander, R.K. (1983a). Geographic components of linkage disequilibrium in natural populations of Escherichia coli. Molecular Biological Evolution 1, 67-83. Whittam, T.S., Ochman, H. & Selander, R.K. (1983b). Multilocus genetic structure in natural populations of Escherichia coli. Proceedings of the National Academy of Sciences of the United States of America 80, 1751-1755. Whittam, T.S., Wolfe, M.L., Wachsmuth, I.K., Orskov, E, Orskov, I. & Wilson, R.A. (1993). Clonal relationships among Escherichia coli strains that cause hemorrhagic colitis and infantile diar-
349 rhea. Infection and Immunity 61, 1619-1629. Wise, M.G., Shimkets, L.J. & McArthur, J.V. (1995). Genetic structure of a lotic population of Burkolderia (Pseudomonas) cepacia. Applied and Environmental Microbiology 61, 1791-1798. Woods, T.C., McKinney, R.M., Plikaytis, B.D., Steigerwalt, A.G., Bibb, W.E & Brenner, D.J. (1988). Multilocus enzyme analysis of Legionella dumoffi. Journal of Clinical Microbiology 26, 799-803. Woods, T.C., Helsel, L.O., Swaminathan, B., Bibb, W.E, Pinner, R.W., Gellin, B.G., Collin, S.E, Waterman, S.H., Reeves, M.W., Brenner, D.J. & Broome, C.V. (1992). Characterization of Neisseria meningitidis serogroup C by multilocus enzyme electrophoresis and ribosomal DNA restriction profiles (ribotyping). Journal of Clinical Microbiology 30, 132-137. Young, J.EW. (1985). Rhizobium population genetics: enzyme polymorphism in isolates from peas, clover, beans and lucerne grown at the same site. Journal of General Microbiology 131, 2399-2408. Young, J.EW. & Wexler, M. (1988). Sym plasmid and chromosomal genotypes are correlated in field populations of Rhizobium leguminosarum. Journal of General Microbiology 134, 2731-2739. Zahner, V., Momen, H., Salles, C.A. & Rabinovitch, L. (1989). A comparative study of enzyme variation in Bacillus cereus and Bacillus thuringiensis. Journal of Applied Bacteriology 67, 275-282.
This Page Intentionally Left Blank
351
Author index Akkermans, A.D.L. 267 Caugant, D.A. 299 De Ryck, R. 159 de Vos, W.M. 267 Deplano, A. 159 Dijkshoorn, L. 1, 77 Grimont, E 107 Grimont, EA.D. 107 Grundmann, H. 135 Hauman, J. 47 Heersma, H.E 47
Heyndrickx, M. 211 Janssen, EJ.D. 177 Kremer, K. 47 Saunders, N.A. 249 Struelens, M.J. 159 Towner, K. 1,135 van Ooyen, A. 31 van Soolingen, D. 47 Vaneechoutte, M. 211 Zoetendal, E.G. 267
This Page Intentionally Left Blank
353
Keyword index Abiotrophia 217 Acanthamoeba 215, 218 Acholeplasma 217 Acinetobacter 4, 14, 20-21, 85-86, 96-97, 101-102, 125, 140, 149-150, 163, 172, 182, 195-196, 198-199, 213-214, 216-217, 220, 223-224, 232, 301 Actinobacillus 301 Actinomyces 217 Aeromonas 96, 182, 194-195, 198, 202, 217, 301 AFLP analysis 15-16, 42, 102, 177-205 Agrobacterium 217 Alcaligenes 217, 223,235 ALF analysis, see Fluorescence density profiles Alvinella 221 Amplified ribosomal DNA restriction analysis, see ARDRA Aneurinibacillus 233 Antibiogram typing 12-13, 42 AP-PCR, see RAPD analysis Arcobacteria 217 ARDRA 211-236 Area-sensitive coefficient 39 Armillaria 218 Automated laser fluorescence analysis, see Fluorescence density profiles Azospirillum 213, 217 Bacillus 114, 182, 202, 216-217, 223-226, 229, 232-235, 275, 280, 290, 301 Background subtraction 59-60 Bacteriocin typing 14-15 Bacteriodes 163, 215, 217, 268 Bartonella 217-218 Bifidobacterium 268 Biomphalaria 218
Biotyping 12 Blotting 112-113, 251 Bordetella 80, 163, 217, 223-224, 235,301,321 Borellia 218, 223, 301, 318 Bradyrhizobium 217 Branhamella 80,125 Brazilian purpuric fever 127 Brevibacillus 233-234 Brevibacterium 217 Brucella 126 Buffer compositions 82, 85,307 Bundle concept 69-70 Burkholderia 41-42, 125, 163, 183, 301,321 Butyrivibrio 280 Campylobacter 96, 163, 182, 201, 216-217, 301 Candida 140, 142 Capnocytophaga 217, 223 Cell envelopes, preparation of 86-87, 99-100 Chlamydia 182, 217, 219 CHEF 161,166 see also PFGE Chips, see DNA chips Citrobacter 163, 301 City-block distance 38 Cleavage, see Restriction endonuclease digestion Clone, definition of 7 Clostridium 96, 163, 182, 189, 217, 224, 280 Cluster analysis 32-44, 54-58, 226-228 Coagulase-negative staphylococci 96, 125 see also Staphylococcus spp. Comamonadaceae 217 Computer-assisted analysis of data
354 17-18, 49-74, 90-92, 116-124, 146-150, 169-172, 232, 315-317, 333 Corynebacterium 110, 128, 217-219, 301 Cryptococcus 218 Culture collections 23 Cyanobacteria 217 DAE see RAPD fingerprinting Database construction 47-74, 333 Dendrograms 33-34, 37, 55-58, 70-72, 122, 147-148, 170-171,233, 278-281 Desulfovibrio 286 DGGE 268, 284-291 Dice coefficient 40, 56-58, 68-69, 73, 121,146-149, 231-233 Digitisation, see Fluorescence density profiles, Scanning gel images Discriminatory capacity 19-20 DNA blotting, see Blotting DNA chips, see DNA microarrays DNA extraction, see Nucleic acid extraction DNA hybridisation 15-16, 25 DNA labelling 113-114, 145 DNA microarrays 25,262, 290-291 DNA probes 113-115 DNA sequencing 15-16, 25 see also MLST Electrophoresis 87-90, 100, 111-112, 143-145, 165, 189-191,230, 308 Emm gene typing 261-262 Entamoaeba 218 Enterobacter 12, 96, 125, 163,301 Enterococcus 125, 141,163,217, 301 Epidemiologic concordance 20 ERIC-PCR 138-139 see also RAPD fingerprinting ERS, see External reference strains and standards Escherichia coli 4, 11, 79-80, 85-86,
109-110, 113, 128, 163, 182, 201, 216, 252, 275, 301,305, 319 Eubacterium 217,280 Euclidian distance 38 Exchanging DNA patterns, see Bundle concept External reference strains and standards 50-51, 53, 61-67, 72-73, 90-91, 94-95, 165-166, 258 Fatty acid analysis 16, 18 FIGE 160-161 Fingerprint analysis 17-18, 47-74, 143-150 Fluorescence density profiles 143, 145-149, 189-190, 215-216 Four-parameter logistic model 120 Frankia 279-280 Fusobacterium 280, 287 Gardnerella 217 Gel compositions and preparation 82, 88 see also Starch gels Gel staining, see Staining gels Gene expression profiling 202-204 Genomovar 5 Genotypic typing methods 8-11, 15-17 Haloarcula 225 Halobacillus 232-233 Haemophilus 80, 127, 163, 216, 302, 321 Helicobacter 96, 182, 198, 205,217, 219, 224, 302, 321,328-335 HPLC 273 Hybridisation 114-115
Ideal typing scheme, properties of 16 Image conversion 52,118 Insertion sequence typing, see IS typing Internal size markers 50-51, 53,
355 61-67, 72-73 Introns 224 Inverse PCR 257-258 Inverse relationship 119 IS200 RFLP patterns 252-253 IS6110 RFLP patterns 48-74, 253-258 ISM, see Internal size markers Isolate, definition of 8 IS typing 249-258 Jaccard coefficient 39, 56-58, 73 Klebsiella 4, 21,125, 140, 163, 302 Labels, see DNA labelling Lactobacillus 3, 96, 216-217, 223, 268 Legionella 96, 108, 114, 125-126, 163, 172, 182, 197, 302, 318, 321 Leptospira 128, 217 Library typing methods 9-10 Linker-mediated PCR 255-257 Lipopolysaccharide analysis, see LPS analysis Listeria 11, 216-217, 229, 302, 319-320, 327-328 LPS analysis 78-80, 98-102 MEE, see MLEE Meloidogynidae 218 Methylophaga 127 Microarrays, see DNA microarrays Micro-heterogeneity 224-225 Microsporidia 218 MLEE 7, 16, 78-79, 299-328 MLST 7, 15-16, 78, 262, 328-335 Mosaic rRNA genes 224 MRSA 96, 141,167, 173,200 see also Staphylococcus aureus Multilocus enzyme electrophoresis, see MLEE Multilocus sequence typing, see MLST Mycobacterium spp. 163, 182, 197,
213, 217, 253-258, 302 Mycobacterium tuberculosis 21, 48-74, 216, 253-258, 260-261,302 Mycoplasma 182, 204, 217-218, 225 Nattrassia 232 Neighbour-joining 171,279 Neisseria 80, 163, 173, 215-217, 302, 313-314, 321-325,328-334 Nitrobacter 217, 220 Nocardia 213, 215, 217 Nomenspecies 5 Normalisation 53, 118, 145, 169-170, 192-193 Nucleic acid extraction 109-110, 140-141,163-165, 181-184, 269-273 Numerical taxonomy 17 Oligotyping 249-250, 259-262 see also Spoligotyping Optimisation 56-60 Ornithobacterium 303 Outer membranes, preparation of 86-87 Paenibacillus 182,196,225-226, 232-233 Pandoraea 42 Pasteurella 303 Pearson's correlation coefficient 38-39, 42, 56-60, 68-69, 118, 120-122, 148-152, 170-171 Peptostreptococcus 280 PFCE 161,173 PFGE 15-16, 19, 21, 42, 129, 159-173 Phage-typing 14, 20, 108 Phenotypic typing methods 8-13 Photorhabdus 218, 220, 223 Phytoplasma 218, 223 Plasmid typing 15-16 Porphyromonas 303 Position tolerance 59
356 Prevotella 218, 223,275, 303, 318-319 Primers 137-142, 188-189 Principal component analysis 32-33 Product-moment correlation coefficient, see Pearson's correlation coefficient Propionibacterium 127, 218 Protein analysis 16-17, 42, 78-98 Proteus 126-127 Providencia 96, 125-126 Pseudomonas 4, 14, 21, 125-126, 140, 163, 167, 172-173, 181-183, 197, 217, 223,275,290, 303 Pulsed-field gel electrophoresis, see PFGE Pyrobaculum 224
Quality control 92-95, 123-125 see also Standardisation Ralstonia 218 Random amplified polymorphic DNA, see RAPD analysis RAPD analysis 15-16, 135-154, 206 REP-PCR 15, 136, 138-139 see also RAPD analysis Reproducibility 20-21,193-194 see also Normalisation, Standardisation Resolution of scanned images 58-59 see also Scanning gel images Restriction endonuclease digestion 110, 162-163, 165, 179-181, 228-229 Restriction fragment length polymorphism, see RFLP analysis Reverse transcriptase PCR 273-276 RFE 160-161 RFLP analysis 15-16, 21, 48-74, 162-163 see also ARDRA, RAPD analysis, Ribotyping Rhizobium 218, 223, 303,321
Rhodococcus 125 Ribotyping 15-16, 42, 107-129 see also ARDRA, Riboprinting Riboprinting 21, 115-116, 123-124, 129 Rochalimea 218 RNA extraction, see Nucleic acid extraction rRNA analysis, see ARDRA, Ribotyping, Riboprinting, rRNA sequence diversity rRNA sequence diversity 5-6, 267-292 RT-PCR, see Reverse transcriptase PCR Ruminococcus 280 Saccharomonospora 218 Salmonella 14, 80, 108, 173, 182, 200-201,252-253,303,321 Scanning gel images 51-52, 58-59, 92, 117, 145, 169, 191-193, 230-231 Scytalidium 232 SDS-PAGE 16, 42, 78-98 Serotyping 13, 108 Serpulina 303, 321 Serratia 4, 21,140, 163,303 Shigella 14, 128 Silver-staining 100-101,189 Similarity coefficients 54-58 Similarity matrix 35-36, 68, 171 Simpson's index of diversity 19 Species, definition of 5 Sphingomonas 3 Spiroplasma 218 Spline 119 Spoligotyping 260-261 see also Oligotyping SSCP analysis 268, 283-284, 291 Stability 21-22, 93 Staining gels 83, 89-90, 100-101, 143-144, 189, 309 Standardisation 22, 94-95, 111,
357 118-119, 150-153, 192-193 Staphylococcus aureus 14, 79, 125, 163, 172, 182, 216, 321 see also MRS A Staphylococcus spp. 4, 42-43, 126, 163, 182, 199, 216, 303 Starch gels 306-308 Stenotrophomonas 125, 163, 182 Strain, definition of 7 storage of 23 Streptococcus 96, 125, 163, 173, 182, 199-200, 205, 218, 261-262, 275, 280, 303-304, 321,325-338 Streptomyces 3 Sub-species, definition of 6 TGGE 268, 271,284-291 Thermobispora 225 Thermoproteus 224 Thermus 275 Thiobacillus 218 Toxoplasma 141 Treponema 304 Tropheryma 219 Trypanosoma 218 Tsukamurella 125 Typability 22
Type, definition of 19 Typing system concordance 20 Unweighted pair group method using arithmetic averages, see UPGMA UPGMA 40, 42, 55-56, 69, 147-148, 170-171,279 Ureaplasma 218 Vacuum transfer, see Blotting Var(iety), definition of 6 Veillonella 216, 218 Vibrio 110, 127-128, 182, 196, 215, 218, 304 Virgibacillus 233-234 Ward's averaging 41 Weissella 97 Whole-cell lysates, preparation of 85-86, 99-100
Xanthomonas 183, 196-197, 218, 223 Xenorhabdus 218, 220, 223 Yeast 218 Yersinia 218
This Page Intentionally Left Blank