METHODS
IN
MOLECULAR BIOLOGY™
Series Editor John M. Walker School of Life Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK
For further volumes: http://www.springer.com/series/7651
DNA Barcodes Methods and Protocols Edited by
W. John Kress and David L. Erickson Department of Botany, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
Editors W. John Kress, Ph.D. Department of Botany National Museum of Natural History Smithsonian Institution Washington, DC, USA
David L. Erickson, Ph.D. Department of Botany National Museum of Natural History Smithsonian Institution Washington, DC, USA
ISSN 1064-3745 ISSN 1940-6029 (electronic) ISBN 978-1-61779-590-9 ISBN 978-1-61779-591-6 (eBook) DOI 10.1007/978-1-61779-591-6 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2012931933 © Springer Science+Business Media, LLC 2012 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, c/o Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Humana Press is part of Springer Science+Business Media (www.springer.com)
Foreword The diversity of life in a hectare of reef, a county of grassland, or a shipload of imports challenges biologists called to identify the species comprising biodiversity, functioning as ecosystems, or invading ports. The sequences of black-and-white barcodes that empower a newly hired clerk to wave a wand over a cart full of goods swiftly, print an itemized receipt infallibly, and order replacements invisibly call forth a vision of an analog for identifying species. The resemblance of barcodes on commercial products to sequences of DNA shown as black-and-white bars on electrophoretic gels reinforced the vision back in 2003 in the founding meetings of the barcode of life movement. This book edited by early adopters of DNA barcodes, John Kress and David Erickson, proves the barcode of life has arrived in environmental science. In less than a decade, they and the other authors in this volume have realized the vision of a short DNA sequence on a uniform locality of the genome to identify species rapidly and accurately. Because the currency in biology is species, their identification is no academic diversion. Biologists count the rise and fall of biodiversity in species. Regulators designate endangered species by their identified populations and reserve land where they identify the endangered. Governments appraise the success of preservation in the currency of species. Inspectors define quarantines in identified species. Biologists carry the weight of these consequences as they select an exact name from almost two million known species names or conjecture that a specimen may belong to one of millions more unknown species. Written as a sequence of four discrete nucleotides—CATG—along a uniform locality on a genome, a barcode of life provides a “digital” identifying feature, supplementing the more analog gradations of shapes, colors, and behaviors. A library of digital barcodes will provide an unambiguous reference that will facilitate identifying species invading and retreating across the globe and through centuries. Making a difficult task harder, many species metamorphose into different forms as they cycle through stages in their lives. Eggs may become caterpillars and caterpillars become butterflies but, of course, all remain the same species carrying the same genes. Different species may resemble one another or be too small to distinguish easily but each carries different alleles and thus barcodes, which can unmask their identity. Furthermore, an inspector of unloaded cargo on a dock or an analyst of the remains of diets in a stomach may be called to identify species from only a snippet, a hair, or a fin. The fragment may be unrecognizable, but it will faithfully carry the identifying barcode of the source. Since Carl Linnaeus (1707–1778) developed systematic naming, ranking, and classifying of organisms, biologists have produced master keys to all knowledge about a species in the form of binomial names. Biologists use distinguishing features, such as shape, color, or number of legs in taxonomic keys first to assign binomials, like Homo sapiens, and then to associate the names of the organisms with biological knowledge about the species and its relatives. Of course, the bank of names suffers some problems, such as when several names are applied to one species [1]. And, biologists continuously debate criteria for species. The diversity of life from bacteria to whales renders any single rule inadequate for defining all species. Nevertheless a few basic criteria, such as that distinct species do not interbreed and meld their genetic sequences, serve for many groups.
v
vi
Foreword
Since Charles Darwin (1809–1882) proposed a branching pattern of evolution in On the Origin of Species, biologists have sought to arrange a phylogenetic system of species on an evolutionary tree of life. A tree of life illustrates every introductory biology text. Barcoding will reveal whether a newly collected specimen belongs to a species already on the tree. Or if a specimen is a truly new species, barcoding will help place it as a new leaf among known species on the proper branch of the tree of life. Whatever the criteria for defining and recognizing species, their inheritance and their genes must differ to maintain species distinctions generation after generation. Since the molecular discoveries of the mid-twentieth century, genes intimate a code comprising sequences of the four nucleotides that constitute DNA. Even before the barcoding movement now embodied in the 200 member organizations from 50 countries of the Consortium for the Barcode of Life (CBOL), scientific revisions of species boundaries included DNA analysis, and the ability to distinguish new species included DNA divergences. The product barcode analogy leant impetus to the continuing matching of species and genetic differences. Commercial barcodes must be uniform across shelves and warehouses. For animals, concentration on the single segment of the mitochondrial COI gene across the far wider shelves of life imparted the necessary uniformity to avoid a Tower of Babel. Conceiving the series of nucleotides CATG as bars and their presence and order as digital bars opened the door to rapid and unambiguous connection of specimens. Instead of connecting biological specimens to shelves and suppliers, the DNA barcode of life connects them to curated collections in museums and herbaria, lifting their utility. It would also connect specimens to the biological literature of binomial names. DNA barcodes offer a globally consistent way to propose provisional or candidate species that experts have not yet honored with a full description and binomial name. Worries at first evoked by DNA barcoding have not been realized. It has heightened the nuance of the species concept, not diminished it. It has widened humanity’s view of diversity, not reduced diversity to ciphers. It has excited wonder at the knowledge hard-won through earlier techniques and accessible through the master key of binomial names. It has enhanced the need for systematists to match the flood of barcodes with a sound array of binomial names. Barcoding is not a mere slogan and an inadequate analogy. It is a now proven tool for understanding biodiversity. Recurring to the need for uniformity to avoid a Tower of Babel, the choice of a segment of the mitochondrial COI gene has excelled for almost all animal taxa. This barcode region meets four basic specifications: the locality must be present in all barcoded species; it must be shaved as short as possible; the locality must have sequences stable within a species through many generations; and it must nevertheless have sequences variable enough to distinguish species. As this book reports, botanists have now also found barcode regions that are proving successful from carrots and chamomile to oats and pines. Fungal barcodes are not far behind. Some observers do ask a single, searching question about the barcode of life arriving in environmental science: When will it be small, cheap, and convenient enough for nonexperts, even children? In particular, when will the needed equipment shrink to the size of a laptop or a handheld barcoder? In fact, even today the key machines have shrunk until they fit comfortably on a desk or tabletop. The analogy of the newly hired clerk faultlessly pricing the cart of goods suggests the ability to make taxonomic expertise go further, and very far if a handheld barcoder were present. Clues that such a goal will be achieved lie in reports of students detecting endangered marine species on sale in supermarkets, identifying insect traces in their homes, and
Foreword
vii
analyzing tea leaves with inexpensive equipment. As well as enabling specialized scientists to do more and lift the value of specimen collections, barcoding promises to enable laymen to appreciate the diversity of life. The array of opportunities offered by DNA barcodes must rest on a sound foundation of binomial names with associated, vouchered, and identified specimens—housed in readily accessible museum collections. A sound foundation of binomials based on new and existing natural history collections stands as the first priority for the success of DNA barcodes. Fortunately, the Global Names Architecture project associated with the Encyclopedia of Life has already amassed 19 million common and scientific names and is reconciling them for the two million or so species estimated to be known already. Within 5 years, we could celebrate the achievement of the international Barcode of Life (iBOL) project: access to the barcodes of an array of five million specimens sequenced from 500,000 species. Voucher specimens, which are prepared, curated, databased, often digitally imaged, and stored in natural history collections, will support this effort. Already, in just a handful of years, the DNA barcode of life database (www.boldsystems.org) has soared above 1.2 million specimens from about 150,000 species. Already, as the chapters in this book show, the library of barcodes linked to names and curated specimens is multiplying the knowledge of a marine ecologist about a reef, the quality of surveillance for invasive species, and the accuracy of labeling of food products. Such successes will motivate and sustain the further building of the reference library of barcodes and the removal of obstacles for its quick, frugal realization and use. Our vision, first inspired by a barcode wand in the hand of a supermarket clerk, is comparable magic for an ichthyologist on a research vessel with featureless fish larvae, a child on a woodland trail, or an inspector at a port infallibly identifying a species. Reading this book, we learn that science can make magic. New York, NY, USA
Jesse H. Ausubel Alfred P. Sloan Foundation
Reference 1. Patterson DJ, Cooper J, Kirk PM, Pyle RL, Remsen DP (2010) Names are key to the big new biology. Trends Ecol Evol 25:686–691.
Preface The use of a universally accepted short DNA sequence for identification of species has been proposed for application across all forms of life. Such a “DNA barcode,” a term first coined less than a decade before the publication of the present book, in its simplest definition is one or more short gene sequences (<700 base pairs) taken from a standardized portion of the genome that are used to identify species through reference to DNA sequence libraries or databases. DNA Barcodes: Methods and Protocols is a compendium of the latest information on generating, applying, and analyzing DNA barcodes across the Tree of Life, including animals, fungi, protists, algae, and plants. The volume is divided into five major sections: I. Introduction; II. DNA Barcodes for the Tree of Life; III. Generating DNA Barcode Data; IV. Applications of DNA Barcode Data; and V. Case Studies on DNA Barcodes. In preparing the volume, we recognized that DNA barcoding is much more than the sequencing of one or two genes from an organism. The endeavor has come to encompass many elements, from campaigns that provide a deterministic framework for how to build specimen libraries, to the bioinformatic systems needed to track the many samples, sequences, and meta-data that are linked to each individual. Our ability to apply DNA barcode data in diverse contexts is also critical, for just as a library of books is not useful without those who would apply the knowledge contained therein, so it is true that in applying DNA barcode data we fulfill its promise. To that end, this volume is intended as a roadmap, linking methods ranging from standard wet-lab methods, to methods of bioinformatics, statistical and ecological analysis and methods to guide future, large-scale collections campaigns. In the Introductory Section, background material is provided on the rationale for the use of DNA barcodes as well as a short history of the development of the concept of DNA barcoding to the different domains of life. In Section II, detailed protocols and methodologies for barcoding various types of organisms are presented. Although the field of barcoding is still in its infancy, specific methodologies have now been developed for organisms across the Tree of Life, and these chapters describe the most successful methods employed thus far. Although some of these protocols are still evolving, some have become more or less standard for particular taxonomic groups. Section III covers more broadly applicable topics that apply to barcoding any type of organism, such as sample acquisition and archiving, laboratory tracking of tissues and sequences, DNA extraction and amplification, using “mini-barcodes” for samples with degraded DNA, and generating barcodes with next-generation sequencing technology. In addition to these chapters on specific laboratory methodologies, Section IV is devoted to the applications of DNA barcodes in the fields of systematics, phylogeny, and community ecology. These chapters focus on analytic methods that in many cases are still in their infancy of development, but will be critical for those biologists who want to go beyond generating sequences for particular taxa and actually apply DNA barcodes to answer specific
ix
x
Preface
questions in ecology and evolution. Topics include evaluating the efficacy of DNA barcodes, species discovery using DNA barcodes, constructing phylogenetic trees using DNA barcode sequence data, and applying such phylogenies to understanding test hypotheses concerning the assembly of species into communities. In order to better understand how barcodes can be applied across specific taxonomic groups or to specific ecological situations, Section V provides two cases studies of ongoing, large-scale campaigns. The first case is a worldwide initiative to barcode the fishes of the world and the second example applies DNA barcoding to all tree species is represented in a worldwide network of forest dynamics plots (CTFS) as a tool for understanding community evolution an ecological forensics. DNA barcoding is a new and powerful basic research tool with exceptional potential for the incorporation of new technologies and for future applications. The volume closes with a vision by the Editors on the future of DNA barcoding. This book should be of benefit and interest to all biologists and technicians interested in the relevance and application of molecular biology and DNA sequencing to identification, taxonomy, evolution, and ecology. We would like to thank all of the authors of the chapters included in this book for opening up their laboratories for all readers to see how their protocols were developed and how they work to generate and analyze DNA barcodes. All of the contributors to this volume recognize that DNA barcoding is a rapidly changing field and that new methods are being proposed almost on a monthly basis and we thank them for sharing their most up-to-date information. We all hope that the methods and protocols contained herein will be helpful in advancing these efforts. Washington, DC, USA
W. John Kress David L. Erickson
Contents Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PART I
INTRODUCTION
1 DNA Barcodes: Methods and Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. John Kress and David L. Erickson
PART II
3
DNA BARCODES FOR THE TREE OF LIFE
2 Introduction to Animal DNA Barcoding Protocols . . . . . . . . . . . . . . . . . . . . . . . . Lee A. Weigt, Amy C. Driskell, Andrea Ormos, Christopher Meyer, and Allen Collins 3 DNA Barcodes for Insects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John James Wilson 4 DNA Barcoding Methods for Invertebrates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nathaniel Evans and Gustav Paulay 5 DNA Barcoding Amphibians and Reptiles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miguel Vences, Zoltán T. Nagy, Gontran Sonet, and Erik Verheyen 6 DNA Barcoding Fishes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lee A. Weigt, Amy C. Driskell, Carole C. Baldwin, and Andrea Ormos 7 DNA Barcoding Birds: From Field Collection to Data Analysis . . . . . . . . . . . . . . . Darío A. Lijtmaer, Kevin C.R. Kerr, Mark Y. Stoeckle, and Pablo L. Tubaro 8 DNA Barcoding in Mammals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natalia V. Ivanova, Elizabeth L. Clare, and Alex V. Borisenko 9 Methods for DNA Barcoding of Fungi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ursula Eberhardt 10 Methods for DNA Barcoding Photosynthetic Protists Emphasizing the Macroalgae and Diatoms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gary W. Saunders and Daniel C. McDevit 11 DNA Barcoding Methods for Land Plants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aron J. Fazekas, Maria L. Kuzmina, Steven G. Newmaster, and Peter M. Hollingsworth
PART III
v ix xiii
11
17 47 79 109 127 153 183
207 223
GENERATING DNA BARCODE DATA
12 Field Information Management Systems for DNA Barcoding . . . . . . . . . . . . . . . . John Deck, Joyce Gross, Steven Stones-Havas, Neil Davies, Rebecca Shapley, and Christopher Meyer
xi
255
xii
Contents
13 Laboratory Information Management Systems for DNA Barcoding. . . . . . . . . . . . Meaghan Parker, Steven Stones-Havas, Craig Starger, and Christopher Meyer 14 DNA Extraction, Preservation, and Amplification . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Knebelsberger and Isabella Stöger 15 DNA Mini-barcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehrdad Hajibabaei and Charly McKenna 16 Ways to Mix Multiple PCR Amplicons into Single 454 Run for DNA Barcoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ryuji J. Machida and Nancy Knowlton
PART IV
311 339
355
APPLICATIONS OF DNA BARCODE DATA
17 The Practical Evaluation of DNA Barcode Efficacy . . . . . . . . . . . . . . . . . . . . . . . . John L. Spouge and Leonardo Mariño-Ramírez 18 Plant DNA Barcodes, Taxonomic Management, and Species Discovery in Tropical Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christopher W. Dick and Campbell O. Webb 19 Construction and Analysis of Phylogenetic Trees Using DNA Barcode Data . . . . . David L. Erickson and Amy C. Driskell 20 Phylogenetic Analyses of Ecological Communities Using DNA Barcode Data . . . . Nathan G. Swenson
PART V
269
365
379 395 409
CASE STUDIES USING DNA BARCODES
21 FISH-BOL, A Case Study for DNA Barcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robert D. Ward 22 Generating Plant DNA Barcodes for Trees in Long-Term Forest Dynamics Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. John Kress, Ida C. Lopez, and David L. Erickson 23 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David L. Erickson and W. John Kress
423
441
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
467
459
Contributors JESSE H. AUSUBEL • Program for the Human Environment, The Rockefeller University, New York, NY, USA CAROLE C. BALDWIN • Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA ALEX V. BORISENKO • Biodiversity Institute of Ontario & Integrative Biology, University of Guelph, Guelph, ON, Canada ALLEN COLLINS • Department of Invertebrate Zoology, Smithsonian Institution, NMNH, Washington, DC, USA ELIZABETH L. CLARE • Biodiversity Institute of Ontario & Integrative Biology, University of Guelph, Guelph, ON, Canada NEIL DAVIES • Berkeley Natural History Museums, University of California at Berkeley, Berkeley, CA, USA JOHN DECK • Berkeley Natural History Museums, University of California at Berkeley, Berkeley, CA, USA CHRISTOPHER W. DICK • Department of Ecology and Evolutionary Biology and Herbarium, University of Michigan, Ann Arbor, MI, USA AMY C. DRISKELL • Laboratories of Analytical Biology, Smithsonian Institution, NMNH, Suitland, MD, USA URSULA EBERHARDT • CBS-KNAW Fungal Biodiversity Centre, Centraalbureau voor Schimmelcultures, Utrecht, The Netherlands DAVID L. ERICKSON • Department of Botany, Smithsonian Institution, National Museum of Natural History, Washington, DC, USA NATHANIEL EVANS • Florida Museum of Natural History, University of Florida, Gainesville, FL, USA ARON J. FAZEKAS • Department of Integrative Biology, University of Guelph, Guelph, ON, Canada JOYCE GROSS • Berkeley Natural History Museums, University of California at Berkeley, Berkeley, CA, USA MEHRDAD HAJIBABAEI • Biodiversity Institute of Ontario & Integrative Biology, University of Guelph, Guelph, ON, Canada PETER M. HOLLINGSWORTH • Royal Botanic Garden Edinburgh, Edinburgh, UK NATALIA V. IVANOVA • Biodiversity Institute of Ontario & Integrative Biology, University of Guelph, Guelph, ON, Canada KEVIN C.R. KERR • Department of Natural History, Royal Ontario Museum, Toronto, ON, Canada THOMAS KNEBELSBERGER • Senckenberg Research Institute, German Centre for Marine Biodiversity Research (DZMB), Wilhelmshaven, Germany NANCY KNOWLTON • Department of Invertebrate Zoology, Smithsonian Institution, National Museum of Natural History, Washington, DC, USA
xiii
xiv
Contributors
W. JOHN KRESS • Department of Botany, Smithsonian Institution, National Museum of Natural History, Washington, DC, USA MARIA L. KUZMINA • Biodiversity Institute of Ontario & Integrative Biology, University of Guelph, Guelph, ON, Canada DARÍO A. LIJTMAER • Ornithology, “Bernardino Rivadavia”, Buenos Aires, Argentina IDA C. LOPEZ • Department of Botany, Smithsonian Institution, National Museum of Natural History, Washington, DC, USA RYUJI J. MACHIDA • Department of Invertebrate Zoology, Smithsonian Institution, National Museum of Natural History, Washington, DC, USA LEONARDO MARIÑO-RAMÍREZ • National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA DANIEL C. MCDEVIT • Department of Biology, Centre for Environmental & Molecular Algal Research, University of New Brunswick, Fredericton, NB, Canada CHARLY MCKENNA • Department of Integrative Biology and Biodiversity Institute of Ontario,, University of Guelph, Guelph, ON, Canada CHRISTOPHER MEYER • Department of Invertebrate Zoology, MRC-163, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA ZOLTÁN T. NAGY • Royal Belgian Institute of Natural Sciences, Joint Experimental Molecular Unit, Brussels, Belgium STEVEN G. NEWMASTER • Department of Integrative Biology, University of Guelph, Guelph, ON, Canada ANDREA ORMOS • Laboratories of Analytical Biology, Smithsonian Institution, NMNH, Suitland, MD, USA MEAGHAN PARKER • Department of Invertebrate Zoology, MRC-163, Smithsonian Institution, National Museum of Natural History, Washington, DC, USA GUSTAV PAULAY • Florida Museum of Natural History, University of Florida, Gainesville, FL, USA GARY W. SAUNDERS • Department of Biology, Centre for Environmental & Molecular Algal Research, University of New Brunswick, Fredericton, NB, Canada REBECCA SHAPLEY • Structured Data Group, Google Research, Google Inc., Mountain View, CA, USA GONTRAN SONET • Royal Belgian Institute of Natural Sciences, Joint Experimental Molecular Unit, Brussels, Belgium JOHN L. SPOUGE • National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA CRAIG STARGER • Department of Invertebrate Zoology, MRC-163, Smithsonian Institution, National Museum of Natural History, Washington, DC, USA MARK Y. STOECKLE • The Rockefeller University, New York, NY, USA ISABELLA STÖGER • Bavarian State Collection of Zoology (ZSM), Munich, Germany STEVEN STONES-HAVAS • BioMatters Ltd., Auckland, New Zealand NATHAN G. SWENSON • Department of Plant Biology, Michigan State University, East Lansing, MI, USA PABLO L. TUBARO • Ornithology, “Bernardino Rivadavia”, Buenos Aires, Argentina MIGUEL VENCES • Division of Evolutionary Biology Zoological Institute, Technical University of Braunschweig, Braunschweig, Germany
Contributors
xv
ERIK VERHEYEN • Royal Belgian Institute of Natural Sciences, Vertebrate department, Brussels, Belgium ROBERT D. WARD • Wealth from Oceans Flagship, CSIRO Marine and Atmospheric Research, Tasmania, Australia CAMPBELL O. WEBB • Arnold Arboretum of Harvard University, Cambridge, MA, USA LEE A. WEIGT • Laboratories of Analytical Biology, Smithsonian Institution, NMNH, Suitland, MD, USA JOHN JAMES WILSON • Biodiversity Institute of Ontario, University of Guelph, Guelph, ON, Canada
Part I Introduction
Chapter 1 DNA Barcodes: Methods and Protocols W. John Kress and David L. Erickson Abstract DNA barcoding, a new method for the quick identification of any species based on extracting a DNA sequence from a tiny tissue sample of any organism, is now being applied to taxa across the tree of life. As a research tool for taxonomists, DNA barcoding assists in identification by expanding the ability to diagnose species by including all life history stages of an organism. As a biodiversity discovery tool, DNA barcoding helps to flag species that are potentially new to science. As a biological tool, DNA barcoding is being used to address fundamental ecological and evolutionary questions, such as how species in plant communities are assembled. The process of DNA barcoding entails two basic steps: (1) building the DNA barcode library of known species and (2) matching the barcode sequence of the unknown sample against the barcode library for identification. Although DNA barcoding as a methodology has been in use for less than a decade, it has grown exponentially in terms of the number of sequences generated as barcodes as well as its applications. This volume provides the latest information on generating, applying, and analyzing DNA barcodes across the Tree of Life from animals and fungi to protists, algae, and plants. Key words: DNA barcode, Identification, Taxonomy, Discovery, Ecology, Evolution
1. What Is DNA Barcoding? The taxonomic impediment that exists today for many systematists, field ecologists, and evolutionary biologists, i.e., determining the correct identification for any plant or animal sample in a rapid, repeatable, and reliable fashion, is a reality we all must accept (1). This taxonomic problem was a major reason for the development of a new method for the quick identification of any species based on extracting a DNA sequence from a tiny tissue sample of any organism. Appropriately called “DNA barcoding,” referring to the UPC labels one finds on commercial products, DNA barcodes consist of a standardized short sequence of DNA between 400 and 800 bp long that, in theory, can be easily isolated and characterized for all species on the planet (2, 3). By harnessing advances W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_1, © Springer Science+Business Media, LLC 2012
3
4
W.J. Kress and D.L. Erickson
in molecular genetics, sequencing technology, and bioinformatics, DNA barcoding is allowing users to quickly and accurately recognize known species and retrieve information about them. It also has the potential to speed the discovery of the thousands of species yet to be named. DNA barcoding has become a vital new tool for taxonomists who are charged with the inventory and management of the Earth’s immense and changing biodiversity (4). A DNA barcode, in its simplest definition, is one or more short gene sequences taken from a standardized portion of the genome used to identify species. The use of such short DNA sequences for biological identifications was first proposed by Paul Hebert and colleagues (2, 5) with the ultimate goal of quick and reliable species-level identifications across all forms of life, including animals, plants, and microorganisms. The concept of a universally recoverable segment of DNA that can be applied as an identification marker across species was initially applied to animals (5). However, a standard DNA barcode locus for plants was not accepted by the botanical community until 6 years after Hebert published his first paper on barcoding animals. After several broad screenings of gene regions in the plant genome (e.g., refs. 6–10), three plastid (rbcL, matK, and trnH-psbA) and one nuclear (ITS) gene regions have become the standard barcode of choice in most applications for plants and fungi (11–13).
2. The Uses of DNA Barcoding From its inception the primary use of DNA barcodes has been for species identification. As a research tool for taxonomists, barcoding assists in identification by expanding the ability to diagnose species by including all life history stages of an organism (e.g., seeds, seedlings, eggs, larvae, mature individuals both fertile and sterile), unisexual species, damaged specimens, gut contents, scats, and fecal samples. In addition systematists have the potential to quantify the consistency of their species definitions with a universal measure of genetic variability based on the barcode sequence data. For the applied users of taxonomy, barcoding is a tool to identify regulated species, including invasive and endangered species, as well as to test the identity and purity of biological products, such as seafood, herbal medicines, and dietary supplements. As a biodiversity discovery tool, barcoding helps to flag species that are potentially new to science, especially undescribed and cryptic species (see ref. 14). DNA barcodes are now also being used to address fundamental ecological and evolutionary questions, such as how species in plant communities are assembled (12, 15) and the degree of specialization in tropical versus temperate zone herbivores (e.g., ref. 16; see below).
1
DNA Barcodes: Methods and Protocols
5
It was not a coincidence that DNA barcoding developed in concert with genomics-based investigations in the first decade of the twenty-first century (17). DNA barcoding (a rapid tool for species identification based on DNA sequences) and genomics (a broad-based comparative approach to entire genome structure and expression) share an emphasis on large scale genetic data acquisition that offers new answers to questions previously beyond the reach of traditional disciplines. DNA barcodes, which in principle will eventually be generated and characterized for all species on the planet, are intended to be stored in an online digital library of sequences for matching and recognizing unidentified biological samples. Genomics has accelerated the process of recognizing novel genes and gene function through the comparisons of vast amounts of sequence data of the entire genomes of a limited number of taxa. In other words, the aim of DNA barcoding is to utilize the information of ONE OR A FEW gene regions to identify ALL species of life whereas genomics, the inverse of barcoding, describes in ONE OR A FEW (but eventually many) selected species the function and interactions across ALL genes. All other types of DNA sequence-based investigations of organisms, including population genetics and phylogenetics, fall between these two ends of the DNA spectrum.
3. DNA Barcoding Methods in Brief The process of DNA barcoding entails two basic steps: (1) building the barcode library of known species and (2) matching, or assigning the barcode sequence of the unknown sample against the barcode library for identification. The first step requires taxonomic expertise in selecting one or preferably several individuals per species to serve as reference samples in the barcode library. All taxonomists should generate DNA barcodes for the taxa in their monographs or at the least they should deposit verified DNA samples with their associated voucher specimens in core DNA barcode institutions. Tissue samples that yield high-quality DNA extractions in some cases can be obtained from specimens already housed in museum collections and herbaria. However, in most cases new tissues will be taken directly from live specimens in the field before they are prepared, labeled, and stored as voucher specimens in museum collections. These vouchers then serve as the permanent record that connects the DNA barcode to a particular species of plant, fungus, or animal. Once the reference barcode library is complete for the organisms under study, whether they comprise a geographic region, a taxonomic group, or a target assemblage (e.g., medicinal plants, timber trees, etc.), then the DNA barcodes generated for the
6
W.J. Kress and D.L. Erickson
unidentified samples are compared to the known barcodes using some type of matching algorithm. Most practical algorithms for species assignment start by comparing two DNA sequences to produce a distance measure between the sequences. In DNA barcoding, a sequence alignment algorithm is usually employed to assign an unknown sample to a known species by finding the closest database sequence to the sample sequence (18). Basic local alignment search tool (BLAST) is a matching tool that is provided through GenBank to search for correspondence between a query sequence and a sequence library. Two additional commonly used distance measures are the Kimura-2-Parameter Distance and the Smith– Waterman Algorithm (similar to BLAST) for Local Alignment Similarity (see ref. 19). For many users of DNA barcodes, the process ends after the unknown sample is correctly identified. However, barcodes can also be applied as tools for answering fundamental biological questions, such as how species are assembled into communities. This aspect of DNA barcoding has only recently been considered, but offers some of the most exciting prospects for using this new taxonomic tool.
4. A Short History of DNA Barcoding To be practical as a DNA barcode, a gene region must satisfy three criteria: (1) contain significant species-level genetic variability and divergence, (2) possess conserved flanking sites for developing universal PCR primers for the widest taxonomic application, and (3) be of appropriate sequence length so as to facilitate current capabilities of DNA extraction and sequencing. A fourth criterion for a successful DNA barcode relates to sequence quality and has been proposed by some as an important consideration (see CBOL Plant Working Group 2009). A short DNA sequence of 600 bp in the mitochondrial gene for cytochrome c oxidase subunit 1 (CO1; 2) generally fits these criteria and was accepted early on as a practical, standardized species-level barcode for many animals (see http://www.barcoding.si.edu/). The inability of CO1 to work as a barcode in plants and fungi (6, 8) required that botanists find a more appropriate marker. A number of candidate gene regions were immediately suggested as possible barcodes for plants (e.g. refs. 6, 7, 9, 10, 20, 21), but until 2009 none were universally accepted by the plant taxonomic community. This lack of consensus was in most part due to the limitations inherent in a plastid marker (i.e., low sequence variability) relative to CO1. In 2008, The Consortium for the Barcode of Life Plant Working Group convened a lengthy discussion on selecting an appropriate plant barcode and eventually published a paper (11) in which the largest
1
DNA Barcodes: Methods and Protocols
7
number of candidate barcode markers with the largest set of data were evaluated. Their results identified three markers that have become the most widely used barcode loci today: rbcL, matK, and trnH-psbA. Two, rbcL and matK, were identified as the core barcode loci and the third, trnH-psbA, was designated as an important supplementary marker to be further tested and used in appropriate cases. Some research groups continue to advocate additional markers for plants, such as ITS, for specific purposes or specific taxa.
5. What This Book Is About DNA Barcodes: Methods and Protocols provides the latest information on generating, applying, and analyzing DNA barcodes across the Tree of Life from animals and fungi to protists, algae, and plants. Background material is provided on the rationale for the use of DNA barcodes as well as detailed protocols and methodologies for barcoding various types of organisms. Topics include sample acquisition and archiving, laboratory tracking of tissues and sequences, sequencing protocols, data analyses, and informatics as well as case studies of particular taxonomic groups and DNA barcoding campaigns. In addition to these chapters on specific laboratory methodologies, information on the applications of DNA barcodes in the fields of systematics, phylogeny, and community ecology are provided for those who want to go beyond generating sequences for particular taxonomic groups. DNA barcoding is a new and powerful basic research tool with exceptional potential for the incorporation of new technologies and for future applications. This book should be of benefit and interest to all biologists and technicians interested in the relevance and application of molecular biology and DNA sequencing to identification, taxonomy, evolution, and ecology. References 1. Janzen DH (2005) Foreword: how to conserve wild plants? Give the world the power to read them. In: Krupnick GA, Kress WJ (eds) Plant conservation: a natural history approach. University of Chicago Press, Chicago, pp ix–xiii 2. Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications through DNA barcodes. Proc R Soc B 270:313–321. doi:10.1098/rspb.2002.2218 3. Savolainen V, Cowan RS, Vogler AP et al (2005) Towards writing the encyclopedia of life: an introduction to DNA barcoding. Philos Trans Ser B 360:1850–1811. doi:10.1098/ rstb.2005.1730
4. Cowan RS, Chase MW, Kress WJ, Savolainen V (2006) 300,000 species to identify: problems, progress, and prospects in DNA barcoding of land plants. Taxon 55:611–616 5. Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM (2004) Identification of birds through DNA barcodes. PLoS Biol 2:e312. doi:10.1371/journal.pbio.0020312 6. Kress WJ, Wurdack KJ, Zimmer EA et al (2005) Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci USA 102: 8369–8374 7. Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land plants: the
8
W.J. Kress and D.L. Erickson
8.
9.
10.
11.
12.
13.
14.
coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS One 2:e508. doi:10.1371/journal.pone.0000508 Chase MW, Salamin N, Wilkinson M et al (2005) Land plants and DNA barcodes: shortterm and long-term goals. Philos Trans R Soc Lond B Biol Sci 360:1889–1895. doi:10.1098/ rstb.2005.1720 Newmaster SG, Fazekas AJ, Steeves RAD, Janovec J (2008) Testing candidate plant barcode regions in the Myristicaceae. Mol Ecol Resour 8:480–490 Lahaye R, Van der Bank M, Bogarin D, Warner J et al (2008) DNA barcoding the floras of biodiversity hotspots. Proc Natl Acad Sci USA 105:2923–2928 CBOL Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci 106:12794–12797 Kress WJ, Erickson DL, Jones FA, Swenson NG, Perez R, Sanjur O, Bermingham E (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proc Natl Acad Sci USA 106:18621–18626. d o i : w w w. p n a s . o rg / c g i / d o i / 1 0 . 1 0 7 3 / pnas.0909820106 Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JA, et al. (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proc Natl Acad Sci USA 109:6241–6246 Hebert PDN, Penton EH, Burns JM et al (2004) Ten species in one: DNA barcoding
15.
16.
17.
18.
19.
20.
21.
reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator. Proc Natl Acad Sci USA 101:14812–14817. doi:10.1073/ pnas.0406166101 Kress WJ, Erickson DL, Swenson NG et al (2010) Advances in the use of DNA barcodes to build a community phylogeny for tropical trees in a Puerto Rican forest dynamics plot. PLoS One 5:e15409. doi:10.1371/journal. pone.0015409 Jurado-Rivera JA et al (2009) DNA barcoding insect-host plant associations. Proc R Soc Lond B Biol Sci 276:639–648 Kress WJ, Erickson DL (2008) Commentary DNA barcoding: genes, genomics, and bioinformatics. Proc Natl Acad Sci USA 105: 2761–2762 Ratnasingham S, Hebert PDN (2007) BOLD: The barcode of life data system (www.barcoding life.org). Mol Ecol Notes 7:355–364 Erickson DL, Spouge J, Resch A et al (2008) DNA barcoding in land plants: developing standards to quantify and maximize success. Taxon 57:1304–1316 Taberlet P, Coissac E, Pompanon F et al (2007) Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding. Nucleic Acids Res 35:e14. doi:10.1093/nar/ gkl938 Chase MW, Cowan RS, Hollingsworth PM et al (2007) A proposal for a standardised protocol to barcode all land plants. Taxon 56: 295–299
Part II DNA Barcodes for the Tree of Life
Chapter 2 Introduction to Animal DNA Barcoding Protocols Lee A. Weigt, Amy C. Driskell, Andrea Ormos, Christopher Meyer, and Allen Collins Abstract Procedures and protocols common to many DNA barcoding projects are summarized. Planning for any project should emphasize front-end procedures, especially the “genetic lockdown” of collected materials for downstream genetic procedures. Steps further into the DNA barcoding process chain, such as sequencing, data processing, and other back-end functions vary slightly, if at all, among projects and are presented elsewhere in the volume. Point-of-collection sample and tissue handling and data/metadata handling are stressed. Specific predictions of the future workflows and mechanics of DNA barcoding are difficult, so focus is on that which most or all future methods and technologies will surely share. Key words: DNA barcoding, Animals, ATBI, Genetic preservation
1. Introduction The Smithsonian’s LAB, part of the National Museum of Natural History, in partnership with scientists from our own museum and around the world, have generated DNA barcodes from 22 of the 32 recognized phyla of animals (1–8). We have been actively heading the Leading Labs Network of the Consortium for the Barcode of Life (CBOL) and are participating in multiple DNA barcoding campaigns. These efforts have afforded us the opportunity to be heavily involved in all aspects of DNA barcoding for many different groups of organisms—from sample collection in the field, through lab processes and data analysis, to publication. We work on projects that range from a few to thousands of specimens and on projects taking environmental sampling approaches. When we started this work, we were frustrated by the numerous inefficiencies in all steps of the process, from field collections to database submission. We have invested much effort in an attempt to make things less cumbersome— simplifying or eliminating renaming photographs, minimizing the W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_2, © Springer Science+Business Media, LLC 2012
11
12
L.A. Weigt et al.
number of times a specimen is handled, developing field notebook to lab notebook to final database interconnectivity, and improving lab processes and monitoring. Redundant efforts take time and money and introduce opportunities for human error. We have worked to avoid or eliminate many of these from our barcoding pipeline. After all—how many times do you want to type and retype (accurately) the scientific name of the scrawled cowfish (Acanthostracion quadricornis) or repeatedly try to find the correct well in a 384-well plate with a single pipet tip? Now that many DNA barcoding campaigns and large-scale projects are well underway, where are we, what have we learned, where are we going, and what advice do we need to heed going forward? Who could have predicted the technological fallout resulting from the genomic revolution? Where will we be in terms of biotechnology and the application of new instrumentation to biodiversity documentation in 10 or 20 years? Even attempting to answer some of these questions promises to be a near-sighted or short-term effort at best. Our goal here is to present an overview of approaches with a view to the long-term. We want to emphasize aspects not only specific to the current methods, but that reflect our current best knowledge on how to enable future research, procedures, and techniques. In this light, the overall guiding principle of all efforts to acquire tissues should stress the critical nature of what we call “genetic lockdown”— stabilizing and securing the specimens, tissues, and DNA extracts for future genetic work as early as possible in the process chain and keeping them stable, secure, and safe from that point on. This will require different procedures in different collecting circumstances: preserving an entire specimen (or environmental sample) in such a way as to enable downstream DNA applications; or rough sorting and tissue subsampling in the field; or taking along an automated DNA extractor for on-site DNA extractions. Although new techniques for recovery of DNA from formalin-fixed and/or ethanol preserved specimens continue to be evaluated (9), most of these are limited in terms of amplicon size, success rates, and parts of the genome that are accessible, and they are not amenable to highthroughput methods. Therefore, they are to be avoided if better quality specimens can be obtained. Better quality specimens will also have greater future utility beyond present DNA barcoding methods. DNA extractions can be performed with many specific protocols highlighted in other chapters in this volume, but we indicate what to strive for in your method of choice, and point out strengths and weaknesses of various approaches. Subsequent to a high-quality, high molecular weight, archivalquality DNA extract free of secondary compounds and other PCR inhibitors, PCR for DNA barcoding for most animal groups follows very similar procedures. The primary difference is primer selection, and group-specific primers which enhance success for the
2
Introduction to Animal DNA Barcoding Protocols
13
barcoding of many animal groups are already available. We will present generalities, what we use as controls if primers are not working for your group, and provide some common primers and PCR optimization strategies. Once you have a successful PCR product (a clean, single-band of the target size), most animal processes converge onto a similar path—(1) purification of the PCR product; (2) cycle sequencing and subsequent reaction purification; (3) running the cleaned reaction product on an “automated sequencer” or genetic analyzer; (4) processing and quality control of the raw sequence data; (5) submission to databases and repositories. Therefore, we do not spend any time or space on these post-PCR steps. Finally, two nice resources: the first is the two-part manual put out in 2010 by ABC Taxa on protocols for All Taxa Biodiversity Inventories (10). It contains many chapters relevant to animal barcoding and includes chapters on all vertebrate groups, insects and canopy arthropods, soil and litter sampling, marine and continental freshwater habitats; the second is the Consortium for the Barcode of Life’s social network portal (http://connect. barcodeoflife.net/) that is a fantastic option to get assistance and information prior to setting out on a new project, or if you run into problems. This is your barcoding community resource, and should be a first avenue to seek guidance or answers. An abbreviated general Materials and Methods is presented; specifics should be found in the taxon-specific chapters.
2. Materials 2.1. Sample collection (a) Proper disposable or easily sterilized tools. (b) Proper individual storage containers for the organisms and tissues. (c) Data collection tools to handle specimens, tissues. (d) Photodocumentation materials (digital camera with appropriate lens(es), memory cards, backup hard drives). 2.2. Storage buffers (a) VPLN dewar or dry ice and cooler (see Note 1). (b) Salt solution (11). (c) EtOH—95% (nondenatured) (see Note 2). (d) Formalin or other voucher specimen preservation solution(s) (see Chapter 4 for specimen handling solutions and ref. 12).
14
L.A. Weigt et al.
2.3. Extraction components (see Note 3) (a) Lysis buffer for extraction method (see Note 4). (b) Proper plates, tubes or storage vessels (see Note 5). (c) When possible, on-site portable DNA extractor (see Note 6). 2.4. PCR components (a) PCR reaction ingredients and primers (see Note 7). (b) Positive control 16S or 18S primers (13). 2.5. Sequencing, data QC, and analysis—see other chapters in this volume.
3. Methods 3.1. Sample collection—methods will vary by taxonomic group and habitat. The EDIT ATBI volumes (10) are a great source of taxon and/or habitat specific methods as well as the chapters that follow. An excellent summary for marine invertebrates is provided in ref. 12 for relaxation, fixation, preservation as well as specific procedures by taxon; similar resources exist for many groups and can be found via networking with active research groups and labs. The primary goal should always be preserving the integrity of the DNA, and trying to maintain a high-quality voucher specimen. One without the other loses significant value. Photo documentation—for groups where it is necessary, living color patterns or morphology should be captured prior to tissue subsampling, if that will decrease the value of the image. However, some methods (i.e., fin painting with formalin) can degrade the DNA, so care should be taken to preserve the integrity of the genetic material. 3.2. Tissue subsampling—as soon as possible after collection (and potentially the death of the organism) the tissue subsampling needs to occur in order to stop the degradation process. There are many ways of accomplishing this, and many options for storage, transport, and DNA extraction, but the emphasis should always be to quickly stop degradation, then to create a high-quality, high molecular weight, archival DNA extract that will have maximal utility going forward. 3.3. DNA extraction methods—there are several alternatives for acceptable DNA extraction methods that yield a quality product from multiple sources and taxa that can be useful for decades to come. It is advisable to ensure, via preliminary experimentation on a few samples, that the methods work
2
Introduction to Animal DNA Barcoding Protocols
15
prior to destroying the tissues of new taxa for all the specimens on hand. 3.4. PCR methods—see Chapter 4 for a table of primers by taxon. 3.5. Sequencing, data QC, and analysis—see other chapters in this volume.
4. Notes 1. Freezing tissues is frequently optimal, sometimes difficult; vapor phase liquid nitrogen requires proper tanks and materials (−20°C storage for short term can be adequate). 2. 70% ethanol, denatured ethanol, and isopropanol should be avoided if possible because preservation and/or utility might be compromised. 3. It is advisable to pretest different extraction protocols for new taxonomic groups—different methods have different strengths and weaknesses when it comes to yield and to secondary compound and PCR inhibitor carryover. 4. We have found that putting tissues or minced tissues from many phyla directly into the M2 buffer of the Autogen prep protocol for transport back to the lab works well. 5. It is important to use proper seals to avoid well-to-well contamination during transport. One should be cognizant of contamination issues when transporting different vessel types with less than perfect sealing mechanisms—for example, sticky foil and sealing tapes are disastrous, and most of the plate sealers are insufficient—each well of the plate should be sealed independently. 6. When logistics or permits require, it is possible to extract DNA in 96-well plates at many field sites (as long as they have electricity) with instruments, such as the Qiagen BioSprint or Thermo KingFisher and a magnetic bead protocol. 7. Several taxonomic groups have benefited from PCR primer optimization or redesign. Some homework in NCBI’s BarStool, on BOLD and literature searches might yield primers that will increase PCR amplification success.
Acknowledgments All of us work with teams of taxonomists, field assistants, lab technicians, and colleagues in our own institution and departments as well as at numerous other institutions around the globe—this
16
L.A. Weigt et al.
chapter represents what those people have taught us and learned with us over the years and they share in the effort and credit but are too many to name. References 1. Dove CJ, Rotzel NC, Heacker M, Weigt LA (2008) Using DNA barcodes to identify bird species involved in birdstrikes. J Wildlife Manag 72:1231–1236 2. Kerr KCR, Stoeckle MY, Dove CJ, Weigt LA, Francis CM, Hebert PD (2007) Comprehensive DNA barcode coverage of North American birds. Mol Ecol Notes 7:535–543 3. Baldwin CC, Mounts JH, Smith DG, Weigt LA (2008) Genetic identification and color descriptions of early life-history stages of Belizean Phaeoptyx and Astrapogon (Teleostei: Apogonidae) with comments on identification of adult Phaeoptyx. Zootaxa 2008:1–22 4. Baldwin CC, Weigt LA, Smith DG, Mounts JH (2009) Reconciling genetic lineages with species in Western Atlantic Coryphopterus (Teleostei: Gobiidae). Smithson Contrib Mar Sci 38:111–138 5. Baldwin CC, CI Castillo, LA Weigt, BC Victor. (2011) Seven new species within western Atlantic Starksia atlantica, S. lepicoelia, and S. sluiteri (Teleostei: Labrisomidae), with comments on congruence of DNA barcodes and species. Zookeys 79:21–72 6. Tornabene L, Baldwin C, Weigt LA, Pezold F (2010) Exploring the diversity of Western Atlantic Bathygobius (Teleostei: Gobiidae) with cytochrome c oxidase-I, with descriptions of two new species. Aqua 16:141–170
7. Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biol 3:e422 8. Plaisance L, Knowlton N, Paulay G, Meyer C (2009) Reef-associated crustacean fauna: biodiversity estimates using semi-quantitative sampling and DNA barcoding. Coral Reefs 28: 977–986 9. Palero F, Hall S, Clark PF, Johnston D, Mackenzie-Dodds J, Thatje S (2010) DNA extraction from formalin-fixed tissue: new light from the deep sea. Scientia Marina 74:465–470 10. Eymann J, Degreef J, Hauser C, Monje JC, Samyn Y, VandenSpiegel D (2010) Manual on field recording techniques and protocols for all taxa biodiversity inventories. ABC Taxa 8:653 11. Seutin G, White BN, Boag PT (1990) Preservation of avian blood and tissue samples for DNA analysis. Can J Zoo 69:82–90 12. Templado J, Paulay G, Gittenberger A, Meyer C (2010) Sampling the marine realm. Chapter 11 In: Eymann J, Degreef J, Hauser C et al (eds) Manual on field recording techniques and protocols for all taxa biodiversity inventories. vol 8 ABC Taxa (Belgium), pp 653 13. Simon C, Frati F, Beckenbach A et al (1994) Evolution, weighting, and phylogenetic utility of mitochondrial gene sequences and a compilation of conserved polymerase chain reaction primers. Ann Ento Soc Am 87:651–701
Chapter 3 DNA Barcodes for Insects John James Wilson Abstract DNA barcoding refers to the technique of sequencing a short fragment of the mitochondrial cytochrome c oxidase subunit I (COI) gene, the “DNA barcode,” from a taxonomically unknown specimen and performing comparisons with a reference library of barcodes of known species origin to establish a specieslevel identification. The library barcodes gain their value due to an intimate association—through the vouchered specimens from where they came—with other data; particularly Linnaean names, collection localities, and morphology in the form of digital images. Consequently, this chapter details means of efficiently obtaining barcodes along two general streams: rapid barcode assembly to populate the library and retrieval of barcodes from highly prized specimens, but also emphasizes organization and collection of the barcode collaterals. Key words: BOLD, Databasing, Tissue subsampling, DNA extraction, High-throughput, DNA amplification, Sequencing, Sequence editing, Sequence aligning, Cytochrome oxidase, DNA barcoding
1. Introduction The first task of DNA barcoding is the association of sequences with species names (1). These sequences, from “correctly” identified individuals, delineated with external information and classical morphological methods, are then incorporated into the reference barcode library—the Barcode of Life Datasystem (BOLD) (2). Campaigns charged with the goal of populating BOLD have embraced different approaches: large-scale regional inventories using freshly caught specimens (3, 4) or barcode “blitzes” of national parks (5) or museum collections (6). Sequence acquisition is the primary driver and consequently specimens may have “interim” taxonomic names (4). Alternative campaigns focus on
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_3, © Springer Science+Business Media, LLC 2012
17
18
J.J. Wilson
providing “accurate” taxonomic names for sequences, e.g., Lepidoptera Barcode of Life: Sphingidae (7). These approaches do not truly represent a dichotomy, all have the same goal and often are combined (e.g., refs. 8–10); however, the diversity demonstrates a requirement for rapid, efficient, high-throughput laboratory methods to get from “field to fasta,” together with methods for important specimens such as types which may deserve more individualized analysis. Although the molecular methods in this chapter focus on generating library sequences, they can also be used by those wishing to use the library as an identification key. Through sequence comparisons, the reference library can be employed to identify unknown newly collected individuals to a species-level taxon. While much deliberation has surrounded the choice of particular criteria used during sequence comparison (11), DNA barcoding for identification to species is now relatively uncontroversial. This vision of DNA barcoding still requires the protracted process of traditional species description (12, 13) and growth of the library through generating barcodes for “known” species is ultimately restricted by what is already “known.” Another option is for barcoding to involve curation of the reference library though reciprocal illumination. Many studies document cases where species taxonomy was considerably and rapidly improved through the combination of DNA barcodes, morphological and ecological analyses and then swiftly incorporated back into an improved barcode reference library (e.g., ref. 9). We see the incorporation of “collateral” information in the library as vital to the success of the barcoding initiative. In fact, the connection of sequences (digital data easily transferred and analyzed) with other data (images, collection data, and historical taxonomy) is exactly what makes barcodes valuable communicators of biodiversity. Therefore our chapter provides protocols for the generation of the collaterals alongside generation of the sequences.
2. Materials 2.1. Specimen Collection
1. 99.9% Ethyl alcohol (Commercial Alcohols). Store in a flammable liquids cabinet.
2.2. Tissue Subsampling
1. ELIMINase® (Decon Labs Inc.™). 2. KimWipes (Kimberly–Clark Corporation). 3. Forceps (Fine Science Tools). 4. Microplate (Eppendorf). 5. Cap-strips (ABgene).
3
2.3. DNA Extraction and Lysis Buffers
DNA Barcodes for Insects
19
1. 0.5 M EDTA pH 8.0: 186.1 g EDTA (Fisher Scientific®), ~20.0 g NaOH (Fisher Scientific®), made up to 1,000 ml with ddH2O. Vigorously mix on magnetic stirrer with heater. The disodium salt of EDTA will not go into solution until the pH of the solution is adjusted to ~8.0 by the addition of NaOH. Give a brief rinse to NaOH granules with ddH2O in a separate glass before dissolving them. 2. 1 M Tris–HCl pH 8.0: 26.5 g Trizma® base (Sigma®), 44.4 g Trizma® HCl (Sigma®), made up to 500 ml with ddH2O. 3. Proteinase K 20 mg/ml: Add 5 ml of ddH2O to a 100 mg package of Proteinase K (Promega®). Store in 0.5 ml aliquots at –20°C. 4. 0.1 M Tris–HCl pH 6.4: 6.06 g Trizma® base made up to 500 ml with ddH2O. Adjust pH with HCl to 6.4–6.5. 5. 1 M NaCl: 29.22 g of NaCl (Fisher Scientific®) made up to 500 ml with ddH2O. 6. 1 M Tris–HCl pH 7.4: 9.7 g Trizma® base, 66.1 g Trizma® HCl, made up to 500 ml with ddH2O. 7. Insect Lysis Mix: 16.5 g of GuSCN (Sigma®), 12 ml of 0.5 M EDTA pH 8.0, 6 ml of 1 M Tris–HCl pH 8.0, 1 ml Triton X-100 (Sigma®), 10 ml Tween-20 (Fluka®), made up to a final volume of 200 ml with ddH2O. 8. Cap-strips.
2.4. High-Throughput DNA Extraction
1. AcroPrep™ 96 1 ml filter plate with 3.0 μm Glass Fiber media over 0.2 μm Bio-Inert membrane, natural housing (PALL®). 2. Axyseal™ sealing film (Axygen Scientific®). 3. PP MASTERBLOCK®, 96-well, 2 ml (Greiner Bio-One®). 4. Binding Buffer: 354.6 g of GuSCN, 20 ml of 0.5 M EDTA pH 8.0, 50 ml of 0.1 M Tris–HCl pH 6.4, 20 ml of Triton X-100, made up to final volume of 500 ml with ddH2O. Vigorously mix on magnetic stirrer with heater. If any recrystallization occurs, pre-warm at 56°C to dissolve before use. 5. Protein Wash Buffer: 26 ml of Binding Buffer, 70 ml of EtOH 96%, made up to 100 ml with ddH2O. Stable at room temperature for ~1 week, discard if any crystallization occurs. 6. Wash Buffer: 300 ml of EtOH 96%, 23.75 ml of 1 M NaCl, 4.75 ml of 1 M Tris–HCl pH 7.4, 0.475 ml 0.5 M EDTA pH 8.0, made up to 475 ml with ddH2O. Mix well, store at −20°C. 7. Microplate.
2.5. Archival Specimen DNA Extraction
1. DNeasy 96 Blood & Tissue Kit single columns (Qiagen): Buffers AL, AW1, AW2, and AE are included in the kit. 2. EtOH 96%. 3. Fisherbrand Premium Flat Top Microcentrifuge Tubes 1.5 ml (Fisher Scientific).
20
J.J. Wilson
2.6. PCR Amplification
1. ELIMINase®. 2. KimWipes. 3. D-(+)-trehalose dehydrate (Sigma-Aldrich). 4. 10× PCR Buffer supplied with enzyme (Invitrogen). 5. 50 mM MgCl2 (Invitrogen). 6. 10 mM dNTP mix (New England Biolabs). 7. 100 μM Primer Stock: Dissolve desiccated primer (Integrated DNA Technologies) in ($ × 10) μl ultrapure ddH2O. $ is different for every primer and is the number measured in “nmol” that can be found on the tube which the desiccated primer arrives in. Store at –20°C. 8. Taq Polymerase (Invitrogen). 9. Microplate. 10. Cap-strips.
2.7. High-Throughput PCR Check
1. 2% Agarose E-gel® 96 gel (Invitrogen).
2.8. Single Specimen PCR Check
1. 50× TAE buffer: Dissolve 242 g Tris(hydroxymethyl)aminomethane in 500 ml ddH2O, add 57.1 ml glacial acetic acid and 100 ml 0.5 M EDTA and make up to 1,000 ml with more ddH2O.
2. Mother E-Base™ (Invitrogen).
2. Ultrapure Agarose (Fisher Scientific). 3. Parafilm (Fisher Scientific). 4. 6× Loading dye (Fermentas, Thermo Fisher Scientific). 5. 100-bp DNA ladder (Invitrogen). 6. GelRed 10,000× in water (Biotium). 2.9. Cycle Sequencing
1. ELIMINase®. 2. KimWipes. 3. Microplate. 4. Dye terminator mix v3.1 (Applied Biosystems). 5. 5× ABI Sequencing buffer (Applied Biosystems). 6. D-(+)-trehalose dehydrate. 7. Primer.
2.10. Sequencing Clean-Up
1. Sephadex® G-50 (Sigma-Aldrich). 2. MultiScreen® Column Loader (Millipore). 3. Acroprep™ 96 Filter plate with 0.45-μm GHP membrane (PALL).
3
DNA Barcodes for Insects
4. MicroAmp® Optical 96-well Reaction BioSystems®, Cat. No. N801-0560).
Plate
21
(Applied
5. 0.1 mM EDTA pH 8.0 (Fisher Scientific). 6. Septa (Applied Biosystems).
3. Methods 3.1. Barcode of Life Datasystem
1. BOLD (2) is the recommended place to manage your barcoding efforts. We advise initiating your online “project” as the first step in any new study. 2. To create a user account on BOLD, navigate to the homepage http://www.boldsystems.org. 3. Under MANAGEMENT & ANALYSIS click on *Request a new user account. 4. Fill in the required personal details, invent and reconfirm a password, and Submit Request. You will receive an e-mail with your username and password. 5. To create a new project, log in and under Project Options in the left-hand column click Create New Project (see Note 1). 6. The Project Title should be meaningful to facilitate easy recall later (e.g., “Moths of the Olympic Peninsula”). 7. The Project Code is a short form of the title (3–5 letters, e.g., “LOP”) and forms the basis of Process IDs (see Note 2). 8. Select COI-5P—Cytochrome Oxidase subunit 1 5¢ Region as Primary Marker (see Note 3). 9. At the bottom of the form you can Assign Users of the project and select the type of access they will have. 10. Save the new project (see Note 4). 11. Each project can initially hold 999 records. If you will be submitting more than 999 specimens, repeat steps 1–5 above to create additional projects. Later the data can be temporarily merged online for analysis (see Note 5).
3.2. Specimens
1. For beginners to insect collecting, Basic Techniques for Observing and Studying Moths and Butterflies (14) is a good resource regardless of taxon focus. 2. For those eager to expand to large-scale regional inventories, Janzen et al. (4) provide details of a mammoth “guinea pig” initiative in Area de Conservacion Guanacaste, Costa Rica. 3. When collecting insects for DNA barcoding collectors should consider the data required by BOLD and GenBank to make sequence records “BARCODE” compliant (15).
22
J.J. Wilson
Table 1 Specimen collection information Field ID
Collectors Collection date Location (GPS) Notes
QUI001 Billy Black 13/9/2010
47.9053, -124.6261
Storage conditions
Light trap Collected into an individual tube of 99.9% EtOH which was transferred to the −20°C freezer on return to lab
4. The collector should note the way in which the insect was collected, e.g., light trap (see Table 1). 5. The storage conditions (see Table 1) should be recorded—Was the specimen stored in EtOH? Was it oven-dried? This information is crucial for understanding incidents of low DNA yield (see Note 6). 6. Care should be taken to prevent DNA degradation in your collected specimens. We advise specimens be collected into an individual tube of 99.9% EtOH (see Note 7) which is later deposited into a freezer (−20°C). If this is not possible, longerterm storage under ethanol at room temperature should suffice (16). 7. The ethanol should be frequently changed to ensure it remains at high concentration (as water diffuses out of the specimen, the ethanol concentration decreases) (16). 8. Museum collections can also be mined for barcoding (6). The most recently collected specimens and those preserved in ways which minimize DNA degradation are preferred, as are those with a good “record” (i.e., with information available to complete Table 1, and especially types). 3.3. Submitting Specimen Data to BOLD
1. Data can be submitted by typing directly on the webpage, but using a spreadsheet will most likely save time. Most data can be copied directly from the field records (e.g., Table 1). 2. Download a blank data template from BOLD (from http:// www.boldsystems.org click Documentation > Data management > Data submission protocol > download blank data submission template). 3. Open the template in Microsoft Excel. 4. On the Voucher Info sheet, enter Sample IDs (see Note 8). 5. Museum ID and Collection Code are optional (see Note 9). 6. Institution Storing must be completed for each sample. This is the location of the voucher specimen, not subsampled pieces of tissue (can be a private collection, see Table 2, museum or university).
3
DNA Barcodes for Insects
23
Table 2 Minimum data required for submission of specimen records to BOLD Voucher info sheet
Taxonomy sheet
Collection data sheet
Sample ID
Field ID
Institution Storing
Phylum
Country
QUI001
QUI001
Research collection of Carlisle Cullen
Arthropoda
USA
7. Proceed to the Taxonomy sheet and complete as fully as possible. 8. Proceed to the Specimen Details sheet and enter all known information (see Note 10). 9. Proceed to the Collection Data sheet and enter all known information (see Note 11). 10. Save the file (File > Save). 11. Spreadsheets can be uploaded by sending them through e-mail to
[email protected] and must contain data in the Sample ID, Field ID, Institution Storing, Phylum, and Country columns (Table 2). 3.4. Specimen Imaging
1. Make a pedestal by mounting a 15 cm piece of drinking straw vertically in a wooden base. Plug the top of the straw with modeling clay to allow single-specimens to be pinned in the top (14). 2. Take pictures using the high-quality mode on your camera. If a fairly wide aperture (for shallow depth of field) is employed, background shadows will be negligible (14). 3. The specimen should be centered in the image frame. 4. Photos should be taken as close-up to the specimen as possible, leaving very little gap around the edges. 5. Use Landscape orientation. 6. Use 2 × 3 aspect ratio if possible. This will ensure that the images are not skewed when viewed in the BOLD image library. 7. If desired, a measurement scale may be included in the image to provide a size reference. 8. Use a standardized orientation (Table 3) as this makes it much easier to compare specimens within a project.
3.5. Submitting Images to BOLD
1. Create a folder on your desktop called Images and place in it all the image files (in .jpg format) you would like to upload. 2. To create a list of the files in the Images folder open a terminal window (Start > Run and type “cmd” into the black box that appeared in Windows), navigate to the Images folder (see Note 12), and then run one of the following commands: Windows: dir > list.txt ; MacOS: ls > list.txt.
24
J.J. Wilson
Table 3 Common standardized animal orientations for specimen imaging Orientation
Explanation
Dorsal
The anterior of the specimen should be facing the top of the image frame The specimen should be face-down, with the dorsal aspect of the head visible
Lateral
The anterior of the specimen should be facing the left side of the image frame The specimen should be oriented with the feet towards the bottom of the image
Ventral
The anterior of the specimen should be facing the top of the image frame The specimen should be face-up, with the ventral aspect of the head visible
3. Download a blank image submission template from BOLD (from www.boldsystems.org click Documentation > Data management > Image submission protocol > please click here to download a blank image submission template). Save the file (ImageData.xls) in the Images folder on the desktop. 4. Open ImageData.xls in Microsoft Excel. 5. Next open list.txt (see Note 13) and move the data into the Image File column in ImageData.xls. The cells in this column should contain the name of an image file including the extension (.jpg). 6. In the Original Specimen column type yes for original or no for not original. 7. In the View Metadata column choose one of the standard options from Table 3. 8. In the Caption column type any information you wish to appear by the image on BOLD. 9. Obtain the Sample IDs and Process IDs from BOLD by clicking on Data Spreadsheets under the Downloads menu on the left side of your Project Console (see Note 4). Choose to download the Progress Report, open the file bold.xls and copy the data from the Sample ID and Process ID columns into the appropriate columns in ImageData.xls. 10. Once you have filled in all the mandatory columns (see Note 14), save the file (File > Save). 11. The folder Images needs to be zipped before submission to BOLD. Most modern operating systems have built-in functionality for zipping (see Note 15) so this simply requires rightclicking on the folder and selecting Compress “Images” or something similar. 12. Navigate to your BOLD project’s Project Console and under the Uploads menu on the left click Specimen Images. Browse through to Images.zip and click Submit (see Note 16).
3
3.6. Tissue Subsampling
DNA Barcodes for Insects
25
1. Make sure that your specimens are correctly organized. This protocol is appropriate with pinned specimens stored in a drawer, specimens stored in ethanol tubes, and specimens stored in glassine envelopes. Print off a list of specimens and Specimen IDs so you can double-check they are going into the correct well as you go (e.g., QUI001 goes in A1, QUI002 goes in B1, …). 2. Use clean gloves. 3. Clean workspace and wipe bench with Eliminase. 4. Work on top of a KimWipe. 5. Get a microplate. These instructions are for a 96-well microplate; however, the procedure is essentially the same when working with single tubes. This is going to be the microplate in which the lysis takes place. Make sure the microplate is in the correct orientation, e.g., A1 is in the top right corner of the plate (Fig. 1). 6. It is worth assigning two wells as control wells (Fig. 1) at this point. 7. Put cap-strips on all rows. 8. Turn on the gas slowly and light the Bunsen burner, so that there is a small blue flame. When it is not possible or dangerous to use a Bunsen or gas burner an alternative Eliminase dip protocol can be just as effective (see Note 17) and can be easily adapted for use in the field.
Fig. 1. Diagram detailing how to organize and fill a microplate with tissue, including designation of control wells.
26
J.J. Wilson
9. Remove the first cap-strip (row A) from the microplate, and place it on a KimWipe. Take out the first specimen. 10. Take the forceps and dip them in ethanol (carefully shaking off any excess, not near the flame) and put them in the flame for a few seconds to burn off the ethanol. 11. Remove a small piece of tissue from the specimen (about a 2–3-mm-long piece of insect leg) and place it in the first well (A1) in the lysis plate. 12. It is also possible to use whole specimens in the lysis (17) or whole abdomens, in the case of a combined lysis/genitalia dissection (18) (see Note 18). 13. Continue on to the next specimen and well, making sure to sterilize the forceps between each sample with ethanol and flame. Place each specimen back in its drawer/tube/envelope before moving onto the next one. 14. Put the cap-strip back on as you finish a row and then carefully take the cap-strip off the next row. 15. Return your specimens to the freezer or cabinet. 16. For specimens in ethanol, the subsampled tissue needs to be completely dried before moving on to the next stage. Incubate at 56°C for 30 min, with the cap-strips slightly loosened, to evaporate residual ethanol. 3.7. Tissue Lysis
1. For one plate mix 5 ml of Insect Lysis Buffer and 0.5 ml of Proteinase K, 20 mg/ml in a sterile container (see Note 19 for single tubes). 2. Carefully remove all the cap-strips from the microplate (prepared as above). These instructions are for a microplate of tissue containing 96 wells (Fig. 1) but the procedure is the same for single tubes. 3. Add 50 μl of Lysis Mix to each well using a multichannel pipette. If you are careful not to touch the microplate with the tips, you can use the same tips right across the microplate. 4. Cover microplate with cap-strips. 5. Incubate at 56°C for a minimum of 6 h or overnight to allow digestion. It is not necessary to shake the microplate during incubation. 6. Centrifuge at 1,500 × g for 15 s to remove any condensate from the cap-strips (see Note 20).
3.8. High-Throughput DNA Extraction (Ivanova et al. (19))
1. Retrieve your microplate from the lysis stage above and remove cap-strips. Add 100 μl of Binding Mix to each sample using multichannel pipette. Cover plate with new cap-strips. Shake vigorously for 10–15 s and centrifuge at 1,000 × g for 20 s to remove any sample from the cap-strips.
3
DNA Barcodes for Insects
27
2. Remove cap-strips and transfer the lysate (about 150 μl) from the wells into the wells of a GF plate placed on top of a squarewell block using multichannel pipette. Seal the plate with selfadhering cover. 3. Centrifuge at 5,000 × g for 5 min to bind DNA to the GF membrane. 4. First wash step: Add 180 μl of Protein Wash Buffer to each well of GF plate (see Note 21). Seal with a new cover and centrifuge at 5,000 × g for 2 min. 5. Second wash step: Add 750 μl of Wash Buffer to each well of the GF plate (see Note 22). Seal with a new self-adhering cover and centrifuge at 5,000 × g for 5 min. 6. To avoid incomplete Wash Buffer removal open the sealing cover, close it, and centrifuge the GF plates again for 5 min at 6,000 × g. 7. Remove the self-adhering cover. Place GF plate on the lid of a tip box (see Note 23). Incubate at 56°C for 30 min to evaporate residual ethanol. 8. Position a PALL collar on a new microplate and place the GF plate on top. Dispense 30–60 μl of ddH2O (pre-warmed to 56°C) directly onto the membrane in each well of GF plate and incubate at room temperature for 1 min. Seal plate. 9. Place the assembled plates on a clean square-well block to prevent cracking of the collection plate and centrifuge at 5,000 × g for 5 min to collect the DNA eluate. Remove the GF plate and discard it. 10. Cover DNA microplate with cap-strips or aluminum PCR foil. This is your DNA and it can be temporarily stored at 4°C or at −20°C for long-term storage. Label it well. 3.9. Archival Specimen DNA Extraction (See Note 24)
1. Vortex the sample from lysis stage (about 150 μl) for 15 s (see Note 25). 2. Add 200 μl Buffer AL and vortex it (a white precipitate will most likely form). Add 200 μl EtOH 96% and vortex until it is homogeneous (there should be a lot less white precipitate). 3. Pipette the liquid (set the pipette to 650 μl) into a spin column, make sure to label the cap, and centrifuge it at 6,000 × g for 1 min. 4. Discard the collection tube (see Note 26) and put the spin column into a new tube. Add 500 μl Buffer AW1 and centrifuge at 6,000 × g for 1 min. 5. Discard the collection tube and put the spin column into a new tube. Add 500 μl Buffer AW2 and centrifuge at 20,000 × g for 5 min.
28
J.J. Wilson
6. Discard the collection tube and remove the spin column carefully so as not to let it touch the liquid. Put the spin column into a 1.5 or 1.7-ml microcentrifuge tube (see Note 27) and label it. 7. Add 100 μl Buffer AE (elution buffer) and let it sit at room temperature for 1 min. Centrifuge at 6,000 × g for 1 min. 8. Pipette the DNA out of the bottom of the microcentrifuge tube (about 100 μl), place it back into the spin column, and place the spin column back in the microcentrifuge tube. Centrifuge for an additional 1 min at 6,000 × g. 9. The liquid in the bottom of the microcentrifuge tube is your DNA. DNA can be temporarily stored at 4°C or at −20°C for long-term storage. Label it well. 3.10. Designing PCR Primers
1. Primers should be between 20 and 30 nt in length. 2. Avoid complementarity within and between primers. 3. The GC content should be approximately 50%. 4. Avoid mono- or dinucleotide repetition within primers. 5. The primer should end on a G or a C. 6. Primers should end on the second (or first if necessary) position of a codon. 7. The melting temperatures of primer pairs should be within 5°C of one another. 8. To design COI primers for a particular taxonomic group, try aligning as many COI genes from closely related taxa as possible (try surfing GenBank http://www.ncbi.nlm.nih.gov/ genbank/) for the desired species group. Design primers that are situated in regions that are conserved across all taxa. 9. Remember to target the “barcode” region [i.e., overlapping with the region targeted by the Folmer primers (20) (Table 4)]. 10. Primers can be tailed with M13 tails (Table 4) to improve amplification success (16) and facilitate high-throughput sequencing protocols (21). However, some tailed versions can form strong primer dimers, reducing PCR efficiency (e.g., LepF1 and LepR1).
3.11. PCR Set-Up
1. Prepare PCR master mix either for a single tube or 96-well microplate following the recipe in Table 5 where details on ingredient preparation are also provided. 2. We use LepF1 and LepR1 (Table 4) as the primer pair in a first amplification attempt (see Note 28). 3. Remember as above to wear clean gloves, clean benches with Eliminase and work on top of a KimWipe. Also work in a cold block if possible.
3
DNA Barcodes for Insects
29
Table 4 Common primers used for DNA barcoding insects Name
Sequence
Use with
Direction
References
LCO1490
GGTCAACAAATCATAAAGATATTGG
HCO2198
F
(20)
HCO2198
TAAACTTCAGGGTGACCAAAAAATCA
LCO1490
R
(20)
LepF1
ATTCAACCAATCATAAAGATATTGG
LepR1
F
(24)
LepR1
TAAACTTCTGGATGTCCAAAAAATCA
LepF1
R
(24)
MLepF1
GCTTTCCCACGAATAAATAATA
LepR1
F
(25)
MLepR1
CCTGTTCCAGCTCCATTTTC
LepF1
R
(25)
M13F (-21)
TGTAAAACGACGGCCAGT
F
(26)
M13R (-27)
CAGGAAACAGCTATGAC
R
(26)
Table 5 Basic recipe for PCR Amount of ingredient (ml) Ingredient
Single tube
96-well microplate
Ingredient preparation
10% Trehalose
6.25
625
Dissolve 5 g D-(+)-trehalose dehydrate in 50 ml of total volume of molecular grade ddH2O. Store at −20 C
ddH2O
2
200
Store at 4°C
10× Buffer
1.25
125
10× PCR Buffer for Platinun Taq. Store at −20°C
50 mM MgCl2
0.625
62.5
50 mM MgCl2. Store at −20°C
10 mM dNTPs
0.0625
6.25
10 mM dNTPs mix. Store at −20°C in 100 μl aliquots
10 μM F Primer working solution
0.125
12.5
Add 20 μl of 100 μM primer stock to 180 μl ultrapure ddH2O. Store at −20°C
10 μM R Primer Working Solution
0.125
12.5
Add 20 μl of 100 μM primer stock to 180 μl ultrapure ddH2O. Store at −20°C
Taq (5 U/μl)
0.06
6
Platinum Taq polymerase. Store at −20°C in 50 μl aliquots
Total
10.5
1,050
30
J.J. Wilson
4. Label your mix tube and microplate (see Note 29). 5. Return PCR ingredients to the freezer. 6. Mixes in tubes can be stored at −20°C for up to 3 months (1–3 freeze–thaw cycles do not affect performance). The content of a tube should be mixed by pipetting before use. 7. For microplate (see Note 30): Aliquot 1/8 of total mix volume to each of the tubes in an 8-tube PCR strip (see Note 31). Dispense desired volume (10.5 μl for 12.5 μl reactions) into each well of the 96-well plate using multichannel pipette. 8. Retrieve your DNA plate/tube from the fridge. Add 1–2 μl of DNA extract (see Note 32) to each tube/well (see Note 33). Seal and return DNA. Seal microplate with self-adhering aluminum foil (for PCR) or close the tube. 9. Centrifuge the microplate/tube at 1,000 × g for 20 s and start thermocycling. 3.12. PCR Thermocycle Program
1. Typical conditions for COI amplification include the initial denaturation at 94°C for 1 min; five cycles of 94°C for 30 s, annealing at 45–50°C for 40 s, and extension at 72°C for 1 min; followed by 30–35 cycles of 94°C for 30 s, 51–54°C for 40 s, and 72°C for 1 min; with a final extension at 72°C for 10 min, followed by indefinite hold at 4°C. 2. Centrifuge the microplate/tube at 1,000 × g for 20 s.
3.13. High-Throughput PCR Check
1. Precast agarose gels (E-gels) and docks (E-bases) to use them on are available from Invitrogen™. This system is bufferless, so exposure to Ethidium Bromide is minimized. However, gloves should be worn when handling and loading the gel. 2. The recommended program for 2% Agarose E-gel® 96 gel is EG and the run time is 6 min. 3. Plug the Mother E-Base™ into an electrical outlet. Press and release the pwr/prg (power/program) button on the base to select program EG. 4. Remove gel from the package and remove plastic comb from the gel. 5. Slide gel into the two electrode connections on the Mother or Daughter E-Base™. 6. Load 16 μl of ddH2O into wells with 12-multichannel pipettor. 7. Load 4 μl of sample from your PCR microplate. 8. To begin electrophoresis, press and release the pwr/prg button on the E-Base™. The red light changes to green. 9. At the end of run (signaled with a flashing red light and rapid beeping), press and release the pwr/prg button to stop the beeping.
3
DNA Barcodes for Insects
31
10. Remove gel cassette from the base and capture a digital image of the gel on UV transilluminator equipped with digital camera. 11. As a rough guide, set the filter to two for Ethidium Bromide and the exposure time to 2 s. 12. Analyze the image and align or arrange lanes in the image using the E-editor™ 2.0 software available at: http://tools.invitro gen.com/egels/. 13. White bands indicate product; square slots are the loading wells. 3.14. PCR Check: Important and Old Specimens
1. This protocol requires you make the gel yourself which is more time consuming, but cheaper on materials and produces gels that are more sensitive to product. 2. Gel should be ~5 mm thick: measure the size of the gel tray and determine the volume of liquid you will need to make a 5-mm thick gel (e.g., for a gel tray measuring 10 × 20 cm you would need to start with—20 × 10× 0.5—100 ml of 1× TAE buffer, see Note 34). 3. Make sure that your tray is on a flat surface with tape securely on the sides. 4. Tape the edges of the tray so that it will hold liquid. 5. Measure out the agarose powder onto a piece of weigh paper using a metal spatula. The amount of agarose powder that you need depends on the percentage of the gel. Generally, 2% gels are best (e.g., to make a 2% agarose gel for the tray that takes 100 ml of buffer you need 1 g of agarose powder). 6. Add the agarose to a large beaker or Erlenmeyer flask. Add the 1× TAE buffer to the agarose. 7. Place the flask in the microwave on high power for 30 s. Gently swirl the flask using heat resistant gloves. Heat for another few seconds until the agarose has dissolved. 8. Be very careful because the agarose could burn you. 9. Let the flask sit for 5 min on the lab bench at room temperature to cool. 10. Place a comb of desired well width into the tray. Pour the hot liquid into the middle of the tray, trying to avoid creating bubbles. Push any large bubbles to the edges of the tray using a clean pipette tip. 11. Allow the gel to cool for 30 min. Remove the tape from the edges. 12. Set up the gel rig. 13. The liquid in the base should be 1× TAE buffer (see Note 35). 14. The gel should now be set and you can remove the tape from the edges of the gel tray. 15. Slowly lower the tray into place in the gel rig.
32
J.J. Wilson
16. Add more 1× TAE buffer to the gel rig until the gel is completely submerged. 17. Carefully remove the comb from the gel by rocking it back and forth while pulling up slowly. 18. Cut a piece of parafilm and place flat on the lab bench. For every PCR product you will be adding to the gel place a 1 μl drop of loading dye (see Note 36) onto the parafilm. Take your PCR product and add 6 μl to one of the droplets of dye. Using the same pipette tip draw up the PCR product/dye droplet. 19. With a steady hand, add this to a well in the gel. With multiple samples be sure to keep track of which well is holding which sample. 20. The loading dye makes the product heavy so it will sink to the bottom of the well. You can hold the tip directly above the well without entering the gel. When you add the samples to the wells be careful not to poke a hole in the gel. 21. Always run a DNA ladder in a well beside your samples. A ladder of 100 bp would be appropriate and you should add 1 μl of ladder for every 5 mm of well width. 22. Close the top of the gel rig. 23. Remember to have the black electrode near the wells and the red electrode at the opposite end of the gel. DNA runs towards the positive electrodes. 24. Run the gel with the rig set to 150 V. 25. The loading dye forms two bands that you can see—wait until they have moved close to the bottom of the gel then turn off the rig (approximately 20 min). 26. Carefully transfer the gel from the tray into a plastic container for staining. Pour in diluted GelRed (see Note 37) until it covers the gel. Let it sit with moderate manual mixing for 20 min. 27. Pour the GelRed back into the bottle carefully using the funnel. 28. Capture a digital image of a gel on a UV transilluminator, equipped with digital camera, usually located in your institution’s dark room. 29. As a rough guide set the filter to two for Ethidium Bromide and the exposure time to 2 s. 30. Print and save the image. 3.15. Cycle Sequencing Set-Up
1. When sequencing PCR product, you sequence in both forward and reverse directions. This is done with two different reactions and each reaction mix should include only a forward or reverse primer, not both. For example, for each microplate of PCR product, two microplates must be set up for sequencing, one with the forward primer and one with the reverse primer (see Note 38).
3
DNA Barcodes for Insects
33
Table 6 Basic recipe for cycle sequencing Amount of ingredient (ml) Ingredient
Single tube
Dye terminator mix v3.1
0.25
26
5× ABI sequencing buffer
1.875
195
10% Trehalose
5
520
10 μM Primer working solution
1
104
ddH2O
0.875
Total
9
96-well microplate
91 936
2. Prepare cycle sequencing master mix either for a single tube or 96-well microplate following the recipe in Table 6 and details on ingredient preparation in Table 5. 3. Remember as above to wear clean gloves, clean benches with Eliminase and work on top of a KimWipe. Also work in a cold block if possible. 4. Label your mix tube and any microplates (see Note 29). 5. Return cycle sequencing ingredients to the freezer. 6. Mixes in tubes (or pre-made plates, see Note 39) can be stored at −20°C for up to 3 months (see Note 40). The content of a tube should be mixed by pipetting before use. 7. For microplate : Aliquot 1/8 of total mix volume (115 μl) into each of the tubes in an 8-tube PCR strip (see Note 31). Dispense desired volume (9 μl) into each well of the 96-well plate using multichannel pipette. Changing tips after every row (see Note 41). 8. Retrieve your PCR product from the fridge. Add 1.5 μl of PCR product (see Note 42) to each tube/well (see Note 33). Seal and return PCR plate. Seal cycle sequencing microplate with self-adhering aluminum foil (for PCR) or close the tube. 9. Centrifuge the microplate/tube at 1,000 × g for 20 s and start thermocycling. 3.16. Cycle Sequencing Thermocycle Program
1. Denaturation at 96°C for 2 min. 2. Thirty cycles of 96°C for 30 s, annealing at 55°C for 15 s. 3. Additional extension at 60°C for 4 min. 4. Indefinite hold at 4°C (see Note 43).
34
J.J. Wilson
3.17. Sequencing Clean-Up and Analysis (Ivanova and Grainger (22))
1. Sequencers should be operated by specially trained technicians, many facilities exist and will often require the Cycle Sequencing microplate, a supply of sequencing primer and a plate record (Table 7). 2. Measure dry Sephadex® G-50 with the MultiScreen® Column Loader into the Acroprep™ 96 Filter plate with 0.45 μm GHP membrane. 3. Hydrate the wells with 300 μl of ddH2O. 4. Let the Sephadex® hydrate overnight in the fridge or for 3–4 h at room temperature before use. 5. Put Acroprep™ plate together with MicroAmp® Optical 96-well Reaction Plate and secure with at least two rubber bands. 6. Make sure the two sets weigh the same (adjust weight by using different rubber bands). 7. Centrifuge at 750 × g for 3 min—this is to drain the water from the wells. Discard water from MicroAmp® plates (these plates could be reused for the same procedure without autoclaving). 8. Add the entire volume of the cycle sequencing reaction to the center of Sephadex® columns. 9. Add 25 μl of 0.1 mM EDTA pH 8.0 to each well of the new (or autoclaved) MicroAmp® plate.
Table 7 Example of a plate record for a 3730xl DNA Analyzer (Applied Biosystems) Container name
Plate ID
Description
Container type
LOP Plate1
LOP Plate1
COI-Barcodes
96-Well
AppServer
AppInstance
Well
Sample name
Comment
Results group 1
A01
LOP001-11
LepF1
CC
B01
LOP013-11
LepF1
CC
App type
Owner
Operator Plate sealing Schedule pref
Regular
CCDB
CCDB
Instrument Analysis Protocol 1 Protocol 1 FolA700
3730BDTv3-KBDeNovo_v5.1
FolA700
3730BDTv3-KBDeNovo_v5.1
Septa
1234
3
DNA Barcodes for Insects
35
10. To elute DNA attach MicroAmp® plate to the bottom of the Acroprep™ plate—secure them with tape and with rubber bands. 11. Make sure the sets weigh the same (adjust weight by using different rubber bands). 12. Centrifuge at 750 × g for 3 min. 13. Remove MicroAmp® plate and cover its top with Septa. 14. Place MicroAmp® plate into the black plate base and attach the white plate retainer. 15. Stack assembled plate in 3730xl DNA Analyzer (Applied Biosystems)—do not forget barcode and plate record. 16. Discard Sephadex® from Acroprep™ plate. 17. Using the Plate Manager of the Data Collection software (Applied Biosystems), import the plate record(s) for the plate being run. 18. Begin the run within Run Scheduler. 3.18. Uploading Raw Sequences to BOLD
1. The sequencing outputs a folder of files. The files you are interested in have an extension .ab1, e.g., LOP001-11_F.ab1. These raw files (traces) can be edited into the form we are use to seeing DNA sequences represented in, i.e., a string of letters. However, as editing can be a subjective task, BOLD also requires the raw files (traces) be uploaded as part of a barcode’s collateral data. 2. To add trace files of your new sequences to the appropriate records in BOLD create a folder on your desktop called Traces and place in it all the .ab1 files that you would like to upload. 3. To create the list of files in the Traces folder, open a terminal window (Start > Run and type “cmd” into the black box that appeared in Windows), navigate to the Traces folder (see Note 44), and then run one of the following commands: Windows: dir > list.txt; MacOS: ls > list.txt. 4. Download a blank trace submission template from BOLD (from www.boldsystems.org click Documentation > Data management > Trace submission protocol > please click here to download a blank trace submission template). Save the file (data.xls) in the Traces folder on the desktop. 5. Open data.xls in Microsoft Excel. 6. You can then open list.txt and move the data into the Filename (.ab1) column in data.xls (see Note 45). The cells in this column should contain the name of a trace file including the extension (.ab1) (see Note 46). 7. In the FORWARD PCR PRIMER column enter the registered name of the forward primer used during the PCR. Copy it down through the entire column to the end of your file list.
36
J.J. Wilson
Table 8 Example of file data.xls A
B
C
D
E
F
G
J
Filename (.ab1)
Score file (.phd1)
Forward PCR Primer
Reverse PCR Primer
Sequencing Primer
Read direction
Process ID
Marker
LOP00111_F.ab1
LepF
LepR
LepF
Forward
LOP001-11a
COI-5P
LOP00111_R.ab1
LepF
LepR
LepR
Reverse
LOP001-11b COI-5P
a
Formula typed into this cell is = left(A2,9) Formula in this cell is = left(A3,9)
b
8. In the REVERSE PCR PRIMER column enter the registered name of the reverse primer used during the PCR. Copy it down through the entire column to the end of your file list. 9. In the column SEQUENCING PRIMER enter the registered name of your sequencing primer. It will alternate between forward and reverse. For example, LepF1 should line up with the read direction Forward (Table 8). 10. In the Read Direction column enter Forward or Reverse depending on the direction of the .ab1 files it refers to (see Note 47). 11. In the Process ID column you need to type in the formula “= left (A2, $)” where A2 is the column with your first .ab1 file and $ is the number of characters in the Process ID. For example LOP001-11 has nine characters ($ = 9). Therefore, $ may be more or less depending on the number of letters in the Project Code. 12. Press Control and A to select the entire page. Press Control and C to copy the page, and then go Edit > Paste Special and chose values and press OK. This removes the formulas from your sheet. 13. Save this file under the name data.xls and save it in your folder Traces. 14. Delete list.txt from your Traces folder. 15. The folder Traces needs to be zipped before submission to BOLD. See Subheading 3.5 for details on how to do this. Save as Traces.zip. 16. Navigate to your BOLD project’s Project Console and under the Uploads menu on the left click Trace Files. Browse through to Traces.zip and click Submit (see Note 48).
3
3.19. Sequence Editing
DNA Barcodes for Insects
37
1. Open CodonCode (http://www.codoncode.com) and choose Create a new project and press OK. 2. Go to File > Import > Add Folder > Traces then press Import. 3. To see the files you just imported press ► beside the Unassembled Samples folder. 4. Your .ab1 files should be of the form “LOP001-11_F” where the first part “LOP001-11” refers to the Process ID and the second part “F” refers to the sequence direction, i.e., Forward. 5. Sort files by quality by double-clicking on Quality. Any sequences that are of very poor quality or of short length highlight them and click the trashcan to delete them. 6. Next select the Contig menu and move the cursor over Advanced Assembly. From the options that appear select Assemble in Groups. 7. A window will appear asking if you would like to Define sample name parts? Choose Define names… to bring up another small window. 8. There are two parts to our filenames. The first will be the Process ID and for your purposes the option in the Meaning menu can be left as Clone. Since the Process ID is followed by underscore choose _ (underscore) in the Delimiter menu for Clone. 9. For the second part choose Direction in the Meaning menu. We can ignore the Delimiter for the Direction part because there is no actual delimiter following the direction. 10. Delete all the additional parts that may appear on this window. 11. Next click Preview… to check how aligner is interpreting the sample names. Click Close to exit the preview. 12. Click OK to return to the Assemble in Groups window. 13. In order to assemble our files according to direction you should choose Direction in the Name Part section. Then click Assemble. 14. You should now have two folders, one called F with the forward sequences and one called R with the reverse sequences. 15. Next you need to cut the primers from your sequences. Highlight the R folder and reverse and compliment the sequences using the button with three black arrows on it. 16. Double click the R folder to open it. For the reverse sequences, you need to find the forward primer motif (e.g., LepF1) and delete it from the beginning of the consensus sequence. You will find the primer around 50 nucleotides from the end of the raw sequence. For example, in Fig. 2a, you would need to delete the sequence marked in bold and everything to the left of it.
38
J.J. Wilson
Fig. 2. An example of sequence editing.
17. When you have located the primer, highlight it on the consensus sequence at the bottom of the window and press the Delete key. 18. Next go to the opposite end of the consensus, the far right, and delete the consensus sequence from the point where many Ns appear all the way to its very right-hand edge. For example, in Fig. 2b, you would delete the sequence marked in bold and everything to the right of it. Close the window. 19. Double click the F folder to open it. Go to the far right of the consensus sequence and find the reverse and complement of the reverse primer (e.g., LepR1) at the very end. This means that at the right end of the forward sequences, you will find the complement of the reverse primer backwards (e.g., if the reverse primer is ATGC then you will find GCAT at the end of your forward sequence). This should be around position 690–700 bp on the consensus sequence. For example, in Fig. 2c, you would need to delete the sequence marked in bold and everything to the right of it. 20. When you have located the primer, highlight it on the consensus sequence at the bottom of the window and press the Delete key. 21. Next go to the opposite end of the consensus, the far left, and delete the consensus sequence from the point where many Ns appear all the way to its very left-hand edge. For example, in Fig. 2d, you would delete the sequence marked in bold and everything to the left of it. 22. Dissolve both folders by clicking on the button marker with a red X. 23. Highlight all sequences and press the button marked with a black N. This time in order to assemble our files according to
3
DNA Barcodes for Insects
39
Process ID choose Clone in the Name Part menu. Then click Assemble. 24. Specimens which only sequenced successfully in one direction will have files which remain in the Unassembled Samples folder (see Note 49). 25. Open each folder (contig) by double-clicking and make sure that forward and reverse sequences have the correct orientation, i.e., forward sequence is in black with the arrow pointing to the right and reverse in red with the arrow pointing to the left. If they are backwards, reverse-complement the two files in the folder by closing the window, highlighting the folder and clicking the button with three black arrows. 26. Correct ambiguous positions (“N”s) and gaps (“-”s) in consensus sequences by checking the original trace chromatograms, which are present in the CodonCode project. This is done by double-clicking on the consensus sequence. Always open both trace files (forward and reverse) and compare them. 27. Generally if reads conflict (i.e., different colored peaks appear in the same location on the forward and reverse chromatograms) you can decide which is more reliable based on sequence quality (e.g., less background noise, cleaner peaks, taller peaks). 28. Correct bases in contigs first, and then check the single sequences in the Unassembled samples folder. This is a good idea because not all contigs will be kept, some will be dissolved or deleted. 29. Make sure single sequences are also in the correct orientation before uploading to BOLD. 30. To export the consensus sequences select all the folders using shift click, go File > Export > Consensus sequences…, choose Current selection. Open the Options and check Include gaps in FASTA but uncheck all other options by clicking. Press Export. Save the file to the desktop as sequences.fas (see Note 49). 3.20. Sequence Aligning
1. Open the file sequences.fas in BioEdit (see Note 50). 2. Make sure Mode: is set to Edit using the drop-down menu. 3. Another drop-down menu will become visible to the right of the Edit drop-down. Make sure this is set to Insert. 4. Sequences that have ended up in the FASTA file in the wrong orientation may be corrected by highlighting the sequence name by clicking the cursor on it, clicking the Sequences menu at the top of the screen. Moving the cursor down the dropdown to Nucleic Acid and clicking, Reverse complement. 5. Sequences all need to be 658 bp and aligned to each other before being uploaded to BOLD. This can be done by typing additional Ns at the beginning and end of your sequences in
40
J.J. Wilson
Fig. 3. An example of an unaligned and aligned FASTA file.
the BioEdit Edit mode. Be sure to check across the whole alignment of your sequences that you have added the correct number of Ns. 6. In Fig. 3 featuring a 50-bp barcode for simplification, LOP001-11 is of full length. LOP002-11 needs 6 Ns adding to the left side of the sequence to become aligned, while LOP003-11 needs 4 Ns adding to the right side of the sequence to be 50 bps long. LOP005-11_F needs to be reverse complemented to be in the same orientation as the other sequences. 7. Sequences which were not part of a consensus (i.e., when one direction failed but the single sequence is of sufficient length and quality for submission to BOLD) may appear in the FASTA still tagged with the direction. This needs to be deleted, e.g., the sequence named LOP005-11_F should be renamed as LOP005-11 (Fig. 3). 8. If you are having trouble with the alignment, a good quality (i.e., 658[0n]) sequence can be downloaded from BOLD and imported into your BioEdit file as a guide, e.g., MHAHC824-05 (Fig. 3). Be sure to delete this sequence before saving the file. 9. Save the file (File > Save). 3.21. Sequence Upload and Publication
1. Open the file sequences.fas in a text editor. This can usually be done by double-clicking on the file icon on the desktop. 2. Under the Edit menu, click Select All then under Edit, click Copy. 3. Navigate to your BOLD project’s Project Console and under the Uploads menu on the left click Sequences. 4. Right click on the box Paste sequences in fasta format: and click Paste. 5. Select the Markers: as COI-5P and select or register your Run site.
3
DNA Barcodes for Insects
41
6. Click Submit. 7. Once you are happy with all your data make your project public on BOLD by adding Public as a user (see Subheading 3.1).
4. Notes 1. The user who creates the project becomes the Project Manager. The Project Manager is the only user who can add new users. The Project Manager can be changed by contacting the BOLD team at
[email protected]. 2. We recommend avoiding the letters A, C, G, T, U, R, Y, M, K, W, S, B, D, H, V, and N. These are IUPAC nucleotide codes including the ambiguity codes which will appear in your sequences when a sequence-editing program is unable to make a base call. If you use these letters you may have difficulties later on during sequence editing and manipulation. 3. BOLD can store data for other regions besides COI-5P. The extracted DNA obtained by following these methods can be stored in a freezer and subsequently used as a template for other gene regions. For information about amplification of other regions frequently sequenced for insects see refs. 16 and 21. 4. To return to the Console of your new project simply log into BOLD and click on the project name in your list of projects. 5. If you will be creating multiple projects it may be worth launching a Campaign. To start a Campaign contact
[email protected]. Your Campaign can then be selected on the Create New Project page. 6. We’ve had incidences where specimens that should have sequenced successfully failed unexpectedly. After consultation with the collector we were able to attribute the failure to storage conditions. 7. DNA leaks into the storage fluid (23). For large specimens which may be damaged by ethanol, whole specimen can be stored dry and legs can be removed into ethanol. 8. If you enter Sample ID on any other sheet it will overwrite important macros. When copying and pasting information into the spreadsheet, use Paste Special and select Values to avoid overwriting formulas. Avoid using the Project Code in Sample IDs if possible, because the Project Code will form the basis of Process IDs. We advise Sample ID = Field ID where possible. Also keep in mind that each project in BOLD can have a maximum of 999 samples. Alternatively Sample ID can refer directly to the catalog number of a museum collection voucher.
42
J.J. Wilson
9. Museum abbreviations should follow standard registers for biorepositories, e.g., http://www.biorepositories.org/. 10. In the Sex column use M for male, F for female, or H for hermaphrodite. Reproduction refers to the type of lifecycle (use S for sexual, A for asexual, or P for parthenogenic). For Life Stage use either I for immature or A for adult. Extra info will show up on taxonomic identification trees generated by BOLD. Notes will not be seen on the tree, but they will appear on BOLD on the Specimen Page. 11. Latitude (North–South) and Longitude (East–West) must be in decimal format. A useful website for this conversion is http://www.calculatorcat.com/latitude_longitude.phtml . Elevation must be in meters, but it is not necessary to put “m” for meters beside the number. 12. To navigate to the folder, type cd desktop, press enter, type cd Images, press enter. 13. In Excel, go to File > Open… (change Enable: to All files) navigate to Desktop > Images > list.txt and click Open. A window will open called Text Import Wizard. Select Fixed width and click Next >. Scroll down until you can see your first file named .jpg and move the arrowed line beside it over to the right a little so that it touches the file name click Next > and Finish. 14. For information on the optional columns: Measurement, Copyright etc, please refer to the BOLD handbook (http://www. boldsystems.org > Documentation > Image Submission Protocol). 15. If your machine does not have built in zipping functionality try downloading a free program (WinZip: http://www.winzip. com; WinRar: http://www.rarsoft.com; or MacZipIt: http:// www.maczipit.com). 16. BOLD will give a message advising if the submission was successful. If problems have occurred, refer to the Image Submission—Tips and Troubleshooting section of the BOLD handbook (http://www.boldsystems.org > Documentation > Image Submission Protocol). 17. Eliminase protocol: (a) Get four clean jars. Label the first one “Eliminase” and add about 10 cm3 of Eliminase. (b) Label the next three jars “Wash 1,” “Wash 2,” and “Wash 3” and fill them all with 10 ml or more of ddH2O. (c) Dip your forceps into the Eliminase, shake them a bit, remove them and wipe off excess liquid using a clean KimWipe. (d) Dip your forceps and shake them slightly in Wash 1, then Wash 2, and finally in Wash 3. Dry them off using a clean KimWipe.
3
DNA Barcodes for Insects
43
18. In these cases the tissue must be retrieved from the lysis buffer prior to undertaking the DNA extraction. 19. For single tube lysis add 50 μl of lysis buffer and 5 μl of ProK. 20. This is high speed so place the lysis plate on a clean square-well block to prevent cracking. 21. You will need to put about 18 ml of Wash Buffer in the reservoir for a full plate. 22. You will need to put about 75 ml of Wash Buffer in the reservoir for a full plate. 23. Square-well blocks can be washed with ELIMINase® (or with any other DNA removing detergent), autoclaved, and reused. 24. The Qiagen kits can be stored at room temperature (15–25°C). Some ingredients may need storage at lower temperatures. 25. The Qiagen protocol suggests that you use a mortar and pestle to crush part of the insect into a powder, but we have found that this is unnecessary. 26. The following buffers contain toxic components that must not be thrown in the regular garbage: AL, ATL, AW1. Be sure that all liquid waste from DNA extraction is kept in a well-labelled glass jar. 27. This tube is not from the kit. It is one with a cap (see Subheading 2.5). 28. We find this has a very high success rate (99%) with recently collected specimens (<5 years old) and most likely these will be the only primers you will need. If you do not have success with these primers (see Subheading 3.14) the next strategy we recommend to try would be two amplification reactions: (1) using primer combination MLepF1 and LepR1 and (2) using primer combination MLepR1 and LepF1 (see ref. 25 and Table 4). These products are then sequenced in a single direction using LepR1 for the first product and LepF1 for the second product. The sequences can then be combined into a single contig close in length to the target 658-bp full-length barcode. If you have a very old and very important specimen but the above strategies have been unsuccessful, a further (time-consuming and expensive) option is amplification using “micro” primers, consult (8) and (9). 29. Put a star in the corner A1. 30. If you plan to fill several microplates include extra volume to allow for pipetting mistakes and dead volume in the digital multichannel pipettor (e.g., for making ten microplates with 12.5 μl reactions each, include about 40 extra reactions). 31. Once you become proficient can use the final row of the microplate instead of 8-tube PCR strip.
44
J.J. Wilson
32. Can use less DNA template if you find it is too concentrated. 33. Here it is essential to ensure your plates are in the same orientation. 34. TAE buffer is stored at 50× TAE. You will need to add ddH2O to the concentrated TAE buffer to get a 1× dilution, e.g., add 5 ml of 50× TAE to 250 ml of ddH2O to make 1× TAE. 35. Buffer in the rig is good for ~15 runs. 36. We use 6× loading dye, so this means 1 μl of dye for 6 μl of product. 37. GelRed should be diluted 10,000:1 with ddH2O. This means that you need to add 25 μl of GelRed to 250 ml water. GelRed can be used several times. Make sure to keep track of the number of uses. 38. An exception is when using MLepF1 and MLepR1 (see Note 29). 39. Addition of trehalose makes possible the freezing of aliquoted sequencing mixes. Currently, the Canadian Centre for DNA Barcoding uses a batch strategy for making sequencing plates. Mixes are aliquoted directly into 96-well plates, using a Biomek® FX robot; plates are covered with PCR film and stored at −20°C for up to 3 months. 40. The mixes are light sensitive so should be stored in a lightproof box or wrapped in aluminum foil. 41. If you are a skillful pipettor and set up your forward and reverse sequencing plates at the same time you can use the same tips to put PCR product into each plate. You will only need to use one box of tips if you only touch the very edge of the wells near the top, and do not contaminate the tips with primer in the process. 42. The volume of PCR product added to the sequencing reaction can be varied between 0.5 and 1.2 μl depending on the strength of the gel band. 43. The annealing temperature can be varied according to the primer specificity, but 55°C works well for most COI sequencing reactions. 44. To navigate to the folder, type cd desktop, press enter, type cd Traces, press enter. 45. In Excel, go to File > Open… (change Enable: to All files) navigate to Desktop > Traces > list.txt and click Open. A window called Text Import Wizard will open. Select Fixed width and click Next >. Scroll down until you can see your first file named .ab1 and move the arrowed line beside it over to the right a little so that it touches the file name click Next > and Finish.
3
DNA Barcodes for Insects
45
46. Score files are not compulsory and the column can be left blank; however, for more information on these and how you can include them see the BOLD handbook. 47. Usually the files names should be of the form (e.g., LOP00111_F.ab1) Process ID (LOP001-11) and direction (F), which should make this easier. 48. BOLD will give a message advising if the submission was successful. If problems have occurred, refer to the Trace Submission— Tips and Troubleshooting section of the BOLD handbook ( http://www.boldsystems.org > Documentation > Trace Submission Protocol). 49. To export the single direction sequences, go File > Export > Samples…, choose Current selection. Open the Options and select Include gaps in FASTA but deselect all other options. Press Export. Save the file to the desktop as sequences.fas. 50. BioEdit can be downloaded for free from http://www.mbio. ncsu.edu/bioedit/bioedit.html.
Acknowledgment Heather Braid compiled the “Barcoding in the Hanner Lab” protocols (http://barcoding.wikia.com/wiki/Barcoding_in_the_ Hanner_Lab_Wiki), which greatly aided with the structuring and content of this chapter, and also provided Fig. 1. References 1. Floyd R, Wilson JJ, Hebert PDN (2009) DNA barcodes and insect biodiversity. In: Footit RG, Adler PH (eds) Insect biodiversity: science and society. Blackwell Publishing, Oxford, pp 417–431 2. Ratnasingham S, Hebert PDN (2007) BOLD: The barcode of life data system (www.barcod inglife.org). Mol Ecol Notes 7:355–364 3. Janzen DH, Hajibabaei M, Burns JM et al (2005) Wedding biodiversity inventory of a large and complex Lepidoptera fauna with DNA barcoding. Phil Trans R Soc Lond B 360:1835–1845 4. Janzen DH, Hallwachs W, Blandin P, Burns JM, Cadiou J-M, Chacon I et al (2009) Integration of DNA barcoding into an ongoing inventory of complex tropical biodiversity. Mol Ecol Res 9:1–25 5. Xhou X, Robinson JL, Geraci CJ, Parker CR et al (2011) Accelerated construction of a
6. 7.
8.
9.
regional DNA-barcode reference library: caddisflies (Trichoptera) in the Great Smoky Mountains National Park. J Nor Amer Benth Soc 30:131–162 iBOL (2010) Barcoding blitz targets Australian Lepidoptera. Barcode Bulletin 1(4):5 Vaglia T, Haxaire J, Kitching IJ et al (2008) Morphology and DNA barcoding reveal three cryptic species within the Xylophanes neoptolemus and loelia species-group (Lepidoptera: Sphingidae). Zootaxa 1923:18–36 Hausmann A, Hebert PDN, Mitchell A et al (2009) Revision of the Australian Oenochroma vinaria Guenée, 1858 species-complex (Lepidoptera, Geometridae, Oenochrominae): DNA barcoding reveals cryptic diversity and assesses status of type specimen without dissection. Zootaxa 2239:1–21 Wilson JJ, Landry JF, Janzen DH et al (2010) Identity of the ailanthus webworm moth, a
46
10.
11.
12.
13.
14.
15.
16.
17.
J.J. Wilson complex of two species: evidence from DNA barcoding, morphology and ecology. Zookeys 46: 41–60 Dinca˘ V, Zakharov EV, Hebert PDN, Vila R (2011) Complete DNA barcode reference library for a country’s butterfly fauna reveals high performance for temperate Europe. Proc R Soc Lond B 278:347–355 Virgilio M, Backeljau T, Nevado B, de Meyer M (2010) Comparative performances of DNA barcoding across insect orders. BMC Bioinformatics 11:4567–4573 Vogler AP (2006) Will DNA barcoding advance efforts to conserve biodiversity more efficiently than traditional taxonomic methods? Front Ecol Environ 4:270–272 Pons J, Barraclough TG, Gomez-Zurita J et al (2006) Sequence-based species delimitation for the DNA taxonomy of undescribed insects. Sys Biol 55:595–609 Winter WD Jr (2000) Basic techniques for observing and studying moths & butterflies (Memoir No 5). The Lepidopterist’s Society, Cambridge Hanner R. (2005) Proposed standards for BARCODE records in INSDC (BRIs). http:// barcoding.si.edu/PDF/DWG_data_standards-Final.pdf Regier JC (2008) Protocols, concepts, and reagents for preparing DNA sequencing templates. Version 12/4/08. http://www.umbi. umd.edu/users/jcrlab/PCR_primers.pdf Porco D, Rougerie R, Deharveng L, Hebert PDN (2010) Coupling non-destructive DNA extraction and voucher retrieval for small softbodied Arthropods in a high-throughput context: the example of Collembola. Mol Ecol Res 10: 942–945
18. Knölke S, Erlacher S, Hausmann A et al (2005) A procedure for combined genitalia extraction and DNA extraction in Lepidoptera. Insect Syst Evol 35:401–409 19. Ivanova NV, deWaard J, Hebert PDN (2006) An inexpensive, automation-friendly protocol for recovering high-quality DNA. Mol Ecol Notes 6:998–1002 20. Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol Mar Biol Biotech 3: 294–299 21. Wilson JJ (2010) Assessing the value of DNA barcodes and other priority gene regions for molecular phylogenetics of Lepidoptera. PLoS ONE 5:e10525 22. Ivanova N, and Grainger C (2006) Protocols: Sequencing. Canadian Centre for DNA Barcoding CCDB Protocols. http://www.dnabarcoding.ca 23. Shokralla S, Singer GAC, Hajibabaei M (2010) Direct PCR amplification and sequencing of specimens’ DNA from preservative ethanol. BioTechniques 48:232–234 24. Hebert PDN, Penton EH, Burns J, Janzen DH, Hallwachs W (2004) Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly, Astraptes fulgerator. Proc Nat Acad Sci USA 101: 14812–14817 25. Hajibabaei M, Janzen DH, Burns JM, Hallwachs W, Hebert PDN (2006) DNA barcodes distinguish species of tropical Lepidoptera. Proc Nat Acad Sci USA 103:968–971 26. Messing J (1983) New M13 vectors for cloning. Meth Enzymol 101:20–78
Chapter 4 DNA Barcoding Methods for Invertebrates Nathaniel Evans and Gustav Paulay Abstract Invertebrates comprise approximately 34 phyla, while vertebrates represent one subphylum and insects a (very large) class. Thus, the clades excepting vertebrates and insects encompass almost all of animal diversity. Consequently, the barcoding challenge in invertebrates is that of barcoding animals in general. While standard extraction, cleaning, PCR methods, and universal primers work for many taxa, taxon-specific challenges arise because of the shear genetic and biochemical diversity present across the kingdom, and because problems arising as a result of this diversity, and solutions to them, are still poorly characterized for many metazoan clades. The objective of this chapter is to emphasize general approaches, and give practical advice for overcoming the diverse challenges that may be encountered across animal taxa, but we stop short of providing an exhaustive inventory. Rather, we encourage researchers, especially those working on poorly studied taxa, to carefully consider methodological issues presented below, when standard approaches perform poorly. Key words: DNA barcoding, Invertebrates, CO1, Cytochrome c oxidase subunit I
1. Introduction DNA barcoding as a tool for species level identification was developed in zoology and remains most facile for animals (1), reflecting the unusually rapid rate of mitochondrial DNA (mtDNA) sequence evolution that characterizes most Metazoa (2). As a result, relatively short mtDNA sequences, as generated from single, routine PCR reactions, are sufficient for species delineation and identification in many taxa. To date, efforts have focused primarily on the initial ~650 base pair “Folmer” region of cytochrome c oxidase subunit I (COI) which typically accumulates several percentage differences between related animal species (1). In contrast, rates of mtDNA sequence evolution in other eukaryotes: protists, plants, and fungi are much slower (2). Barcoding approaches in these
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_4, © Springer Science+Business Media, LLC 2012
47
48
N. Evans and G. Paulay
groups require alternate, longer, or multiple gene regions, or are limited to supraspecific levels of differentiation. Thus, COI barcoding remains largely a zoological proposition. “Other invertebrates” encompasses almost all animal diversity. That is, invertebrates comprise approximately 34 phyla, while vertebrates represent one subphylum and insects a (very large) class. Consequently, the barcoding challenge in invertebrates is that of barcoding animals in general. While standard extraction, cleaning, PCR methods, and universal primers work for many taxa, taxonspecific challenges arise because of the shear genetic and biochemical diversity present across the kingdom, and because problems arising as a result of this diversity, and solutions to them, are still poorly characterized for many metazoan clades. The objective of this chapter is to emphasize general approaches, and give practical advice for overcoming the diverse challenges that may be encountered across animal taxa, but we stop short of providing an exhaustive inventory. Rather, we encourage researchers, especially those working on poorly studied taxa, to carefully consider methodological issues presented below, when standard approaches perform poorly. Challenges for barcoding across the Metazoa can be grouped into two broad classes: intrinsic, genetic issues of species delimitation, evolutionary rate, and behavior of potential markers, and extrinsic, methodological issues from sample processing to sequence generation. 1.1. Intrinsic Challenges
Species delineation, by any technique, requires that differences in the character(s) used for delineation to be greater between members of different species than among members of the same species. For markers to be useful at the species-level, they need to accumulate measurable divergence over the time frame of speciation. Markers that show limited interspecific divergence or comparable levels of intraspecific variation, do not perform well for species delineation. Thus, the general utility of any given molecular marker for specieslevel “barcoding” requires a fairly rapid, but broadly conserved rate of sequence evolution. Yet, the mode and tempo of speciation and molecular evolution are certainly not constrained and thus any “universal” marker may fail for those clades whose evolutionary dynamics deviate from that expected. This has clearly been born out for DNA barcoding in Metazoa. The relatively high, but comparatively conserved rate of sequence evolution of mtDNA has made CO1 the marker of choice in animals. Yet it has been repeatedly demonstrated that mtDNA sequence divergence is generally too slow for species delineation in most lower, non-bilaterian, animals, much as it is in plants or protists (see below, and papers in this volume). Conversely, unusually rapid sequence evolution has also been detected within a number of metazoans including pulmonate land snails (3), minute animals such as many meiofauna (4) and some parasitic taxa (e.g., (5)).
4 1.1.1. Species Delineation
DNA Barcoding Methods for Invertebrates
49
Species delineation involves characterization of diagnostic differences among species. Characters used for species delineation can be broadly grouped into three classes: morphological, genetic, and isolating. While morphological and genetic characters can be assessed directly, isolation is usually inferred from other sources of evidence and is rarely directly tested through mating/fertilization experiments. Rate variation among characters has important consequences for species delineation; these consequences are especially important to consider when characters are of different classes (Table 1). When the rate of morphological differentiation is slow relative to genetic differentiation and isolation, cryptic species result; a potentially common situation in some taxa (6). Genetic taxonomy has the most to contribute to species delineation in these cases. When the rate of genetic differentiation is slow relative to morphological divergence and isolation, as for example in some African rift lake cichlid species flocks (7), simple barcoding is less powerful than morphological species delineation. When isolation evolves more rapidly than divergence in DNA sequences or morphology, for example with polyploid speciation (8) or when selection acts directly on isolating mechanisms (e.g., on gamete recognition proteins; (9)), species recognition becomes especially challenging. Finally, when isolation evolves slowly relative to the accumulation of morphological or genetic variation within species, intraspecific polymorphism will result. Variation among rates of morphological divergence, genetic divergence, and emergence of isolating mechanisms implies that a priori criteria for species recognition, such as predefined levels of morphological character state changes or genetic thresholds, are vulnerable to error, especially when applied across broad taxonomic groups (10).
Table 1 Consequences of variation in character evolution Character type Morphological
Genetic
Isolating
Consequence
Fast
Slow
Slow
Polymorphism
Fast
Fast
Slow
Polymorphism
Fast
Slow
Fast
Morphospecies, poorly substantiated by barcoding
Slow
Fast
Fast
Cryptic species, barcoding powerful
Slow
Fast
Slow
Genetic polymorphism
Slow
Slow
Fast
Species recognition challenging
50
N. Evans and G. Paulay
Because speciation in animals is predominantly allopatric (11), initial differentiation usually takes place in isolation. Sympatry is usually secondary and can follow only when isolation is sufficient to prevent fusion of lineages. As a result co-occurring taxa are typically well differentiated, making species recognition usually straightforward in sympatry. Reciprocal monophyly in two or more independent genetic or morphological characters implies reproductive isolation in a sympatric setting, thus such populations meet the criteria of the Biological Species Concept (12). Note that as all mitochondrial genes are on one locus, mitochondrial markers in themselves are insufficient to test for reproductive isolation, although a pattern of deep divergence between clades is suggestive of it. In an allopatric setting, reproductive isolation cannot be readily assessed, thus less stringent and subjective species concepts are usually applied. Reciprocal monophyly in two or more independent characters (genetic, morphological, or geographic) can be used to define Evolutionary Significant Units (ESUs) or phylogenetic species (cf. (13)). While threshold-based definitions of species are problematic, it is useful to compare the average levels of divergence between sympatric species and allopatric ESUs in any taxon. ESUs that are isolated by at least as deep sequence divergence as sympatric species in the same taxon make good species hypotheses. 1.1.2. Marker Choice
The ideal marker is easily amplified, exists in a single form (single copy, or multiple but identical copies) per cell or organism, and exhibits sufficient sequence variation to distinguish species. The “Folmer” region at the 5¢ end of COI (14) was proposed as an ideal DNA barcode for these reasons (1), serves well for the majority of animals, and is the most widely used gene region for barcoding. The rapidly growing numbers of COI barcode sequences across the Metazoa, partly as a consequence of large-scale efforts (CBOL, IBOL, BOLD, etc.), is leading to the availability of a large and growing library of barcodes from identified, vouchered specimens for comparison. For most taxa, the COI barcoding region remains an ideal choice, and is the focus of this chapter. Nevertheless, additional or alternative molecular markers may be considered when CO1 sequences are insufficient to distinguish recognized species, when they conflict with interpretations of morphological or isolating characters, or when amplification is challenging (see below). Because of slow rates of sequence evolution, the Folmer region of COI tends to be insufficient for species delineation across much of the Porifera, Cnidaria, Ctenophora, and Placozoa (15–21). Other single gene regions have been explored with varied success, but effective DNA taxonomy in these basal phyla is moving toward multiple gene region approaches (e.g., (17, 22)). Nevertheless, certain clades among non-bilaterians do exhibit rapid rates of mtDNA evolution and can be resolved at the species level using single mitochondrial gene regions (e.g., (23)).
4
1.2. Methodological Challenges
DNA Barcoding Methods for Invertebrates
51
Methodological challenges in barcoding are those of isolating, amplifying, and sequencing DNA in general. These are briefly outlined here and are dealt with below under their respective protocols. First, tissues need to be preserved so that DNA does not degrade. There is substantial variation among animals in how rapidly tissue and DNA breaks down and also how easily liquid fixatives penetrate; this influences specimen handling and preservation. Second, depending on the DNA extraction protocol used, various other metabolites may coextract with the DNA, and some of these may cause PCR inhibition. Inhibitors tend to be clade specific, are inconsequential for many large taxonomic groups, but are important in others. Inhibition can be addressed by changing extraction protocols to minimize coextraction of inhibitor, cleaning the DNA extract to remove the inhibitor, or diluting the extract to a level where the effect of the inhibitor is lost, but sufficient DNA remains for amplification. Third, primers may not amplify the marker or amplify unintended additional markers. The former results when the sequence at the annealing site has evolved too far from that of the primer used, and can be addressed by making PCR conditions less specific (using lower temperature for annealing, higher concentration of MgCl2, or degenerate primers), by designing better (i.e., taxon-specific) primers, or by changing to alternate markers. The latter can be addressed by more stringent PCR conditions in some cases, but becomes challenging when it is the result of gene duplication, either as nuclear copies of mitochondrial genes (NUMTs), separate male, and female mitochondrial lines, or heteroplasmy. Such multiple copies pose challenges as well as opportunities in some taxa and are addressed in more detail below.
2. Materials Researchers should have access to standard field and molecular laboratory supplies and equipment. To prevent contamination in either setting, researchers should have and use materials that enable sterile techniques. This includes latex or nitrile gloves, filter pipette tips, kimwipes, bunsen burners to flame reusable dissecting instruments, and diluted bleach. 2.1. Tissue Subsample Preservation
1. 95–100% ethyl alcohol (EtOH) or DMSO–EDTA–salt buffer: 20% DMSO, 0.25 M sodium-EDTA, and NaCl to saturation, pH 7.5. 2. 2.0 ml Screw Cap Microtubes or 96-well plates with caps.
2.2. Preparation of DNA
1. DNAzol® genomic DNA isolating reagent (Molecular Research Center, Inc).
52
N. Evans and G. Paulay
2. Proteinase K solution, 20 mg/ml: Combine and mix 100 mg Proteinase K with 5 ml sterile dH2O (or 2.5 ml dH2O and 2.5 ml glycerin). Store aliquots at −20°C. 3. Sterile polypropylene pellet pestles. 4. 100% and 75% EtOH (preferably ice cold). 5. 1.7 ml polypropylene microcentrifuge tubes. 6. TE buffer: 10 mM Tris–HCl pH 8.0, 1 mM EDTA (ethylenediaminetetraacetic acid). 2.3. DNA Quantification (Optional; See Subheading 3.4)
1. Spectrophotometer (e.g., NanoDrop™ variety by Thermo Fisher Scientific Inc.).
2.4. PCR Amplifications
1. PCR tubes or 96-well PCR plates.
2. Mass DNA Ladder.
2. Primers (10 μM): see tables 6 and 7 for a list universal and taxon-specific primers for the “Folmer” region of CO1, and a few alternative markers. 3. Deoxynucleotide (dNTP) solution mix at a concentration of 10 mM for each of the four nucleotides. 4. A variety of Taq polymerase enzymes and PCR reagents are available separately or in kits and most will be equally suitable (reagents are listed in Table 3). Be aware that concentrations and properties may vary between different varieties of PCR reagents. Also, variation can exist in the efficacy of different Taq polymerases. Two that consistently work well are Taq DNA Polymerase (New England Biolabs), and Platinum Taq DNA Polymerase (Invitrogen). The latter is a heat activated, “hot-start” polymerase and, though more expensive, is more tolerant of reaction assembly at room temperature. 5. 15% Trehalose, (~0.4 M): Dissolve 7.5 g trehalose dihydrate in 50-ml sterile dH2O. Heating may be needed. Store aliquots at −20°C. Final concentrations in a PCR cocktail should be at approximately 0.2 M (24). 6. Bovine serum albumin (BSA) 2.5 μg/μl solution: Combine and mix 1 ml Ultrapure BSA (at 50 mg⁄ml; Invitrogen) with 20 ml sterile dH2O. Store aliquots at −20°C. Use at 0.2–0.4 μg/μl final concentration in a PCR cocktail (25).
3. Methods 3.1. Field Methods
1. Here, we consider mostly specimen processing and data tracking; more detailed treatment of biodiversity survey methods are provided by Templado et al. and Eymann et al. (26, 27).
4
DNA Barcoding Methods for Invertebrates
53
The Consortium for the Barcode of Life (CBOL) have defined Barcode Data Standards that include required as well as recommended data fields for barcode records. Required categories include a unique identifier (usually a collection catalog number) for the voucher specimen in a biorepository, an identification, and a country code. Recommended categories include latitude, longitude, collector, and collection date. As deposition of a voucher in a biorepository is required, additional data should also be collected for each specimen, to meet basic data standards of collection databases. These include depth/elevation, a hierarchy of location fields (e.g., state, county, specific locality), habitat/microhabitat, host association if any, notes, fixative, and preservative. Numerous other data fields are also used by various collections or for specific taxa. Finally, it is important to note the existence and unique identifiers of photo, tissue, or extraction samples taken from the specimen. 2. Recording these data and specimen tracking are best accomplished in a series of data tables in a Field Information Management System (FIMS). FIMS can be set up in spreadsheet or relational database formats; specifically designed FIMS databases have been created for a number of field biodiversity/ barcoding projects (e.g., CReefs, Moorea Biocode). Spatiotemporal, habitat, and collector data are typically kept in the station table, and referred to in the specimen table through a unique station number. Specific notes about the specimen, such as microhabitat (unless this is parsed into the station table), host, fixation procedures, and reference to photos and tissue/ extractions subsamples, are kept in a specimen table. Tables on photos and subsamples complete a basic FIMS (Table 2).
Table 2 Main fields/field types needed in a FIMS Station table
Specimen table
Photo table
Subsample table
Station #
Field #
Photo #
Subsample #
Locality
Identification
Field #
Field #
Habitat
Station #
Photographer
Tissue type
Elevation/depth
Microhabitat
Date
Plate/well #
Coordinates
Fixative
Notes
Date
Photo taken?
Collectors
Tissue taken?
Notes
Notes
54
N. Evans and G. Paulay
Note that each table has a unique identifier (for station, specimen, photo, and subsample) for sample tracking; additional unique identifiers (e.g., collection catalog number assigned to voucher specimen) may be added and linked to these. 3. Samples collected from a station are handled in the field appropriately for the method and taxa involved, so that specimens remain in good condition for preparation of vouchers, tissue subsamples, and photographs. Specimens may be fixed immediately in the field after collection, or transported live to the field lab for further processing. Bulk field fixation is more time-expedient, but prevents immediate taking and differential preservation of tissue subsamples, photographing live/fresh animals, or tracking specimen-level information. Live transport to the field lab allows specimens to be handled individually for photography, subsampling, and specific fixation protocols, but is more time consuming. Combining these approaches can be useful. Specimens that die and deteriorate rapidly (e.g., sponges) can be photographed and subsampled in the field, while specimens where lab photography and fixation is especially useful (e.g., opisthobranch mollusks, flatworms) can be transported live to the field lab for further processing. In contrast, bulk samples too large to process in the field and field lab (e.g., plankton sample) can be fixed immediately after collection. Some bulk collection methods include collection of the substratum (e.g., leaf litter, marine sediments, reef rock) from which specimens are extracted in the field lab. The use of chemicals that damage DNA (e.g., formalin) for extracting specimens should be avoided. 4. Live samples taken to the field lab should be sorted to morphospecies by people sufficiently knowledgeable about the taxon to make this relatively accurate. Three to five specimens are useful to aim for in taxa where morphospecies accurately reflect genetic species, while more specimens are useful when cryptic complexes are expected. It is also informative to take samples from across the geographic range of species when possible, as geographically differentiated cryptic complexes are common. For animals that are too small to provide useable morphological and genetic samples from the same individual, two sets of specimens can be prepared for morphological vouchers and DNA sequencing. Detection of only one species in each set by subsequent analysis lends confidence that specimens pertain to the same species. Retention of the extracted specimen as specimen voucher is possible in microfauna where identification characters are cuticle-based (e.g., most arthropods), by gently digesting soft tissues for DNA (e.g., with proteinase K), and preserving the remaining cuticle “shell” for voucher. 5. Voucher preparation usually involves relaxing, killing, fixing, and preserving the specimen based on taxon-specific protocols (26).
4
DNA Barcoding Methods for Invertebrates
55
Some taxa require fixation for morphological study in fixatives (e.g., formalin, glutaraldehyde) that are incompatible with DNA preservation; taking subsamples for genetic analysis prior to fixation is essential for these. Subsampling of ethanol-fixed specimens can be delayed until return to the home lab for field-expediency. 6. Photodocumentation can provide online access to the voucher and captures information often lost in fixed specimens. Photos should capture characters that allow identification. They can be of the whole organism, close-ups of diagnostic features, or various preparations. Thus, photo efforts should be guided by someone knowledgeable about the taxon. For some organisms (e.g., many sessile invertebrates), in situ photos can be especially informative, as even collection will disrupt their appearance. Photographs of fresh and relaxed specimens record living color and morphological features that may be altered by preservation, and can facilitate taxonomy as much as genetic data. In some taxa (e.g., in decapod crustaceans, opisthobranchs), the most closely related species differ mainly in color pattern rather than in structural morphology. In contrast, images of preserved or prepared specimens are as or more useful than images of live specimens for other taxa (e.g., SEMs for bryozoans, sections for helminths). 3.2. Subsample Preservation
1. Subsamples should be taken from body parts that are not important for morphological identification, have a low probability of contamination (from symbionts, environment, food, etc.), and are rich in mitochondria (to minimize potential NUMT coamplification). 1–3 mm3 of tissue provides an ideal subsample, but much smaller amounts are also sufficient. Subsampling should be done on a clean surface, and the tools used flamed to prevent contamination. 2. Subsamples can be preserved by freezing, drying, EtOH, DMSO–EDTA–salt buffer, proprietary buffers, or placed directly into extraction buffers (see Chapter 14). Use at least 5× as much preservative as the volume of the tissue sample. Placing subsamples directly into extraction buffer followed by DNA extraction is efficient and provides high quality genomic DNA, but uses up the subsample preventing future alternative extraction procedures. Ethanol provides an ideal preservative that often doubles as a preservative for the morphological voucher. Ethanol concentrations between 70 and 100% all work well, but for voucher fixation only 70–75% should be used as higher concentrations can make specimens dehydrated, brittle, and can lead to preservation artifacts. DMSO–EDTA–salt buffer is easier to transport and can leave higher quality genomic DNA in some taxa (28), but as tissue disintegrates in this solution, it is inappropriate for voucher
56
N. Evans and G. Paulay
preservation. Keeping preserved subsamples in a fridge or freezer until extraction slows DNA degradation. 3. Subsamples are ideally placed in 96-well plate format (or in tubes arranged in 96-well format, such as Matrix Storage Tubes (Thermo Scientific)), and worked in that format through to sequencing. When subsamples are collected in small numbers, or for replicate samples of especially important specimens, small tissue vials can be used. As possibilities of sample mixup or cross-well contamination are substantial for 96-well plates, extra care needs to be taken to prevent this. Staggering samples of the same species among non-neighboring cells facilitates detection of contamination. 3.3. Preparation of DNA
There are a diversity of DNA preparation methods and most are suitable for any metazoan (see Note 1). For larger scale projects, we suggest the high-throughput, silica-based DNA extraction protocol described by Ivanova et al. (29) (see Note 2). Here, we present a manual extraction protocol using DNAzol®. This simple and reliable method works well with most metazoans, produces highquality extractions, and is suitable for even decade old specimens. The manufacturer’s protocol can be carried out in minimal lab settings at room temperature in less than 30 min. However, we suggest the following modified approach to improve DNA yields and quality. 1. Place ~1–2 mm piece of tissue on parafilm. Remove extra storage buffer or EtOH by evaporation or blotting with a Kimwipe. Mincing tissue with a sterile blade can improve digestion and increase yield. Shaving off outer tissue layer can reduce inclusion of contaminants. 2. Transfer tissue to 1.7-ml polypropylene microcentrifuge tube. 3. Add 750 μl DNAzol® and 5 μl proteinase K (20 mg/ml). For challenging tissues, let it stand for ~10 min then add an additional 5–10 μl proteinase K. 4. Grind tissue with sterile pellet pestle (see Note 3). Alternatively, for particularly soft-bodied taxa (e.g., medusozoans) a simple vortex is sufficient. 5. Allow tissue to digest for ~24 h at room temperature, preferably on a rocking shaker. For fresh, high-quality tissue a 1-h digest may be sufficient, while for poor quality samples digestion can be extended for >24 h. 6. Centrifuge sample at ~12,000 × g for 15 min. 7. Carefully pipette supernatant into new 1.7 ml tube avoiding disturbance or transfer of pellet at bottom of original tube. Leaving some (<25 μl) supernatant behind will prevent such transfer and should not significantly affect yields. Discard original tube.
4
DNA Barcoding Methods for Invertebrates
57
8. Add 375 μl (i.e., ~50% of volume of supernatant) of 95–100% ice-cold EtOH. 9. Invert tube gently a few times. DO NOT vortex as DNA is particularly vulnerable to shearing at this stage. 10. Store between 4°C and −20°C for ~1 h. Additional time may result in co-precipitation of salts and affect extraction quality. 11. Centrifuge sample at ~12,000 × g for >5 min. Orient tubes in same direction (e.g., all cap hinges facing out), to facilitate locating and thus avoiding DNA pellet (angled at bottom of tube) during subsequent pipetting. 12. Carefully pipette or decant nearly all the supernatant without disturbing DNA pellet. If pellet is disturbed repeat from step 9. 13. Rinse sample twice, adding ~1 ml of 75% EtOH, centrifuging at 12,000 × g, and decanting (or pipetting) as above. 14. Carefully remove any remaining EtOH from tube, by pipetting or evaporation, without disturbing DNA pellet. Residual EtOH can be evaporated by leaving tube open but protected from contamination (e.g., covered with a Kimwipe), for up to a few hours. 15. Add 30–50 μl TE buffer or sterile dH2O, then mix by flicking tube or gently vortexing. Leave at room temperature for several hours or overnight to dissolve DNA pellet. If the extraction is noticeably viscous the concentration is likely high and additional TE buffer or sterile dH2O should be added. Alternatively, if yields are consistently low consider adding less TE/dH2O in future extractions. 16. Resulting genomic DNA extraction is ready for use in PCR. However, quantification and dilution of extraction should be considered. 3.4. DNA Quantification
The ideal DNA concentration for PCR falls broadly around 50–100 ng/μl and extractions tend to approximate this range. DNA quantification is optional, but we recommend it when PCR success is variable (see Note 4). In these cases genomic DNA extractions can be reliably quantified with a spectrophotometer. However, cruder estimates can be successfully made by electrophoresis of a 1-μl aliquot of DNA on a 0.8% agarose gel in parallel with a properly mass-quantified DNA ladder.
3.5. DNA Storage
Genomic DNA will degrade unless it is kept cold. Freezing is preferable but repeated freeze–thaw cycles also result in degradation of DNA (see Note 5). For repeated use over short term (days) store DNA at 4°C, but for longer term (weeks to a few years) keep DNA extracts at least at −20°C. Only temperatures at or below −80°C effectively halt degradation and should be used for long-term storage.
58
N. Evans and G. Paulay
3.6. PCR Amplifications
1. Calculate reagent volumes needed for PCR cocktail based on volumes in Table 3 multiplied by the number of reactions. A negative control reaction (i.e., no DNA template) should always be included; a positive controls (of easily/previously amplified DNA) is also useful. Due to pipetting errors it can be useful to add extra reaction volumes (we suggest approximately one additional reaction volume per 24 samples). 2. When including PCR enhancers replace corresponding volume of ddH2O with enhancer solution (Table 4). Inclusion of PCR enhancers provides a powerful and cheap method to improve PCR success, especially for poor quality DNA or those that contain inhibitory compounds (see Note 6).
Table 3 Standard PCR cocktail for one 25 ml reaction Reagents
25 ml rxn
ddH2Oa
19.5 μl 2.5 μl
10× buffer 50 mM MgCl
1.25 μl
10 μM forward primer
0.25 μl
10 μM reverse primer
0.25 μl
10 mM dNTPs
0.125 μl
Taq polymerase (5 units/μl)
0.125 μl
b 2
PCR cocktail total (1 rxn) DNA template
24 μl 1 μl
See Tables 6 and 7 for appropriate primer pairs May adjust ddH2O volume to accommodate additional reagents, DNA template, or PCR enhancing additives b Yields 2.5 mM MgCl2; a range between 0.5 and 3 mM is recommended, with higher concentrations making primer annealing less stringent a
Table 4 Optional PCR enhancing additives. To include, appropriately decrease ddH2O volume in 25 ml PCR cocktail PCR enhancers 15% Trehalose stock
12.6 μl
BSA 2.5 μg/μl stock
2–4 μl
4
DNA Barcoding Methods for Invertebrates
59
3. Thaw, on ice, DNA samples and all necessary PCR reagents, except for Taq polymerase. Taq is especially sensitive to thermal degradation, is usually suspended in glycerin and thus remains liquid at −20°C, thus can be pipetted directly from “frozen” tubes. 4. Combine PCR reaction cocktail on ice (see Notes 7 and 8). 5. Vortex and spin cocktail. 6. Pipette 24 μl of PCR cocktail into each PCR reaction tube. 7. Pipette 1 μl of appropriate DNA template into each reaction tube, except negative control. 8. Firmly seal reaction tubes and place in thermal cycler machine. 9. Start appropriate PCR thermal cycling profile program. 3.7. PCR Thermal Cycling Profiles
PCR profiles vary greatly and researchers should make an initial effort to test and optimize reactions for their particular thermal cycler machine, targeted marker, primer pair, and taxa of interest. Suggested thermal cycling profiles appear in Table 5 (see Note 9 for detailed explanations). These profiles, adjusted to the suggested
Table 5 Three thermal cycling approaches Approach
Utility
Thermal cycling profilea
Standard
Well suited for taxonspecific primers
94°C for 5 min, 35 cycles (94°C for 30 s, Ta [given in Table 7] for 45 s, 72°C for 1 min), 72°C for 5 min, hold at 4°C
Step-up
Decreases annealing specificity, appropriate for universal primers. Risk: co-amplification of contaminants or nontarget sequence
94°C for 5 min, 5 cycles (94°C for 30 s, ~5°C below Ta [given in Table 7] for 45 s, 72°C for 1 min), 30 cycles (94°C for 30 s, Ta [given in Table 7] for 45 s, 72°C for 1 min), 72°C for 5 min, hold at 4°C
Stepdown
Increases annealing specificity, eliminating co-amplified products. Risk: no amplification of target sequence
94°C for 5 min, 5 cycles (94°C for 30 s, ~5°C above Ta [given in Table 7] for 45 s, 72°C for 1 min), 30 cycles (94°C for 30 s, Ta [given in Table 7] for 45 s, 72°C for 1 min), 72°C for 5 min, hold at 4°C
See Note 9 for description Table 7 for appropriate annealing temperatures
a
60
N. Evans and G. Paulay
annealing temperatures for the primers used (Table 7), should provide a starting point for amplification of sequences less than approximately 800 bps in length. Longer sequences will require longer extension times and may require multiple amplifications and primer pairs. Usa a “heated lid” option for the thermal profile program whenever available, to prevent PCR reactions from condensing on the tube lids. 3.8. PCR Product Confirmation
Successful PCR amplification should be confirmed by electrophoresis of 2.5–5 μl of the amplicon and a molecular ladder in neighboring wells in a 1% agarose gel, made (or stained) with a dilute (~0.5 μg/ml) ethidium bromide (EtBr) solution and visualized under UV light (see Notes 10 and 11). Successful amplicons will appear as single, distinct bands. Faint or additional, spurious bands suggest that PCR or primer optimization is needed. Those conducting high-through put efforts may consider commercially available 96-well precast gel systems (e.g., Invitrogen E-gel 96 system).
3.9. Sequencing
Given the infrastructure, resources, and expertise needed to do sequencing, this step is usually contracted out to commercial or university core facilities. It is prudent to “shop around” for sequencing services (even internationally), comparing everything from volumes of PCR reactions requested, to average completion times. In our experience, researchers who can guarantee a high volume of samples may be able to negotiate a better price. We also recommend working with those facilities capable of affordably handling raw PCR products. However, in some cases it may still be more cost effective to “clean” or purify PCR products before submitting them. A variety of methods can be found in this volume and elsewhere (see Note 12).
3.10. Sequence Data Processing and Verification
1. Construct bidirectional contigs: Bidirectional sequence data (i.e., from forward and reverse sequencing primer reactions) should be assembled into a single contig to provide a reliable consensus sequence (see Note 13). This can be accomplished with various software including Geneious, Sequencher, UGENE, and MEGA (see Note 14). 2. Check sequence identity: Sequence identity should be checked to reduce the likelihood of proceeding with contaminant sequences, misidentified samples, or pseudogenes (discussed below). This can be carried out by conducting a BLAST query (http://blast.ncbi.nlm.nih.gov/Blast.cgi) or a “Taxon ID Tree” in BOLD (http://www.boldsystems.org/) (see Note 15). The quality of these queries is directly related to the library of available sequences. For understudied clades (especially highly divergent ones), these queries perform less favorably. Be cautious if results suggest unexpected taxon affinities.
4
DNA Barcoding Methods for Invertebrates
61
3. Create alignment: Before comparative analyses can proceed, sequences (of the same gene region) should be assembled into a multiple sequence alignment. There are a number of free programs capable of performing this (e.g., MAFFT, MUSCLE, ClustalW) (see Note 16). 4. Detection of contaminant and pseudogene sequences: Poor quality, contaminant or nontarget sequences can also be identified when they fail to properly align to other sequences, or if they possess unique indels or atypical sequence regions. For protein coding genes (e.g., CO1), nucleotide sequences should be translated into amino acid data, to confirm an open reading frame (i.e., no stop codons). Stop codons are indicative (but their absence is not a guarentee) that a nonfunctional, pseudogene region was amplified (a serious concern). For mtDNA genes, be careful to choose the correct mitochondrial genetic code for inferring amino acid sequences, as this varies among some animal phyla (see Note 17). Indels and introns, while rare, do exist within metazoan mitochondrial protein coding genes (21, 30–33). If translational frameshifts result, such sequences could be incorrectly identified as pseudogenes (30). 3.11. Alternative or Additional Markers
Additional or alternative molecular markers should be considered when CO1 data is insufficient to distinguish recognized species, when it conflicts with interpretations of morphological or isolating characters, or when CO1 amplification remains challenging (but see Notes 18–20). A brief overview of the most commonly used alternative species level markers can be found below (see Note 21 for mitochondrial, and Note 22 for nuclear markers). A limited selection of universal primers and additional references are provided in Tables 6 and 7. Successful amplification of many of these markers will require additional research and troubleshooting (see Notes 19 and 20).
4. Notes 1. Preparation of DNA from tissue subsample can be accomplished by either extraction protocols or cruder DNA “release” methods (34). DNA “release” approaches digest the tissue such that DNA is quickly brought into solution, but they stop short of isolating it from the lysate. These methods are both fast and inexpensive but do not produce DNA samples suitable for archiving and can contain PCR inhibiting compounds. However, if organisms or tissue samples are exceedingly small (e.g., meiofauna) and extraction protocols fail to produce suitable DNA yields, a DNA release approach ensures that no
62
N. Evans and G. Paulay
Table 6 Phylum specific strategies, caveats and references for Barcoding invertebrate fauna Phylum
Alternative CO1 “Folmer” region primers
Caveats
Notable references
Acanthocephala
1,2
(61–63)
Acoelomorpha
1
(64, 65)
Annelida
2,3
(42, 66–71)
Arthropoda: (excl. Hexapoda)
Chelicerate-F1, Chelicerate-R1, Chelicerate-R2, HCOoutout, CrustDF1, CrustDR1, CrustF1, CrustF2
2,3
(5, 40, 60, 72–83)
Brachiopoda
Cohen-Fwd, Cohen-Rev
1,2
(84, 85)
1,2
(86)
Bryozoa Cephalochordata
AmphL109, AmphH1325
(87)
Chaetognatha
(88)
Cnidaria: Anthozoa
MCOIF, MCOIR
2,3,4,5
(16, 17, 31, 32, 89–93)
Cnidaria: Medusozoa
LCOjf
2,4
(94–97)
Cnidaria: Myxozoa
1,2,4
(98, 99)
Ctenophora
1,2,4,5
(18, 100, 101)
Cycliophora
CycF, CycR
Echinodermata
COIceF, COIceR
(102) 2,4
(41, 103)
Entoprocta
1
(104)
Gastrotricha
1
(105)
1
(106)
Hemichordata
1,2,4
(107, 108)
Kinorhyncha
1
Loricifera
1
Micrognathozoa
1
(109)
2,3
(10, 13, 110–119)
Nematoda
2,3,4,5
(48, 55, 56, 82, 120, 121)
Nematomorpha
1
Gnathostomulida
Mollusca
Nemertea
COI-7, COI-D
dgLCO, dgHCO
HCOoutout
1,2
(122–124)
Onychophora
2
(125–127)
Orthonectida
1
(continued)
4
DNA Barcoding Methods for Invertebrates
63
Table 6 (continued) Phylum
Alternative CO1 “Folmer” region primers
Caveats
Notable references
Phoronida
1
(128)
Placozoa
2,3,4
(20, 21, 129)
Platyhelminthes
2,4,5
(65, 130–132)
2,3,4,5
(30, 47, 59, 133, 134)
Porifera
dgLCO, dgHCO
Priapulida
1
Dicyemida
1,3
(135)
Rotifera
2,3
(136–138)
Sipuncula
1,2
(139, 140)
Tardigrada
HCOoutout
1,2
(141–144)
Urochordata
Tun_fwd, Tun_rev2
1,2,4
(145, 146)
1,2
(119, 147)
Xenoturbellida
Caveats: 1. Limited or no DNA barcoding completed for this clade, 2. Additional/alternative markers, primer sets or pcr strategies reported (see references), 3. Peculiar genetics or biology warrant caution (see references), 4. Genetic code may deviate from standard invertebrate mtDNA code (see Note 17 or references), 5. “Folmer” CO1 region may be insufficient for DNA barcoding (see references)
DNA is discarded (a problem of varying degrees for most DNA extraction protocols). We recommend the DNA release protocol described by deWaard et al. (35) which utilizes Chelex® 100 (Bio-RAD). DNA extraction methods are more appropriate for isolating DNA from older tissues, from organisms known to possess PCR inhibiting compounds, and when high-quality, stable DNA is desired for molecular work or archiving purposes. Organic DNA extractions remain the cheapest and often most effective extractions methods but require the use of toxic materials (e.g. phenol and chloroform) and unless automated, are labor intensive. Where both tissue and funds are sufficient, commercially available silicabased DNA extraction kits provide both supplies and simple protocols that yield high-quality extractions. QIAGEN DNeasy® and Clonetech Nucleospin® tissue kits are among the most recommended, and are available in individual and 96-well plate formats. 2. This affordable high-throughput protocol has been adopted by the Canadian Centre for DNA Barcoding and can also be found on their website (http://www.ccdb.ca/pa/ge/research/ protocols). 3. To clean polypropylene pestels before reuse, soak them in bleach for >30 min, rinse with water, wrap in foil, and autoclave.
5¢–3¢ Forward primer sequence
CrustF1/HCO2198
LCO1490/ HCOoutout Chelicerate-F1/ Chelicerate-R1 Chelicerate-F1/ Chelicerate-R2 CrustDF1/ CrustDR1
COI-7/COI-D
CycF/CycR Cohen-Fwd/ Cohen-Rev Tun_fwd/Tun_rev2 AmphL109/ AmphH1325 COIceF/COIceR
LCO1490/ HC02198 dgLCO/dgHCO
TAAACYTCAGGRTGACCRAARAAYCA
GGTCWACAAAYCATAAAGAYATTGG
See above
GGATGGCCAAAAAATCAAAATAAATG
See above
TTTTCTACAAATCATAAAGACATTGG
CCTCCTCCTGAAGGGTCAAAAAATGA
TACTCTACTAATCATAAAGACATTGG
TCGTGTGTCTACGTCCATTCCTACTG TRAACATRTG TCTGGGTGTCCRAARAAYCARAA
ACTGCCCACGCCCTAGTAATGATATT TTTTATGGTNATGCC ACNAAYAARCAYGAYATYGGNAC GTAAATATATGRTGDGCTC
AACTTGTATTTAAATTACGATC TCNGAATAYCGNCGWGGTATNCC
TCGACTAATCATAAAGATATTAG ATTCGNGCNGAAYTNTCNCAGCC
See above
TTAAAATTACGRTCTGTYAAAAG TACCCYCGNCAAAAAC
TAAACTTCAGGGTGACCAAARAAYCA
GGTCAACAAATCATAAAGAYATYGG
CGRATGGARCTYTCTCAYCC ATTYTBCCNGGRTTTGG
TAAACTTCAGGGTGACCAAAAAATCA
5¢–3¢ Reverse primer sequence
GGTCAACAAATCATAAAGATATTGG
CO1 (5¢ “Folmer” region)
Primer name— forward/reverse
45 and 51°C
42 and 50°C 45 and 50°C 45 and 50°C 45 and 51°C
40°C
51°C
50°C 45–60°C
45°C NA
40–44°C
40–55°C
Ta
Table 7 Universal and phylum-specific primer pairs, sequences, and annealing temperatures
Crustacea
Crustacea
Arachnida
Gnathostomulida, Annelida Tardigrada, Chelicerata, Myriapoda Arachnida
Echinodermata
Urochordata Cephalochordata
Degenerate Universal Metazoa Cycliophora Brachiopoda
Universal Metazoa
Taxon utility
650 bp
650 bp
660 bp
660 bp
742 bp
657 bp
655 bp
586 bp 1 kb
487 bp 663 bp
650 bps
650 bps
Amplicon size
(73)
(80)
(79)
(77, 144, 150, 151) (79)
(149)
(41)
(145) (87)
(102) (84)
(148)
(14)
References
GGTTCTTCTCCACCAACC ACAARGAYATHGG TCTACAAATCATAAAGACATAGG GGTCAACAAATCATAAAGATATTGGAAC
5¢–3¢ Forward primer sequence
CGCCTGTTTATCAAAAACAT
AACCTGGTTGATCCTGCCAGT
ITS5/ITS4
TCCTCCGCTTATTGATATGC
ACGATCGATTTGCACGTCAG
TACTAGAAGGTTCGATTAGTC
TGATCCTTCCGCAGGTTCACCT
CCGGTCTGAACTCAGATCACGT
GAGAAATTATACCAAAACCAGG See above
See above
5¢–3¢ Reverse primer sequence
NA
52.5°C
52.5°C
47°C
45-55°C
55°C 45–51°C
42°C
Ta
Universal
Universal Metazoa
Universal Metazoa
Universal Metazoa
Universal Metazoa
Scleractinia Medusozoa
Crustacea
Taxon utility
>600 bps
0.8–1.3 kb
0.8–1.3 kb
1.8 kba
500 bp
650 bp 650 bp
650 bp
Amplicon size
Ta = reported annealing temperatures. Two temperatures indicate a Step-up or Step-down approach, a range indicates that reported Ta commonly varies a 18S can and often should be amplified and sequenced using multiple internal primers (see references)
GGAAGTAAAAGTCGTAACAAGG
See above
LSU D1D2 fw1/ LSU D1D2 rev2
ITS1, 5.8S, ITS2
AGCGGAGGAAAAGAAACTA
LSU D1D2 fw1/ LSU D1D2 rev1
28S rDNA (nuclear LSU) D1-D2 region
18 S-A/18 S-B
18 S rDNA (nuclear SSU)
16sar-L/16sbr-H
16 S rDNA (mitochondrial LSU)
MCOIF/MCOIR LCOjf/HCO2198
CrustF2/HCO2198
Primer name— forward/reverse
(158)
(57)
(57)
(156, 157)
(155)
(32, 152) (153, 154)
(73)
References
66
N. Evans and G. Paulay
4. Quantification of genomic DNA extractions is optional, because both low and high concentrations of quality DNA can provide good PCR amplifications. However, it is not uncommon to recover extractions as high as 500–1,000 ng/μl, and such high concentrations may inhibit PCR reactions and should be avoided. Highly viscous extracts are indicative of high DNA concentration. Creating and testing serial dilutions (e.g., 1:10, 1:100, and 1:1,000 dilutions of DNA in dH2O) can overcome PCR inhibition from high DNA concentration as well as from inhibiting contaminants present in an extraction. 5. Frost-free freezers often function by cycling temperatures and thus can promote sample degradation. Their use should generally be avoided for tissue, DNA, and reagent storage. 6. For problematic amplifications, including those from samples suspected to contain inhibitory compounds, there are a diversity of PCR enhancing additives that can be used (24, 25, 36, 37). Betaine, BSA, dimethyl sulphoxide (DMSO), and trehalose are among the most successfully used. Most enhancers function by stabilizing contaminants, enzymes, and PCR products in the reaction and many lower the melting temperature of GC-rich templates. We recommend both trehalose and BSA (Table 4). Given their consistent positive effects and minimal cost, we suggest always including one of these additives in a PCR reaction. Although we found no record of these reagents being used together, we suspect they would not interfere with one another. If PCR enhancers and dilutions (see Note 4) fail to provide reliable amplification it may be appropriate to try removing PCR inhibitors from DNA samples with commercially available DNA purification kits, or by trying alternate DNA extraction methods. Unfortunately, purification kits can negatively affect final DNA concentrations or quality (through size exclusion). 7. Prepared PCR cocktails are thermally unstable and should always be mixed just prior to use. However, addition of trehalose to the cocktail mix will enable storage and use of the cocktail mix for a few months, although repeated freeze–thaw cycles should be avoided. 8. Double (50 μl) or half (12.5 μl) this reaction volume are also commonly used. These all typically still include ~1 μl of DNA template. 9. A “Standard” PCR profile is widely applicable, especially for taxon-specific primers. However, degenerate or universal primers (e.g., the “Folmer” CO1 primers) may be more successful with a “Step-up” PCR approach in which lower annealing temperatures in the initial cycles facilitate less specific but more
4
DNA Barcoding Methods for Invertebrates
67
successful primer annealing. This creates a greater pool of target sequences for subsequent amplification cycles which run at higher, more optimal annealing temperatures. The downside to this approach is that lower initial annealing temperatures also facilitate amplification of nontarget regions or contaminant DNA. Some of these concerns can be avoided by not setting annealing temperatures lower than ~45°C. When amplification of nontarget sequences are of specific concern (e.g., when symbionts are known to be present) higher annealing temperatures can help and even a “Step-down” approach may be employed. “Step-down” profiles, like the related “Touchdown” approach (38) can avoid spurious PCR amplicons by beginning at higher, suboptimal annealing temperatures to increase primer specificity and the pool of target sequences before cycling at lower temperatures in which co-amplification occurs. Both “Step-Up” and “Step-down” approaches can be powerful tools for troubleshooting. 10. TAE (Tris–Acetate–EDTA) or TBE (Tris–Borate–EDTA) buffers work equally well for gel preparation and as electrophoresis buffers. If products are to be cut out of gel for further molecular work or preparations, TAE is recommended. 11. Given that EtBr is a known mutagen, use of gloves is necessary (nitrile forms provide better protection than latex). Review MSDS information and handle with care. 12. For ease of use, we suggest ExoSAP-IT (Affymetrix) for cleaning PCR products when this is not handled by the sequencing facility. The manufactures protocol works well, but using a 1:10 dilution of ExoSAP-IT is more affordable and still effective when the 37°C incubation period is extended by 15–30 min. Be aware that this method can make subsequent quantification of PCR products with a spectrophotometer unreliable. 13. If assembly of quality sequences into a bidirectional contig is not possible or several polymorphic sites are present, check the identity of each for contaminants (see appropriate section). Tracking down the source of this error is important and may involve reevaluating original specimens, DNA extractions, PCR reactions, or sequencing efforts. Additional troubleshooting may be required including cloning, sequencing, PCR amplification, extractions, or application of alternative primers. 14. Geneious Pro has a number of tools specifically developed to automate or speed up sequence editing and checking, and provides helpful video tutorials (http://www.geneious.com/). 15. Be aware that GenBank (the database queried in a BLAST search) is known to harbor misidentified species, pseudogene, and contaminant sequences, so caution should be taken when interpreting BLAST queries (39).
68
N. Evans and G. Paulay
16. For protein coding genes with few or no indels (e.g., CO1), we suggest employing simpler, faster algorithms. For more complex markers such as ribosomal DNA (rDNA) or those with large introns, alignment methods employing multiple iterative refinement steps are necessary. 17. NCBI (http://www.ncbi.nlm.nih.gov/) maintains an updated list of known alternative genetic codes under TAXONOMY > TOOLS > Genetic Codes. NCBI’s Translation Table 5 describes the standard invertebrate mitochondrial code. Relevant taxa with known deviations from this are: (a) Cnidaria, Ctenophora, and Placozoa (NCBI’s Translation Table 4). (b) Porifera (NCBI’s Translation Table 4 and 5). (c) Echinodermata and Hemichordata (NCBI’s Translation Table 9). (d) Urochordata (NCBI’s Translation Table 13). (e) Nematoda (NCBI’s Translation Tables 5 and 14). (f ) Platyhelminthes (NCBI’s Translation Tables 5, 9, 14 and 21). Understanding alternative codes for stop codons are the greatest concern when inferring the presence of pseudogenes from amino acid alignments. 18. When amplification is attempted in new or diverse sets of taxa (where primer efficacy has not been evaluated), primer fidelity can be a challenging problem. In these cases, taxon-specific primers can be developed for groups where universal primers do not work well (e.g., see refs. 40, 41). 19. Taxon-specific primers can be designed from alignments of sequences (partial or complete) from closely related taxa (try searching BOLD or GenBank). Focus on developing primers from highly conserved regions on either end of the desired gene. See Hoareau and Boissin (41) for a thorough example of this approach. 20. When developing new primers it is wise to adhere to the following guidelines: (a) Primers should be between 18 and 30 nucleotides, with a 40–60% GC content. (b) Melting temperatures of a primer pair should be between 52°C and 65°C and within 5°C of each other. (c) Avoid nucleotide repeats and base pair complementarities between and within primers. These can promote the formation of hairpin loops, self-annealing and primer dimers, and will negatively affect target sequence amplification. (d) Try including a G and/or C at or near the 3¢ end to provide a “GC clamp.”
4
DNA Barcoding Methods for Invertebrates
69
(e) For protein coding genes, avoid ending primer on a third base position of a codon. (f ) Primers should not exceed a total tenfold degeneracy, (e.g., twofold = two nucleotides substituted in one position). 21. Mitochondrial markers: Most metazoans share a highly conserved repertoire of approximately 37 mitochondrial genes (13 protein subunits, 2 rDNAs, 22 tRNAs) (42). In addition to this they often carry one or more non-coding “control” regions or intergenic spacers (IGS ). Though sequences of both tRNAs and noncoding regions have been successfully used for species level work (especially the latter), concerns with duplication or lack of true homology makes these markers problematic (5, 43–45). Furthermore, gene order rearrangement in mtDNA is common across Metazoa, even within closely related clades. As a result some caution should be taken when attempting to amplify through multiple gene regions (42, 46). I3-M11 partition of CO1: This region of CO1 (approximately 450 bp sequence) is just downstream of the Folmer region and has been demonstrated to be a more variable, thus successful species level marker for several clades including Porifera, Anthozoa, and Nematoda (47, 48). Ribosomal DNA (rDNA or rRNA): mtDNA includes sequences for a large ribosomal RNA subunit (termed LSU or 16S rDNA) and a small ribosomal RNA subunit (termed SSU or 12S rDNA). These sequences possess highly conserved, easily alignable domains interspersed with fairly divergent, difficult to align regions. Though care must be taken with alignment, these genes are often highly informative for both higher level and species level phylogenetic analyses. Many taxonomic communities prefer the dual utility of these genes, particularly the 5¢ end of 16S. Additional mtDNA protein coding genes: Twelve other protein coding genes are typically encoded in metazoan mtDNA (NADH 1-6, NADH 4L, CO 2, CO 3, Cyt b, ATP6, ATP8 ). These genes have each been used at varying success across Metazoa but a thorough review of their clade-by-clade specieslevel utility is outside the scope of this chapter. However, we refer readers to more universal protocols presented in the supplemental material of reference (49). 22. Nuclear markers: Development of nuclear, species-level markers for Metazoa are limited by a number of intrinsic challenges. Nuclear protein coding genes typically evolve at a much slower rate than mitochondrial ones, and can exist as members of complex gene families that make homology difficult to infer. Even closely related taxa can display unique gene duplication and extinction patterns. Thus, single copy nuclear genes are
70
N. Evans and G. Paulay
often not conserved as such across larger taxonomic scales (50). Furthermore, highly conserved nuclear genes often possess exons with little variation but can have introns that are highly variable and difficult to align. Some researchers have advocated taking advantage of this by developing exon primed intron crossing (EPIC) primers. EPIC markers amplify putative homologous intron regions that should be informative at or below the species level and thus may uniquely complement mtDNA barcoding data. Recent work by Chenuil et al. (51) provides both protocols for developing EPIC markers in Metazoa and a list of broad “universal” primers. Though effective species level nuclear protein coding genes do exist for some metazoan clades, none have emerged as a likely candidate for DNA barcoding across the kingdom. Traditional nuclear rDNAs that have long been used for multiple levels of analyses remain the best Metazoa-wide alternatives (52). Nuclear rDNA is comprised of three subunits separated by two internal transcribed spacers (ITS ). These are arranged in a single, tandem repeating unit from 5¢ to 3¢: 18S, ITS1, 5.8S, ITS2, and 28S. These high copy, repeating units typically maintain a single intragenomic sequence identity through concerted evolution (53), although this process can nevertheless maintain multiple unique rDNA copies. In addition, some clades are vulnerable to mobile elements that target rDNA and create a number of pseudogenes. For these reasons, many advocate caution when using these markers (54). These concerns can usually be addressed by intensive cloning and sequencing directed at a small fraction of the samples as well as amplification of larger sequences followed by nested PCRs. 18S (or SSU) rDNA: This large gene region (~1.8 kb) is generally better suited for higher level metazoan phylogenetics because of slow divergence rates. However, because it can be readily amplified it has served as a sort of higher taxon level DNA barcode. It is also popular in “species” level barcoding of nematodes (but see refs. 48, 55) and meiofauna in general, even though 18S may not show sufficient variation to distinguish between closely related species (4, 56). 28S (or LSU) rDNA D1-D2 region: At approximately 3.2 kb, 28S is even larger than 18S, however, it appears to have more informative, hypervariable regions, including the ~0.80–1-kb region of D1-D2, that some advocate using for lower taxonomic level analyses (57, 58). This marker certainly will be phylogenetically informative but its broad utility at the specieslevel has not been well documented (but see refs. 55, 59, 60). ITS1 and ITS2: Internal transcribed spacers 1 and 2 are thought to be under significantly less selective pressure than the rDNA subunit sequences (53). Their high sequence divergence rates
4
DNA Barcoding Methods for Invertebrates
71
are consistent with this, and have long made them important species level markers (52). However, this also likely explains why multiple unique copies are maintained in some metazoans. Yet these markers have played an important role for augmenting or replacing CO1 data in some clades where mtDNA performs poorly, especially many non-bilaterians. The length of ITS1 and ITS2 can significantly vary at the interspecific, intraspecific, and sometimes the intragenomic level, posing further challenges for species level analyses. This should be taken into consideration when troubleshooting these markers. References 1. Hebert PDN, Cywinska A, Ball SL, Dewaard JR (2003) Biological identifications through DNA barcodes. Proc R Soc Lond B Biol Sci 270:313–321 2. Lynch M, Koskella B, Schaack S (2006) Mutation pressure and the evolution of organelle genomic architecture. Science 311: 1727–1730 3. Davison A, Blackie RLE, Scothern GP (2009) DNA barcoding of stylommatophoran land snails: a test of existing sequences. Mol Ecol Resour 9:1092–1101 4. Creer S, Fonseca VG, Porazinska DL et al (2010) Ultrasequencing of the meiofaunal biosphere: practice, pitfalls and promises. Mol Ecol 19(Suppl 1):4–20 5. Hassanin A (2006) Phylogeny of Arthropoda inferred from mitochondrial sequences: strategies for limiting the misleading effects of multiple changes in pattern and rates of substitution. Mol Phylogenet Evol 38: 100–116 6. Knowlton N (1993) Sibling species in the sea. Ann Rev Ecol Syst 24:189–216 7. Verheyen E, Salzburger W, Snoeks J, Meyer A (2003) Origin of the superflock of cichlid fishes from Lake Victoria, East Africa. Science 300:325–329 8. Gregory TR, Mable BK (2005) Polyploidy in animals. In: Gregory TR (ed) The evolution of the genome. Academic, Waltham, MA, pp 428–501 9. Landry C, Geyer LB, Arakaki Y, Uehara T, Palumbi SR (2003) Recent speciation in the Indo-West Pacific: rapid evolution of gamete recognition and sperm morphology in cryptic species of sea urchin. Proc R Soc Lond B Biol Sci 270:1839–1847 10. Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biol 3:e422 11. Coyne J, Orr H (2004) Speciation. Sinauer Associates, Sunderland, MA, p 545
12. Mayr E (1963) Animal species and their evolution. Harvard University Press, Cambridge, p 797 13. Meyer C, Geller J, Paulay G (2005) Fine scale endemism on coral reefs: archipelagic differentiation in turbinid gastropods. Evolution 59:113–125 14. Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol Mar Biol Biotechnol 3:294–299 15. Chen I-P, Tang C-Y, Chiou C-Y et al (2009) Comparative analyses of coding and noncoding DNA regions indicate that Acropora (Anthozoa: Scleractina) possesses a similar evolutionary tempo of nuclear vs. mitochondrial genomes as in plants. Mar Biotechnol 11:141–152 16. Huang D, Meier R, Todd PA, Chou LM (2008) Slow mitochondrial COI sequence evolution at the base of the metazoan tree and its implications for DNA barcoding. J Mol Evol 66:167–174 17. McFadden CS, Benayahu Y, Pante E et al (2010) Limitations of mitochondrial gene barcoding in Octocorallia. Mol Ecol Resour 11:1–13 18. Ortman BD (2008) DNA barcoding the medusozoa and ctenophora. Ph.D. Dissertation, University of Connecticut, Storrs, CT 19. Shearer TL, Coffroth MA (2008) DNA BARCODING: barcoding corals: limited by interspecific divergence, not intraspecific variation. Mol Ecol Resour 8:247–255 20. Signorovitch AY, Dellaporta SL, Buss LW (2006) Caribbean placozoan phylogeography. Biol Bull 211:149–156 21. Signorovitch AY, Buss LW, Dellaporta SL (2007) Comparative genomics of large mitochondria in placozoans. PLoS Genet 3:e13
72
N. Evans and G. Paulay
22. Wörheide G, Erpenbeck D, Menke C (2008) The Sponge Barcoding Project: aiding in the identification and description of poriferan taxa. In: Custódio M, Lôbo-Hajdu G, Haidu E, Muricy G (eds) Porifera research: biodiversity, innovation and sustainability. Museu Nacional de Rio de Janiero Book Series. Rio de Janeiro, Brazil, pp 123–128 23. Dawson MN, Jacobs DK (2001) Molecular evidence for cryptic species of Aurelia aurita (Cnidaria, Scyphozoa). Biol Bull 200:92 24. Spiess A-N-L, Mueller N, Ivell R (2004) Trehalose is a potent pcr enhancer: lowering of DNA melting temperature and thermal stabilization of Taq polymerase by the disaccharide trehalose. Clin Chem 50:1256–1259 25. Kreader CA (1996) Relief of amplification inhibition in PCR with bovine serum albumin or T4 gene 32 protein. Appl Environ Microbiol 62:1102–1106 26. Templado J, Paulay G, Gittenberger A, Meyer C (2010) Sampling the marine realm. In: Eymann J, Degreef J, Häuser C et al (eds) Manual on field recording techniques and protocols for all taxa biodiversity inventories. vol 8. ABC Taxa. Belgian National Focal Point for the GTI, Brussels, pp 273–307 27. Eymann J, Degreef J, Häuser C, Monje JC, Samyn Y, Van den Spiegel D (eds) (2010) Manual on field recording techniques and protocols for all taxa biodiversity inventories. vol 8. ABC Taxa. Belgian National Focal Point for the GTI, Brussels 28. Gaither M, Szabó Z, Crepeau M et al (2011) Preservation of corals in salt-saturated DMSO buffer is superior to ethanol for PCR experiments. Coral Reefs 30:329–333 29. Ivanova NV, Dewaard JR, Hebert PDN (2006) An inexpensive, automation-friendly protocol for recovering high-quality DNA. Mol Ecol Notes 6:998–1002 30. Rosengarten RD, Sperling EA, Moreno MA, Leys SP, Dellaporta SL (2008) The mitochondrial genome of the hexactinellid sponge Aphrocallistes vastus: evidence for programmed translational frameshifting. BMC Genomics 9:33 31. Sinniger F, Pawlowski J (2009) The partial mitochondrial genome of Leiopathes glaberrima (Hexacorallia: Antipatharia) and the first report of the presence of an intron in COI in black corals. Galaxea 11:21–26 32. Fukami H, Chen CA, Chiou C-Y, Knowlton N (2007) Novel group I introns encoding a putative homing endonuclease in the mitochondrial cox1 gene of Scleractinian corals. J Mol Evol 64:591–600
33. Milbury CA, Gaffney PM (2005) Complete mitochondrial DNA sequence of the eastern oyster Crassostrea virginica. Mar Biotechnol 7:697–712 34. Hajibabaei M, DeWaard JR, Ivanova NV et al (2005) Critical factors for assembling a high volume of DNA barcodes. Philos Trans R Soc Lond B Biol Sci 360:1959–1967 35. DeWaard J, Ivanova N, Hajibabaei M, Hebert P (2008) Assembling DNA barcodes. Analytical protocols. In: Martin C (ed) Methods in molecular biology. Humana, Totowa, pp 275–293 36. Bickley J, Hopkins D (1999) Inhibitors and enhancers of PCR. In: Saunders GC, Parkes HC (eds) Analytical molecular biology: quality and validation. Royal Society of Chemistry, Cambridge, UK, pp 81–102 37. Ralser M, Querfurth R, Warnatz H-J et al (2006) An efficient and economic enhancer mix for PCR. Biochem Biophys Res Comm 347:747–751 38. Hecker KH, Roux KH (1996) High and low annealing temperatures increase both specificity and yield in touchdown and stepdown PCR. Biotechniques 20:478–485 39. Siddall ME, Fontanella FM, Watson SC et al (2009) Barcoding bamboozled by bacteria: convergence to metazoan mitochondrial primer targets by marine microbes. Syst Biol 58:445–451 40. Schubart C (2009) Mitochondrial DNA and decapod phytogenies: the importance of pseudogenes and primer optimization. In: Martin JW, Crandall KA, Felder DL (eds) Decapod crustacean phylogenetics. CRC, Boca Raton, FL, pp 47–64 41. Hoareau TB, Boissin E (2010) Design of phylum-specific hybrid primers for DNA barcoding: addressing the need for efficient COI amplification in the Echinodermata. Mol Ecol Resour 10:960–967 42. Gissi C, Iannelli F, Pesole G (2008) Evolution of the mitochondrial genome of Metazoa as exemplified by comparison of congeneric species. Heredity 101:301–320 43. Chen C, Chiou CY, Dai CF, Chen CA (2008) Unique mitogenomic features in the scleractinian family pocilloporidae (Scleractinia: Astrocoeniina). Mar Biotech 10:538–553 44. Rawlings TA, Collins TM, Bieler R (2003) Changing identities: tRNA duplication and remolding within animal mitochondrial genomes. Proc Natl Acad Sci USA 100: 15700–15705 45. Walther E, Schofl G, Mrotzek G et al (2011) Paralogous mitochondrial control region in the
4
DNA Barcoding Methods for Invertebrates
giant tiger shrimp, Penaeus monodon (F.) affects population genetics inference: a cautionary tale. Mol Phylgenet Evol 58:404–408 46. Machida R, Miya M, Nishida M, Nishida S (2006) Molecular phylogeny and evolution of the pelagic copepod genus Neocalanus (Crustacea: Copepoda). Marine Biol 148: 1071–1079 47. Erpenbeck D, Hooper JNA, Worheide G (2006) CO1 phylogenies in diploblasts and the “Barcoding of Life” – are we sequencing a suboptimal partition? Mol Ecol Notes 6:550–553 48. Derycke S, Vanaverbeke J, Rigaux A et al (2010) Exploring the use of cytochrome oxidase c subunit 1 (COI) for DNA barcoding of free-living marine nematodes. PLoS One 5:e13716 49. Simon C, Buckley TR, Frati F, Stewart JB, Beckenbach AT (2006) Incorporating molecular evolution into phylogenetic analysis, and a new compilation of conserved polymerase chain reaction primers for animal mitochondrial DNA. Ann Rev Ecol Syst 37:545–579 50. Simpson R, Wilding C, Grahame J (2005) Intron analyses reveal multiple calmodulin copies in Littorina. J Mol Evol 60:505–512 51. Chenuil A, Hoareau TB, Egea E et al (2010) An efficient method to find potentially universal population genetic markers, applied to metazoans. BMC Evol Biol 10:276 52. Hwang UW, Kim W (1999) General properties and phylogenetic utilities of nuclear ribosomal DNA and mitochondrial DNA commonly used in molecular systematics. Korean J Parasitol 37:215 53. Eickbush TH, Eickbush DG (2007) Finely orchestrated movements: evolution of the ribosomal RNA genes. Genetics 175:477–485 54. Harris DJ, Crandall KA (2000) Intragenomic variation within ITS1 and ITS2 of freshwater crayfishes (Decapoda: Cambaridae): implications for phylogenetic and microsatellite studies. Mol Biol Evol 17:284 55. Derycke S, Fonseca G, Vierstraete A et al (2008) Disentangling taxonomy within the Rhabditis (Pellioditis) marina (Nematoda, Rhabditidae) species complex using molecular and morhological tools. Zool J Linn Soc 152:1–15 56. Bhadury P, Austen MC (2010) Barcoding marine nematodes: an improved set of nematode 18S rRNA primers to overcome eukaryotic co-interference. Hydrobiologia 641: 245–251 57. Sonnenberg R, Nolte A (2007) An evaluation of LSU rDNA D1-D2 sequences for their use in species identification. Front Zool 4:6
73
58. Markmann M, Tautz D (2005) Reverse taxonomy: an approach towards determining the diversity of meiobenthic organisms based on ribosomal RNA signature sequences. Philos Trans R Soc Lond B Biol Sci 360:1917–1924 59. Cárdenas P, Rapp HT, Schander C, Tendal OS (2010) Molecular taxonomy and phylogeny of the Geodiidae (Porifera, Demospongiae, Astrophorida)–combining phylogenetic and Linnaean classification. Zoolog Scripta 39: 89–106 60. McLain DK, Li J, Oliver JH (2001) Interspecific and geographical variation in the sequence of rDNA expansion segment D3 of Ixodes ticks (Acari: Ixodidae). Heredity 86:234–242 61. Benesh DP, Hasu T, Suomalainen L-R, Valtonen ET, Tiirola M (2006) Reliability of mitochondrial DNA in an acanthocephalan: the problem of pseudogenes. Int J Parasitol 36:247–254 62. Martínez-Aquino A, Reyna-Fabián ME, Rosas-Valdez R, Razo-Mendivil U, de León GP-P, García-Varela M (2009) Detecting a complex of cryptic species within Neoechinorhynchus golvani (Acanthocephala: Neoechinorhynchidae) inferred from ITSs and LSU rDNA gene sequences. Int J Parasitol 95: 1040–1047 63. Steinauer ML, Nickol BB, Ortí G (2007) Cryptic speciation and patterns of phenotypic variation of a highly variable acanthocephalan parasite. Mol Ecol 16:4097–4109 64. Sikes JM, Bely AE (2008) Radical modification of the A-P axis and the evolution of asexual reproduction in Convolutriloba acoels. Evol Dev 10:619–631 65. Telford MJ, Herniou EA, Russell RB, Littlewood DT (2000) Changes in mitochondrial genetic codes as phylogenetic characters: two examples from the flatworms. Proc Natl Acad Sci USA 97:11359–11364 66. Chang C-H, Rougerie R, Chen J-H (2009) Identifying earthworms through DNA barcodes: pitfalls and promise. Pedobiologia 52:171–180 67. Aguado MT, Nygren A, Siddall ME (2007) Cladistics analysis of nuclear and mitochondrial genes. Cladistics 23:552–564 68. Carr CM (2010) The polychaeta of canada: exploring diversity and distribution patterns using DNA barcodes. MSc Thesis, University of Guelph, Guelph, ON 69. James SW, Porco D, Decaëns T et al (2010) DNA barcoding reveals cryptic diversity in Lumbricus terrestris L., 1758 (Clitellata): resurrection of L. herculeus (Savigny, 1826). PLoS One 5:e15629
74
N. Evans and G. Paulay
70. Zhou H, Zhang Z, Chen H et al (2010) Integrating a DNA barcoding project with an ecological survey: a case study on temperate intertidal polychaete communities in Qingdao, China. Chin J Oceanol Limnol 28:899–910 71. Bely AE, Weisblat DA (2006) Lessons from leeches: a call for DNA barcoding in the lab. Evol Dev 8:491–501 72. Costa FO, Henzler CM, Lunt DH et al (2009) Probing marine Gammarus (Amphipoda) taxonomy with DNA barcodes. Syst Biod 7:365 73. Costa FO, DeWaard JR, Boutillier J, Ratnasingham S, Dooh RT, Hajibabaei M, Hebert PDN (2007) Biological identifications through DNA barcodes: the case of the Crustacea. Can J Fish Aquat Sci 64:272–295 74. Böttger-Schnack R, Machida RJ (2010) Comparison of morphological and molecular traits for species identification and taxonomic grouping of oncaeid copepods. Hydrobiologia 666:111–125 75. Bradford T, Adams M, Humphreys W, Austin A, Cooper S (2010) DNA barcoding of stygofauna uncovers cryptic amphipod diversity in a calcrete aquifer in Western Australia’s arid zone. Mol Ecol Resour 10:41–50 76. Goolsby JA, DE Barro PJ, Makinson JR, Pemberton RW, Hartley DM, Frohlich DR (2006) Matching the origin of an invasive weed for selection of a herbivore haplotype for a biological control programme. Mol Ecol 15:287–297 77. Murienne J, Edgecombe GD, Giribet G (2010) Including secondary structure, fossils and molecular dating in the centipede tree of life. Mol Phylogenet Evol 57:301–313 78. Navajas M, Navia D (2010) DNA-based methods for eriophyoid mite studies: review, critical aspects, prospects and challenges. Exp Appl Acarol 51:257–271 79. Barrett RDH, Hebert PDN (2005) Identifying spiders through DNA barcodes. Can J Zool 83:481–491 80. Radulovici AE, Sainte-Marie B, Dufresne F (2009) DNA barcoding of marine crustaceans from the Estuary and Gulf of St Lawrence: a regional-scale approach. Mol Ecol Resour 9:181–187 81. Ros VID, Breeuwer JAJ (2007) Spider mite (Acari: Tetranychidae) mitochondrial COI phylogeny reviewed: host plant relationships, phylogeography, reproductive parasites and barcoding. Exp Appl Acarol 42:239–262 82. Hurst GDD, Jiggins FM (2005) Problems with mitochondrial DNA as a marker in population, phylogeographic and phylogenetic studies: the effects of inherited symbionts. Proc R Soc Lond B Biol Sci 272:1525–1534
83. Engelstädter J, Hurst GDD (2009) The ecology and evolution of microbes that manipulate host reproduction. Ann Rev Ecol Evol Syst 40:127–149 84. Cohen BL, Bitner MA, Harper EM et al (2011) Vicariance and convergence in Magellanic and New Zealand long-looped brachiopod clades (Pan-Brachiopoda: Terebratelloidea). Zoolog J Linn Soc 162. doi: 10.1111/j.1096-3642.2010.00682.x 85. Lüter C, Cohen B (2002) DNA sequence evidence for speciation, paraphyly and a Mesozoic dispersal of cancellothyridid articulate brachiopods. Mar Biol 141:65–74 86. Gómez A, Wright PJ, Lunt DH et al (2007) Mating trials validate the use of DNA barcoding to reveal cryptic speciation of a marine bryozoan taxon. Proc R Soc Lond B Biol Sci 274:199–207 87. Kon T, Nohara M, Nishida M et al (2006) Hidden ancient diversification in the circumtropical lancelet Asymmetron lucayanum complex. Mar Biol 149:875–883 88. Jennings RM, Bucklin A, Pierrot-Bults A (2010) Barcoding of arrow worms (Phylum Chaetognatha) from three oceans: genetic diversity and evolution within an enigmatic phylum. PLoS One 5:e9949 89. Sinniger F, Reimer JD, Pawlowski J (2008) Potential of DNA sequences to identify zoanthids (Cnidaria: Zoantharia). Zoolog Sci 25:1253–1260 90. Concepcion GT, Crepeau MW, Wagner D et al (2007) An alternative to ITS, a hypervariable, single-copy nuclear intron in corals, and its use in detecting cryptic species within the octocoral genus Carijoa. Coral Reefs 27:323–336 91. Coleman AW, van Oppen MJH (2008) Secondary structure of the rRNA ITS2 region reveals key evolutionary patterns in acroporid corals. J Mol Evol 67:389–396 92. Chiou CY, Chen IP, Chen C et al (2008) Analysis of Acropora muricata Calmodulin (CaM) indicates that scleractinian corals possess the ancestral exon/intron organization of the eumetazoan CaM gene. J Mol Evol 66:317–324 93. Flot J-F, Magalon H, Cruaud C et al (2008) Patterns of genetic structure among Hawaiian corals of the genus Pocillopora yield clusters of individuals that are compatible with morphology. C R Biol 331:239–247 94. Miranda LS, Collins AG, Marques AC (2010) Molecules clarify a cnidarian life cycle – the “hydrozoan” Microhydrula limopsicola is an early life stage of the Staurozoan Haliclystus antarcticus. PLoS One 5:e10182
4
DNA Barcoding Methods for Invertebrates
95. Moura CJ, Harris DJ, Cunha MR, Rogers AD (2008) DNA barcoding reveals cryptic diversity in marine hydroids (Cnidaria, Hydrozoa) from coastal and deep-sea environments. Zoolog Scripta 37:93–108 96. Dawson MN (2004) Some implications of molecular phylogenetics for understanding biodiversity in jellyfishes, with emphasis on Scyphozoa. Hydrobiologia 530–531: 249–260 97. Ortman BD, Bucklin A, Pagès F, Youngbluth M (2010) DNA barcoding the Medusozoa using mtCOI. Deep Sea Res II 57: 2148–2156 98. Henderson M, Okamura B (2004) The phylogeography of salmonid proliferative kidney disease in Europe and North America. Proc R Soc Lond B Biol Sci 271:1729 99. Whipps CM, Kent ML (2006) Phylogeography of the cosmopolitan marine parasite Kudoa thyrsites (Myxozoa: Myxosporea). J Eukaryot Microbiol 53:364–373 100. Podar M, Haddock SH, Sogin ML, Harbison GR (2001) A molecular phylogenetic framework for the phylum Ctenophora using 18S rRNA genes. Mol Phylogenet Evol 21: 218–230 101. Gorokhova E, Lehtiniemi M, Viitasalo-fro S, Haddock SHD (2009) Molecular evidence for the occurrence of ctenophore Mertensia ovum in the northern Baltic Sea and implications for the status of the Mnemiopsis leidyi invasion. Limnol Oceanogr 54:2025–2033 102. Obst M, Funch P, Giribet G (2005) Hidden diversity and host specificity in cycliophorans: a phylogeographic analysis along the North Atlantic and Mediterranean Sea. Mol Ecol 14:4427–4440 103. Lessios HA (2008) The great American schism: divergence of marine organisms after the rise of the Central American Isthmus. Ann Rev Ecol Evol Syst 39:63–91 104. Fuchs J, Iseto T, Hirose M, Sundberg P, Obst M (2010) The first internal molecular phylogeny of the animal phylum Entoprocta (Kamptozoa). Mol Phyl Evol 56:370–379 105. Todaro MA, Kånneby T, Dal Zotto M, Jondelius U (2011) Phylogeny of thaumastodermatidae (gastrotricha: macrodasyida) inferred from nuclear and mitochondrial sequence data. PLoS One 6:e17892 106. Sørensen MV, Sterrer W, Giribet G (2006) Cladistics four molecular loci and morphology. Cladistics 22:32–58 107. Cannon JT, Rychel AL, Eccleston H, Halanych KM, Swalla BJ (2009) Molecular phylogeny of hemichordata, with updated
75
status of deep-sea enteropneusts. Mol Phyl Evol 52:17–24 108. Smith SE, Douglas R, Burke K, Swalla BJ (2003) Morphological and molecular identification of Saccoglossus species (Hemichordata: Harrimaniidae) in the Pacific Northwest. Can J Zool 141:133–141 109. Giribet G, Sorensen MV, Funch P et al (2004) Investigations into the phylogenetic position of Micrognathozoa using four molecular loci. Cladistics 20:1–13 110. Doucet-Beaupré H, Breton S, Chapman EG et al (2010) Mitochondrial phylogenomics of the Bivalvia (Mollusca): searching for the origin and mitogenomic correlates of doubly uniparental inheritance of mtDNA. BMC Evol Biol 10:50 111. Campbell DC, Johnson PD, Williams JD et al (2008) Identification of “extinct” freshwater mussel species using DNA barcoding. Mol Ecol Resour 8:711–724 112. Ghiselli F, Milani L, Passamonti M (2011) Strict sex-specific mtDNA segregation in the germ line of the DUI species Venerupis philippinarum (Bivalvia: Veneridae). Mol Biol Evol 28:949–961 113. Allcock AL, Barratt I, Eléaume M et al (2010) Cryptic speciation and the circumpolarity debate: a case study on endemic Southern Ocean octopuses using the COI barcode of life. Deep Sea Res II 114. Kelly RP, Sarkar IN, Eernisse DJ, Desalle R (2007) DNA barcoding using chitons (genus Mopalia). Mol Ecol Notes 7:177–183 115. Dinapoli A, Klussmann-Kolb A (2010) The long way to diversity–phylogeny and evolution of the Heterobranchia (Mollusca: Gastropoda). Mol Phylogenet Evol 55:60–76 116. Barr NB, Cook A, Elder P et al (2009) Application of a DNA barcode using the 16S rRNA gene to diagnose pest Arion species in the USA. J Molluscan Stud 75:187–191 117. Ladoukakis ED, Theologidis I, Rodakis GC, Zouros E (2011) Homologous recombination between highly diverged mitochondrial sequences: examples from maternally and paternally transmitted genomes. Mol Biol Evol 28:1–40 118. Breton S, Stewart DT, Shepardson S et al (2011) Novel protein genes in animal mtDNA: a new sex determination system in freshwater mussels (Bivalvia: Unionoida)? Mol Biol Evol 28:1645–1659 119. Bourlat SJ, Nakano H, Åkerman M et al (2008) Feeding ecology of Xenoturbella bocki (phylum Xenoturbellida) revealed by genetic barcoding. Mol Ecol Resour 8:18–22
76
N. Evans and G. Paulay
120. Bhadury P, Austen MC, Bilton DT et al (2007) Exploitation of archived marine nematodes – a hot lysis DNA extraction protocol for molecular studies. Zoolog Scripta 36:93–98 121. De Ley P, De Ley IT, Morris K et al (2005) An integrated approach to fast and informative morphological vouchering of nematodes for applications in molecular barcoding. Philos Trans R Soc Lond B Biol Sci 360: 1945–1958 122. Mateos E, Giribet G (2008) Exploring the molecular diversity of terrestrial nemerteans (Hoplonemertea, Monostilifera, Acteonemertidae) in a continental landmass. Zoolog Scripta 37:235–243 123. Maslakova S, Norenburg J (2008) Revision of the smiling worms, genus Prosorhochmus Keferstein, 1862, and description of a new species, Prosorhochmus belizeanus sp. nov. (Prosorhochmidae, Hoplonemertea, Nemertea) from Florida and Belize. J Nat History 42:1219–1260 124. Sundberg P, Thuroczy Vodoti E, Strand M (2010) DNA barcoding should accompany taxonomy – the case of Cerebratulus spp (Nemertea). Mol Ecol Resour 10:274–281 125. Daniels SR, Ruhberg H (2010) Molecular and morphological variation in a South African velvet worm Peripatopsis moseleyi (Onychophora, Peripatopsidae): evidence for cryptic speciation. J Zool 282:171–179 126. Trewick SA (2000) Mitochondrial DNA sequences support allozyme evidence for cryptic radiation of New Zealand Peripatoides (Onychophora). Mol Ecol 9:269–281 127. Podsiadlowski L, Braband A, Mayer G (2008) The complete mitochondrial genome of the onychophoran Epiperipatus biolleyi reveals a unique transfer RNA set and provides further support for the ecdysozoa hypothesis. Mol Biol Evol 25:42–51 128. Santagata S, Cohen BL (2009) Phoronid phylogenetics (Brachiopoda; Phoronata): evidence from morphological cladistics, small and large subunit rDNA sequences, and mitochondrial cox1. Zool J Linn Soc 157:34–50 129. Voigt O, Collins AG, Pearse VB et al (2004) Placozoa – no longer a phylum of one. Current Biol 14:944–945 130. Sanna D, Lai T, Francalacci P et al (2009) Population structure of the Monocelis lineata (Proseriata, Monocelididae) species complex assessed by phylogenetic analysis of the mitochondrial Cytochrome c Oxidase subunit I (COI) gene. Gen Mol Biol 32:864–867 131. Moszczynska A, Locke SA, McLaughlin JD et al (2009) Development of primers for the
mitochondrial cytochrome c oxidase I gene in digenetic trematodes (Platyhelminthes) illustrates the challenge of barcoding parasitic helminths. Mol Ecol Resour 9:75–82 132. Zarowiecki MZ, Huyse T, Littlewood DTJ (2007) Making the most of mitochondrial genomes–markers for phylogeny, molecular ecology and barcodes in Schistosoma (Platyhelminthes: Digenea). Int J Parasitol 37:1401–1418 133. Pöppe J, Sutcliffe P, Hooper JNA et al (2010) CO I barcoding reveals new clades and radiation patterns of Indo-Pacific sponges of the family Irciniidae (Demospongiae: Dictyoceratida). PLoS One 5:e9950 134. Wang X, Lavrov DV (2008) Seventeen new complete mtDNA sequences reveal extensive mitochondrial genome evolution within the Demospongiae. PLoS One 3:e2723 135. Watanabe KI, Bessho Y, Kawasaki M, Hori H (1999) Mitochondrial genes are found on minicircle DNA molecules in the mesozoan animal Dicyema. J Mol Biol 286:645–650 136. Derry AM, Hebert PDN, Prepas EE (2003) Evolution of rotifers in saline and subsaline lakes: a molecular phylogenetic approach. Limn Oceanograph 48:675–685 137. Fontaneto D, Kaya M, Herniou EA, Barraclough TG (2009) Extreme levels of hidden diversity in microscopic animals (Rotifera) revealed by DNA taxonomy. Mol Phy Evol 53:182–189 138. Gómez A, Serra M, Carvalho GR, Lunt DH (2002) Speciation in ancient cryptic species complexes: evidence from the molecular phylogeny of Brachionus plicatilis (Rotifera). Evolution 56:1431–1444 139. Du X, Chen Z, Deng Y, Wang Q (2009) Comparative analysis of genetic diversity and population structure of Sipunculus nudus as revealed by mitochondrial COI sequences. Biochem Genet 47:884–891 140. Kawauchi GY, Giribet G (2010) Are there true cosmopolitan sipunculan worms? A genetic variation study within Phascolosoma perlucens (Sipuncula, Phascolosomatidae). Marine Biol 157:1417–1431 141. Blaxter M, Mann J, Chapman T, Thomas F, Whitton C, Floyd R, Abebe E (2005) Defining operational taxonomic units using DNA barcode data. Philos Trans R Soc Lond B Biol Sci 360:1935–1943 142. Sands CJ, Convey P, Linse K, McInnes SJ (2008) Assessing meiofaunal variation among individuals utilising morphological and molecular approaches: an example using the Tardigrada. BMC Ecol 8:7
4
DNA Barcoding Methods for Invertebrates
143. Schill RO (2007) Comparison of different protocols for DNA preparation and PCR amplification of mitochondrial genes of tardigrades. J Limnol 66:164–170 144. Cesari M, Bertolani R, Rebecchi L, Guidetti R (2009) DNA barcoding in Tardigrada: the first case study on Macrobiotus macrocalix Bertolani & Rebecchi 1993 (Eutardigrada, Macrobiotidae). Mol Ecol Resour 9:699–706 145. Stefaniak L, Lambert G, Gittenberger A et al (2009) Genetic conspecificity of the worldwide populations of Didemnum vexillum Kott, 2002. Aquat Invasion 4:29–44 146. Nydam ML, Harrison RG (2007) Genealogical relationships within and among shallow-water Ciona species (Ascidiacea). Marine Biol 151:1839–1847 147. Bourlat SJ, Nielsen C, Lockyer AE et al (2003) Xenoturbella is a deuterostome that eats molluscs. Nature 424:925–928 148. Meyer CP (2003) Molecular systematics of cowries (Gastropoda: Cypraeidae) and diversification patterns in the tropics. Biol J Linn Soc 79:401–459 149. Kojima S, Segawa R, Hashimoto J, Ohta S (1997) Molecular phylogeny of vestimentiferans collected around Japan, revealed by the nucleotide sequences of mitochondrial DNA. Marine Biol 127:507–513 150. Prendini L (2005) Systematics of the group of African whip spiders (Chelicerata: Amblypygi): Evidence from behaviour, morphology and DNA. Organ Div Evol 5:203–236 151. Schwendinger PJ, Giribet G (2005) The systematics of the south-east Asian genus
77
Fangensis Rambla (Opiliones: Cyphophthalmi: Stylocellidae). Invertebr Syst 19:297–323 152. Fukami H, Budd AF, Levitan DR et al (2004) Geographic differences in species boundaries among members of the Montastraea annularis complex based on molecular and morphological markers. Evolution 58:324–337 153. Dawson MN (2005) Incipient speciation of Catostylus mosaicus (Scyphozoa, Rhizostomeae, Catostylidae), comparative phylogeography and biogeography in south-east Australia. J Biog 32:515–533 154. Martínez DE, Iñiguez AR, Percell KM et al (2010) Phylogeny and biogeography of Hydra (Cnidaria: Hydridae) using mitochondrial and nuclear DNA sequences. Mol Phyl Evol 57:403–410 155. Palumbi SR, Martin A, Romano S et al (2002) The simple fool’s guide to PCR, Version 2.0. Department of Zoology and Kewalo Marine Laboratory, Honolulu, HI 156. Apakupakul K, Siddall ME, Burreson EM (1999) Higher level relationships of leeches (Annelida: Clitellata: Euhirudinea) based on morphology and gene sequences. Mol Phylogenet Evol 12:350–359 157. Medlin L, Elwood HJ, Stickel S, Sogin ML (1988) The Characterization of enzymatically amplified eukaryotic 16S-like rRNA-coding regions. Gene 71:491–499 158. White T, Bruns T, Lee S, Taylor J (1990) Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. In: Innis M, Gelfand D, Sninsky J, White T (eds) PCR Protocols: a guide to methods and applications. Academic, New York, pp 315–322
Chapter 5 DNA Barcoding Amphibians and Reptiles Miguel Vences, Zoltán T. Nagy, Gontran Sonet, and Erik Verheyen Abstract Only a few major research programs are currently targeting COI barcoding of amphibians and reptiles (including chelonians and crocodiles), two major groups of tetrapods. Amphibian and reptile species are typically old, strongly divergent, and contain deep conspecific lineages which might lead to problems in species assignment with incomplete reference databases. As far as known, there is no single pair of COI primers that will guarantee a sufficient rate of success across all amphibian and reptile taxa, or within major subclades of amphibians and reptiles, which means that the PCR amplification strategy needs to be adjusted depending on the specific research question. In general, many more amphibian and reptile taxa have been sequenced for 16S rDNA, which for some purposes may be a suitable complementary marker, at least until a more comprehensive COI reference database becomes available. DNA barcoding has successfully been used to identify amphibian larval stages (tadpoles) in species-rich tropical assemblages. Tissue sampling, DNA extraction, and amplification of COI is straightforward in amphibians and reptiles. Single primer pairs are likely to have a failure rate between 5 and 50% if taxa of a wide taxonomic range are targeted; in such cases the use of primer cocktails or subsequent hierarchical usage of different primer pairs is necessary. If the target group is taxonomically limited, many studies have followed a strategy of designing specific primers which then allow an easy and reliable amplification of all samples. Key words: Amphibia, Testudines, Crocodylia, Sphenodontia, Squamata, COI primers
1. Introduction In contrast to numerous other taxa, especially fishes and birds among vertebrates, DNA barcoding of amphibians and reptiles is in a very early stage. We here use the term amphibians as encompassing all Lissamphibia, i.e., frogs, salamanders, and caecilians (as of February 2012, totaling 6,922 species: 6,115 frogs, 618 salamanders, and 189 caecilians) (1). Reptiles are a paraphyletic group and we use the term here to include, all nonavian extant taxa of the Testudines, Crocodylia, Sphenodontia, and Squamata (as of February 2008, 8,734 species: 313 turtles, 23 crocodiles, 2 tuataras, and 8,396 squamates) (2). W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_5, © Springer Science+Business Media, LLC 2012
79
80
M. Vences et al.
Only a few DNA barcoding campaigns on reptiles were initiated recently, e.g., DNA barcoding of the South African reptile fauna (also see the International Barcode of Life web site; www.ibol. org). To date, the number of studies and publications dedicated to DNA barcoding of reptiles in general is very limited. Exceptions are the manageable few species of marine turtles with high conservational implications, where a good progress of DNA barcoding was recently achieved (3, 4). Related to this issue of conservational biology and genetics, DNA barcoding was recently applied to identify species targeted by bushmeat practices and to identify among others alligators and crocodiles (5, 6). In amphibians, several test cases of COI DNA barcoding have been published (7–9) and an extensive DNA barcoding program is currently being carried out on Central and South American taxa and has already led to remarkable results (10). From our own work in progress, rich data sets, taxon coverage ca. 90 and 80% respectively, on amphibians and reptiles of Madagascar are available with research continuing to achieve complete taxon coverage, while ongoing field surveys will enable us to initiate similar barcoding efforts for the frogs of the Congo basin and of Cuba. Given the critical conservation status especially of many amphibians, implementation of larger amphibian DNA barcoding programs would be very useful. They would allow to more efficiently delimit the distribution area and habitat use of endangered species also on the basis of larvae or juveniles which currently cannot be reliably identified. Integration of molecular assessment would help to accelerate the pace of species discovery and the quality of species hypotheses (11, 12). Until 2010, the vast majority of amphibian and reptile COI sequences were not produced in the framework of the global DNA barcoding initiative but they are mostly the result of phylogenetic or phylogeographic studies where COI was used as one of the genetic markers. In addition, numerous COI sequences in GenBank originated from sequencing strategies in which a stretch containing full or partial ND1 and ND2 genes, intervening tRNAs, and only a short section (100–200 bp) of the 5¢ terminus of the COI gene are obtained for phylogenetic analysis (e.g., for amphibians see refs. 13, 14). We have not considered the studies involving this fragment in the primer tables given herein. Beyond investigations on DNA barcoding and phylogeny, there are a growing number of mitogenomic studies that have yielded COI sequences. Among the ones with stronger impact or including several species are (15–21) for reptiles and (22–25) for amphibians. These studies have certainly contributed to the number of available COI sequences, but are otherwise not related to the DNA barcoding effort as such. However, the available coverage of higher taxa such as orders and families in mitogenomic studies is of crucial importance because it allows the design of primers for a variety of regions of the mitochondrial genome (26),
5
DNA Barcoding Amphibians and Reptiles
81
including targeted COI primers for particular taxonomic groups or species in which universal primers may fail. A common theme in amphibian and reptile DNA barcoding is that there is no single pair of primers that will guarantee a sufficient rate of success across all taxa, which means that the strategy needs to be adjusted depending on the specific research question. As far as known there are also no primers universal within major amphibian or reptile subgroups, such as salamanders, frogs, snakes, or lizards. Our experiences show that for amplifying and sequencing large numbers of samples from a restricted taxonomic group (a single species or a complex of closely related species), it is most convenient to design specific primers. If a wide array of taxa are to be screened, either usage of a primer cocktail or a hierarchical approach is advisable (first using one pair of universal primers, and subsequently using a different set of primers for samples that have failed to amplify in the first attempt). A first compilation of mitochondrial DNA primers used in amphibians was published in 1999 (27) but only included a few COI primers. Although not comprehensive, Tables 1 and 2 show a representative overview of primers and annealing temperatures used so far in studies that involved sequencing of COI in a larger number of samples of amphibians or reptiles, respectively. The specificity of primers and the targeted fragment size vary case by case, and the position of primers in the COI gene, and relative to the Folmer region (28), is shown in Figs. 1 and 2. When barcoding amphibians and reptiles, it is to be kept in mind that many species and species complexes are evolutionarily old and contain cryptic candidate species and deep conspecific lineages (refs. 7, 29; see also Note 1). This situation appears to be more commonly encountered in the tropics. In temperate regions, on the one hand, species are better studied so that discovery of new cryptic lineages happens less frequently; on the other hand these species have often expanded from glacial refuges in the Pleistocene, so that similar mitochondrial haplotypes can be encountered over vast geographic ranges and divergences within species are less deep. Altogether, DNA barcoding of amphibians and reptiles based on COI is not fundamentally different from that in other animal groups and has the same promises. Specifics to be kept in mind are mainly the old age of many species and the potential presence of very deeply diverged mitochondrial lineages within species which (a) make it necessary to have very complete COI reference databases for a successful species identification and (b) accentuate the problem of primer failure in single samples even within species or species complexes. Below we give a brief overview of laboratory methods for tissue sampling and for extracting DNA as well as amplifying and sequencing COI from amphibian and reptile specimens. These methods,
Primer name
COIf
COIa
COIa2
LCO1490
HCO2198
VF2 t1
FishF2 t1
Specificity/origin
Universal
Universal
Universal
Universal
Universal
Fishes
Fishes
5,391
5,392
6,089
5,406
6,662
6,707
6,047
52 used in cocktail (tailed)
TGTAAAACGACGGCCA F GTCGACTAATCATA AAGATATCGGCAC
50; 49–50
50; 49–50
45
45; 57
45; 57
Annealing temperature (°C)
52 used in cocktail (tailed)
R
F
R
R
F
Direction
F
GTAAAACGACGGCCA GTCAACCAACCACA AAGACATTGGCAC
TAAACTTCAGGGA CCAAAAAATCA
GGTCAACAAATCA TAAAGATATTGG
CCTGCYARYCCTA RRAARTGTTGAGG
AGTATAAGCGTCT GGGTAGTC
CCTGCAGGAGGA GGAGAYCC
Position Sequence (5¢–3¢)
Tungara frogs (Physalaemus)
(50)
(52)
(52)
(28)
Clawed frogs (Xenopus)
Clawed frogs (Xenopus)
Poison frogs (Oophaga); Malagasy frogs (Mantellidae)
Poison frogs (Oophaga); Malagasy frogs (Mantellidae)
Tungara frogs (Physalaemus); dirt frogs (Craugastor)
(46)
(28)
Tungara frogs (Physalaemus), dirt frogs (Craugastor)
(46)
Primer reference Used for
(53)
(53)
(7, 8, 51)
(7, 8, 51)
(48)
(49, 51)
(49, 51)
Studies
Table 1 Selection of primers used for amplifying COI (fragments) in phylogenetic or phylogeographic studies of amphibians with details on taxon specificity and PCR conditions
82 M. Vences et al.
FR1d t1
VF1-d
VR1-d
LepF1
LepRI
BirdF1
BirdR1
BirdR2
“Desmognathus- 5,370 forward”
“Desmognathus- 6,005 reverse”
MVZ_201
Fishes
Fishes
Fishes
Butterflies
Butterflies
Birds
Birds
Birds
Dusky salamanders
Dusky salamanders
Arboreal salamanders (Aneides)
5,408
6,129
6,129
5,408
6,089
5,406
6,089
5,405
6,086
6,086
FishR2 t1
Fishes
TCAACAAAYCATAAA GATATTGGCACC
GTATTAAGATTTCGG TCTGTTAGAAGTAT
CGGCCACTTTACCYR TGATAATYACTCG
ACTACATGTGAGATG ATTCCGAATCCAG
ACGTGGGAGATAATT CCAAATCCTG
TTCTCCAACCACAAA GACATTGGCAC
TAAACTTCTGGATGT CCAAAAAATCA
ATTCAACCAATCATA AAGATATTGG
TAGACTTCTGGGT GGCCRAARAAYCA
TTCTCAACCAACCA CAARGAYATYGG
CAGGAAACAGCTAT GACACCTCAGGG TGTCCGAARAAYC ARAA
CAGGAAACAGCTAT GACACTTCAGGG TGACCGAAGAAT CAGAA
Position Sequence (5¢–3¢)
Primer name
Specificity/origin
F
R
F
R
R
F
R
F
R
F
NA
52
52
49–50
49–50
49–50
45 and 51
45 and 51
45 and 51
45 and 51
(7)
(58)
(58)
(57)
(57)
(57)
(56)
(56)
(55)
(55)
52 used in cocktail (54) (tailed)
R
( 8)
( 8)
( 8)
( 9)
( 9)
( 9)
( 9)
(53)
(53)
Studies
Arboreal salamanders (Aneides)
DNA Barcoding Amphibians and Reptiles (continued)
( 7)
Dusky salamanders (58) (Desmognathus)
Dusky salamanders (58) (Desmognathus)
Malagasy frogs (Mantellidae)
Malagasy frogs (Mantellidae)
Malagasy frogs (Mantellidae)
Various frog and salamander taxa
Various frog and salamander taxa
Various frog and salamander taxa
Various frog and salamander taxa
Clawed frogs (Xenopus)
Clawed frogs (Xenopus)
Primer reference Used for
52 used in cocktail (52) (tailed)
Annealing temperature (°C)
R
Direction
5 83
Primer name
MVZ_202
PP6
PP7
PP8
PP9
COI-1
COI-2
COI-3
COI-4
Specificity/origin
Arboreal salamanders (Aneides)
Physalaemus
Physalaemus
Physalaemus
Physalaemus
Fire-bellied toads
Fire-bellied toads
Fire-bellied toads
Fire-bellied toads
Table 1 (continued)
5,903
6,503
6,503
5,412
6,467
6,467
6,302
6,302
6,695
CCAGCAATGTCAC AATACCAAAC
GACAGAACATAGTGG AAGTGAGCTAC
GATACGACATAGTGG AAGTGGGCTAC
CAAATCACAAAGACA TTGGCACCCT
TCATGTAATACAATG TCTAGAGA
TCTCTAGAYATTGT ATTACATGA
GTTGGAATTGCRAT GATTATTGT TGCAGA
TCTGCAACAATAAT YATYCGCAATT CCAAC
GCGTCWGGGTART CTGAATATCGTCG
Position Sequence (5¢–3¢)
F
R
R
F
R
F
R
F
R
Direction
NA
NA
NA
NA
Internal sequencing primer
Internal sequencing primer
Internal sequencing primer
Internal sequencing primer
NA
Annealing temperature (°C)
(38)
(38)
(38)
(38)
(50)
(50)
(50)
(50)
(7)
Fire-bellied toads (Bombina)
Fire-bellied toads (Bombina)
Fire-bellied toads (Bombina)
Fire-bellied toads (Bombina)
Tungara frogs (Physalaemus)
Tungara frogs (Physalaemus)
Tungara frogs (Physalaemus)
Tungara frogs (Physalaemus)
Arboreal salamanders (Aneides)
Primer reference Used for
(38)
(38)
(38)
(38)
(48)
(48)
(48)
(48)
(7)
Studies
84 M. Vences et al.
6,176 5,908
6,707
COI-6
Cox
Coy
COI-smallF
COI-smallR
KLPf
HmCO1
CO1AXen-H
CO1h-L
CO1g-L
Fire-bellied toads
Australian Litoria frogs
Australian Litoria frogs
Australian frogs (Litoria aurea)
Australian frogs (Litoria aurea)
Australian frogs (Litoria)
South American hylid frogs (Dendropsophus minutus)
South American hylid frogs (Dendropsophus minutus)
Toads (Bufonidae)
Toads (Bufonidae)
TTCATACGTGGTAA CATTTTAGTCAAG
GGAATTATTTCCC AYGTWGTAAC
TGTATAAGCGT CTGGGTAGTC
CGTCACTCAGTA CCAAACCCCC
AAAGAACCTTTT GGTTACATGGG
CAAATACGG CCCCCATAGAT
TTGGCCTGCTA GGTTTTATTG
GGGGTAGTCAG AATAGCGTCG
TGATTCTTTGGG CATCCTGAAG
GCAGGGGTGTCC TCAATTCTAG
TGGTAATTCCTG CAGCAAGAAC
F
F
R
F
F
R
F
R
F
F
R
Direction
NA
NA
NA
NA
NA
(27)
Toads (Bufonidae) (27)
Toads (Bufonidae) (27)
(27)
(63)
(63)
(62)
(37)
South American hylid frogs (Dendropsophus minutus)
South American hylid frogs (Dendropsophus minutus)
Australian frogs (Litoria)
Australian frogs (Litoria aurea)
(37)
(59–61)
(59–61)
(38)
(38)
Studies
(63)
(63)
(62)
Step-down profile: (37) 60, 58, 56, 54
Australian frogs (Litoria aurea)
Australian Litoria frogs
Australian Litoria frogs
(59) (59)
Fire-bellied toads (Bombina)
Fire-bellied toads (Bombina)
(38)
(38)
Primer reference Used for
Step-down profile: (37) 60, 58, 56, 54
NA
NA
NA
NA
Annealing temperature (°C)
Position is given relative to the complete mitochondrial genome sequence of Discoglossus galganoi (GenBank accession number: AY585339). When multiple annealing temperatures are given, it refers to alternative temperatures used in different studies for the same primer or primer combination
5,162
6,137
6,526
6,222
6,695
6,089
5,840
5,984
COI-5
Fire-bellied toads
Position Sequence (5¢–3¢)
Primer name
Specificity/origin
5 DNA Barcoding Amphibians and Reptiles 85
HCO2198
C1-J-1718
C1-J-2191
CO1a
CO1f
COIcXen
COIfXen
COIaXen
COIeXen
Universal
Universal
Universal
Vertebrata
Vertebrata
Vertebrata
Vertebrata
Vertebrata
Vertebrata
6,398
6,539
5,307
5,787
5,898
6,539
5,939
5,466
5,921
CCAGTAAATAAC GGGAATCAGTG
TGTATAAGCGTC TGGGTAGTC
CCTGCCGGAGG AGGTGACCC
TCGTTTGATCAG TATTAATCAC
CCTGCAGGAGGA GGAGAT(orY)CC
AGTATAAGCGTCT GGGTAGTC
CCCGGTAAAATTAAAA TATAAACTTC
GGAGGATTTGGAAA TTGATTAGTTCC
TAAACTTCAGGGT GACCAAAAAATCA
GGTCAACAAATCAT AAAGATATTGG
LCO1490
Universal
5,262
Primer name Position Sequence (5¢–3¢)
Specificity
R
R
F
F
F
R
R
F
R
F
Direction
47
47
47
47
45–58
45–58
42
42
42–45
42–45
Annealing temperature (°C)
(45)
(45)
(45)
(45)
(45)
(45)
(67)
(67)
(28)
(28)
Primer reference
Anolis
Anolis
Anolis
Anolis
Turtle, tortoise, iguana, skink, crocodile
Turtle, tortoise, iguana, skink, crocodile
Lizard
Lizard
Lizard, turtle, gecko
Lizard, turtle, gecko
Used for
(74)
(74)
(74)
(74)
(68–73)
(68–73)
(66)
(66)
(62–64)
(62–64)
Studies
Table 2 Primers used for amplifying COI (fragments) in phylogenetic or phylogeographic studies of reptiles with details on taxon specificity and PCR conditions
86 M. Vences et al.
FishF2_t1
FishR2_t1
FR1d_t1
M13F (221)
M13R (227)
VF1
VR1
RepCOI-F
RepCOI-R
M72
Vertebrata (COI-3 cocktail)
Vertebrata (COI-3 cocktail)
Vertebrata (COI-3 cocktail)
Universal (COI-3 cocktail)
Universal (COI-3 cocktail)
Vertebrata
Vertebrata
Squamata
Squamata
Testudines
5,946
5,921
5,256
5,921
5,262
NA
NA
5,918
5,918
5,265
TGATTCTTCGGTCACCCA GAAGTGTA
ACTTCTGGRTGKCC AAARAATCA
TNTTMTCAACNAACC ACAAAGA
TAGACTTCTGGGTGGCC AAAGAATCA
TTCTCAACCAACCACAAA GACATTGG
CAGGAAACAGCTATGAC
TGTAAAACGACGGCCAGT
[M13R]ACCTCAGGGT GTCCGAARAAYCARAA
[M13R]ACTTCAGGGT GACCGAAGAATCAGAA
[M13F]CGACTAATCAT AAAGATATCGGCAC
[M13F]CAACCAACCAC AAAGACATTGGCAC
VF2_t1
Vertebrata (COI-3 cocktail)
5,265
Primer name Position Sequence (5¢–3¢)
Specificity
F
R
F
R
F
R
F
R
R
F
F
Direction
48 or 55
48.5
48.5
52
NA
(69)
(77)
(77)
(55)
(55)
51.1 5×, then 56.9 (54) 30×
51.1 5×, then 56.9 (54) 30×
51.1 5×, then 56.9 (54) 30×
51.1 5×, then 56.9 (54) 30×
Side-necked turtle
Squamata
Squamata
Boelen’s python, watersnake
Boelen’s python
Crocodile
Crocodile
Crocodile
Crocodile
Crocodile
51.1 5×, then 56.9 (54) 30×
Used for Crocodile
Primer reference
51.1 5×, then 56.9 (54) 30×
Annealing temperature (°C)
(continued)
(69)
(77)
(77)
(75, 76)
(76)
(5)
(5)
(5)
(5)
(5)
(5)
Studies
5 DNA Barcoding Amphibians and Reptiles 87
L-330COI
H-610COI
H-715COI
L-turtCOIc
H-turtCOIc
L-turtCOI
H-turtCOI
H-turtCOIb
L-COIint
H-COIint
Testudines
Testudines
Testudines
Testudines
Testudines
Testudines
Testudines
Testudines
Testudines
Testudines
5,634
5,792
6,119
6,059
5,968
6,066
5,234
5,946
5,843
5,564
TAGTTAGGTCTACAG AGGCGC
TGATCAGTACTTATCAC AGCCG
GTTGCAGATGTAAAA TAGGCTCG
CCCATACGATGAA GCCTAAGAA
ACTCAGCCATCTTA CCTGTGATT
TGGTGGGCTCATAC AATAAAGC
TACCTGTGATTTTAA CCCGTTGAT
GCCAAATCCTGGTAA GATTAAGAT
GTATTTAGGTTTCGGT CAGTGAG
TACTTTTACTCCTAGCC TCCTCAG
CCTATTGATAGGACGTA GTGGAAGTG
M73
Testudines
6,342
Primer name Position Sequence (5¢–3¢)
Specificity
Table 2 (continued)
R
F
R
R
F
R
F
R
R
F
R
Direction
(79)
(79)
(79)
(79)
(79)
(78)
(78)
(78)
(69)
Primer reference
For sequencing only (79)
For sequencing only (79)
56–58
56–58
56–58
56
56
50–54
50–54
50–54
48 or 55
Annealing temperature (°C)
(78)
(78)
(79)
(78)
(78)
(78)
(78)
(78)
(69)
Studies
Yunnan box turtle
(78)
Yunnan box turtle, (74, 78) anoles
Yunnan box turtle
Yunnan box turtle
Turtle
Yunnan box turtle
Yunnan box turtle
Yunnan box turtle
Yunnan box turtle
Yunnan box turtle
Side-necked turtle
Used for
88 M. Vences et al.
CoxIH2
COIf-ot1
COIr-ot2
COIf-ot2
COIr-ot1
L7354
H7794
rTrp–1L
rCOI−1H
LCOI5973
HCOI6576
LCOI5982
HCOI6570
NA
Crocodylia
Crocodylia
Crocodylia
Crocodylia
Crocodylia
Squamata
Squamata
Squamata
Squamata
Squamata
Squamata
Squamata
Squamata
Serpentes
5,222
5,864
5,317
5,921
5,262
6,332
4,879
6,365
5,925
5,871
5,654
5,595
5,891
6,042
R
F
F
R
F
R
R
F
Direction
TCAGCCATACTACCTG TGTTCA
TGCTGGGTCGAAGAA GGTNGT
GGTATAACCGGAACA GCCCTNAGY
TAAACTTCAGGGTGA CCAAAAAATCA
GGTCAACAAATCATAAA GATATTGG
F
R
F
R
F
TAGTGGAARTGKGCTACTAC R
TAAACCARGRGCCTTCAAAG F
ATAATGGCAAATACTGCCCC
TACCAACACCTATTCTGATT
CGAAACYTAAACACTACCTT
CAGCAAGATGAAGGG AGAAGAT
CGCCGGTACAGGATGAAC
TTGGTATAGRATTGGA TCYCC
CCTAAGAAGCCAATTG ATATTATGC
GGCTACTGCCACTAA TAATCGC
CoxIL2
Crocodylia
5,478
Primer name Position Sequence (5¢–3¢)
Specificity
52
50
50
46–50
46–50
48 5×, 58 35×
48 5×, 58 35×
47–55
47–55
NA
(65)
(65)
(65)
(65)
(15)
(15)
(68)
(68)
50–46 touchdown (6)
50–46 touchdown (6)
50–46 touchdown (6)
Snake
Gecko
Gecko
Gecko
Gecko
Gecko, Komodo dragon
Gecko, Komodo dragon
Iguana, lizard
Iguana, lizard
Dwarf crocodile
Dwarf crocodile
Dwarf crocodile
Dwarf crocodile
50–46 touchdown (6)
Dwarf crocodile
Used for
Dwarf crocodile
(6)
Primer reference
(6)
50
50
Annealing temperature (°C)
DNA Barcoding Amphibians and Reptiles
(continued)
(75)
(65)
(65)
(65)
(65)
(15, 81)
(15, 81)
(68, 80)
(68, 80)
(6)
(6)
(6)
(6)
(6)
(6)
Studies
5 89
COI(−)bdeg
COI(+)b
Serpentes
Serpentes
TAAATAATATAAGCTTCT GACTGCTACCACC
ATTATTGTTGCYGCT GTRAARTAGGCTCG F
R
F
Direction
56.5
56.5–65
56.5–65
Annealing temperature (°C)
(83)
(82)
(82)
Primer reference
Snake
Snake
Snake
Used for
(83)
(82)
(82)
Studies
Position is given relative to the complete mitochondrial genome of Furcifer oustaleti (GenBank accession number: NC_008777). When multiple annealing temperatures are given it refers to alternative temperatures used in different studies for the same primer or primer combination
5,535
6,119
AAGCTTCTGACTNCTA CCACCNGC
COI(+)deg1
Serpentes
5,538
Primer name Position Sequence (5¢–3¢)
Specificity
Table 2 (continued)
90 M. Vences et al.
5
DNA Barcoding Amphibians and Reptiles
91
HmCO1/CO1AXen-H
Anura
Cox/KLPf/COI-smallF/COI-smallR/Coy COI-1/COI-2&COI-3/COI-6/COI-4/COI-5 CO1g-L/CO1h-L
Urodela
PP6&PP7/PP8&PP9 MVZ_201/MVZ_202 Desmognathus-forward/-reverse
Vertebrata
BirdF1/BirdR1/BirdR2 LepF1/LepRI VF1-d/FR1d-t1&VR1-d
universal
FishF2-t1&VF2-t1/FishR2-t1 LCO1490/HCO2198 COIf/COIa2/COIa
6000
5500
6500
« Folmer region »
Fig. 1. Some primers used to amplify COI in amphibians sorted according to their specificity (for details, see Table 1). Black triangles represent forward, empty squares reverse primers, respectively. Numbers on the axis refer to the position on the complete mitochondrial genome of Discoglossus galganoi (GenBank accession number: AY585339).
however, are straightforward and similar to those established in other vertebrates (see Note 2). We also provide an overview of selected primers that have thus far been used to amplify COI from amphibians and reptiles, and which should be helpful to design amplification strategies in future DNA barcoding studies targeting these animals. To obtain this compilation of primers, we focused on studies where the COI gene as a molecular genetic marker was targeted, in particular, the standard animal barcoding region, the so-called Folmer region (28).
2. Materials 2.1. DNA Extraction and Preservation (See Note 3)
1. For routine DNA barcoding, we recommend a salt extraction protocol. 2. Extraction buffer: 0.01 M Tris–HCl (pH 8.0), 0.1 M NaCl, 0.01 EDTA (pH 8.0) in dH2O.
92
M. Vences et al.
COI(+)b/COI(+)deg1/COI(-)bdeg
Serpentes
Squamata
LCOI5973/LCOI5982/HCOI6570/HCOI6576 rTrp–1L/rCOI–1H L7354/H7794
Testudines
CoxIL2/COIr-ot2/COIf-ot2/COIr-ot1/COIfCrocodylia ot1/CoxIH2 L-turtCOIc/H-COIint/L-COIint/L-turtCOI/HturtCOI/H-turtCOIc/H-turtCOIb L-330COI/H-610COI/H-715COI M72/M73
Reptiles
RepCOI-F/RepCOI-R
Vertebrata
VF2(t1)/FR1d(t1) FishF2(t1)/FishR2(t1) F<Makowsky/VF1/VR1 COIfXen/COIcXen/COIaXen/COIeXen
universal
CO1f/CO1a C1-J-1718/C1-J-2191 LCO1490/HCO2198
« Folmer region » 5000
5500
6000
6500
Fig. 2. Some primers used to amplify COI in reptiles sorted according to their specificity (for details, see Table 2). Black triangles represent forward, empty squares reverse primers, respectively. Numbers on the axis refer to the position on the complete mitochondrial genome of Furcifer oustaleti (GenBank accession number: NC_008777).
3. Proteinase K: 4.5 U/ml solution, typically corresponds to 0.15 mg/ml. 4. SDS: 10% solution in dH2O (10 g SDS in 100 ml dH2O). 5. NaCl solution: 5 M NaCl in dH2O. 6. Isopropanol: absolute; cooled in a freezer. 7. Ethanol: 80% solution. 8. DNA elution buffer: 100 mM Tris–HCl pH 8.0. 9. Preservation of the DNA extracts can be done at (ultra) deep temperature or using dry storage systems at room temperature (e.g., GenVault or GenTegra systems, both GenVault Corp.). 2.2. PCR Amplification and Sequencing (See Note 4)
1. dH2O. 2. Reaction buffer (5 or 10×, supplied with Taq polymerase by the respective manufacturer). 3. dNTPs: 10 mM. 4. Forward primer: 10 mM. 5. Reverse primer: 10 mM.
5
DNA Barcoding Amphibians and Reptiles
93
6. Taq polymerase: 5 U/ml. 7. DNA. 8. SAP (Shrimp Alkaline Phosphatase): 10 U/ml (Promega). 9. EXO-nuclease I: 10 U/ml (NEB). 10. BigDye Terminator v1.1 (Applied Biosystems by Life Technologies) diluted 5×. 11. ABI sequencing buffer: supplied as 5× buffer with BigDye. 12. Sequencing primer: 0.4 mM. 2.3. Collection and Hygiene
1. Sodium hypochlorite: 5% solution, diluted to 1% as needed.
3. Methods 3.1. Sample Collection
For amphibians and reptiles, a broad spectrum of tissue samples can be used for the DNA extraction, amplification, and sequencing of the DNA barcoding process. Because of the large number of undescribed species in speciesrich tropical regions (refs. 29, 30 for amphibians), it is advisable to integrate any DNA barcoding study on tropical amphibians and reptiles with adequate collection of representative voucher specimens for subsequent taxonomic study. In some cases, however, the killing of the animals and their deposition in public collections is undesirable or impossible; for instance, in critically endangered species where populations are so small that reducing the population size by even single individuals infers a serious further threat (e.g., in some tortoises), or when collection permits for whole specimens could not be obtained. In such cases, blood samples (most reptiles with middle to large body size), tail tips (e.g., lizards and snakes), toe clips (frogs and salamanders), fin clips (aquatic salamanders and tadpoles), and scale clips (snakes) can be taken. In moderate- to large-sized specimens, buccal saliva swabs give good results. However, in small-sized specimens, especially amphibians, it needs to be considered that introducing such a swab in the mouth is probably more stress and harmful to the animal than the superficially more invasive method of toe clipping. Swabs can be air-dried and then stored, but in humid environment, storage in pure ethanol is preferable. A further alternative source of DNA are the exuviae (shed skin, especially of snakes) but if encountered in the wild, the identity of the specimen will not always be ascertainable. For dead animals including museum specimens that have not been fixed using formalin, liver, heart, or muscle tissues represent optimal sources of genomic, and especially mitochondrial genomic DNA. In general, ethanol-fixed and ethanol-preserved museum
94
M. Vences et al.
specimens of amphibians and reptiles can be suitable for DNA extraction and barcoding if they are up to 10–20 years old. Older specimens might also yield DNA, but success rate becomes very low and usually it is strongly advisable to collect fresh samples. Dry specimens (like tortoise shells or nontreated skins) typically still have tiny rests of dried original soft tissue attached, and these might yield appropriate genomic DNA even from older specimens. Cut a small piece (at least 100 mg) of tissue or take blood or buccal swab samples and make sure that the preservation method of your choice enables the maintenance of high-quality genomic DNA over long period. For a commented overview of the available common collection, sampling, and preservation methods, see ref. 31. Usually, most tissue samples could be kept in absolute ethanol and/ or in frozen state (preferably at ultra low temperature) for longer periods. The proper and permanent marking as well as the easy traceability of the samples is of central importance. 3.2. Tadpole Sampling
Based on recent large-scale studies in Madagascar (32), we propose the following work protocol in sampling (see Note 5): 1. In the field, collect tadpoles from a water body exhaustively (if semiquantitative data are needed, follow a particular plot or transect sampling design). 2. Kill the collected tadpoles using a chlorobutanol or MS222 solution. Using a magnifying glass or field stereo microscope, sort them into morphospecies according to easily recognizable external characters (size, color, oral disk, and body shape). Make sure to make 1–2 duplicate series of each morphospecies to have a better chance to sample cryptic species. 3. From each series, select one representative specimen and take an obvious tissue sample (e.g., a triangular piece of the tail muscle) and store it in pure ethanol. Fix and preserve the sampled specimen with the rest of the series in an equal mixture of 4% buffered formalin and 70% ethanol (formalin fixation is important for subsequent morphological study). 4. After species identification through DNA barcoding, the morphology of the tadpole is studied on the basis of the barcoded voucher specimen (identifiable because of tissue sampling). The remaining specimens of a series (of which identification was not ascertained) are used to describe variation. 5. When collecting of tadpoles is not possible or desired, the same protocol can be carried out with subsequent release of the specimens. However, sorting living tadpoles into morphospecies can be difficult. The DNA voucher specimen that will be sampled for tissue (a clip of the dorsal or terminal part of the caudal fin) should be photographed. Not much experience on anesthetizing tadpoles is available, but it should be possible with appropriate dosage of MS222, as is common practice with fishes.
5
3.3. Documentation (See Note 6)
DNA Barcoding Amphibians and Reptiles
95
1. Whenever samples are collected, documentation of color in life should be a crucial component of each study. Digital photos should be made both of dorsal and ventral sides. If voucher specimens are collected, the photos will later serve to know their life color since many amphibians and lizards (like colorful geckos or chameleons) will lose their color within hours in preservative (and sometimes in life while being handled). 2. If voucher specimens are not collected, photographic documentation becomes even more important and should include lateral, dorsal, and ventral photos as well as close-ups (e.g., of head and ventral sides of hands and feet), to make sure diagnostic characters will be recognizable and allow to subsequently verify species identification of barcoded specimens. 3. In frogs, vocalizations are essential to identify and describe cryptic species, and whenever possible advertisement calls should be documented. Because in species-rich tropical communities often many frog species call from the same sites, it is essential to observe the calling specimen during emission of the calls, i.e., to observe the movement of the vocal sac. If this can be achieved, this “call voucher” specimen should be collected separately, documented photographically, and its call recording and tissue sample marked unambiguously. 4. Often it is preferable to obtain and barcode one or a few welldocumented specimens with linked bioacoustic data than to collect large series of specimens without any further biological information.
3.4. Hygiene Protocols (See Notes 7 and 8)
Due to the spread of amphibian diseases causing drastic declines (33), the application of strict hygiene protocols is advisable in many cases, especially when researchers move among different sites. 1. Amphibians can be handled using bare hands as long as the handler washes their hands between amphibians in water to which the animals would normally be exposed; this will ensure that the risks to frogs of exposure are not increased above environmental levels. 2. If no water is available for washing hands between amphibians, and in situations where possible disease transmission could be an issue, the handler should wear unused disposable gloves, or wear an unused plastic bag, or wipe their hands with a sterilizing alcohol-based hand disinfectant between amphibians. 3. If amphibians are held in a container prior to return to the wild, the container should not have previously have been used for holding other amphibians, or if previously used, the container should be disinfected prior to use. 4. Surgical instruments, such as scissors used for toe tip clipping, should be sterilized between amphibians by chemical disinfection
96
M. Vences et al.
by either 70% ethanol or 1 mg/L Vircon (Dupont) for 1 min (both alternatives are effective against chytrid and ranaviruses, see also refs. 34, 35). 5. Nonsurgical equipment used in a stream or water body should be disinfected prior to use in any other water bodies. Footwear should be washed to remove any mud and disinfected prior to being used in a separate water catchment or water body isolated from the initial water body. For disinfection, the easiest options are bleach (sodium hypochlorite; 1% for 1 min effective against chytrid; 4% for 15 min effective against chytrid and ranaviruses), complete drying for at least 3 h or heating to 37°C for 4 h (only chytrid), or heating to 60°C for 15 min (chytrid and ranaviruses). Clothes will be appropriately disinfected if washed at 60°C for at least 15 min. 3.5. Sampling to Test for Infection with Batrachochytrium dendrobatidis
The following protocol was developed (34) to allow field biologists to nondestructively sample amphibians in the field to test for infection with Batrachochytrium dendrobatidis. It was published in Amphibiaweb (1) by Vredenburgh and Briggs, and is here reproduced in a slightly condensed format. 1. Use swabs as described below for buccal swabbing of amphibians and reptiles (Medical Wire and Equipment, catalog number MW113), and as vials, sterilized screw cap 1.5 ml microcentrifuge tubes. 2. Preferably, capture amphibians by hand. Wear gloves when swabbing animals and change gloves between animals. If you are using a dip net, be aware that B. dendrobatidis zoospores could be caught on the net and transferred between individuals; therefore, use different nets whenever possible or disinfect the net as often as you can (there is no perfect solution to this problem). 3. Swab the underside or ventrum of adult/metamorphs 30 times. Remember you are in effect scraping small amounts of tissue from the skin. Some pressure must be applied, but this does not mean that you must squash the animal. Areas to target are the drink patch, thighs, and webbing between the toes. 4. Air dry the swab for approximately 5 min, avoid direct sunlight if possible (if conditions are too humid to air dry then store in 95% EtOH). 5. Break swab ~3 cm from tip and drop into empty screw cap tube. The swab stick should not touch or bump against the top of the vial. Screw the cap on the vial and store in the shade. 6. Samples can be kept at room temperature for a week or maybe longer, but it is best to keep the samples cool and placed as soon as possible in a 4°C freezer. Avoid extreme high temperature and direct sunlight. Samples may be stored in a freezer for many months without problems.
5
3.6. Tissue Sampling and DNA Extraction After (36) (see Also Notes 9 and 10)
DNA Barcoding Amphibians and Reptiles
97
Tissue sampling for DNA barcoding amphibians and reptiles is similar to any sampling procedure for genetic analysis, and the protocol given here is one example of such procedures (see Note 3). Use appropriate dissecting tools such as scalpel, scissors, pins, or forceps to remove a solid tissue sample from the specimen. Collecting buccal saliva swabs of larger specimens can be done with small sterile swabs, such as those available from Medical Wire and Equipment, http://mwe-usa.com/mwe/mwe.php, product code MW113. Use bleach or a flame to sterilize your tools before each sampling. Storage vials labeled with unique laser-etched 2D barcode (e.g., ABgene 2D Barcoded 2 ml Screw Cap Tubes—Thermo Scientific) can be used to securely identify and track samples. 1. Cut a small piece of tissue of about 1 mm3 from the original sample and put it into a clean 1.5-ml microtube. Add 410 ml extraction buffer (see Subheading 2.1), 80 ml 10% SDS, and 10 ml proteinase K solution. 2. Let the sample incubate at 37°C in the thermo mixer (up to 65°C to accelerate the lysis process). After the tissue is completely dissolved, vortex the sample. Depending on the fixation of the tissue, this may take between 3 and 12 h (see Note 10). 3. Centrifuge tubes with dissolved tissues for 5 min at maximum speed (typically 8000–15000 × g force), then transfer the supernatant into a new 1.5 ml tube and add 180 ml NaCl (5 M). Vortex for at least 30 s. 4. Centrifuge for 5 min at maximum speed. Transfer the supernatant quickly into a clean 1.5 ml tube and quickly add 420 ml cold Isopropanol (mix by gently inverting the tube several times). Centrifuge for 5 min at maximum speed, to pellet the DNA, then discard the supernatant. Now you will have your DNA pellet (may be not visible) at the bottom of the tube. Wash the pellet by adding 250 ml 80% EtOH, mix by gently inverting the tube several times, and centrifuge 5 minutes at full speed. 5. Repeat the EtOH washing from step 4 once. 6. Carefully remove all alcohol and let the pellet dry for 20 min in a thermoblock at 37°C, after opening the lids of the microcentrifuge tubes. 7. Rehydrate the DNA by adding 100 ml ddH2O (or in a DNA elution buffer), and leave it overnight at room temperature. Freeze the DNA at –20°C. 8. Quantity and purity of the extracted DNA can be evaluated with a spectrophotometer (e.g., NanoDrop ND-1000 spectrophotometer, NanoDrop Technologies) where only a small
98
M. Vences et al.
amount (1–2 ml) of the DNA extract is needed. Potential degradation of the DNA can be visualized with agarose gel electrophoresis (3–5 ml DNA solution is loaded on an 1.2% agarose gel, and stained with GelRed or ethidium-bromide). 3.7. PCR and Sequencing of the COI Fragment (see Notes 11 and 12)
1. For each 10 ml PCR mix, the following: 6.44 ml dH2O, 2.0 ml 5× reaction buffer, 0.2 ml 10 mM dNTPs, 0.24 ml each of 10 mM forward and reverse primer, 0.08 ml Taq, and 0.8 ml DNA. 2. The typical profile of the PCR consists of an initial denaturation of the DNA at 94°C for 3–5 min, 35–40 cycles of denaturation at 94°C for 30–45 s, annealing for 30–45 s and extension at 72°C for 60–90 s, and a final extension at 72°C for 7 min. 3. The annealing temperature depends on the primers and can vary among the cycles if a touchdown protocol is used (Table 1, see also Note 12). Some authors have cloned the obtained PCR products before sequencing, but usually direct amplification, and direct sequencing with the same primers, is sufficient and is the routine method applied in most cases.
3.8. Purification of the PCR Products
Purification can be performed rapidly and cost-efficiently on a special 96-well plate using the NucleoFast 96 PCR Plate (MachereyNagel) and the NucleoVac 96 Vacuum manifold (Macherey-Nagel). Alternatively, a very fast and cost-effective purification can be performed using the EXO-SAP method that uses shrimp alkaline phosphatase and EXO-nuclease 1 (e.g., available through Promega). The following is a recipe to generate 6.5 ml cleaned product. If more or less product is needed, volumes need to be adjusted accordingly (e.g., if only 0.5–2.5 ml of the product are needed per sequencing reaction, cut the recipe by half ). 1. Per sample use 1.125 U SAP (Shrimp Alkaline Phosphatase— Promega), 1.2 U EXO-nuclease I (NEB), and dH2O to make up 1.5 ml. 2. Make a master mix for all PCR products you want to purify (or more to prepare a stock). 3. Fill 5 ml per PCR product into 0.2 ml (strip) tubes. 4. Add 1.5 ml Exo-SAP master mix to each tube with 5 ml PCR product. 5. Centrifuge briefly. 6. Place tubes in a thermocycler: 15 min at 37°C followed by 15 min at 80°C (to destroy the enzyme). 7. The samples are now purified, and ready for use as sequencing reaction template.
5
3.9. Usage of the 16S rRNA Gene as Complementary Barcoding Marker (see Note 13)
DNA Barcoding Amphibians and Reptiles
99
COI has been well established as the universal DNA barcoding marker to be used and we endorse completing the reference database of this gene in amphibians and reptiles. As a complement, and an alternative for specific barcoding applications, a specific fragment of the mitochondrial 16S rRNA gene has been proposed (7, 8). This gene fragment has three advantages that can be decisive in some cases: (a) the existing reference database for amphibians and reptiles is much more taxon-rich; (b) while for COI often species-specific primers need to be developed (e.g., refs. 37, 38), truly universal primers exist for 16S; and (c) because the phylogenetic signal in the gene is stronger, even with incomplete taxon sampling in the reference database, identification of species to higher clades (families, genera, and subgenera) is more reliable. The most comprehensive and taxon-rich studies published by various researcher teams to reconstruct the amphibian tree of life have been based at least partly on mitochondrial genes (39–44). However they did not include COI as target gene but instead included fragments or complete sequences of the 16S rRNA and 12S rRNA genes. The total number of amphibian 16S sequences in GenBank is therefore considerably higher than that of COI sequences, although in recent times, COI datasets have been produced for several taxon-rich studies (10). The situation is similar for reptiles although 12S and 16S are less universally used in taxonrich phylogenetic studies. These often employ ND1, ND2, ND4, and cytochrome b. In general, many more amphibian and reptile taxa have been sequenced for 16S rRNA which for some purposes (such as identification of bushmeat that might originate from amphibians and reptiles or from other vertebrates) may be a suitable complementary marker, at least until a more comprehensive COI reference database becomes available. Several universal primers for the 12S and 16S rRNA genes exist (e.g., refs. 45, 46) of which one pair has been used with particular frequency in amphibian and, to a lesser extent, reptile studies (e.g., refs. 10, 29, 32, 47), and has been recommended for DNA barcoding (7, 8). The primers 16SrA-L (5¢-CGC CTG TTT ATC AAA AAC AT-3¢) and 16SrB-H (5¢-CCG GTC TGA ACT CAG ATC ACG T-3¢), which often are named 16SA-L and 16SB-H, amplify a 500–550 bp stretch of the 3¢ portion of 16S reliably across vertebrates, from fishes to mammals, with low failure rates in most taxa. An annealing temperature of 55°C is recommended but the primer pair will work with a large array of PCR protocols. In amphibians, observed failure rates were 1.3% (10) and 0% (8). This short fragment of 16S furthermore performs better to reconstruct relationships among deep lineages of vertebrates (8). Therefore it is more reliable to assign species that are not represented in the COI database to the correct major lineages. However, as the COI reference database is becoming more complete, this disadvantage of COI will disappear in due time.
100
M. Vences et al.
4. Notes 1. Especially in amphibians, the age of species in many genera will easily be of several million years, and even conspecific lineages are often highly divergent. On the one hand, this can lead to failure even of specific primers when previously unknown deep lineages are sampled. On the other hand, it also implies that successful species assignment of sequences will strongly depend on the completeness of the reference database (7, 8, 47). 2. One PCR protocol for COI in amphibians (48) also included cutting of bands from agarose gels and resuspending these for a second round of PCR. 3. Extraction of total genomic DNA from amphibian and reptile samples does not require specific methods. Any other DNA extraction protocol used in other vertebrates (e.g., fish and birds) may be suitable as well. A variety of methods (salt, phenol–chloroform, and Chelex) have been successfully used to extract DNA from amphibian and reptile tissue samples. Alternatively, it is possible to use spin columns containing silica membranes that are able to reversibly bind DNA by differential affinity. This technique is fast, easy, and provides a clean product that is well suitable for further applications like PCR and sequencing. Commercial kits such as the NucleoSpin® Tissue kit (Macherey-Nagel) or the DNeasy® Blood and Tissue kit (Qiagen) provide all components for the lysis of the tissue samples and the subsequent extraction of genomic DNA. DNA extraction protocols using spin columns with silica membranes typically include four steps: tissue lysis, DNA precipitation and binding to the silica membrane, washing, and elution of the isolated DNA (for a more detailed protocol, see the respective manufacturer’s protocol). 4. Typically, a PCR mix (e.g., total volume of 25 ml) consists in (final concentration): Taq DNA polymerase (0.03 U/ml), PCR buffer 1× containing MgCl2 (1.5 mM), dNTPs (0.2 mM each), forward and reverse primers (0.2 mM each), and DNA (ca. 20–80 ng) and ultrapure water (dH2O; ad 25 ml). For routine amplification and large series, reaction volumes can be decreased to 10 ml. 5. The larvae of frogs (tadpoles) are often encountered in large quantities and densities in water bodies, especially in speciesrich tropical frog communities. For many frog species, the tadpoles are unknown and have not been morphologically described. Furthermore, tadpoles of related species can be externally similar to each other while the unknown amount of variability through phenotypic plasticity will complicate matters even more. Field identification of tadpoles is therefore often
5
DNA Barcoding Amphibians and Reptiles
101
difficult and unreliable. We therefore suggest using DNA barcoding as a strategy to identify tadpoles to species. Because of the high numbers of individuals that are usually collected, DNA barcoding of each individual is often not possible. 6. We reiterate that proper documentation (including photo documentation of the specimens) of the whole DNA barcoding process is of central importance (see Meyer et al. on LIMS, this volume). 7. In general, during field trips that successively lead to various geographically separated areas, care should be taken to avoid spreading diseases of amphibians and reptiles, and of course also of other animals, among sites and populations. In general, all equipment and clothing should be reasonably cleaned to avoid introducing pathogens as well as seeds of any nonnative organisms, especially invasive plants or insects. In amphibians, these general precautionary measures need to be intensified because of the worldwide spreading of an emergent infectious disease, chytridiomycosis, caused by the amphibian chytrid fungus, Batrachochytrium dendrobatidis. This pathogen has already led to dramatic declines and extinctions (33), and researchers working with amphibians must follow strict safety protocols to avoid its further spreading. We here summarize (a) in Section 3.4 a general hygiene protocol for amphibian studies that will avoid the spread of chytrid fungus as well as other amphibian pathogens and (b) in Section 3.5 a protocol to collect samples from amphibian specimens in the wild which subsequently can be tested for chytrid infection using molecular methods. For the latter, typically real-time PCR (34) rather than COI barcoding is used, and we will not further describe the lab methods here. 8. The information and recommendations to avoid spreading of amphibian diseases, especially chytridiomycosis and ranaviruses, among amphibian populations are excerpts, with few modifications, from a hygiene protocol developed by Speare et al. (35). Handling of amphibians should be done in a manner that does not significantly increase their risks of exposure to infectious disease above those normally experienced in the absence of handling. People handling amphibians should not be expected to reduce risks below the natural level for those amphibians. However, current data do not indicate that scientific activities have so far played a significant role in the transmission of chytridiomycosis or other pathogens of amphibians in the wild in Australia or any other country. There is no evidence that the amphibian chytrid fungus or other pathogens of amphibians have been transmitted between water catchments by vehicles, footwear, or clothing.
102
M. Vences et al.
As the amphibian chytrid fungus is extremely sensitive to temperatures above 29°C and will die at 32°C, it will not grow on human skin. Ranaviruses, the other major pathogen of amphibians, also show sensitivity to temperature, being unable to grow above 33°C. Complete drying will kill the amphibian chytrid fungus, but will not kill ranaviruses. The greatest risk of transmission of infectious agents is when amphibians are placed together in contact or in the same container or in containers reused for holding amphibians without disinfection between specimens. The practical recommendations in section 3.4 above are especially relevant when traveling between localities or even between countries. 9. The DNA extraction protocol presented here is robust against variation of numerous parameters, such as the amount of tissue (which can be distinctly less or at least up to five times more than recommended) or the relative centrifugal force. If final DNA concentration is too high, it can be a source of PCR problems; in this case, try to dilute DNA by 1–3 orders of magnitude. 10. For DNA extraction of samples that are very small (less than 5 mg) and/or old (more than 10 years) and/or mainly consisting of hard tissue (e.g., scales of reptiles), the lysis period could be prolonged (up to 48 h) and the elution volume reduced (ca. 70 ml). For older specimens (dry samples like turtle or tortoise shells or nontreated skins), a dedicated ancient DNA lab and verification of the results at least in one additional lab is strongly advised. Protocols for extracting, amplifying, and sequencing true ancient DNA, not restricted to amphibians and reptiles, are a burgeoning field of study with numerous new and recent developments, and will not be discussed here. 11. So far, no truly universal COI primers for either amphibians or reptiles have been proposed, and due to the deep divergences among species and higher taxa in these groups, it is unlikely that such primers can be ever developed (7, 8). In amphibians, failure rates of 50–70% of individual primer pairs have been observed (8) which however were not designed for amphibians. By contrast, our own experiences with Malagasy squamate reptiles yielded rather low failure rates of about 15% for a single— newly designed and degenerated—primer pair (RepCOI-F, RepCOI-R; see Table 2). In general, failure rates can be strongly reduced by using primer cocktails or by using a hierarchical approach in which first all samples are amplified with the most promising primer pair, and the ones that failed are subsequently amplified with the next alternative primer pair, and so on, for at least 3–4 primer pairs if the samples are important. 12. The most comprehensive COI barcoding study in amphibians to date (10), with 300 individuals corresponding to 63 named species and various undescribed candidate species, used a mix
5
DNA Barcoding Amphibians and Reptiles
103
of primers and reported a failure rate of 7.7% for the COI primers. To overcome the problem of failure or poor performance of universal primers, several authors of phylogeographic studies have followed a strategy to design species-specific or genus-specific primers, either on the basis of previous sequences obtained by more universal primers or on available full mitochondrial genome sequences (e.g., refs. 37, 38). This approach could also be useful in DNA barcoding studies that target limited numbers of closely related species. 13. One possible major shortcoming of DNA barcoding with the 16S gene and universal primers might be a particular sensitivity of the PCR to primer or template DNA degradation and other artifacts. In several labs, we have observed that the universal primer pair suggested here, although of high reliability, can suddenly switch to producing PCRs with unspecific fragment lengths (visible as smear rather than sharp bands on agarose gels). This problem often persisted despite usage of newly synthesized primers and fresh PCR reagents and might relate to degradation of DNA (of the primers or of the template DNA) in which case this primer pair might be particularly prone to amplify fragments of different length. In some cases, this problem can be alleviated by strongly diluting the template DNA. Although we have occasionally encountered similar problems with protein-coding genes or more specific 16S primers, it occurred more frequently with universal primers. In forensic studies, such as the analysis of bushmeat, which might originate from an amphibian or reptile species but could also be from a mammal, bird, or fish, it probably is more advisable to use a primer cocktail for COI and complement this with a second PCR using the universal vertebrate primers for 16S (see below), given that very large comparative data are also available for this mitochondrial fragment for basically all vertebrate groups. 14. While this book was in press, two new pairs of COI primers specifically aimed at amphibians were published (84). These primers, Chmf4 (TYT CWA CWA AYC AYA AAG AYA TCG G) / Chmr4 (ACY TCR GGR TGR CCR AAR AAT CA), and COIC02 (AYT CAA CAA ATC ATA AAG ATA TTG G) / COI-C04 (ACY TCR GGR TGA CCA AAA AAT CA), will likely facilitate DNA barcoding campaigns in amphibians as they had high success rate in many disparate taxonomic groups of amphibians.
Acknowledgment We thank the Belgian Science Policy Office for supporting the Joint Experimental Molecular Unit.
104
M. Vences et al.
References 1. AmphibiaWeb (2010) Information on amphibian biology and conservation [Web application]. AmphibiaWeb, Berkeley, CA. http:// www.amphibiaweb.org/. Accessed 1 Feb 2012 2. Uetz P, Goll J, Hallermann J (2010) The reptile database [web application]. http://www. reptile-database.org/. Accessed 1 Nov 2010 3. Vargas SM, Araújo FCF, Santos FR (2009) DNA barcoding of Brazilian sea turtles (Testudines). Genet Mol Biol 32:608–612 4. Naro-Maciel E, Reid B, Fitzsimmons NN et al (2010) DNA barcodes for globally threatened marine turtles: a registry approach to documenting biodiversity. Mol Ecol Res 10:252–263 5. Eaton MJ, Meyers GL, Kolokotronis SO et al (2010) Barcoding bushmeat: molecular identification of Central African and South American harvested vertebrates. Conserv Genet 11:1389–1404 6. Eaton MJ, Martin A, Thorbjarnarson J, Amato G (2009) Species-level diversification of African dwarf crocodiles (Genus Osteolaemus). A geographic and phylogenetic perspective. Mol Phylogenet Evol 50:496–506 7. Vences M, Thomas M, Bonett RM, Vieites DR (2005) Deciphering amphibian diversity through DNA barcoding: chances and challenges. Phil Trans R Soc Lond 360:1859–1868 8. Vences M, Thomas M, van der Meijden A et al (2005) Comparative performance of the 16S rRNA gene in DNA barcoding of amphibians. Front Zool 2:5 9. Smith MA, Poyarkov NA Jr, Hebert PDN (2008) CO1 DNA barcoding amphibians: take the chance, meet the challenge. Mol Ecol Res 8:235–246 10. Crawford AJ, Lips KR, Bermingham E (2010) Epidemic disease decimates amphibian abundance, species diversity, and evolutionary history in the highlands of central Panama. Proc Natl Acad Sci USA 107:13777–13782 11. Padial JM, de la Riva I (2007) Integrative taxonomists should use and produce DNA barcodes. Zootaxa 1586:67–68 12. Padial JM, Miralles A, de la Riva I, Vences M (2010) The integrative future of taxonomy. Front Zool 7:e16 13. Kozak KH, Larson A, Bonett RM, Harmon LJ (2005) Phylogenetic analysis of ecomorphological divergence, community structure, and diversification rates in dusky salamanders, Desmognathus. Evolution 59:2000–2016 14. Weisrock DW, Papenfuss TJ, Macey JR et al (2006) A molecular assessment of phylogenetic relationships and lineage accumulation rates
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
within the family Salamandridae (Amphibia, Caudata). Mol Phylogenet Evol 41:368–383 Kumazawa Y, Endo H (2004) Mitochondrial genome of the Komodo dragon: efficient sequencing method with reptile-oriented primers and novel gene rearrangements. DNA Res 11:115–125 Kumazawa Y (2004) Mitochondrial DNA sequences of five squamates: phylogenetic affiliation of snakes. DNA Res 11:137–144 Dong S, Kumazawa Y (2005) Complete mitochondrial DNA sequences of six snakes: phylogenetic relationships and molecular evolution of genomic features. J Mol Evol 61:12–22 Parham JF, Feldman CR, Boore JL (2006) The complete mitochondrial genome of the enigmatic bigheaded turtle (Platysternon): description of unusual genomic features and the reconciliation of phylogenetic hypotheses based on mitochondrial and nuclear DNA. BMC Evol Biol 6:11 Macey JR, Kuehl JV, Larson A et al (2008) Socotra Island the forgotten fragment of Gondwana: unmasking chameleon lizard history with complete mitochondrial genomic data. Mol Phylogenet Evol 49:1015–1018 Okajima Y, Kumazawa Y (2009) Mitogenomic perspectives into iguanid phylogeny and biogeography: Gondwanan vicariance for the origin of Madagascan oplurines. Gene 441:28–35 Okajima Y, Kumazawa Y (2010) Mitochondrial genomes of acrodont lizards: timing of gene rearrangements and phylogenetic and biogeographic implications. BMC Evol Biol 10:141 Mueller RL, Macey JR, Jaekel M et al (2004) Morphological homoplasy, life history evolution, and historical biogeography of plethodontid salamanders inferred from complete mitochondrial genomes. Proc Natl Acad Sci USA 101:13820–13825 San Mauro D, Gower DJ, Oommen OV et al (2004) Phylogeny of caecilian amphibians (Gymnophiona) based on complete mitochondrial genomes and nuclear RAG1. Mol Phylogenet Evol 33:413–427 Zhang P, Papenfuss TJ, Wake MH et al (2008) Phylogeny and biogeography of the family Salamandridae (Amphibia: Caudata) inferred from complete mitochondrial genomes. Mol Phylogenet Evol 49:586–597 Zhang P, Zhou H, Chen YQ, Liu YF, Qu LH (2005) Mitogenomic perspectives on the origin and phylogeny of living amphibians. Syst Biol 54:391–400
5 26. Kurabayashi A, Sumida M (2009) PCR primers for the Neobatrachian mitochondrial genome. Curr Herpetol 28:1–11 27. Goebel AM, Donnelly J, Atz M (1999) PCR primers and amplification methods for the 12S ribosomal DNA, cytochrome oxidase I, cytochrome b, the control region in bufonids and other frogs and an overview of PCR primers available for analyses of amphibians. Mol Phylogenet Evol 11:163–199 28. Folmer OM, Black W, Hoeh R et al (1994) DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol Mar Biol Biotechnol 3:294–299 29. Vieites DR, Wollenberg KC, Andreone F et al (2009) Vast underestimation of Madagascar’s biodiversity evidenced by an integrative amphibian inventory. Proc Natl Acad Sci USA 106:8267–8272 30. Fouquet A, Gilles A, Vences M et al (2007) Underestimation of species richness in Neotropical frogs revealed by mtDNA analyses. PLoS One 2:e1109 31. Nagy ZT (2010) A hands-on overview of tissue preservation methods for molecular genetic analyses. Org Divers Evol 10:91–105 32. Strauss A, Reeve E, Randrianiaina RD et al (2010) The world’s richest tadpole communities show functional redundancy and low functional diversity: ecological data on Madagascar’s streamdwelling amphibian larvae. BMC Ecol 10:12 33. Wake DB, Vredenburg VT (2008) Are we in the midst of the sixth mass extinction? A view from the world of amphibians. Proc Natl Acad Sci USA 105:11466–11473 34. Boyle DG, Boyle DB, Olsen V et al (2004) Rapid quantitative detection of chytridiomycosis (Batrachochytrium dendrobatidis) in amphibian samples using real-time Taqman PCR assay. Dis Aquat Organ 60:141–148 35. Speare RL, Berger LF, Skerratt RA et al (2004) Hygiene protocol for handling amphibians in field studies. Amphibian Disease Group, James Cook University, Townsville, Australia. http:// www.jcu.edu.au/school/phtm/PHTM/ frogs/field-hygiene.pdf 36. Bruford MW, Hanotte O, Brookfield JFY, Burke T (1992) Single-locus and multilocus DNA fingerprint. In: Hoelzel AR (ed) Molecular genetic analysis of populations: a practical approach. IRL Press, Oxford, pp 225–270 37. Burns EL, Eldridge MDB, Crayn DM, Houlden BA (2007) Low phylogeographic structure in a widespread endangered Australian frog Litoria aurea (Anura: Hylidae). Cons Genet 8:17–32
DNA Barcoding Amphibians and Reptiles
105
38. Zheng Y, Fu J, Li S (2009) Toward understanding the distribution of Laurasian frogs: a test of Savage’s biogeographical hypothesis using the genus Bombina. Mol Phylogenet Evol 52:70–83 39. Darst CR, Cannatella DC (2004) Novel relationships among hyloid frogs inferred from 12S and 16S mitochondrial DNA sequences. Mol Phylogenet Evol 31:462–475 40. Frost DR, Grant T, Faivovich J et al (2006) The amphibian tree of life. Bull Am Mus Nat Hist 297:1–370 41. Heinicke MP, Duellman WE, Hedges SB (2007) Major Caribbean and Central American frog faunas originated by ancient oceanic dispersal. Proc Natl Acad Sci USA 104:10092–10097 42. Wollenberg KC, Vieites DR, van der Meijden A et al (2008) Patterns of endemism and species richness in Malagasy cophyline frogs support a key role of mountainous areas for speciation. Evolution 62:1890–1907 43. Santos JC, Coloma LA, Summers K et al (2009) Amazonian amphibian diversity is primarily derived from late Miocene Andean ancestors. PLoS Biol 7:1–14 44. Van Bocxlaer I, Loader SP, Roelants K et al (2010) Gradual adaptation toward a rangeexpansion phenotype initiated the global radiation of toads. Science 327:679–682 45. Palumbi S, Martin A, Romano S et al (1991) The simple fool’s guide to PCR. Department of Zoology, University of Hawaii, Hawaii 46. Kessing B, Croom H, Martin A et al (1989) The simple fool’s guide to PCR. Department of Zoology, University of Hawaii, Hawaii 47. Nielsen R, Matz M (2006) Statistical approaches for DNA barcoding. Syst Biol 55:162–169 48. Weigt LA, Crawford AJ, Rand AS, Ryan MJ (2005) Biogeography of the túngara frog, Physalaemus pustulosus: a molecular perspective. Mol Ecol 14:3857–3876 49. Crawford AJ, Bermingham E, Polanía SC (2007) The role of tropical dry forest as a longterm barrier to dispersal: a comparative phylogeographical analysis of dry forest tolerant and intolerant frogs. Mol Ecol 16:4789–4807 50. Cannatella DC, Hillis DM, Chippindale PT et al (1998) Phylogeny of frogs of the Physalaemus pustulosus species group, with an examination of data incongruence. Syst Biol 47:311–335 51. Wang IJ, Shaffer HB (2008) Rapid color evolution in an aposematic species: a phylogenetic analysis of color variation in the strikingly polymorphic strawberry poison-dart frog. Evolution 62:2742–2759
106
M. Vences et al.
52. Ward RD, Zemlak TS, Innes BH et al (2005) DNA barcoding Australia’s fish species. Phil Trans R Soc Lond B 360:1847–1857 53. Du Preez LH, Kunene N, Hanner R et al (2009) Population-specific incidence of testicular ovarian follicles in Xenopus laevis from South Africa: a potential issue in endocrine testing. Aquat Toxicol 95:10–16 54. Ivanova NV, Zemlak TS, Hanner RH, Hebert PDN (2007) Universal primer cocktails for fish DNA barcoding. Mol Ecol Notes 7:544–548 55. Ivanova NV, deWaard JR, Hebert PDN (2006) An inexpensive automation-friendly protocol for recovering high-quality DNA. Mol Ecol Notes 6:998–1002 56. Hebert PDN, Penton EH, Burns JM et al (2004) Ten species in one: DNA barcoding reveals cryptic species in the Neotropical skipper butterfly Astraptes fulgerator. Proc Natl Acad Sci USA 101:14812–14817 57. Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM (2004) Identification of birds through COI DNA barcodes. PLoS Biol 2:1–7 58. Beamer DA, Lamb T (2008) Dusky salamanders (Desmognathus, Plethodontidae) from the Coastal Plain: multiple independent lineages and their bearing on the molecular phylogeny of the genus. Mol Phylogenet Evol 47:143–153 59. Schneider CJ, Cunningham M, Moritz C (1998) Comparative phylogeography and the history of endemic vertebrates in the Wet Tropics rainforests of Australia. Mol Ecol 7:487–498 60. James CH, Moritz C (2000) Intraspecific phylogeography in the sedge frog Litoria fallax (Hylidae) indicates pre-Pleistocene vicariance of an open forest species from eastern Australia. Mol Ecol 9:349–358 61. Hoskin CJ, Higgie MA, McDonald KR, Moritz C (2005) Reinforcement drives rapid allopatric speciation. Nature 437:1353–1356 62. McGuigan K, McDonald K, Parris K, Moritz C (1998) Mitochondrial DNA diversity and historical biogeography of a wet forest restricted frog (Litoria pearsoniana) from mid-east Australia. Mol Ecol 7:175–186 63. Hawkins MA, Sites JW Jr, Noonan BP (2007) Dendropsophus minutus (Anura: Hylidae) of the Guiana Shield using DNA barcodes to assess identity and diversity. Zootaxa 1540:61–67 64. Feldman CR, Parham JF (2004) Molecular systematics of Old World stripe-necked turtles (Testudines: Mauremys). As Herp Res 1:28–37 65. Kasapidis P, Magoulas A, Mylonas M, Zouros E (2005) The phylogeography of the gecko Cyrtopodion kotschyi (Reptilia: Gekkonidae) in the Aegean archipelago. Mol Phylogenet Evol 35:612–623
66. Kyriazi P, Poulakakis N, Parmakelis A et al (2008) Mitochondrial DNA reveals the genealogical history of the snake-eyed lizards (Ophisops elegans and O. occidentalis) (Sauria: Lacertidae). Mol Phylogenet Evol 49:795–805 67. Simon C, Frati F, Beckenbach A et al (1994) Evolution, weighting and phylogenetic utility of mitochondrial gene sequences and a compilation of polymerase chain reaction primers. Ann Entomol Soc Am 87:651–701 68. Frost DR, Crafts HM, Fitzgerald LA, Titus TA (1998) Geographic variation, species recognition, and molecular evolution of cytochrome oxidase I in the Tropidurus spinulosus complex (Iguania: Tropiduridae). Copeia 1998:839–851 69. Georges A, Birrell J, Saint KM et al (1998) A phylogeny for side-necked turtles (Pleurodira) based on mitochondrial gene sequence variation. Biol J Linn Soc 67:213–246 70. Daniels SR, Heideman NJL, Hendricks MGJ, Crandall KA (2006) Taxonomic subdivisions within the fossorial skink subfamily Acontinae (Squamata: Scincidae) reconsidered: a multilocus perspective. Zool Scripta 35:353–362 71. Daniels SR, Hofmeyr MD, Henen BT, Crandall KA (2007) Living with the genetic signature of Miocene induced change: evidence from the phylogeographic structure of the endemic angulate tortoise Chersina angulata. Mol Phylogenet Evol 45:915–926 72. Daniels SR, Heideman NJL, Hendricks MGJ (2009) Examination of evolutionary relationships in the Cape fossorial skink species complex (Acontinae: Acontias meleagris meleagris) reveals the presence of five cryptic lineages. Zool Scripta 38:449–463 73. Venegas-Anaya M, Crawford AJ, Escobedo Galván AH et al (2008) Mitochondrial DNA phylogeography of Caiman crocodilus in Mesoamerica and South America. J Exp Zool 309A:614–627 74. Stenson AG, Thorpe RS, Malhotra A (2004) Evolutionary differentiation of bimaculatus group anoles based on analyses of mtDNA and microsatellite data. Mol Phylogenet Evol 32:1–10 75. Makowsky R, Marshall JC Jr, McVay J et al (2010) Phylogeographic analysis and environmental niche modeling of the plain-bellied watersnake (Nerodia erythrogaster) reveals low levels of genetic and ecological differentiation. Mol Phylogenet Evol 55:985–995 76. Austin CC, Spataro M, Peterson S et al (2010) Conservation genetics of Boelen’s python (Morelia boeleni) from New Guinea: reduced genetic diversity and divergence of captive and wild animals. Cons Genet 11:889–896 77. Nagy ZT, Sonet G, Glaw F, Vences M (2012) First large-scale DNA barcoding assessment of
5 reptiles in the biodiversity hotspot of Madagascar, based on newly designed COI primers. PLoS ONE 7:e34506 78. Parham JF, Stuart BL, Bour R, Fritz U (2004) Evolutionary distinctiveness of the extinct Yunnan box turtle (Cuora yunnanensis) revealed by DNA from an old museum specimen. Proc R Soc Lond B 271:S391–S394 79. Stuart BL, Parham JF (2004) Molecular phylogeny of the critically endangered Indochinese box turtle (Cuora galbinifrons). Mol Phylogenet Evol 31:164–177 80. Passoni JC, Benozzati ML, Rodrigues MT (2008) Phylogeny, species limits, and biogeography of the Brazilian lizards of the genus Eurolophosaurus (Squamata: Tropiduridae) as inferred from mitochondrial DNA sequences. Mol Phylogenet Evol 46:403–414
DNA Barcoding Amphibians and Reptiles
107
81. Zhou K, Wang Q (2008) New species of Gekko (Squamata: Sauria: Gekkonidae) from China: morphological and molecular evidence. Zootaxa 1778:59–68 82. Schätti B, Utiger U (2001) Hemerophis, a new genus for Zamenis socotrae Günther, and a contribution to the phylogeny of Old World racers, whip snakes and related genera (Reptilia: Squamata: Colubrinae). Rev Suisse Zool 108:919–948 83. Utiger U, Helfenberger N, Schätti B et al (2002) Molecular systematics and phylogeny of Old and New World ratsnakes, Elaphe auct., and related genera (Reptilia, Squamata, Colubridae). Russ J Herpetol 9:105–124 84. Che J, Chen HM, Yang JX et al (2012) Universal COI primers for DNA barcoding amphibians. Mol Ecol Resour 12:247–258
Chapter 6 DNA Barcoding Fishes Lee A. Weigt, Amy C. Driskell, Carole C. Baldwin, and Andrea Ormos Abstract This chapter is an overview of the techniques for DNA barcoding of fishes from field collection to DNA sequence analysis. Recommendations for modifications of field protocols and best tissue sampling practices are made. A variety of DNA extraction protocols is provided, including high-throughput robot-assisted methods. A pair of well-tested forward and reverse primers for PCR amplification and sequencing are presented. These primers have been successfully used for DNA barcode on a wide array of marine fish taxa and also work well in most freshwater and cartilaginous fishes. Recipes and cycling protocols for both PCR amplification and sequencing and cleanup methods for the reaction products are provided. A method for the consistent production of high-quality DNA barcodes from DNA sequence data is given and stringent guidelines for judging the quality of raw sequence data are laid out. Key words: DNA barcoding, Fish, PCR, DNA extraction
1. Introduction Fishes are the largest and most diverse class of vertebrates and fortunately among the easiest groups for which to generate DNA barcode data. Estimates of species numbers generally exceed 30,000 (1), with about 300 new species described each year. Commercial fisheries value estimates exceed US$200 billion, and efforts in our lab have contributed to standardized protocols for fish barcoding (2, 3), other standardized field surveys and bioassessments (4–6), and testing standardized storage methods and ethanol-recycling instrumentation for contaminant carryover (7, 8). Eggs, larvae, and juvenile fish can be difficult to identify to species using morphology alone, and we have promoted DNA barcoding as a tool to assist all these efforts (9–11). We find that the correct taxonomy can be illuminated in complex and morphologically challenging species groups by combining comparative investigation of living color patterns, examination of traditional
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_6, © Springer Science+Business Media, LLC 2012
109
110
L.A. Weigt et al.
morphological characters in preserved specimens, and DNA barcodes (12, 13). Efforts in our lab have focused on bony marine fish, but we have used these methods on both freshwater and cartilaginous fish, though there are other larger barcoding campaigns for both groups (14). We have focused on full-length barcode sequences, but, based on the success of others (15), will be starting “mini-barcode” work shortly on major collections of scientific importance collected 50+ years ago. We have found high-resolution digital images of freshly captured specimens to be extremely useful in elucidating species in taxonomically challenging species groups, and each of our specimens is photographed as soon as possible after collection and before preservation. Though it is possible to work with many types of fish tissue including scales, fin clips, blood, buccal swabs, bones, and others (16, 17), our preferred starting material is a small muscle biopsy from a fresh specimen. Since we photograph the fish (and subsequently examine much of the morphology) from the left side (fish “looking” to the left, in lateral view), we remove the biopsy from the right side and try to minimize damage to any important morphological characters. DNA extraction protocols vary by lab, but with fresh tissue from fish, almost all extraction protocols will yield amplifiable DNA. Significant value can be added to a specimen by performing archival quality DNA extractions, though these are more expensive than some alternative extraction methods (e.g., Chelex) that do not produce extracts viable for a great length of time. Some newer extraction protocols have emerged from the Guelph Barcoding lab (18), but as they are newer, the extracts have not been tested for longevity in archival biobanks. The “gold standard” for DNA extractions from animals is the phenol:chloroform method (19), but if performed manually, it has some critical drawbacks, including handling and disposing of hazardous chemicals. We present here our primary method—an automated version of the phenol–chloroform protocol that reduces the problematic factors. However, as not everyone will be able to afford the instrument on which this protocol is performed, we also present a manual method and other alternatives. These include a smaller instrument which utilizes a magnetic bead-based approach and a common filter membrane-based kit. Amplification of fish DNA via PCR is rarely problematic, and follows standard protocols. The primers are very robust, amplifying almost all taxa tested to date. The same is true for DNA sequencing. Finally, as much of fish biodiversity is known, we have been able to generate a reference sequence. This reference (with degenerate bases to represent variation among fish taxa) eases the alignment and analysis of fish data. We list specific brands used in our lab, but acceptable equivalents usually exist, for both consumables and many instruments.
6
DNA Barcoding Fishes
111
2. Materials Materials common to all laboratory steps include latex/nitrile gloves, as well as pipettes with disposable tips, both filtered and nonfiltered. In addition, use of a centrifuge with a rotor that can accommodate a microtiter plate and is capable of speeds greater than 1,000 rcf, and a thermal cycler, is required for many steps. 2.1. Sample Collection
1. Photo documentation: Digital camera, scale bar, color scale. 2. Tissue sampling: Scalpels, tweezers, tubes (Matrix-brand 2D labeled or other Matrix tubes or 2.0-ml cryovials), handheld barcode scanner, data-recording materials (computer with spreadsheet when possible), bleach or Decon ELIMINase, alcohol burner.
2.2. Tissue Storage
1. Tissue preservation buffer (21): 0.25 M EDTA, 25% DMSO, saturated with NaCl—500 ml DMSO, 1 L 0.5 M EDTA pH 8.0, 500 ml water, >200 g NaCl. Stir solution while adding salt; continue adding salt until no more goes into solution and it begins to collect on the bottom of the mixing vessel. 2. 95% Ethanol (see Note 1).
2.3. DNA Extraction: Automated and Manual Extractions
1. Autogen Prep 965 DNA extraction: Autogen 965 robot, and kit buffers including M1, M2, R3, R4, R5, R6, R7, R8, R9, and proteinase-K for use with animal extractions; 96-well deep well plates (Costar #3960). 2. Automated extractions using Qiagen BioSprint96: Biosprint robot, and BioSprint 96 DNA Blood Kit (940057), Buffer ATL (Qiagen 19076), and proteinase-K. 3. For both automated extraction methods: AxyMat silicone lids (Axygen) for 96-well digestion blocks, plexiglass (or other firm solid material) rectangles cut to fit the tops of the 96-well blocks, and 0.2% Tween® 20: 200 ml Tween® 20 in 100 ml H2O. 4. Manual extraction lysis buffer: 100 mM EDTA, 25 mM Tris pH 7.5, and 1% SDS. 5. 100 mg/ml proteinase-K dissolved in water. 6. Phenol, equilibrated to pH 7.5 with Tris–HCl pH 8.0. 7. Chloroform:isoamyl alcohol (24:1 ratio). 8. TE solution: 10 mM Tris, 1 mM EDTA pH 8.0; equilibrated to pH 7.6 with Tris–HCl pH 7.0. 9. Incubator or incubator shaker for tissue digestion, capable of maintaining a temperature of >50°C.
112
L.A. Weigt et al.
2.4. Polymerase Chain Reaction: Amplification and Purification
1. 10 mM deoxynucleotide (dNTP) mix. 2. 100 mM oligonucleotide Primers (IDT Technologies, USA). 3. Biolase Taq DNA Polymerase (BioLine). 4. 10× PCR Buffer for Bioline Taq. 5. 50 mM Magnesium chloride. 6. Liquidator 96-channel benchtop pipette (Rainin). 7. ExoSAP-IT (USB 78201) for purification. 8. 96-Well (0.2 ml volume) plastic PCR plates (Genemate T-3060-1). 9. Silicone plate mat (lid) for PCR plates (Genemate T-3161-1).
2.5. Polymerase Chain Reaction: Visualization
1. Agarose. 2. 1× TBE buffer: 0.9 M Tris base, 0.89 M boric acid, 0.02 M Na-EDTA; prepare by mixing 108 g Tris, 55 g boric acid, and 7.4 g Na-EDTA in a beaker together with 400 ml water, mix until dissolved, and add deionized water to 10 L. 3. Sample loading dye: 0.083% bromophenol blue; 0.083% xylene cyanol, and 10% glycerol. 4. DNA stain: Ethidium bromide (10 mg/ml) or SYBR SAFE (Invitrogen). 5. Optional: DNA size standard (“ladder”; Hi Lo DNA marker, Minnesota Molecular, Inc.). 6. Electrophoresis rig and power supply. 7. Gel imaging system/camera for use over UV light box.
2.6. Sanger Sequencing Components: BigDye Reactions
1. 5× Sequencing Buffer: 400 nm Tris–HCl pH 9.0, and 10 mM MgCl2. 2. Oligonucleotide Primers: Dissolved in water to 10 mM (Table 1 lists all primers used for fish). 3. BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems). 4. 96-Well (0.2 ml volume) plastic PCR plates (Genemate T-3060-1). 5. Silicone plate mat (lid) for PCR plates (Genemate T-3161-1).
2.7. Sephadex Purification of Cycle-Sequencing Products
1. Sephadex® G50 (Sigma). 2. Hi-Di™ formamide (Applied Biosystems). 3. Multiscreen® HTS filter plates (Millipore MSHVN4550). 4. Multiscreen column loader (Millipore MACL09645). 5. Liquidator 96-chanel manual pipette (Rainin). 6. 96-Well (0.2 ml volume) semi-skirted plastic PCR plates (Genemate T-3085-1). 7. Septaseal rubber mats (ABI #4315933).
6
DNA Barcoding Fishes
113
Table 1 Primer table for fish PCR (and M13 sequencing primers) (from refs. 10, 22, 23) Barcode primer name
Barcode primer sequence 5¢ → 3¢
FISHCO1LBC
TCAACYAATCAYAAAGATATYGGCAC
FISHCO1HBC
ACTTCYGGGTGRCCRAARAATCA
FISHCO1LBCm13F
CACGACGTTGTAAAACGACTCAACYAATCAYAAAGATATYGGCAC
FISHCO1HBCm13R
GGATAACAATTTCACACAGGACTTCYGGGTGRCCRAARAATCA
16SAR
CGCCTGTTTATCAAAAACAT
16SBR
CCGGTCTGAACTCAGATCACGT
m13F
CACGACGTTGTAAAACGAC
m13R
GGATAACAATTTCACACAGG
2.8. Genetic Analyzer Components
1. ABI 3130XL genetic analyzer: Polymer POP-7; 36-cm capillary array run using the ABI Template Protocol “RapidSeq36_ POP7” with a run time of 2,280 s. 2. ABI 3730XL genetic analyzer: Polymer POP-7; 50-cm capillary array run using the ABI Template Protocol “LongSeq50_ POP7” with a run time of 4,000 s.
2.9. Data Processing and Quality Control
1. Sequencher vers. 4.10.1 (Gene Codes). 2. Geneious (BioMatters)—use of the Geneious program and the BioCode plugin is discussed elsewhere in this volume.
3. Methods 3.1. Tissue Sampling
1. Photography processing—fish orientation—left side, fish’s head on left; see Note 2. 2. Tissues to sample (in order of decreasing desirability)—muscle biopsy from right side, right eye, portion of right pectoral fin, other fin clip, gill tissue, swabs, and scales. 3. Muscle biopsy—from right side, caudal region, dorsal to lateral line; avoid heavily parasitized areas (e.g., gills and guts) and areas of important morphological characters (e.g., fins and lateral line area); from larvae and small specimens, it may be necessary to destructively sample a portion of the specimen, and consultation with taxonomists is advised so as to avoid critical morphological regions (e.g., the suction disk on clingfish). 4. Clean all tools prior to touching specimen using bleach solution or flame sterilization, etc. Scrape off scales from the area to be sampled and carefully dissect out a small portion of muscle—the amount of tissue to sample is dependent on the
114
L.A. Weigt et al.
size of the specimen and storage vessel (do not exceed a tissue:buffer ratio of 1:4 if possible) (see Note 3). 5. From EtOH-preserved and -stored specimens: DNA leached from the specimen can be extracted from the alcohol in the storage container (8). (After distillation, the used ethanol can be recycled without risk of contamination (7).) However, increased yields will be obtained via more substantial yet destructive tissue biopsy (as above in Subheading 3.1, step 2). 3.2. Tissue Storage
1. BioBanking: One of the significant contributions of the DNA barcoding enterprise is a repository of genetic materials. These are tied to voucher specimens in public collections and the identity and integrity of the specimens have been validated genetically. These materials can then serve as a starting point for subsequent molecular investigations. Therefore, it is important to maximize the utility of all collected materials. 2. Frozen storage: Freezing tissues (−20°C or lower) is the recommended method of preservation to maximize potential future uses of the material. Vapor-phase liquid nitrogen is ideal. Frequently this is not feasible, particularly in the field, so alternatives are presented. 3. Salt/DMSO buffer storage: Transportation of ethanol and other flammables has become an issue, and salt/DMSO buffer is an option in those cases. Place small tissue chunks in the buffer, taking care not to overwhelm the buffer with too much tissue—a good ratio of tissue:buffer is 1:4. 4. Ethanol (95%) storage: 70% ethanol should be avoided (see Note 1).
3.3. DNA Extraction: Autogen Prep 965
1. Prepare fresh lysis solution for every run by dissolving appropriate aliquots of proteinase-K provided in kit with each aliquot of Reagent M1. Standard concentration of proteinase-K in M1 lysis buffer for overnight digestion of animal tissue is 0.4 mg/ml. 2. For tissue lysis, place tissue into the appropriate well of a 96 deep-well plate (Costar #3960), and add 150 ml of Reagent M2 and 150 ml Reagent M1 containing the predissolved proteinase-K at the concentration of 0.4–1.0 mg/ml. Cover the plate with a silicone mat (Axygen) and one or more plexiglass plates cut to fit the block to minimize evaporation of buffer and prevent contamination between wells. The silicone mat and plexiglass plates are taped firmly to the block. 3. Incubate the samples overnight at 56°C with shaking. 4. Spin the plates briefly to remove condensed droplets from the lid. Load lysis plates on the AutoGenprep 965, with an equal number of output plates for DNA and tips following manufacturer’s instructions. A maximum of four 96-well plates can be run simultaneously on the machine (see Note 4).
6
DNA Barcoding Fishes
115
5. Load Reagents R3, R4, R5/R6/R7, R8, and R9 into the appropriate reservoirs following manufacturer’s instructions. 6. The standard resuspension volume is 0.05 ml of buffer R9. If a large quantity of DNA is expected, such as when extracting vertebrate tissues or large amounts of other tissue, we change this volume to 0.1 ml. 7. Run the protocol on the instrument. 8. Upon completion, using a 96-well pipettor (if available), portion the DNA extracts into the desired amounts for the working and archival stocks for BioBanking. 3.4. DNA Extraction: Qiagen Biosprint Magnetic Bead Protocol (96-Well Plate Protocol)
1. We follow the Qiagen protocol for Biosprint extractions. Following are our deviations from the published protocol and our observations on particular steps. 2. Before beginning, check buffer ATL for white precipitate and take steps to resuspend it (see Note 6). 3. The MagAttract particles settle out of solution very quickly. Before adding this mix to the master mix, vortex at high speed for 3 min. Use immediately. Vortex again if much time has elapsed (>2–3 min). 4. Prepare Master mix of AL Buffer = 100 ml; isopropanol = 100 ml; MagAttract Suspension G = 15 ml. Prepare master mix 10% greater than that required for the total number of sample purifications to be performed. 5. Cut 5–25 mg of each tissue into small pieces and place in a 96-well S-Block. Add buffer ATL and proteinase-K. 6. Seal the plate following the same method as the Autogen lysis above (Subheading 3.3, step 2). 7. Place sealed plate in an incubator/shaker and digest overnight at 56°C (see Note 7). 8. Following lysis with ATL, briefly centrifuge the S-Block containing the samples to remove drops from underneath the lid. 9. Vortex the master mix containing Buffer AL, isopropanol, and MagAttract Suspension G (see Note 6) for at least 1 min. Add 215 ml of this master mix to each sample in the S-Block. 10. Place blocks on instrument and start and run protocol. 11. Upon completion, using a 96-well pipettor (if available), portion the DNA extracts into the desired amounts for the working and archival stocks for genetic repository (see Note 8).
3.5. Manual Extraction: Phenol:Chloroform Protocol, Following Ref. 20
1. It is typically easiest to carry out the extraction out in 1.7–2-ml centrifuge tubes. 2. For lysis, prepare fresh stock of lysis buffer (from Subheading 2.3, item 5) and add proteinase-K to 1 mg/ml. 3. Combine 1 ml lysis buffer with ~1 cm2 of tissue.
116
L.A. Weigt et al.
4. Incubate with shaking overnight at 56°C. 5. Add no more than 700 ml of the lysed sample to the extraction tube. If a large amount of tissue is being extracted, it may be best to dilute the extraction with more extraction buffer, divide the sample into more than one tube, and extract each separately (see Note 5). 6. Phenol extraction: Add an equal volume of phenol to the lysed sample, and vortex vigorously to mix the phases. Spin in a microcentrifuge at top speed for 1–2 min to separate the phases. Remove aqueous phase (top layer) to new tube, being careful to avoid the phase interface. 7. Repeat the phenol extraction two more times. 8. Chloroform:isoamyl extraction: As in Subheading 3.5, step 6, use an equal volume of chloroform:isoamyl alcohol (instead of phenol) to remove any trace phenol. Repeat once more. 9. Precipitate the DNA with equal volume of isopropanol, and incubate at −20°C for 1 h or longer. 10. Spin sample with alcohol in microcentrifuge for 10 min at maximum speed to pellet DNA. 11. Before resuspension, set tube on counter at room temperature (RT), covered with a tissue, for 10 min or until all residual ethanol has evaporated. 12. Resuspend DNA pellet in a volume (usually, 50–200 ml) of TE (or a 1/10 dilution of TE) to achieve desired concentration. Vortex to mix. 3.6. PCR Methods: Amplification
1. Thaw and prepare reagents in proper concentration for the PCR reaction. Wait for each reagent to thaw completely and then mix thoroughly. Dilute primers to a 10 mM working stock for the PCR reaction. If starting with lyophilized primers, spin down before opening the tube and resuspend to 100 mM in molecular-grade water to form the stock solution. Dilute this stock 1:10 to make working stock. 2. Mix all reagents in volumes listed in the PCR recipe (Table 2) to form the master mix. Keep your reaction plate (or tubes) and master mix on ice. Vortex the master mix vigorously or pipette up and down to mix well (vortexing will cause liquid to be trapped on the cap of the tube, so follow with 15-s spin in a mini-centrifuge). 3. Aliquot 9 ml of master mix into each well of the 96-well plate (Genemate). 4. Aliquot 1 ml of each DNA template (undiluted) to each well. Change tips between samples.
6
DNA Barcoding Fishes
117
Table 2 PCR reaction cocktail PCR reagents
Each well (ml)
96-Well plate (ml)
ddH2O
6.4
640
10 mM dNTPs
0.5
50
10× buffer
1
50 mM MgCl2
0.4
40
10 mM primer F
0.3
30
10 mM primer R
0.3
30
Bioline Taq (5 U/ml)
0.1
10
Total
9
100
900
5. Add 1 ml nuclease-free water to well H12 to function as negative PCR control. 6. Place silicone plate mat over the top of the 96-well plate and secure it with a roller. Centrifuge the plate in a plate centrifuge or plate spinner for 10–15 s at approximately 3,950 rcf (= “centrifuge briefly”). 7. Place the plate in a thermal cycler block and run PCR with the following cycling parameters: 95°C for 5 min, 35 cycles of 95°C for 30 s, 52°C for 30 s, and 72°C for 45 s, with a final extension at 72°C for 5 min and hold indefinitely at 10°C (see Notes 9–11). 3.7. PCR Methods: Visualization via Agarose Gel Electrophoresis
1. Cast a 1.5% agarose gel for each PCR reaction plate (e.g., add 0.75 g agarose to 50 ml 1× TBE buffer and boil in the microwave until the agarose is dissolved) (see Note 12). 2. Cool the solution, then add 1 ml ethidium bromide (10 mg/ml) per 50 ml of agarose solution, mix well, and pour immediately into the casting tray. 3. Let gel set for approximately 30 min or until firm. Remove combs from the gel and place it into an electrophoresis rig filled with 1× TBE buffer. 4. Add 2 ml 2× loading dye to each well of an empty 96-well plate. 5. Add 2 ml of each PCR product to the loading dye plate. Use new tips for each transfer. 6. Mix PCR product and dye by pipetting up and down, and then load 4 ml of the mixture to each well of the gel (see Note 12).
118
L.A. Weigt et al.
Table 3 Exosap purification of PCR product Reagents
Each well (ml)
96-Well plate (ml)
ddH2O
1.5
150
ExoSAP-IT
0.5
50
Total
2
200
7. Run gel at 100 V for approximately 12 min or until the bromophenol blue and xylene cyanol dyes in the loading dye are clearly separated. 8. At the end of the run, remove the gel from the rig and place on a UV transilluminator. Use a gel-imaging system to capture a digital image of the gel. 3.8. PCR Purifications: EXOSAP-IT
1. We use a fourfold dilution of the ExoSAP-IT mix. In a 1.7-ml microcentrifuge tube, mix nuclease-free water and ExoSAP-IT in the volumes listed in the Table 3. Keep the enzyme mix on ice or cold block at all times (see Note 13). 2. Vortex the diluted mix vigorously or pipette up and down to mix well. 3. Centrifuge the plate briefly to bring down all condensation from the sides of the wells and lid to the bottom. 4. Aliquot 2 ml of ExoSAP-IT mix to each well of the 96-well plate. 5. Place the silicone plate mat back on the top of the 96-well plate, press it down with a roller, and centrifuge briefly. 6. Place the plate in a thermal cycler block and run with the following parameters: 37°C for 30 min and 80°C for 20 min, and hold on 10°C.
3.9. Sanger Cycle Sequencing Protocol
1. Thaw and prepare reagents in proper concentration for the cycle sequencing reaction. Wait for each reagent to thaw completely and then mix them thoroughly. Keep the BigDye on ice and in the dark since the BigDye is both light and temperature sensitive. 2. Mix all four reagents in the volumes as listed in Table 4. Create two master mixes, one containing the forward primer and one containing the reverse (see Note 14). 3. Vortex the mixes vigorously or pipette up and down to mix. Centrifuge briefly. 4. Aliquot 9 ml of master mix into each well of the 96-well plate, keeping both plate and master mix on ice.
6
DNA Barcoding Fishes
119
Table 4 Cycle sequencing reaction recipe Reagents
Each well (ml)
96-Well plate (ml)
ddH2O
25
625
5× SEQ buffer
1.75
175
BigDye
0.5
50
10 mM primer F or R
0.5
50
Total
9
900
5. Aliquot 1 ml of purified PCR product into each well of the 96-well plate. Change tips after each sample. 6. Place a silicone plate mat on the 96-well plate, press it down with a roller, and centrifuge briefly. 7. Place the plates in a thermal cycler and run the cycle sequencing program with the following parameters: 30 cycles of 95°C for 30 s, 50°C for 30 s, and 60°C for 4 min and hold at 10°C. 8. After the program has finished, store the plates at 4°C in a dark refrigerator until purification (see Note 14). 3.10. Sephadex Purification of Sanger Sequence Reactions
1. Measure dry Sephadex G50 using the multiscreen black column loader (predrilled uniform holes which measure and deliver the correct amount of sephadex) into a Multiscreen® HTS filter plate for each 96-well cycle sequencing reaction plate to be cleaned. 2. Add 300 ml molecular-grade water to the wells and allow to sit at room temperature for at least 2 h in order to completely hydrate the sephadex matrix. 3. Place the filter plate on top of a 96-well PCR plate (the “catch” plate) and tape together with laboratory tape. 4. Centrifuge at 750 × g force for 5 min to drain the excess water from the wells. Discard the water. The catch plate can be used again, but ONLY for this purpose, not PCR. 5. Add the entire volume of the sequencing reaction to the center of the Sephadex columns (not along the side walls) taking care not to touch the column surface or destroy the integrity of the column. 6. Attach a new 96-well PCR plate to the bottom of each Sephadex plate and secure with lab tape. Make sure that the orientation of the plates is the same: the A1 well of the Sephadex plate is over the A1 well of the catch plate.
120
L.A. Weigt et al.
7. Centrifuge at 750 × g force for 5 min to elute the cleaned sequencing product into the catch plate. 8. Dry the cleaned sequencing products on a heat block at 90°C for 10–15 min or in a Sorvall Speedvac. Cover the plate with Septa Seal mat and store at −20°C (see Note 15). 9. To prep for running on the genetic analyzer, add 10 ml Hi-Di™ Formamide to each well (under a fume hood). 10. Denature the DNA by heating the plate at 97°C for 3 min. 11. Cool the plate at 4°C for 3 min. 12. Load the plates into the Genetic Analyzer. 3.11. Genetic Analyzer Methods
1. ABI 3130XL: Running a 36-cm capillary array run using the ABI Template Protocol “RapidSeq36_POP7” with a run time of 2,280 s. 2. ABI 3730XL: Running a 50-cm capillary array run using the ABI Template Protocol “LongSeq50_POP7” with a run time of 4,000 s (see Note 16).
3.12. Data Processing and Quality Control Methods
1. Production of the final DNA barcode sequence from the raw sequencer output (the “traces”) involves several steps (forward and reverse traces for one set of 96 specimens are processed together). 2. The method we outline here uses Sequencher vers. 4.10.1 (Gene Codes, Corp). Alternatively, use of the Geneious program (BioMatters) and the BioCode plugin is discussed elsewhere in this volume. 3. Trace trimming—trimming is based on the phred quality scores of each base call (see Note 17). Trimming criteria, as implemented in Sequencher, are as follows: trim from the 5¢ and 3¢ ends until the first (or last) 20 bases contain fewer than 3 bases with a phred score <20, and from both ends until the first (or last) 10 bases contain <3 ambiguous bases. 4. Filtering traces—after trimming, all traces <450 bp in length are discarded, as are those where <90% of the bases have an average phred score of 20 or higher (see Note 18). 5. Contig building—using the “Assemble by Name” option in Sequencher (if reference sequence is used, then “Assemble by Name to Reference”; see Note 19 for our degenerate reference sequence), forward and reverse trimmed traces are assembled into contigs. 6. If the two reads are less than 97% identical, the sequences are set aside for manual interpretation of cause (see Note 20). Such dissimilarity may be due to low-quality traces (if both reads are on the lower end of our quality spectrum, a good contig may not be possible), laboratory error, sequencer error or contamination.
6
DNA Barcoding Fishes
121
Manual base calls in the consensus sequence resulting from these contig assemblies are based on “confidence” (phred scores). 7. If there is a discrepancy in a base call between the forward and reverse reads, the base call in the consensus sequence will reflect the base with the highest confidence of the two (under the CONTIG menu, choose “Consensus by Confidence) (see Note 21). 8. Inspection—contig assemblies are manually inspected if any ambiguities or gaps exist or if there are more than three disagreements between the two reads (this information is available using Sequencher’s “Get Info” option). Base calls are double checked by eye and left as ambiguities when neither base call has high confidence (see Note 22). 9. Finalizing the consensus sequence—all reads are assembled together with a reference sequence into one large assembly (usually using the lowest similarity requirement—60%). The reference sequence is used as a length gauge and primer sequences are trimmed from all forward and reverse reads at this point, if not done in prior step. 10. The assembly is examined for gaps, which at this point usually stem from a base call error (e.g., one base mistakenly interpreted by the software as two), and these mistakes are rectified. The large assembly is then “dissolved” and each specimen’s pair of traces reassembled as in step 3. The consensus sequence is then exported from each trace contig assembly as a concatenated fasta-formatted text file. 11. Contamination check—if, in the previous step, a read does not assemble, then it is compared to the NCBI nucleotide database using BLAST (see Note 23).
4. Notes 1. Denatured ethanol is not conducive to molecular work and should be avoided. 2. If “painting” fins with formalin to better photograph a specimen’s natural colors, the biopsy should be taken from right side first or from an area not exposed to any formalin, as this can hinder DNA extraction and/or amplification. 3. When sampling specimens in the field, cutting the larger tissue sample into smaller pieces suitable for DNA extraction saves time back in the lab. Upon returning, one then only needs to open the tube and pick out one or two small minced pieces to place into the tube/plate for cell lysis and DNA extraction. Not doing this means the larger (pencil eraser size) piece of
122
L.A. Weigt et al.
tissue must be removed from the tube and subsampled at a later time, requiring more tools, sterilization, and, most importantly, more time. Partially mincing the tissue also aids penetration of the stabilizing buffer/solution. 4. There is a minimum of 2 plates and 24 samples (12 samples of Row A in each plate) required to balance the built-in centrifuge of the Autogen 965. 5. It is difficult to extract volumes smaller than 100 ml. A sample can be concentrated by resuspending in a smaller volume after precipitation. 6. Check that Buffer ATL does not contain a white precipitate. If necessary, incubate for 30 min at 37°C with occasional shaking to dissolve precipitate. MagAttract magnetic particles copurify RNA and DNA if both are present in the sample. If RNA-free DNA is required, add RNase A to the sample before starting the procedure. The concentration of RNase should be 2 mg/ml (add 2 ml of 100 mg/ml RNase A solution to each 100 ml of sample). 7. Place a weight on top of the caps during incubation. Mix occasionally during incubation to disperse the sample or place on a rocking platform. Lysis time varies depending on the type, age, and amount of tissue being processed. Lysis is usually complete in 1–3 h, but optimal results will be achieved after overnight lysis. After incubation, the lysate may appear viscous, but should not be gelatinous as it may clog the DNeasy 96 membrane: dilution may be required. 8. Elution with volumes less than 200 ml increases the final DNA concentration but may reduce the overall DNA yield. For samples containing less than 1 mg DNA, elution in 50 ml Buffer AE is recommended. For maximum DNA yield, repeat step 16 with another 200 ml Buffer AE. A second elution with 200 ml Buffer AE will increase the total DNA yield by up to 25%. However, due to the increased volume, the DNA concentration is reduced. If a higher DNA concentration is desired, the second elution step can be performed using the 200 ml eluate from the first elution. This will increase the yield by up to 15%. 9. Tailed primers (e.g., FISHCO1LBCm13F and FISHCO1HBCm13R) can be used in the PCR reaction for easier downstream processing. In this case, the sequencing primers will be the m13F and m13R primers. 10. If the PCR amplification is not successful, we use the standard animal 16S primers (16Sar and 16Sbr) (21) to check the quality of the DNA. The PCR recipe and cycling parameters are the same as for CO1, but with an annealing of 48°C instead of 52°C. 11. Silicone plate mats can be cleaned by soaking for a minimum of 10 min in a 10% bleach solution followed by rinsing with
6
DNA Barcoding Fishes
123
distilled water. After drying, UV irradiation is recommended prior to reuse. 12. Agarose gels can be stored for 2–3 days prior to use. To store, wrap gels in plastic wrap and keep at 4°C. Gels can also be made up in large volumes and stored in aliquots of a volume sufficient for pouring one gel (in 50-ml Falcon tubes, which can be reused). At the time of use, the solid gel is put into a beaker and microwaved until liquid. 13. The original strength Exosap-IT can be diluted as much as tenfold and still be effective; this will cut down costs even further. Incubation time at 37°C must be increased to accommodate the lower quantity of enzyme. Diluted ExoSAP-IT mixes cannot be refrozen and has a short shelf life at 4°C. We discard any excess diluted ExoSAP-IT immediately. 14. Keep sequencing reactions in a dark place at all times to avoid degradation of the light-sensitive sequencing product. If tailed primers FISHCO1LBCm13F and FISHCO1HBCm13R were used in the PCR reaction, the sequencing primers can be the same as the amplification primers or, alternatively, the m13 sequencing primers m13F and m13R. 15. Multiscreen® HTS filter plates (as well as the plastic catch plates used for Sephadex) can be reused by filling the filter plate with 300 ml distilled water and spinning for 5 min at 700 × g force. Repeat this procedure once more and invert to dry. Dried cleaned sequencing reactions can be kept for months at −20°C or can be shipped. While other methods than Sephadex cleanup can be used and are sometimes less expensive (e.g., ethanol precipitations), improper execution by inexperienced users can lead to premature degradation of the capillary arrays on the genetic analyzer. In the hands of most users, the Sephadex method consistently produces high-quality cleaned sequencing products. 16. The newest polymer from Applied Biosystems, POP7, is used for all sequencing runs, and the running buffer is changed 3×/ week. The number of samples in the 3730XL queue can be increased by combining four sets of 96 samples into a 384-well plate. This can maximize the number of samples run over a weekend. Once run, the sequencing reactions are kept at 4°C for a week. They can be re-run on the sequencer with no significant loss of signal during this period. 17. Phred scores are essentially a measure of the accuracy of a base call made by the sequencing software. According to many sources, a phred score of 20 indicates a 99% probability of an accurate base call. We consider base calls with a phred score of <20 as “low” quality and “high”-quality calls have a phred score >40 (users with different parameter definitions can generate incongruent results).
124
L.A. Weigt et al.
18. Occasionally, sequences with an initial overall quality <80% (see Subheading 3.7, step 2) can be improved via visual inspection and manual trimming. We only do this when it is clear that the trimming parameters failed to remove additional poorquality bases at the end of the sequence, thus lowering the overall quality. It is sometimes possible to cut out the poorquality bases and raise the overall sequence quality while maintaining a length >450 bp. In Sequencher, candidates for this treatment may be recognized as post-trim sequences of adequate (or longer) length with 70–80% overall quality. 19. Our degenerate fish reference sequence is 5¢ → 3¢ NNTTTAT NTAGTATTTGGTGCCTGAGCCGGAATAGTAGGCACA GCCCTAAGCYTAYTAATTCGAGCTGAACTAAGCCAAC CTGGCGCCCTNNTNGGNGACGACCAAATTTATAATG TAATCGTAACTGCCCACGCCTTTGTAATAATTTTCTT TATAGTAATACCAATTATGATTGGAGGCTTTGGAAAC TGAYTAATCCCCCTAATGATTGGGGCCCCCGACATGG CCYTCCCYCGAATAAACAACATAAGCTTTTGNNTNNT NCCNCCNTCNNTCNTNNTNNTNNTNGCATCCTCTG GNNTNGAAGCCGGGGCCGGAACAGGATGAACAGTT TANCCNCCNNTAGCNGGAAACYTAGCCCACGCAGGA GCCTCTGTAGACCTAACAATTTTCTCCYTTCATCTAG CAGGAATNTCCTCAATNNTNGGNGCAATTAACTTTA TTACAACAATYYTNAANATGAAACCNCCNGCNATNTC NNNNTACCAAACACCCYTATTTGTTTGAGCNNNNYT AATTACAGCCGTNNTNNTNNTNNTNTNNYTCCNNG TCYTTGCTGCTGGCATTACAATGYTTYTCACAGACCG AAACYTAAACACAACCTTCTTTGACCCTGCAGGAGGA GGAGACCCCATTYTGTACCAACACYTAYTC. 20. If a contig has a few disagreements between the sequences but no ambiguities or gaps, each base in the sequence has a “solid” call based on relative confidence. Contigs with >3 disagreements and no ambiguities are visually inspected to ascertain the cause(s) of the disagreement. 21. To use the “Assemble by Name” option in Sequencher which is really the only way to create a large number of assemblies without a tedious amount of file renaming, it is important to plan ahead. Trace files need to come from the sequencer with names that can be readily combined; therefore, files from a run of 96 using multiple primers or plates need to be named in a similar fashion so that the entire batch can be treated at once. Standardization is the key. Our fish trace files are labeled fieldnumber_Genus_species_barcode_F (only part of the generic and specific names are used; e.g., BLZ8013_Scaru_ iseri_89893427_F, a forward trace file). Sequencher breaks up a name as directed—underscores are most common (the above
6
DNA Barcoding Fishes
125
example would break into five names). We often use regular expressions to define the parameters to be used for the splitting so that we can get the names we want on our final fasta files (e.g., (.*)_(.*) will take off only the F or R from the above example and the remainder will be the consensus sequence fasta file defline). 22. At the point of visual inspection, there are generally no changes to be made to the contig. All of the previous stringent filtering, trimming, etc. leave very few miscalls. The most common errors remaining are multiple calls—2 real As are called as 3, 3 real Gs are called as 2. Errors such as these can cause a gap to be inserted in the consensus sequence. Miscalls such as these can slip through all of our QC steps and are quite pernicious. Errors, where one extra base is called, can be caught by assembling all sequences in a project together with a reference sequence. But errors, where too few bases are called, are more vexing. There is no foolproof method for identifying them. When they occur very near the 3¢ end of a sequence, if not caught, they can lead to artificial increases in distances from other sequences or stop codons. 23. In general, sequences from almost all fish species will readily assemble to one another and if a sequence does not, then it is likely a contaminant. Contaminants in our fish sequences are generally bacterial or fungal in origin. BLASTing against GenBank is the best way to verify this, as these contaminating sequences are often not a match to any known COI sequence.
Acknowledgments Our Fish DNA Barcoding Team would like to thank the NMNH Fish scientists and taxonomists for taking the time to educate us on fish biology and biodiversity and make this work thoroughly enjoying. We also thank David Erickson for his tireless efforts to produce this important volume. References 1. Eschmeyer WN, Fricke R, Fong JD, Polack DA (2010) Marine fish diversity: history of knowledge and discovery (Pisces). Zootaxa 2525: 19–50 2. Handy SM, Deeds JR, Ivanova NV et al (2010) A single laboratory validated method for the generation of DNA barcodes for the identification of fish for regulatory compliance. J AOAC Int 94:1–10
3. Yancy HF et al (2008) A protocol for validation of DNA-barcoding for the species identification of fish for FDA regulatory compliance. FDA Lab Inf Bull LIB No 4420 4. Pilgrim EM, Jackson SM, Swenson S, Turcsanyi I, Friedman E, Weigt LA, Bagley MJ (2010) Incorporation of DNA barcoding into a largescale biomonitoring program: opportunities and pitfalls. J NABS 30:217–231
126
L.A. Weigt et al.
5. Gemeinholzer B, Rey I, Weising K, Grundmann M, Muellner AN et al (2010) Organizing specimen and tissue preservation in the field for subsequent molecular analyses, chap. 7. In: Eymann J, Degreef J, Hauser C, Monje JC, Samyn Y, Vanden Spiegel D (eds) Manual on field recording techniques and protocols for all taxa biodiversity inventories. ABCTaxa (Belgium) 8:653 6. Leponce M, Meyer C, Hauser CL, Bouchet P, Delabie J, Weigt L, Basset Y (2010) Challenges and solutions for planning and implementing large-scale biotic inventories, chap. 3. In: Eymann J, Degreef J, Hauser C, Monje JC, Samyn Y, VandenSpiegel D (eds) Manual on field recording techniques and protocols for all taxa biodiversity inventories. ABCTaxa (Belgium) 8:653 7. Keel WG, Moser W, Giaccai J, Ormos A, Tanner J, Weigt LA (2011) Alcohol recycling at the Smithsonian Institution, National Museum of Natural History (NMNH). Collect Forum 25(1):10–21 8. Shokralla S, Singer GAC, Hajibabaei M (2010) Direct PCR amplification and sequencing of specimens’ DNA from preservative ethanol. Biotechniques 48:305–306 9. Venegas RP, Hueter R, Gonzalez J, Tyminski J, Remolina JFG et al (2011) An unprecedented aggregation of whale sharks, Rhincodon typus, in Mexican coastal waters. PLoS One 6(4): e18994. 10. Baldwin CC, Mounts JH, Smith DG, Weigt LA (2009) Genetic identification and color descriptions of early life-history stages of Belizean Phaeoptyx and Astrapogon (Teleostei: Apogonidae) with comments on identification of adult Phaeoptyx. Zootaxa 2008:1–22 11. Baldwin CC, Weigt LA, Smith DG, Mounts JH (2009) Reconciling genetic lineages with species in Western Atlantic Coryphopterus (Teleostei: Gobiidae). Smithsonian contribution. Mar Sci 38:111–138 12. Baldwin CC, Castillo CI, Weigt LA, Victor BC (2011) Seven new species within western Atlantic Starksia atlantica, S. lepicoelia, and S. sluiteri (Teleostei: Labrisomidae), with comments on congruence of DNA barcodes and species. Zookeys 79:21–72
13. Tornabene L, Baldwin CC, Weigt LA, Pezold F (2010) Exploring the diversity of western Atlantic Bathygobius (Teleostei: Gobiidae) with cytochrome c oxidase-I, with descriptions of two new species. Aqua 16:141–170 14. Wong EH-K, Shivji MS, Hanner RH (2009) Identifying sharks with DNA barcodes: assessing the utility of a nucleotide diagnostic approach. Mol Ecol Res 9:243–256 15. Hajibabaei M, Smith MA, Janzen DH, Rodriguez JJ, Whitfield JB, Hebert PDN (2006) A minimalist barcode can identify a specimen whose DNA is degraded. Mol Ecol Notes 6:959–964 16. Kumar R, Singh PJ, Nagpure NS, Kushwaha B, Srivastava SK, Lakra WS (2007) A non-invasive technique for rapid extraction of DNA from fish scales. Ind J Exp Biol 45:992–997 17. Lucentini L, Caporali S, Palomba A, Lancioni H, Panara F (2006) A comparison of conservative DNA extraction methods from fins and scales of freshwater fish: a useful tool for conservation genetics. Cons Genet 7:1009–1012 18. Ivanova NV, Dewaard JD, Hebert PDN (2006) An inexpensive, automation-friendly protocol for recovering high-quality DNA. Mol Ecol Notes 6:998–1002 19. Sambrook J, Fritsch EF, Maniatis T (1989) Purification of nucleic acids. In: Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, pE.3–4 20. Palumbi SR (1996) Nucleic acids II: the polymerase chain reaction. In: Hillis DM, Moritz C, Mable BK (eds) Molecular systematics. Sinauer, Sunderland, MA 21. Seutin G, White BN, Boag PT (1990) Preservation of avian blood and tissue samples for DNA analysis. Can J Zool 69:82–90 22. Ivanova NV, Zemlak TS, Hanner RH, Hebert PDN (2007) Universal primer cocktails for fish DNA barcoding. Mol Ecol Notes 7:544–548 23. Simon C, Frati F, Beckenbach A, Crespi B, Liu H, Flook P (1994) Evolution, weighting, and phylogenetic utility of mitochondrial gene sequences and a compilation of conserved polymerase chain reaction primers. Ann Entomol Soc Am 87:651–701
Chapter 7 DNA Barcoding Birds: From Field Collection to Data Analysis Darío A. Lijtmaer, Kevin C.R. Kerr, Mark Y. Stoeckle, and Pablo L. Tubaro Abstract As of February 2011, COI DNA barcode sequences (a 648-bp segment of the 5¢ end of the mitochondrial gene cytochrome c oxidase I, the standard DNA barcode for animals) have been collected from over 23,000 avian specimens representing 3,800 species, more than one-third of the world’s avifauna. Here, we detail the methodology for obtaining DNA barcodes from birds, covering the entire process from field collection to data analysis. We emphasize key aspects of the process and describe in more detail those that are particularly relevant in the case of birds. We provide elemental information about collection of specimens, detailed protocols for DNA extraction and PCR, and basic aspects of sequencing methodology. In particular, we highlight the primer pairs and thermal cycling profiles associated with successful amplification and sequencing from a broad range of avian species. Finally, we succinctly review the methodology for data analysis, including the detection of errors (such as contamination, misidentifications, or amplification of pseudogenes), assessment of species resolution, detection of divergent intraspecific lineages, and identification of unknown specimens. Key words: Birds, Cytochrome c oxidase I, DNA barcodes, Collection, DNA extraction, Neighbor joining, Polymerase chain reaction, Pectoral muscle, Sequencing, Toe pad
1. Introduction Taxonomy and phylogenetic affinities are better understood in birds than in any other large group of organisms. Additionally, they are probably the best represented group of vertebrates in frozen tissue collections, with more than 300,000 tissue samples covering nearly 75% of known bird species (1). These characteristics make birds an ideal group to analyze the effectiveness of a standardized genetic method for species identification—i.e., DNA barcoding. Consequently, they were the first taxonomic group for which a largescale barcoding study was performed (2) and were the focus of one
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_7, © Springer Science+Business Media, LLC 2012
127
128
D.A. Lijtmaer et al.
Fig. 1. Localities of collection of bird tissue samples that have been DNA barcoded so far as of February 2011.
of the first global barcoding campaigns, the all birds barcoding initiative (ABBI), which was launched in September 2005 with the goal of obtaining barcodes for the ca. 10,000 extant bird species. The initial avian barcoding study (2), in which roughly 40% of the North American species were examined, showed that a 648-bp segment of the 5¢ end of the mitochondrial gene cytochrome c oxidase I (COI) was highly effective for species identification, presaging similar findings in diverse animal groups. Following this preliminary analysis, coverage of North American species expanded to near completion (3) and avian barcode surveys commenced in other regions, including the Neotropics (4–6), the Palearctic (7–9), the Indomalayan region (10, 11), and Australasia (12). As of February 2011, more than 23,000 avian barcode sequences from more than 3,800 species had been obtained, representing more than one-third of the world’s avifauna (Fig. 1). These studies demonstrated that a COI barcode effectively distinguishes among known species of birds (see also ref. 13) and additionally highlights species and species groups in which further analyses of taxonomic boundaries are necessary. Even in a wellstudied group such as birds, barcoding projects have revealed previously undetected lineages, many of which could represent new species (e.g., ref. 14). Increasingly comprehensive public libraries of bird barcodes also enable new lines of scientific inquiry and practical applications, including analyses of diversification patterns at continental scales (15), identification of species involved in birdstrikes (16–18), studies of avian reservoirs for human and animal diseases from blood meals of arthropod vectors (19), and identification of bird species occupying cavity nests (20).
7
DNA Barcoding Birds: From Field Collection to Data Analysis
129
Avian barcoding projects also present challenges. Approximately 25% of species are not represented in frozen tissue collections, and obtaining high-quality DNA from historical specimens, such as study skins or skeletons, is arduous and often unsuccessful. Collecting new specimens is relatively expensive and time consuming as compared to collecting other animal groups, such as insects. This is particularly challenging in the context of ABBI’s goal of obtaining barcodes from multiple individuals—ideally belonging to geographically distant populations—of all avian species. In addition, many of those species absent from tissue collections are rare or endangered, limiting the possibility of field collection (1). Blood is often collected from threatened species to avoid sacrificing individual birds, but this adds another challenge because avian red cells are nucleated and contain few mitochondria, making blood a relatively poor source of mitochondrial DNA. This results in a higher risk of amplifying nuclear-mitochondrial pseudogenes, or numts, instead of the desired COI mitochondrial copy (21). Finally, necessary permits for collecting specimens and transporting samples within and across national borders are often particularly complex for birds (see Note 1). In this chapter, we detail the methods for barcoding birds, covering the entire process from field collection to data analysis. We focus on the methods that have been most widely used, and include a brief description of other options where appropriate. Most of the methods that are used to barcode birds are also used for other animal organisms and in many other molecular studies. Therefore, although we outline all the information that a reader needs to barcode avian samples, we emphasize those aspects that are specific to avian barcoding, addressing the aforementioned challenges posed by birds.
2. Materials 2.1. Collecting, Documenting, and Storing Tissue Samples and Their Associated Vouchers
Materials needed for the first steps of the pipeline are not specific to barcoding and include general equipment for collecting specimens (see Note 2), dissecting tools for obtaining tissue samples, materials for storing tissue samples, taxidermy materials for preparing vouchers, and imaging equipment for documenting specimens. See refs. 22–28 for a general description of materials needed to collect, prepare and store avian tissue samples and their associated vouchers.
2.2. DNA Extraction
The following are the working solutions for the manual glass fiber (GF) protocol for DNA extraction using 96-well plates (29, 30) or individual tubes. The volumes described are those needed to extract ten 96-well plates or approximately 1,000 samples (see the protocol available from the Canadian Centre for DNA Barcoding (CCDB)
130
D.A. Lijtmaer et al.
website (www.dnabarcoding.ca) for recipes for preparing stock solutions and for descriptions and suppliers of reagents, disposables, and equipment 30). 1. Vertebrate lysis buffer (VLB): 0.5% SDS, 100 mM NaCl, 50 mM Tris–HCl pH 8.0, and 10 mM EDTA pH 8.0. To prepare 50 ml of VLB, mix 0.25 g of SDS, 5 ml of 1 M NaCl, 2.5 ml of 1 M Tris–HCl pH 8.0, and 1 ml of 0.5 M EDTA pH 8.0, and then add double-distilled water (ddH2O) to reach the final volume. 2. Binding buffer (BB): 6 M guanidine thiocyanate, 20 mM EDTA pH 8.0, 10 mM Tris–HCl pH 6.4, and 4% Triton X-100. To prepare 100 ml of BB, mix 70.9 g of guanidine thiocyanate, 4 ml of 0.5 M EDTA pH 8.0, 10 ml of 0.1 M Tris–HCl pH 6.4, and 4 ml of Triton X-100, and then add ddH2O to reach the final volume (see Note 3). 3. Wash buffer (WB): 60% ethanol, 50 mM NaCl, 10 mM Tris–HCl pH 7.4, and 0.5 mM EDTA pH 8.0. To prepare 750 ml of WB, mix 475 ml of 96% ethanol, 37.5 ml of 1 M NaCl, 7.5 ml of 1 M Tris–HCl pH 7.4, and 0.75 ml of 0.5 M EDTA pH 8.0, and then add ddH2O to reach the final volume. Mix well and store at −20°C. 4. Binding mix (BM): To prepare 100 ml of BM, mix 50 ml of BB and 50 ml of 96% ethanol. BM is stable for about 1 week at room temperature. 5. Protein wash buffer (PWB): To prepare 180 ml of PWB, mix 47 ml of BB and 126 ml of 96% ethanol, and then add ddH2O to reach the final volume. PWB is stable for about 1 week at room temperature and it should be discarded if any crystallization occurs. 2.3. PCR Amplification of Cytochrome C Oxidase I
The following are the reagents needed to amplify the COI barcode region via Polymerase Chain Reaction (PCR) according to the protocol generated at the CCDB (31). 1. Taq polymerase: We recommend Platinum Taq polymerase (Invitrogen), which is the one currently used at the CCDB (see Note 4). 2. 10× PCR buffer (for Platinum Taq). Store at −20°C. 3. 50 mM MgCl2. Store at −20°C. 4. 10 mM dNTP mix. Store at −20°C. 5. 10 μM primer solution (obtained by diluting ten times the 100 μM stock solution, which should be prepared from the desiccated primer by dissolving each nmol of the primer in 10 μl of ultrapure water). Store at −20°C. 6. 10% Trehalose (obtained by dissolving each gram of D-(+)trehalose dihydrate in 10 ml of ultrapure water). Approximately
7
DNA Barcoding Birds: From Field Collection to Data Analysis
131
6.25 ml of 10% trehalose are used for 1,000 PCRs. Store at −20°C (see Note 5). 7. Ultrapure water. 2.4. Sanger Sequencing
The following are the reagents needed to obtain the COI barcode sequence following the protocol developed at the CCDB (32). 1. BigDye Terminator v3.1 reaction mix (ABI). 2. 5× sequencing buffer (400 mM Tris-HCl, pH 9.0, 10 mM MgCl2). 3. 10% Trehalose (see above). 4. 10 μM primer solution (see above). 5. Ultrapure water. 6. Sephadex G-50 (Sigma-Aldrich) for purification of sequence products.
3. Methods 3.1. Collecting, Documenting, and Storing Tissue Samples and Their Associated Vouchers
3.1.1. Obtaining and Storing Tissue Samples
Avian barcoding projects often rely on tissue samples already deposited in museums (3, 7–9). However, current collections do not always provide adequate coverage of all bird species or regions. In these cases, barcoding projects are a catalyst for the growth of bird tissue collections (e.g., ref. 6). Collecting avian specimens, obtaining and storing tissue samples, and preparing vouchers are complex tasks. Here, we focus on the key points and emphasize those that are particularly relevant in the context of barcoding. Regardless of whether tissue samples are collected for the project or obtained from museums, it must be borne in mind that they should be sampled broadly across the geographic range of species so that intraspecific genetic variation is accurately represented. To achieve this objective, as a rule of thumb, it is recommended to analyze five to ten specimens per species selected from sites distributed throughout its range. However, this number varies depending on various factors, such as the complexity of the species population structure, availability of specimens, permits, and financial limitations (see Note 6). 1. Collection of specimens should follow standard procedures (e.g., ref. 26) in accordance with the collection permits obtained for the project (see Note 1). Ideally, tissue samples should be obtained immediately after collecting the specimen. If this is not possible, the intact specimen should be kept frozen until tissue sampling can be performed (see Note 7). The most widely used tissue source is pectoral muscle (which is large and easy to access), but other sources of muscle, such as the heart, are also commonly used. The liver has also been sampled
132
D.A. Lijtmaer et al.
traditionally, but, likely due to its high enzymatic activity, liver samples are not as good as muscle samples for amplifying mitochondrial DNA (A. Borisenko, personal communication). 2. For long-term preservation of tissue samples, immediate freezing is recommended. The use of nitrogen tanks is a popular method to achieve this while in the field. Once the sample is deposited in a permanent collection, it will likely be kept in either nitrogen or an ultracold freezer at −80°C (see Note 8). Regular microtubes can be used for storing the samples in ultracold freezers, but cryogenic tubes, which are thicker walled, are required if samples are stored in nitrogen, even if only for a few days. If samples cannot be frozen either in the field or permanently, then they should be fixed (see Note 9). The most common fixative is ethanol (96% or higher, see Note 10), but there are other options, such as DMSO buffer. In this case, regular microtubes can be used (see Note 11). 3. When only a blood sample is obtained (e.g., with rare or endangered species), it is also recommended to freeze the sample, although blood is usually immersed in ethanol or lysis buffer (see Note 12) regardless of the temperature of storage. 3.1.2. Preparing, Documenting, and Storing Vouchers
1. The presence of vouchers associated with tissue samples is critical when establishing a barcode library (see Notes 13–17) and their preparation and storage should follow standard procedures (22–28). 2. Documentation of vouchers is an important step in the barcode pipeline; photographs of specimens should be included with the information uploaded to the Barcode of Life Data Systems (BOLD; www.boldsystems.org; ref. 33). BOLD is an online repository of barcode records and a workbench for DNA barcoding projects. Photographs typically document prepared specimen skins (which should also feature a scale and the voucher number, Fig. 2) but alternatively could feature the living bird (see Note 18) or the specimen prior to preparation if only a skeleton will be prepared (both photographs of the nonprepared specimen and the skeleton can be uploaded). When a project uses samples already deposited in museum collections, photographs of vouchers should still be included as data to support the DNA barcode sequence.
3.1.3. Data Associated to the DNA Barcode Sequence
1. The collection information associated with a barcode record is vital for its use either for identification purposes, aiding species discovery, or any evolutionary or ecological study. Several information fields can be uploaded to BOLD, and increasing the amount of information enhances the value of the barcode record. The information about the data that should be included in each barcode record and the spreadsheet used to upload the data can be obtained from the BOLD website (see Note 19).
7
DNA Barcoding Birds: From Field Collection to Data Analysis
133
Fig. 2. Photographs of vouchers (particularly museum skins) add value to BOLD records. Picture taken by K. Kerr.
2. As previously mentioned, images of each specimen should also be uploaded to BOLD. The image submission protocol can also be obtained from the BOLD website. 3.2. DNA Extraction
3.2.1. 96-Well Plate Format
A homemade GF DNA extraction method has been developed by Ivanova et al. (29) with the objective of providing high-quality DNA extracts at low cost and thus increasing the efficiency of the barcoding process. This protocol can be applied to either vertebrates or invertebrates with only slight modifications, and has already been used on thousands of bird specimens (6, 8). Several different types of tissue samples can be used with this method, including muscle, blood, feathers, and toe pads (see Note 20). This method can also be adapted to various scales, from single reactions to 384-well plates. We describe below a version of the protocol for 96-well plates, which is a commonly used scale when developing large bird barcoding surveys, and a small-scale version for individual tubes. Both utilize the working solutions listed in Subheading 2.2. In high-throughput laboratories, an automated version of the protocol may be employed (this version is not described here; see ref. 29 for a detailed protocol). 1. For each plate, mix 5 ml of VLB and 0.5 ml of Proteinase K (20 mg/ml) and dispense 50 μl of this Lysis Mixture in each well.
134
D.A. Lijtmaer et al.
2. Add a small piece of tissue (approx. 1–2 mm3, see Note 21) to each well of the plate (see Notes 22–24). To avoid cross contamination, it is highly recommended to gently cover the plate with caps before transferring the tissue and only uncover the row that is being used (see Notes 25 and 26). After transferring all the samples into the plate, the caps need to be fully inserted to avoid evaporation during incubation. 3. To allow digestion, incubate at 56°C overnight (or for a minimum of 6 h, see Note 27). 4. Centrifuge the plate at 1,500 × g force for 15 s to remove any condensate from the caps. 5. Remove the caps and add 100 μl of BM to each sample using a multichannel pipette. Mix by pipetting. 6. Transfer the lysate (about 150 μl) from the wells of the microplate into the wells of a GF plate (e.g., AcroPrep 96 1 ml filter plate with 1.0 μm GF media) placed on top of a squarewell block (e.g., PP MASTERBLOCK, 96-well, 2 ml) to be used as a catch plate, using a multichannel pipette. Seal the plate with self-adhering cover (e.g., Axyseal sealing film). 7. Centrifuge the GF plate with the square-well block at 5,000 × g force for 5 min to bind DNA to the GF membrane. 8. Remove the cover and add 180 μl of PWB to each well of the GF plate. Seal the plate with a new cover and centrifuge at 5,000 × g force for 2 min. 9. Add 750 μl of WB to each well of the GF plate. Seal with a new cover and centrifuge at 5,000 × g force for 5 min. 10. Discard the square-well block and the flow through (see Note 28). Remove the cover film of the GF plate, place it on the lid of a tip box, and incubate at 56°C for 30 min to evaporate residual ethanol. 11. Place the GF plate on top of the microplate that will be used to collect the DNA. Dispense 30–60 μl of ddH2O (prewarmed to 56°C) directly onto the membrane in each well of the GF plate and incubate at room temperature for 1 min before sealing the plate. 12. Place the assembled plates on a clean square-well block (to prevent cracking of the collection plate) and centrifuge at 5,000 × g force for 5 min to elute the DNA into the microplate (see Note 28). Remove the GF plate and discard it. 13. Cover the microplate containing the DNA extracts with caps or aluminum PCR foil. Extracts can be temporarily stored in a refrigerator (4°C), but a freezer (−20°C) is recommended for long-term storage. Between 1 and 5 μl of the DNA extract should be used for PCR depending mainly on the quality of the DNA source (e.g., fresh tissue sample vs. toe pad sample).
7 3.2.2. Individual Tubes Format
DNA Barcoding Birds: From Field Collection to Data Analysis
135
1. Mix 50 μl of VLB and 5 μl of Proteinase K (20 mg/ml) multiplied by the number of samples to process. Put 50 μl of this Lysis Mixture in each tube. 2. Add a small piece of tissue (approx. 1–2 mm3, see Note 21) to each tube (see Notes 22–24 and 26). 3. To allow digestion, incubate the tubes at 56°C overnight (or for a minimum of 6 h) (see Note 27). 4. Centrifuge the tubes at 5,000–8,000 × g force for 15 s to remove any condensate from the walls and lids. 5. Add 100 μl of BM to each tube. 6. Mix the tubes by pipetting and transfer the lysate into spin columns (e.g., Epoch Biolabs spin columns with attached Lid) placed on top of collection tubes. 7. Centrifuge the columns and tubes at 5,000–6,000 × g force for 2 min to bind DNA to the GF membrane. 8. Add 180 μl of PWB to each column and centrifuge the columns and tubes at 5,000–6,000 × g force for 2 min. Pour out the contents of each collection tube or replace it with a new tube. 9. Add 700 μl of WB to each column and centrifuge the columns and tubes at 5,000–6,000 × g force for 4 min. Pour out the flow through from each collection tube or replace it with a clean one and centrifuge the columns and tubes at 10,000 × g force for an extra 4 min. 10. Replace the collection tubes with 1.5 ml tubes with removed lids and incubate the columns (with the lids opened) and tubes at 56°C for 15–30 min (incubation can alternatively be done at room temperature). 11. Add 50–80 μl of ddH2O (prewarmed to 56°C) directly onto the membrane of each column and incubate the columns and tubes at room temperature for 1 min before closing the lids of the columns. 12. Centrifuge the columns and tubes at 10,000 × g force for 5 min to collect the DNA eluate. 13. Transfer the DNA extract into a clean tube. Extracts can be temporarily stored in a refrigerator (4°C), but a freezer (−20°C) is recommended for long-term storage. Between 1 and 5 μl of the DNA extract should be used for PCR depending mainly on the quality of the DNA source (e.g., fresh tissue sample vs. toe pad sample).
136
D.A. Lijtmaer et al.
3.3. PCR Amplification of COI 3.3.1. Primer Selection
1. The target region in birds is the standard animal COI barcode, a 648-bp fragment of the 5¢ end of COI. Primer pairs that have been used to amplify this gene region are shown in Table 1, and corresponding amplicon sizes in Table 2 (see also Fig. 3). For efficient analysis of large numbers of samples, it is desirable to use “universal” primers, i.e., those which work with a broad taxonomic range of birds. To date, the largest number of successful amplifications have utilized the forward BirdF1 and the reverse COIBirdR2 primers (3, 6, 8, 9). Other primer pairs generally effective for avian samples include CO1F + CO1R (16), PasserF1 + PasserR1 (10), and AWCF1 + AWCR6 (12). These four sets of primers should be considered as first-choice primer pairs (combinations of them could also give good results, such as the pair AWCF1 + COIBirdR2 for passerines; ref. 12). Other primers should be considered when these recommended first options fail. The forward primer FalcoFA, for example, has been shown to work well in cases in which the combination BirdF1 + COIBirdR2 tended to fail (3). In situations where DNA yields are low (e.g., degraded samples), one of us (KCRK) found that a nested PCR approach often succeeds when standard amplification attempts fail. The primer pair LTyr + COI907aH2 (13) can be used for an initial PCR and the amplified fragment can then be used as a template for a second PCR using the primer pair COIaRt + COI748Ht (13). The latter primer pair is also used for sequencing. 2. The aforementioned primers are designed to amplify as a single fragment the entire barcode region (or an even longer portion of the mitochondrial genome) and are generally effective when using high-quality DNA (i.e., obtained from frozen or ethanolpreserved tissue samples). For many species or geographic regions, however, these kinds of tissue samples are not available. In such cases, traditional museum collections of study skins or skeletons can be sampled. The drawback of using historical samples is that their DNA is usually highly degraded making full-length barcode amplification usually unsuccessful. In these cases, short, overlapping fragments (each of them not longer than 300–400 bp) must be amplified using internal primers. Two groups of primers used to amplify relatively short fragments are shown in Table 3 and Fig. 4, one consisting of four primers that generate the barcode in two fragments (6) and the other consisting of six primers that generate the barcode in three fragments (12).
3.3.2. PCR mixture
1. The ingredients of the PCR mixture are detailed in Table 4 (using the reagents listed in Subheading 2.3). This recipe should be followed when performing PCR reactions in 12.5 μl volumes, which has the advantage of reducing the total amount
Orientation
F
R
R
R
F
R
R
F
R
F
R
F
R
F
F
R
R
R
F
R
F
R
R
Name
BirdF1
BirdR1
BirdR2
BirdR3
FalcoFA
VertebrateR1
COIbirdR2
CO1-ExtF
CO1-ExtR
PasserF1
PasserR1
AWCF1
AWCR6
LTyr
COIaRt
COI748Ht
COI745h2
COI907aH2
CO1F
CO1R
L6615
H10884
H8121
20
23
23
26
26
23
25
23
21
25
26
25
26
24
26
26
26
26
27
22
28
25
26
Length
1,478
4,260
−30
704
51
905
747
746
49
−24
794
−2
708
52
1,551
−2
746
707
51
1,064
747
747
51
Location
(47)
(46)
(46)
(16)
(16)
(13)
(13)
(13)
(13)
(13)
(12)
(12)
(10)
(10)
(9)
(9)
(6)
(3)
(3)
(2)
(2)
(2)
(2)
Primary reference
(4, 5)
(4, 5)
(11)
(8, 9)
(3, 7, 9)
(3, 6–9, 37)
Additional references
DNA Barcoding Birds: From Field Collection to Data Analysis
GGGCAGCCRTGRATTCAYTC
GGRTCRAANCCRCAYTCRTANGG
CCYCTGTAAAAAGGWCTACAGCC
ACTTCTGGGTGGCCAAAGAATCAGAA
TTCTCGAACCAGAAAGACATTGGCAC
GTRGCNGAYGTRAARTATGCTCG
ACRTGNGAGATRATTCCRAANCCNG
TGGGARATAATTCCRAAGCCTGG
AACAAACCACAAAGATATCGC
TGTAAAAAGGWCTACAGCCTAACGC
ATTCCTATGTAGCCGAATGGTTCTTT
CGCYTWAACAYTCYGCCATCTTACC
GTAAACTTCTGGGTGACCAAAGAATC
CCAACCACAAAGACATCGGAACC
AACCAGCATATGAGGGTTCGATTCCT
ACGCTTTAACACTCAGCCATCTTACC
ACGTGGGAGATAATTCCAAATCCTGG
TAGACTTCTGGGTGGCCAAAGAATCA
TCAACAAACCACAAAGACATCGGCAC
AGGAGTTTGCTAGTACGATGCC
ACTACATGTGAGATGATTCCGAATCCAG
ACGTGGGAGATAATTCCAAATCCTG
TTCTCCAACCACAAAGACATTGGCAC
Sequence (5¢–3¢)
Table 1 Avian DNA barcode primers [primer name, orientation, sequence, length, position of 3¢ nucleotide relative to COI start codon [designated position 0], and references are shown]
7 137
PasserF1
FalcoFA
CO1F
BirdF1
COIaRT
AWCF1
CO1-ExF
LTyr
L6615
Forward
704
CO1R
Reverse
707
705
746
799
747
746
750
749
740
847
976
1,060
1,551
1,605
4,336
VertebrateR1 PasserR1 COIbirdR2 BirdR1 BirdR2 COI748Ht AWCR6 COI907aH2 BirdR3 H8121 CO1-ExtR H10884
Table 2 Primer pairs that have been successfully used to obtain bird barcodes and sizes of the amplified fragment in base pairs (Sizes of fragments obtained with first-choice primer sets are in bold)
138 D.A. Lijtmaer et al.
7
DNA Barcoding Birds: From Field Collection to Data Analysis
139
Fig. 3. Schematic of avian DNA barcode primers and COI landmarks.
Table 3 Primers used to obtain the avian DNA barcode when working with degraded DNA
Primer
Primer sequence (5¢–3¢)
Primer 3¢ end Original Primer location relative reference length to COI start codon for the primer
BirdF1
TTCTCCAACCACAAAGACATTGGCAC
26
52
(2)
AvMiF1
CCCCCGACATAGCATTCC
18
285
(6)
AvMiR1
ACTGAAGCTCCGGCATGGGC
20
411
(6)
COIbirdR2 ACGTGGGAGATAATTCCAAATCCTGG
26
747
(6)
AWCF1
CGCYTWAACAYTCYGCCATCTTACC
25
−2
(12)
AWCintF2
ATAATCGGAGGCTTCGGAAACTGA
24
245
(12)
AWCintF4
TCCTCAATCCTGGGAGCAATCAACTT
26
493
(12)
AWCintR2 ATGTTGTTTATGAGTGGGAATGCTATG 27
275
(12)
AWCintR4 TGGGAKAGGGCTGGTGGTTTTATGTT 26
510
(12)
AWCintR6 GGATTAGGATGTAGACTTCTGGGTG
720
(12)
25
The top four primers generate two overlapping fragments and the bottom six primers generate three shorter, overlapping fragments (see Fig. 4 for a scheme showing how to combine each set of primers)
of reagents needed (and thus the cost of each reaction) and allows sequencing without cleanup because primers and dNTPs are consumed during the PCR reaction. Always prepare more volume than needed to account for pipetting error and dead volumes (e.g., prepare enough PCR mix for 100 reactions to
140
D.A. Lijtmaer et al.
Fig. 4. Schematic of avian DNA barcode primers that can be used to amplify short, overlapping fragments when dealing with degraded DNA.
Table 4 Recipe for the PCR mix (this recipe is for using 2 ml of DNA extract, but the volume of ddH2O can be adjusted to use between 1 and 4 ml of DNA template) Reagent
Amount for 1 reaction (ml)
Amount for 100 reactions (one plate) (ml)
10% Trehalose
6.25
625
ddH2O
2
200
10× Buffer
1.25
125
50 mM MgCl2
0.625
62.5
10 μM forward primer
0.125
12.5
10 μM reverse primer
0.125
12.5
10 mM dNTPs
0.0625
6.25
Taq polymerase (5 U/μl)
0.06
6
Total
10.5
1,050
DNA template
2 μl per well or tube
amplify an entire 96-well plate, see Table 4). Due to the high efficiency of the Platinum taq, it is important to minimize the risks of contamination (see Note 29 for recommendations). 3.3.3. Thermal Cycling Programs
1. The amplification parameters for first-choice primer sets applied to high-quality DNA are outlined in Table 5. The thermal cycle parameters for primer sets used for degraded DNA are outlined in Table 6. For programs used for other primer pairs, please check the references listed in Table 1. 2. PCR success can be visualized on 2% agarose gels. If working with 96-well plates, we recommend the use of the Invitrogen E-gel 96 system.
7
DNA Barcoding Birds: From Field Collection to Data Analysis
141
Table 5 Thermocycler parameters for first-choice primer sets for high-quality DNA Primer pair
Thermocycle program
BirdF1 + COIbirdR2
94°C for 1 min, 5 cycles (94°C for 1 min, 45°C for 40 s, 72°C for 1 min), 35 cycles (94°C for 1 min, 51°C for 40 s, 72°C for 1 min), 72°C for 5 min
CO1F + CO1R
94°C for 2 min, 25 cycles (94°C for 20 s, 48°C for 45 s, 72°C for 30 s), 72°C for 3 min
PasserF1 + PasserR1
95°C for 3 min, 40 cycles (94°C for 1 min, 58°C for 40 s, 72°C for 1.5 min), 72°C for 5 min
AWCF1 + AWCR6
94°C for 2 min, 35 cycles (94°C for 30 s, 57.5°C for 30 s, 72°C for 30 s), 72°C for 4 min
Table 6 Thermocycle parameters for primer sets used to amplify degraded DNA
3.4. Cycle Sequencing 3.4.1. Sequencing Mixture
3.4.2. Thermal Cycling Program
Primer set
Thermocycle program
BirdF1 + AvMiR1 AvMiF1 + COIbirdR2
94°C for 1 min, 25 cycles (94°C for 1 min, 45°C for 1.5 min, 72°C for 1.5 min), 35 cycles (94°C for 1 min, 55°C for 1.5 min, 72°C for 1.5 min), 72°C for 5 min
AWCF1 + AWCintR2 AWCintF2 + AWCintR4 AWCintF4 + AWCR6
94°C for 2 min, 10 cycles (94°C for 20 s, 55°C for 20 s, 72°C for 20 s), 30 cycles (94°C for 20 s, 50°C for 20 s, 72°C for 20 s), 72°C for 4 min
The recipe for the sequencing mix is detailed in Table 7. This recipe includes a stabilizer (i.e., 10% trehalose) so that premade mixes can be stored in a −20°C freezer for up to 3 months. The primer used depends on those used during PCR. For some primer pairs, unique primers are introduced for sequencing (see ref. 13). For highthroughput sequencing, M13 tails can be added to PCR primers to streamline cycle sequencing reactions (34). Cycle sequencing should generally involve the following thermal cycle: 96°C for 2 min; 30 cycles of 96°C for 30 s, 55°C for 15 s, and 60°C for 4 min; then hold at 4°C until samples are removed from the thermocycler.
142
D.A. Lijtmaer et al.
Table 7 Recipe for the sequencing mix (this recipe is for using 1 ml of DNA extract)
3.4.3. Sequencing Clean-up
Reagent
Amount for 1 reaction (ml)
Amount for 100 reactions (one plate) (ml)
BigDye Terminator v3.1
0.25
25
5× ABI sequencing buffer
1.875
187.5
10% Trehalose
5
500
10 μM sequencing primer
1
100
ddH2O
0.875
Total
9
DNA template
1 μl per well or tube
87.5 900
The following protocol, adapted from that developed at the CCDB (32), details how to clean cycle sequencing reactions using the Sephadex method in 96-well plate format (plates are best prepared two at a time to provide a balance for centrifugation steps). 1. Measure out Sephadex G-50 proportions for a 96-well plate using a multiscreen column loader (Millipore), and then invert over a 96-well filter plate with 0.45 μm pore size. 2. Add 300 μl of ultrapure water to each well, and then allow the plate to sit for at least 1 h (alternatively, plates can be left overnight in a refrigerator at 4°C). 3. Assemble the filter plate with hydrated Sephadex into a sandwich with a 96-well microtiter catch plate. Centrifuge at 750 × g force for 3 min to remove the water from the well. Discard the flow through (the collection plate can be reused for this wash step in the future). 4. Pipette the entire contents of the sequence reaction onto the centre of the Sephadex columns. 5. Add 25 μl of 0.1 mM EDTA pH 8.0, to each well of a final collecting 96-well plate. 6. Place the filter plate over the final collecting plate (pay careful attention to plate orientation), and then centrifuge at 750 × g force for 3 min. 7. Discard Sephadex from the filter plate. Dry the contents of the final collection plate and seal once completely dry.
7
3.5. Data Analysis 3.5.1. Sequence Assembly and Verification
DNA Barcoding Birds: From Field Collection to Data Analysis
143
Bidirectional sequencing—a requisite for high-confidence base calls and GenBank keyword barcode compliance—necessitates contig assembly from forward and reverse sequencing reads. Several programs exist that can facilitate this process through automation (e.g., Sequencher or CodonCode). Assembled contigs should then be aligned. If no ambiguous base calls occur at the end of the sequences (i.e., sequences are of full length between primer-binding sites) or are relatively few, then sequences may be easily aligned by eye. Otherwise, sequence alignment software may be required (e.g., MEGA). Amino acid translations should be carefully scrutinized for rare mutations, which may be indicative of errors introduced during contig assembly.
3.5.2. Tree Based Verification
Newly obtained sequences should be reviewed for possible sources of error, including contamination, misidentifications, and pseudogenes. Most of these are most readily identified via phylogenetic methods, such as tree construction. Sequences uploaded to BOLD may be easily reviewed using the “Taxon ID Tree” tool under the “Sequence Analysis” menu, which generates a neighbor-joining tree of the selected sequences. The more species—and specimens per species—one includes in this analysis, the better chance one has to detect possible sources of error. When including large numbers of sequences, trees can be colorized based on the sequence age to highlight the specimens that were recently added to BOLD and one wants to check.
3.5.3. Contamination
Contamination may originate from lab products (which involve common laboratory species, such as Mus musculus or Sus domestica), from cross-well contaminants when using 96-well plates, or from diverse sources when using samples obtained from historical specimens (such as toe pad samples from museum skins). The former is readily identifiable, while cross-well contamination can usually be detected when the sequences obtained from two samples of the plate that belong to different species are identical, particularly if these species are quite different from each other and not likely to be confused (Fig. 5). However, cross-well contamination might be more challenging to confirm when closely related species are processed in the same plate (see Note 25). Sequences obtained from degraded DNA should be carefully scrutinized for accuracy because contaminant DNA can outperform that which is targeted— contamination should be suspected if there is an unexpected result.
3.5.4. Misidentifications
If lab errors and contamination have been ruled out, disagreements between specimen identification and barcode results should prompt the inspection of voucher materials. This is most pivotal when the species indicated by the barcode is similar to the one originally identified based on morphology (Fig. 6). In these cases, the identification of the specimen should be carefully reviewed.
144
D.A. Lijtmaer et al.
Fig. 5. An example of a BOLD species ID tree that depicts a cross-well contamination. Note the presence of a sequence theoretically belonging to a specimen of Poospiza torquata placed within the haplogroup of Cranioleuca pyrrophia. The evidence that this is likely a case of contamination includes two aspects: (a) the sequence is identical to others in the same plate (note that the process ID of the sample of P. torquata is LBARG050-10, and there are two other identical sequences from close wells in the plate: LBARG048-10 and LBARG058-10) and (b) P. torquata, which belongs to the family Emberizidae, is quite different from C. pyrrophia, which is a furnarid, and therefore an identification error can be ruled out.
Fig. 6. An example of a BOLD species ID tree that flags a possible misidentification. Note that Saltator coerulescens and S. similis are clearly separated in the neighbor-joining tree, but one specimen of S. similis (MACN-Or-ct 5072) is placed within the S. coerulescens clade. Because juveniles of S. coerulescens resemble the adults of S. similis, it is likely that a juvenile of S. coerulescens was collected and misidentified in the field as S. similis. The voucher has to be reanalyzed in cases like this, which is the main reason why it is so important to use vouchered samples for barcoding projects.
7
DNA Barcoding Birds: From Field Collection to Data Analysis
145
3.5.5. NuclearMitochondrial Pseudogenes (Numts)
Numts are copies of mitochondrial genes that have been translocated to the nuclear genome. These copies are typically nonfunctional and, thus, accumulate nonsensical mutations, such as frameshifts and stop codons. These features usually allow pseudogenes to be rapidly identified. Full-length COI pseudogenes are uncommon but might be more frequently encountered when working with avian blood samples. In exceptional cases, characteristic features might be absent from a pseudogene, in which case a more thorough phylogenetic analysis may be required to properly identify it (21).
3.5.6. Assessing Species Resolution of DNA Barcodes
The “Nearest Neighbor Summary” tool, available under the “Sequence Analysis” menu in BOLD, can be used to identify low values of interspecific variation. Alternatively, neighbor-joining trees can also be used to highlight species that fail to produce independent haplogroups. Species pairs with highly similar sequences can be subjected to character-based analysis (35, 36) to identify diagnostic character states (i.e., single-nucleotide polymorphisms) that distinguish otherwise very similar sequences.
3.5.7. Identifying Divergent Intraspecific Lineages
The “Distance Summary” tool, available under the “Sequence Analysis” menu in BOLD, can be used to identify species with high levels of intraspecific variation, which can be indicative of cryptic species. Alternatively, neighbor-joining trees may be generated with bootstrap support to identify lineages that garner stronger statistical support when clustered independently rather than as one cohesive group.
3.5.8. Identifying Unknown Specimens
If barcode sequences have been acquired for the purpose of species identification (as opposed to generating a reference library), then these may be identified singly using either the BOLD Identification Engine (BOLD-IDS) or the Basic Local Alignment Search Tool (BLAST) available from NCBI (37). The main advantage of using the former system is that BOLD includes not only the public sequences that are available in GenBank, but also those that belong to particular projects, which are not visible to all users but are part of the database searched by the identification engine, increasing in this way the chances of a correct identification. In addition, identifications in BOLD can not only be done simply by sequence similarity (as in a BLAST search), but also using a tree-based approach.
4. Notes 1. It is usually necessary to obtain a permit to collect whole bird specimens, blood samples, or even feathers. A transit permit is also generally necessary to move specimens or samples between provinces/states within each country. A permit (or a series of
146
D.A. Lijtmaer et al.
permits) is usually also needed to export/import tissue samples or DNA extracts. If samples originate from a species protected by the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES), then an additional permit is needed. It is essential to check national regulations before starting any barcoding project that potentially includes collection or movement of tissue samples or DNA extracts across borders. 2. Usually small- to medium-sized birds (ranging from hummingbirds to doves) can be captured with mist nets and larger species are hunted using firearms. There are also special types of nets used for particular groups of birds, such as cannon- or rocket-projected nets used for capturing shorebirds. 3. Weigh the guanidine thiocyanate first, add the required volumes of stock solutions and part of the ddH2O needed to reach the final volume, vigorously mix on magnetic stirrer with heater until the guanidine thiocyanate is fully dissolved, and finally add more ddH2O if necessary (when working at a smaller scale and preparing smaller volumes in a tube that cannot be placed in a magnetic stirrer, the solution can be warmed at 56°C and mixed with a vortex). If any recrystallization occurs, prewarm the solution at 56°C to dissolve the guanidine thiocyanate before using it. 4. The use of Platinum Taq polymerase (Invitrogen) has several advantages compared to the standard Taq, including higher success rate, higher amount of amplicons produced per sample, the need of a “hot start” for activation (which reduces amplification of nonspecific fragments), and stability at room temperature. 5. The use of trehalose stabilizes the PCR and allows freezing of aliquoted PCR mixes, which can simplify and accelerate workflow in high-throughput facilities. 6. For example, in studies supported by the funding sources of the International Barcode of Life project (iBOL), ten specimens per species is the upper limit. This is because the objective is to maximize the number of species represented in the database. However, in the case of species for which two or more lineages with deep genetic divergence are found, one might want to include more individuals to better study the genetic structure of the species. In fact, if the divergence between lineages suggests that they have been isolated for a long period of time and are evolving independently, up to ten specimens per lineage might be sampled for barcoding. 7. Depending on the characteristics of the collecting trip, specimens can be frozen in a freezer or in a cooler containing dry ice. Wet ice should be avoided because of its humidity and higher temperature, which does not guarantee that all specimens in the cooler will be kept frozen.
7
DNA Barcoding Birds: From Field Collection to Data Analysis
147
8. Other options are available for long-term storage of tissue samples at room temperature or in a refrigerator, such as drying the sample on ceramic beads or using FTA paper, but nitrogen or ultracold freezers are the most widely used and have been demonstrated to be effective for long-term storage. 9. Even though the sample is fixed, it is highly recommended to store the tube in an ultracold freezer or in nitrogen once it is deposited in the permanent collection if this is possible. 10. Ethanol should be replaced a few days after placing the sample inside the tube. This is because the water in the tissue sample is replaced by ethanol and it is released to the tube, thus diluting the ethanol. This is particularly relevant when the sample is large relative to the amount of ethanol. 11. We recommend to write the identification of the tube contents on adhesive labels with a pencil (covering it with tape), given that it is common for inked labeling to wear off after extended periods in an ultracold freezer or liquid nitrogen. In the case of samples fixed with ethanol, it is particularly important to use a pencil or to make sure that the marker is alcohol-proof because most permanent markers are only waterproof and are easily removed by ethanol (even by the ethanol vapors that can accumulate inside the boxes or bags in which the tubes are stored). Scratching the information on the tube is a highly discouraged alternative because it is unpractical and the data is later difficult to retrieve. 12. Freezing the sample is not recommended for some lysis buffers due to the tendency of some of their components to precipitate. In this case, we recommend keeping the samples in a refrigerator (particularly for long-term storage). 13. Vouchers are vital because they allow confirming the identification of species. In addition, they provide important information (e.g., about morphology) that can complement the conclusions reached through the DNA analysis. The importance of collecting bird specimens and preserving vouchers in addition to tissue samples has been emphasized in the literature (38–43). 14. There are different standards for vouchers, depending on the accuracy with which they permit species identification. The “gold standard” for most birds is a study skin from an adult male in breeding plumage, but other options are also valid when these ideal conditions cannot be met, including the use of juveniles or females (which might be as useful as males for species that lack sexual dimorphism) and the preparation of skeletons or ethanol-preserved specimens (ideally, in at least 96% ethanol). 15. We advise to preserve as many sources of information from the specimen as possible. The ideal situation is to prepare a study
148
D.A. Lijtmaer et al.
skin and keep a partial skeleton, including the bones that are retired from the skin during preparation. Other materials, such as internal organs, the syrinx, stomach contents, or parasites, can also be preserved. 16. When blood samples are obtained and the individuals are not collected (as in the case of rare or endangered species), we recommend taking diagnostic photographs before the bird is released to serve as e-vouchers. 17. For projects using preexisting museum collections, the presence—and type—of vouchers should be considered when selecting samples. 18. This is particularly relevant when obtaining an e-voucher of a specimen that is bled and then released. However, it is also useful to take pictures of the living bird even if it will be collected because some characteristics might be better observed in the individual when it is alive (such as eye color, for example). Pictures can also aid in confirming the identification of the specimen after returning from the field (particularly in cases when preparation of vouchers is not done immediately). 19. It is very useful to store the data associated with the specimens in a database as compatible as possible with the data submission spreadsheet to minimize the work needed to upload this information to BOLD. 20. The toe pad is the best source of DNA from study skins. This is because the contact with preservatives that may damage DNA is reduced in this area and because the risk of crosscontamination is also minimized (particularly if the sample is obtained at some depth and not from the toe pad surface). 21. Use very small amounts of tissue. Larger samples complicate the extraction process rather than facilitating it. A rule of thumb is that the tissue sample is big enough as long as it can be seen with the naked eye. 22. Dissection instruments used for transferring tissue samples to the plate (or tube) should be sterilized between samples. This can be done using a DNA-removing detergent (such as ELIMINase). The instrument should be placed for a few seconds in the detergent and then washed three times using three different water containers (this way, the concentration of detergent decreases in each subsequent container and after the third wash, there should be no detergent left on the instrument). Alternatively, instruments can be sterilized with bleach or a flame (although the latter can damage the instruments much more than the detergent). 23. When working with toe pad samples, it is recommended to rehydrate and soften the tissue before digestion. Samples can be preincubated in phosphate-buffered saline (PBS) for 24 h
7
DNA Barcoding Birds: From Field Collection to Data Analysis
149
before digesting them with the Lysis Mix (44). A recipe for PBS includes the following final concentration of components: 137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4, 2 mM KH2PO4, adjusted to pH 7.4. Note that the same plates or tubes that are used for the extraction can be used for the incubation with PBS, in which case the PBS should be pipetted out of each well or tube before adding the Lysis Mix (it is recommended to pulse centrifuge the plate or tube for a few seconds after pipetting the PBS and then pipette again any leftovers before adding the Lysis Mix). 24. When using feathers as a DNA source, undissolved keratin from feather barbs and barbules may clog the filter plates, obstructing DNA extraction. When working with larger feathers (i.e., rectrices), isolate the section of the feather shaft containing the superior umbilicus and use only this for your sample. When working with smaller feathers (i.e., contour feathers from passerine birds), feather keratin may be digested by adding 20 μl of 1 M dithiothreitol (DTT) solution to the Lysis Mix for each sample. Alternatively, DNA may be extracted from feathers using a Chelex-based method (45), but this results in a poor-quality DNA extract. 25. Because cross-well contamination between conspecific samples is virtually impossible to detect, we recommend not placing tissue samples from the same species in contiguous wells. 26. It is always a good practice to clean the bench top with ethanol before starting the extraction protocol and to change gloves if they touch possible sources of contamination. Furthermore, pipettes used for the extraction protocol should never be used to handle PCR products. These caveats are particularly important when extracting DNA from samples obtained from historical specimens (such as toe pad samples from study skins). In this case, change gloves often, use filter tips, and work as isolated as possible from other areas of the laboratory in which other samples, DNA extracts, or PCR products are handled. 27. In the case of samples obtained from toe pads or feathers in which digestion tends to be more complicated, longer incubation periods might be necessary to completely digest the tissue. Incubate samples for 24 h or until there is no visible undegraded tissue. Adding 2–3 μl of proteinase K (20 mg/ml) after the first 6 h of incubation is recommended. 28. Square-well blocks can be washed with a DNA-removing detergent, autoclaved, and reused. 29. To minimize the risks of contamination, always clean the bench top before preparing the PCR mix, change gloves if they touch any contaminants, add DNA template lastly after reagents have been returned to the freezer, and try not to work with PCR
150
D.A. Lijtmaer et al.
products in the area where PCR mix is prepared. Ideally, a different set of pipettes should be used for handling PCR reagents and PCR products (and a third one for performing DNA extraction). Contamination is particularly problematic when amplifying the barcode from an extract containing degraded DNA (such as with toe pad samples) because a fragment of the nondegraded, contaminant DNA might have higher chances of being amplified than the target DNA. In these cases, it is vital to be extra careful; change gloves often, always use filter tips to prepare the PCR mix, and work as isolated as possible from other areas of the laboratory in which other samples, DNA extracts, or PCR products are handled. To detect contamination of reagents, always include a negative control.
Acknowledgments Our work related to barcoding has been possible thanks to the financial support provided by the National Research Council of Argentina (CONICET), the Agencia Nacional de Promoción Científica y Tecnológica (ANPCyT), Lounsbery Foundation, Fundación Williams, the Consortium for the Barcode of Life (CBOL), the iBOL, the International Development Research Centre of Canada (IDRC), and the Biodiversity Institute of Ontario—Canadian Centre for DNA Barcoding (BIO-CCDB). We also thank the authorities of National Fauna and the provincial offices of fauna of Argentina, the National Parks Administration of Argentina, and Fundación Pearson. Finally, we thank N. Ivanova and A. Borisenko for their invaluable help in diverse aspects of the barcoding process. References 1. Stoeckle M, Winker K (2009) A global snapshot of avian tissue collections: state of the enterprise. Auk 126:684–687 2. Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM (2004) Identification of birds through DNA barcodes. PLoS Biol 2:1657–1663 3. Kerr KCR, Stoeckle MY, Dove CJ et al (2007) Comprehensive DNA barcode coverage of North American birds. Mol Ecol Notes 7:535–543 4. Vilaça ST, Lacerda DR, Sari EHR, Santos FR (2006) DNA-based identification applied to Thamnophilidae (Passeriformes) species: the first barcodes of Neotropical birds. Revista Bras Ornitol 14:7–13 5. Chaves AV, Clozato CL, Lacerda DR et al (2008) Molecular taxonomy of Brazilian tyrantflycatchers (Passeriformes: Tyrannidae). Mol Ecol Resour 8:1169–1177
6. Kerr KCR, Lijtmaer DA, Barreira AS et al (2009) Probing evolutionary patterns in Neotropical birds through DNA barcodes. PLoS One 4:e4379. doi:10.1371/journal. pone.0004379 7. Yoo HS, Eah JY, Kim JS et al (2006) DNA barcoding Korean birds. Mol Cells 22:323–327 8. Kerr KCR, Birks SM, Kalyakin MV et al (2009) Filling the gap – COI barcode resolution in eastern Palearctic birds. Front Zool 6:29 9. Johnsen A, Rindal E, Ericson PGP et al (2010) DNA barcoding of Scandinavian birds reveals divergent lineages in trans-Atlantic species. J Ornithol 151:565–578 10. Lohman DJ, Prawiradilaga DM, Meier R (2009) Improved COI barcoding primers for Southeast Asian perching birds (Aves: Passeriformes). Mol Ecol Res 9:37–40
7
DNA Barcoding Birds: From Field Collection to Data Analysis
11. Lohman DJ, Ingram KK, Prawiradilaga DM et al (2010) Cryptic genetic diversity in “widespread” Southeast Asian bird species suggests that Philippine avian endemism is gravely underestimated. Biol Cons 143:1885–1890 12. Patel S, Waugh J, Millar CD, Lambert DM (2009) Conserved primers for DNA barcoding historical and modern samples from New Zealand and Antarctic birds. Mol Ecol Res 10:431–438 13. Tavares ES, Baker AJ (2008) Single mitochondrial gene barcodes reliably identify sister-species in diverse clades of birds. BMC Evol Biol 8:81 14. Sanín C, Cadena CD, Maley JM et al (2009) Paraphyly of Cinclodes fuscus (Aves: Passeriformes: Furnariidae): implications for taxonomy and biogeography. Mol Phylogenet Evol 53:547–555 15. Lijtmaer DA, Kerr KCR, Barreira AS et al (2011) DNA barcode libraries provide insight into continental patterns of avian diversification. PLoS One 6: e20744. doi:10.1371/journal.pone.0020744 16. Dove CJ, Rotzel NC, Heacker M, Weigt LA (2008) Using DNA barcodes to identify bird species involved in birdstrikes. J Wildlife Manag 72:1231–1236 17. Marra PP, Dove CJ, Dolbeer R et al (2009) Migratory Canada geese cause crash of US Airways Flight 1549. Front Ecol Environ 7: 297–301 18. Waugh J, Evans MW, Millar CD, Lambert DM (2010) Birdstrikes and barcoding: can DNA methods help make the airways safer? Mol Ecol Res. doi:10.1111/j.1755-0998.2010.02884.x 19. Alcaide M, Rico C, Ruiz S et al (2009) Disentangling vector-borne transmission networks: a universal DNA barcoding method to identify vertebrate hosts from arthropod bloodmeals. PLoS One 4:e7092 20. Robert M, Vaillancourt MA, Drapeau P (2010) Characteristics of nest cavities of Barrow’s Goldeneyes in eastern Canada. J Field Ornithol 81:287–293 21. Kerr KCR (2010) A cryptic, intergeneric cytochrome c oxidase I pseudogene in tyrant flycatchers (family: Tyrannidae). Genome 53: 1103–1109 22. Blake ER (1949) Preserving birds for study. Fieldiana Tech 7:1–38 23. Johnson NK, Zink RM, Barrowclough GF, Marten JA (1984) Suggested techniques for modern avian systematics. Wild Bull 96: 543–560 24. Bates J, Hackett S, Zink RM (1993) Técnicas y materiales para la preservación de tejidos conge-
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
151
lados. In: Escalante-Pliego P (ed) Curación moderna de colecciones ornitolólogicas. American Ornithologists’ Union, Washington, DC Proctor NS, Lynch PJ (1993) Manual of ornithology: avian structure and function. Yale University Press, New Haven; London Winker K (2000) Obtaining, preserving, and preparing bird specimens. J Field Ornithol 71: 250–297 de Queiroz Piacentini V, Silveira LF, Costa Straube F (2010) A coleta de aves e a sua preservação em coleções científicas. In: Von Matter S, Straube FC, Accordi IA, de Queiroz Piacentini V, Cândido-Jr JF (eds) Ornitologia e conservação: ciencia aplicada, técnicas de pesquisa e levantamento. Technical Books Editora, Rio de Janeiro Roos AL (2010) Capturando aves. In: Von Matter S, Straube FC, Accordi IA, de Queiroz Piacentini V, Cândido-Jr JF (eds) Ornitologia e conservação: ciencia aplicada, técnicas de pesquisa e levantamento. Technical Books Editora, Rio de Janeiro Ivanova NV, Dewaard JR, Hebert PDN (2006) An inexpensive, automation-friendly protocol for recovering high-quality DNA. Mol Ecol Notes 6:998–1002 Ivanova NV, Dewaard JR, Hebert PDN. Protocols. Glass fiber plate DNA extraction. CCDB website: www.dnabarcoding.ca. Last accessed on February 27, 2012 Ivanova NV, Grainger C. Protocols. COI amplification. CCDB website: www.dnabarcoding. ca. Last accessed on February 27, 2012 Ivanova NV, Grainger C. Protocols. Sequencing. CCDB website: www.dnabarcoding.ca. Last accessed on February 27, 2012 Ratnasingham S, Hebert PDN (2007) BOLD: the barcode of life data system. Mol Ecol Notes. www.barcodinglife.org. doi:10.1111/ j.1471-8286.2006.01678.x Ivanova NV, Zemlak TS, Hanner RH, Hebert PDN (2007) Universal primer cocktails for fish DNA barcoding. Mol Ecol Resour 7:544–548 DeSalle R, Egan MG, Siddall M (2005) The unholy trinity: taxonomy, species delimitation and DNA barcoding. Phil Trans R Soc B 360: 1905–1916 Sarkar IN, Planet PJ, Desalle R (2008) CAOS software for use in character-based DNA barcoding. Mol Ecol Resour 8:1256–1259 Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215: 403–410 Goodman SM, Lanyon SM (1994) Scientific collecting. Cons Biol 8:314–315
152
D.A. Lijtmaer et al.
39. Remsen JV Jr (1995) The importance of continued collecting of bird specimens to ornithology and bird conservation. Bird Cons Int 5: 145–180 40. Winker K (1996) The crumbling infrastructure of biodiversity: the avian example. Cons Biol 10:703–707 41. Winker K, Braun MJ, Graves GR (1996) Voucher specimens and quality control in avian molecular studies. IBIS 138:345–346 42. Bates JM, Bowie RCK, Willard DE et al (2004) A need for continued collecting of avian voucher specimens in Africa: why blood is not enough. Ostrich 75:187–191 43. Winker K (2005) Bird collections: development and use of a scientific resource. Auk 122: 966–971
44. Campagna L, Benites P, Lougheed SC et al (2012) Rapid phenotypic evolution during incipient speciation in a continental avian radiation. Proc R Soc B, in press. doi: rspb.2011.2170v1-rspb20112170 45. Walsh PS, Metzger DA, Higuchi R (1991) Chelex 100 as a medium for simple extraction of DNA for PCR-based typing from forensic material. Biotechniques 10:506–513 46. Sorenson MD, Ast JC, Dimcheff DE et al (1999) Primers for a PCR-based approach to mitochondrial genome sequencing in birds and other vertebrates. Mol Phylogenet Evol 12:105–114 47. Sorenson MD (2003) Avian mtDNA primers. http://people.bu.edu/msoren/Bird.mt. Primers.pdf
Chapter 8 DNA Barcoding in Mammals Natalia V. Ivanova, Elizabeth L. Clare, and Alex V. Borisenko Abstract DNA barcoding provides an operational framework for mammalian taxonomic identification and cryptic species discovery. Focused effort to build a reference library of genetic data has resulted in the assembly of over 35 K mammalian cytochrome c oxidase subunit I sequences and outlined the scope of mammalrelated barcoding projects. Based on the above experience, this chapter recounts three typical methodological pathways involved in mammalian barcoding: routine methods aimed at assembling the reference sequence library from high quality samples, express approaches used to attain cheap and fast taxonomic identifications for applied purposes, and forensic techniques employed when dealing with degraded material. Most of the methods described are applicable to a range of vertebrate taxa outside Mammalia. Key words: Mammalia, Molecular diagnostics, Molecular biodiversity, Molecular methods, DNA extraction, PCR, Primers, Sequencing, Cytochrome c oxidase subunit I
1. Introduction Mammals represent a minute fraction of biological diversity, with only ~5,500 species currently recognized. Despite their large size and charismatic nature, garnering much taxonomic scrutiny, it is projected that their global diversity is still severely underestimated with at least 7,000 mammalian species in existence (1). The present rate of species discovery in mammals is surprisingly high (~10% increase in 15 years (1, 2)) with more than half of newly recognized species classified as “cryptic” (2) and their discovery largely attributable to the use of non-morphological character complexes. The recently introduced Genetic Species Concept (3, 4) emphasizes genetic rather than reproductive isolation as key to mammalian species’ definition and utilizes genetic divergence to assess species boundaries. While this concept does not define species based on genetic divergence alone and is thus not analogous to molecular taxonomy, it calls for collecting large amounts of
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_8, © Springer Science+Business Media, LLC 2012
153
154
N.V. Ivanova et al.
information on inter- and intraspecific genetic divergence as an important first step in the process of elaborating taxonomic hypotheses. This coincides with the DNA barcoding approach (5, 6), which offers an operational framework for species identification and discovery using short, standardized gene fragments. Although cytochrome b has been traditionally used for studying mammalian alpha-taxonomy (e.g., ref. 3), cytochrome c oxidase subunit 1 (COI) was the marker of choice for many groups outside Chordata, such as insects and certain marine invertebrates (7–9), and has been adopted as the standard barcoding marker for the animal kingdom (10). While COI evolves more slowly than cyt b (11), it performs equally well in mammalian diagnostics (12) and can yield complementary data for combined phylogenetic analyses (13). The utility of COI DNA barcoding in mammals has been demonstrated in a range of different studies, including bioinformatics (14–16), verification of field-made taxonomic assignments (17, 18), and assessment of genetic diversity patterns in regional (19, 20) and continental faunas (21). The standard animal DNA barcode region has been used in conjunction with other genes for taxonomic revision (13, 22–24) and new species’ description (25). As of 2010, more than 35 K mammalian sequences from over 1 K species have been assembled at the Biodiversity Institute of Ontario as part of an ongoing international effort to build the reference library of DNA barcodes. The sources from which our samples were obtained ranged from degraded archival specimens to high quality cryopreserved tissue. This experience has helped to outline the scope of mammal-related barcoding projects—from forensic cases to the creation of the reference DNA barcode library—and the array of methodological approaches that optimally suit each particular case. While one should keep in mind that mammalian DNA barcoding also intersects with important topics including the ethics of sampling vertebrate collections (26), approaches to conservation genetics (27), specimen examination (28), and the front-end logistics of the barcoding pipeline (29) for efficient high throughput molecular processing, these topics fall outside the scope of this paper and are discussed elsewhere. Here, we have attempted to summarize the methodological approaches employed for the molecular aspects of DNA barcoding in mammals. The molecular protocols are similar to those used in other animals (30), particularly vertebrates, and can be readily applied when dealing with sample sets which include multiple vertebrate groups. We describe three molecular pathways depending on application. 1. Routine barcoding—the assembly of the reference barcode library from high grade tissue samples. This approach frequently employs high throughput methodology in 96-well plate-based manual applications but is also applicable to robotic liquid handling protocols (31). The outcome is the generation
8
DNA Barcoding in Mammals
155
of high quality genomic DNA extracts suitable for long-term archival and bidirectional reads of the full-length barcode region of COI (657 bp). 2. Express barcoding—applied barcoding used in ecological surveys and rapid taxonomic assessments. Similar to routine methods, the approach generally utilizes high grade tissue sources and high throughput techniques, but can be scaled down for small numbers of samples. Express protocols use comparatively fewer reagents and require simpler equipment to be cost-effective, but the resulting DNA extracts are not of archival grade. Unidirectional short length (420 bp) sequences are often generated which are sufficient for reliable species-level identification in mammals (32). An offshoot of this approach is the design and use of microarrays (12, 14). 3. Forensic barcoding—applied barcoding aimed at generating DNA-based identifications when the DNA is degraded and contaminated with fungi or bacteria or when the samples are otherwise recalcitrant. This approach requires polymerase chain reaction (PCR) primers amplifying shorter fragments of DNA (33) and quality checks for cross-contamination. If the tissue is fresh, quick alkaline lysis can be used for DNA extraction; otherwise, we recommend following the DNA extraction protocol for routine barcoding (31). The protocols below are centered on high throughput approaches towards routine barcoding adopted for laboratories lacking robotic liquid handling equipment, but highlight the methodological deviations required for express and forensic barcoding. The utility of these protocols has been validated against a wide range of mammalian taxa representing all major extant orders.
2. Materials 1. Tissue preservation (routine DNA barcoding): (a) Ethanol 96% (store in a flammable liquid cabinet). (b) ELIMINase (Decon Labs Inc.) for sterilizing instruments. (c) Cryotubes with O-ring caps. (d) Acid-free label paper (Rite-in-the-Rain or equivalent). (e) Fine smooth-tip forceps (Dumont or equivalent; see Note 1). 2. Tissue subsampling and lysis (routine DNA barcoding): (a) Ethanol 96% (store in a flammable liquid cabinet). (b) Hard-shell skirted microplate (Eppendorf twin-tec 96 PCR plate, Fisher Scientific; Fig. 1).
156
N.V. Ivanova et al.
Fig. 1. 96-Well microplate shown from below demonstrating the recommended amount of tissue to sample in each well.
(c) 12-Strip flat PCR caps (Thermo Scientific). (d) ELIMINase (Decon Labs Inc.). (e) 12 × 8 Cryotube holding rack for arranging tubes in plate format. (f) Fine forceps (Dumont or equivalent, see Note 1). (g) KimWipes (Kimberly-Clark, Inc.). (h) Glass jars: one 4 oz for ELIMINase and three 8 oz for tool rinsing. (i) Multichannel pipette 5–200 mL LTS or Liquidator (Rainin); see Note 2. (j) Proteinase K: Proteinase K (20 mg/ml) in 10 mM Tris– HCl pH 7.4, 50% glycerol v/v. (Add 20 ml of water and 0.5 ml of 1 M Tris–HCI pH 7.4, to a vial with 1 g of Proteinase K, close the lid, mix well by inverting, and do not shake. Pour into graduated cylinder, add water to 25 ml, then add 25 ml of glycerol, and mix well on magnetic stirrer. Do not filter.) (k) VLB buffer: 100 mM NaCl, 50 mM Tris–HCl pH 8.0, 10 mM EDTA pH 8.0, 0.5% SDS (20 ml 1 M NaCl, 10 ml 1 M Tris–HCl pH 8.0, 1 g sodium dodecyl sulfate (SDS), water to 200 ml). Store at room temperature for up to 6 months. (l) Lysis mix: Mix 5 ml of VLB and 0.5 ml of Proteinase K in sterile container.
8
DNA Barcoding in Mammals
157
3. DNA extraction [routine DNA barcoding: glass fiber (GF) method]; see Note 3: (a) ELIMINase (Decon Labs Inc.). (b) Ethanol 96% (store in a flammable liquid cabinet). (c) 8-Strip flat PCR caps (Thermo Scientific). (d) GF plate: AcroPrep 96 1 ml filter plate with 1.0 mm GF media (PALL, Catalog No. 5051). (e) Matrix Impact2 pipette, 15–1,250 ml with tips (Matrix Technologies). (f ) Square-well block (PROgene Deep-Well Storage Plate 2 ml, Ultident). (g) PALL collar (SBS Receiver Plate Collar, PALL) (see Note 4). (h) Filter unit (0.2 mm CN membrane, Nalgene, Catalog No. 450-0020). (i) Hard-shell skirted microplate; also used as collection plate for DNA eluates. (j) Aluminum Sealing Film (Axygene Scientific, VWR) (to seal DNA plate). (k) Clear Sealing Film (Axygene Scientific, VWR) (used to cover GF plates during centrifugation). (l) Refrigerated centrifuge with swinging deep-well plate bucket rotor (Allegra 25R, Beckman Coulter) (see Note 5). (m) 1 M Tris–HCI pH 8.0 [26.5 g Trizma base (SigmaAldrich)], 44.4 g Trizma HCl (Sigma-Aldrich, water to 500 ml) (see Note 6 for all stock solutions and buffers). (n) 1 M Tris–HCI pH 7.4 (9.7 g Trizma base, 66.1 g Trizma HCl, water to 500 ml). (o) 0.1 M Tris–HCI pH 6.4 (6.06 g Trizma base, water to 500 ml); dissolve in smaller volume than 500 ml, adjust pH with HCl to 6.4–6.5, and then add water to attain the final volume. (p) 1 M NaCl (29.22 g NaCl, water to 500 ml). (q) 1 N NaOH (20 g NaOH, water to 500 ml). (r) 0.5 M EDTA pH 8.0 (186.1 g EDTA, ~20.0 g NaOH, water to 500 ml). Vigorously mix on magnetic stirrer with heater. Disodium salt of EDTA does not dissolve until pH is adjusted to ~8.0 with NaOH. Briefly rinse NaOH granules with ddH2O in a separate glass before dissolving. Adjust pH to 8.0 with 1 N NaOH, before bringing to the final volume. (s) BB buffer: 6 M guanidinium thiocyanate (GuSCN), 20 mM EDTA pH 8.0, 10 mM Tris–HCl pH 6.4, 4%
158
N.V. Ivanova et al.
Triton X-100 (354.6 g GuSCN, 20 ml 0.5 M EDTA pH.8.0, 50 ml 0.1 M Tris–HCl pH 6.4, 20 ml Triton X-100, water to 500 ml). Store at room temperature in the dark for up to 2 months. Weigh dry components first, then add required volumes of stock solutions, followed by small volume of water; dissolve on a warm water bath while constantly stirring; add water to attain the final volume. Filter while solution is still warm. Although the filtering is optional, it helps to minimize crystallization. If crystallization occurs, heat to 56°C before use to redissolve any salts (caution: GuSCN is an irritant; see Note 7). (t) WB buffer: 60% Ethanol, 50 mM NaCl, 10 mM Tris– HCl pH 7.4, 0.5 mM EDTA pH 8.0 (600 ml ethanol 96%, 23.75 ml 1 M NaCl, 9.5 ml 1 M Tris–HCl pH 7.4, 0.950 ml 0.5 M EDTA pH 8.0, water to 950 ml; do not adjust pH); store at –20°C. (u) BM buffer: 250 ml of BB buffer, 250 ml ethanol 96% (store at room temperature in the dark for up to 1 month). (v) PWB buffer: 260 ml of BB buffer, 700 ml ethanol 96%, 40 ml water (store at room temperature in the dark for up to 1 month). 4. PCR (same for all pathways): (a) 10% Trehalose: 5 g of D-(+)-Trehalose dihydrate (SigmaAldrich, No. 90210-50g; see Note 8), molecular grade water to 50 ml. Store in 1–2 ml aliquots at –20°C. (b) 10× PCR Buffer, Minus Mg (Invitrogen); store at –20°C. (c) 50 mM Magnesium Chloride (Invitrogen); store at –20°C. (d) 10 mM Deoxynucleotide Solution Mix (Promega); store in 100 ml aliquots at −20°C. (e) 100 mM primer stock: Dissolve desiccated primer (IDT DNA, USA) in the amount of water indicated by the manufacturer to produce the final solution of 100 mM (i.e., add the number of nmol × 10 ml of water; see Note 9); store at –20°C. (f ) 10 mM primer stock: 20 ml of 100 mM stock, 180 ml of water; store at –20°C. (g) Platinum Taq DNA Polymerase (Invitrogen); store at –20°C. (h) Eppendorf Mastercycler ep gradient S Thermocycler (Eppendorf). (i) Heat sealer (Eppendorf) (used to apply heat sealing film on PCR plates prior to thermal cycling). (j) Aluminum Sealing Film (Axygene Scientific, VWR) (used to reseal DNA plate after use).
8
DNA Barcoding in Mammals
159
(k) Clear Sealing Film (Axygene Scientific, VWR) (used for temporary cover of PCR plates prior to use). (l) Heat sealing Film (GE Uniseal Al Heat Seal Film, Lab Planet) (used to seal PCR plate prior to thermal cycling). (m) Hard-shell skirted microplate (see Note 10). (n) Swinging bucket centrifuge with microplate rotor (Thermo Scientific). 5. PCR Product Check (same for all pathways): (a) Gel documentation system (AlphaImager, Alpha Innotech). (b) Pre-cast agarose gels (2% E-Gel 96, Invitrogen). (c) E-Base Integrated power supply (Invitrogen). 6. Cycle sequencing reaction (same for all pathways): (a) 10% Trehalose: 5 g of D-(+)-Trehalose dihydrate (SigmaAldrich, Catalog No. T9531-100g), molecular grade water to 50 ml. Store in 1–2 ml aliquots at –20°C. (b) 5× Sequencing Buffer (400 mM Tris–HCl pH 9.0 + 10 mM MgCl2). (c) 100 mM primer stock: Dissolve desiccated primer (IDT DNA, USA) in the amount of water indicated by the manufacturer to produce the final solution of 100 mM (i.e., add the number of nmol × 10 ml of water; see Note 9); store at –20°C. (d) 10 mM primer stock: 20 ml of 100 mM stock, 180 ml of water; store at –20°C. (e) BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems). (f ) Microplate (any microplate can be used at this stage). (g) AirClean Systems Ductless PCR Workstation (Fisher Scientific). 7. Sequencing reaction cleanup (same for all pathways): (a) Sephadex G50 (Sigma). (b) Acroprep 96 Filter plate, 0.45 mm GHP (PALL Corporation Catalog No. 5030). (c) Pop-7 Polymer for 3730xl DNA Analyzers (Applied Biosystems). (d) 3730xl DNA Analyzer Capillary Array, 50 cm (Applied Biosystems). (e) 10× Running Buffer for 3730xl DNA Analyzers (Applied Biosystems). (f) Collection plate: MicroAmp 96-well reaction plate (Applied Biosystems).
160
N.V. Ivanova et al.
Fig. 2. FTA Elute blotting card with blood blots demonstrating the recommended amount of tissue to sample in each blotting circle.
(g) Hydroclave MC8 International).
Steam
Sterilizer
(Barnstead
(h) AirClean Systems Ductless PCR Workstation (Fisher Scientific). (i) 3730xl DNA Analyzer (Applied Biosystems). (j) 8-Channel Matrix multichannel pipette (Matrix Impact2 pipette, 15–1,250 ml, Matrix Technologies). 8. Tissue preservation (express DNA barcoding: FTA Elute blotting cards; Fig. 2): (a) Small cotton swabs, e.g., ear swabs for liquid tissue (see Note 11). (b) FTA Elute blotting cards (Whatman, GE Healthcare). 9. Tissue subsampling and lysis (express DNA barcoding: FTA Elute blotting cards): (a) Ethanol, 96%. (b) Harris Micropunch (1.2 mm, Sigma-Aldrich). (c) Harris self-healing cutting mat (Sigma Aldrich)—as support surface for blotting and punching. (d) Microplate (Eppendorf twin-tec PCR plate, Fisher Scientific). 10. DNA extraction (express DNA barcoding: FTA Elute blotting cards): (a) Swinging bucket centrifuge with microplate rotor (Thermo Scientific).
8
DNA Barcoding in Mammals
161
11. PCR (express DNA barcoding)—materials listed under item 4. 12. PCR product check (express DNA barcoding)—materials listed under item 5. 13. Cycle sequencing reaction (express DNA barcoding)—materials listed under item 6. 14. Sequencing reaction cleanup (express DNA barcoding)— materials listed under item 7. 15. Tissue preservation (forensic DNA barcoding): materials listed under item 1. 16. Tissue subsampling and alkaline lysis (forensic DNA barcoding)—materials listed under item 2. 17. DNA extraction (forensic DNA barcoding: alkaline lysis); see Note 3: (a) AL buffer: 0.1 N NaOH, 0.3 mM EDTA; pH 13.0 (5 ml 1 N NaOH, 30 ml 0.5 M EDTA pH 8.0). Store in small aliquots at –20°C. (b) NT buffer: 0.1 M Tris–HCl pH 7.0 (6.06 g Trizma base, water to 500 ml). Adjust pH with HCl to 7.0; add water to the final volume. Store in small aliquots at –20°C. 18. PCR (forensic DNA barcoding)—materials listed under item 4. 19. PCR product check (forensic DNA barcoding)—materials listed under item 5. 20. Cycle sequencing reaction (forensic DNA barcoding)— materials listed under item 6. 21. Sequencing reaction cleanup (forensic DNA barcoding)— materials listed under item 7.
3. Methods 1. Tissue sampling and preservation (routine DNA barcoding): (a) Dissect and remove piece of skeletal muscle (see Note 12) with clean scissors/forceps. (b) Fill cryotube with 96% ethanol (see Note 13) and label the tube with individual specimen number (see Note 14) on the outside (ethanol-resistant ink or pencil) or on paper (use ethanol-resistant marker/pencil and acid-free paper). Ensure that at least a 10:1 ratio is maintained between fixative and tissue; e.g., the volume of tissue placed in a 2 ml cryotube should not exceed 5 × 5 × 5 mm. (c) Mince tissue in the tube thoroughly with scissors to allow fixative penetration; place label inside the tube (if applicable).
162
N.V. Ivanova et al.
(d) Between samples, remove visible tissue particles from tool tips, sterilize tool tips with detergent (e.g., ELIMINase) and rinse with several changes of water to remove detergent residue (see Note 15). (e) Replace fixative after 2–10 days of storage (see Note 16). (f) Store tissues at low temperatures (ideally, below 0°C) and away from light (see Note 17). 2. Subsampling and lysis (routine DNA barcoding; see Note 18): (a) Prefill a 96-well microplate with 30 ml (or one drop) of 96% ethanol per well. Cover plate with 12-cap strips. Do not use ethanol if the samples were previously fixed in another medium (see Note 16). (b) Assemble specimens and prepare a map, based on the alpha numeric grid of well and row location of sample locations in 96-well plate (see Note 19). (c) Using forceps, sample ca. 1 mm3 of tissue into each well of the plate. While working, keep only one row uncapped at a time to minimize the chance of error and crosscontamination. (d) Between samples, sterilize forceps by wiping with KimWipe, washing in ELIMINase and rinsing with three changes of distilled water. (e) Seal each row firmly with cap strips. (f) Prior to lysis, centrifuge plate for 1 min at 1,000 × g to remove ethanol and tissue from cap strips. (g) Remove cap strips and evaporate residual ethanol at 56°C (secure plate against workbench surface and remove caps slowly to prevent spraying of well contents). (h) Add 50 ml of Lysis Mix to each well and reseal with sterile 8-cap strips. (i) Incubate at 56°C for >12 h to allow digestion (see Notes 20 and 21). 3. DNA extraction—GF protocol for routine barcoding (see Note 22): This method utilizes a bind–wash–elute approach commonly used in commercial kits for DNA extraction, but delivers similar or even better results at a fraction of the cost. The DNA binds to a GF membrane in the presence of chaotropic salts, while contaminants are washed away using two different wash buffers. After two wash stages and drying, DNA is eluted from membrane using molecular grade water or low-salt buffer (see Note 23). (a) Centrifuge plates with lysed samples [from step 1(i)] at 1,500 × g for 1 min to remove any condensate from cap strips.
8
DNA Barcoding in Mammals
163
(b) Open cap strips (secure plate against workbench surface and remove caps slowly to prevent spraying of lysate). (c) Add 100 ml of BM buffer to each sample using multichannel pipette or Liquidator (see Note 2). (d) Mix lysate by gently pipetting; transfer lysate (about 150 ml) from microplate to corresponding wells of GF plate resting on square-well block. Seal GF plate with clear film. (e) Centrifuge at 5,000 × g for 5 min to bind DNA to GF membrane. (f ) Perform the first wash step: Add 180 ml of PWB buffer to each GF plate well. Seal with new cover and centrifuge at 5,000 × g for 2 min. (g) Perform the second wash step: Add 750 ml of WB buffer to each GF plate well. Seal with new clear film and centrifuge at 5,000 × g for 5 min. (h) Remove clear film and place GF plate over collection microplate. Incubate at 56°C for 30 min to evaporate residual ethanol. (i) Position PALL collar on collection microplate and place GF plate on top. Dispense 50–60 ml of ddH20 (prewarmed to 56°C) directly on each GF plate well membrane to elute DNA (see Note 23) and incubate at room temperature for 1 min. Seal GF plate with clear film. (j) Place plate assembly on clean square-well block to prevent collection plate cracking; centrifuge at 5,000 × g for 5 min to collect eluted DNA. (k) Remove GF plate and discard. (l) Cover DNA plate with strip cap or aluminum PCR foil. (m) Use 1–2 ml of DNA for PCR. DNA can be temporarily stored at 4 or at –20°C for long-term storage. (n) To reuse 2 ml square-well blocks, wash them with hot water and ELIMINase and rinse with deionized water. Dry and expose to UV light for >15 min prior to use. 4. PCR (routine DNA barcoding): M13-tailed primer cocktails are more effective than conventional degenerate primers, allowing barcode work on taxonomically diverse samples to be performed in a high throughput fashion. (a) Select primer cocktails, depending on application (Table 1); refer to volume proportions indicated in the column “Ratio in cocktail” of Table 1 (see also Note 24). (b) Prepare PCR reagent mix using volumes listed in Table 2.
FR1d_t1
FishR2_t1
VF2_t1
FishF2_t1
VR1i_t1
VR1_t1
VR1d_t1
LepR1_t1
VF1i_t1
VF1d_t1
VF1_t1
LepF1_t1
Primer name
Fish Cocktail [C_FishF1t1 + C_FishR1t1] TGTAAAACGACGGCCAGTCGACTAA TCATAAAGATATCGGCAC TGTAAAACGACGGCCAGTCAACCAA CCACAAAGACATTGGCAC CAGGAAACAGCTATGACACTTCAGG GTGACCGAAGAATCAGAA CAGGAAACAGCTATGACACCTCAGG GTGTCCGAARAAYCARAA
Mammal Cocktail (C_VF1LFt1 + C_VR1LRt1] TGTAAAACGACGGCCAGTATTCAAC CAATCATAAAGATATTGG TGTAAAACGACGGCCAGTTCTCAAC CAACCACAAAGACATTGG TGTAAAACGACGGCCAGTTCTCAAC CAACCACAARGAYATYGG TGTAAAACGACGGCCAGTTCTCAAC CAACCAIAAIGAIATIGG CAGGAAACAGCTATGACTAAACTTC TGGATGTCCAAAAAATCA CAGGAAACAGCTATGACTAGACTTC TGGGTGGCCRAARAAYCA CAGGAAACAGCTATGACTAGACTTC TGGGTGGCCAAAGAATCA CAGGAAACAGCTATGACTAGACTTC TGGGTGICCIAAIAAICA
Primer sequence
M13R
M13R
M13F
M13F
M13R
M13R
M13R
M13R
M13F
M13F
M13F
M13F
M13 seq primer
1
1
1
1
3
1
1
1
3
1
1
1
Ratio in cocktail
RB, EB
RB, EB
RB, EB
RB, EB
RB, EB
RB, EB
RB, EB
RB, EB
RB, EB
RB, EB
RB, EB
RB, EB
Application
Fish52
MammCOI
Thermocycler program
652
658
(46)
(46)
Product length (bp)a Reference
Table 1 Mammalian primer sets for various applications (RB routine DNA barcoding and reference library, EB express DNA barcoding and ecological surveys, FB forensic DNA barcoding and museum samples)
164 N.V. Ivanova et al.
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
Primers for forensic and express barcoding of bats TTCTCAACCAACCACAAAGACATTGG GCATGAGCTGTTACGATTACG CTGGGGCCCTATTAGGTGAT TATATCGGGTGCGCCAATTA GGGGGATTCGGTAATTGATT GGCTAGTGGTGGGTATACGG TACAGTTGAAGCTGGCGTTG AGCTCCAAGGATTGACGAAA TCTCTCTTCACCTAGCCGGA GGCAGTGATTAAAACGGATCA GCCCTCTCTCAATATCAAACAC TAGACTTCTGGGTGGCCAAAGAATCA
Sequencing primers TGTAAAACGACGGCCAGT CAGGAAACAGCTATGAC
VF1b BC1R BC2F BC2R BC3F BC3R BC4F BC4R BC5F BC5R BC6F VR1b
M13F (-21)c M13R (-27)c
RB, EB, FB FB, EB FB, EB FB, EB FB, EB FB, EB FB, EB FB, EB FB, EB FB, EB FB, EB RB, EB, FB
RB, EB RB, EB
FB, EB
FB, EB
RB, EB RB, EB
Application
Seq3.1
ExpressCOI
RatCOI
ExpressCOI
MammCOI54
COIfast
Thermocycler program
161
93
96
134
106
117
702
187
421
658
(50)
(31) This study This study This study This study This study This study This study This study This study This study (49)
(48)
This study
(12)
(47)
Product length (bp)a Reference
M13-tailed primer cocktails are more effective than conventional degenerate primers, allowing barcode work on taxonomically diverse samples to be performed in a high throughput fashion a Product lengths are given without primers b VF1/VR1 are universal vertebrate primers that can be used for some mammal species as a stand-alone primer pair; their tailed modification is used in the M13-tailed mammal cocktail c M13F/M13R are used for sequencing products generated with M13-tailed primers and their cocktails. In all other sequencing reactions, use the same primers as for the PCR reaction
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
N/A N/A
N/A N/A
N/A
Rat primers [BatL5310 + R6036R] CCTACTCRGCCATTTTACCTATG ACTTCTGGGTGTCCAAAGAATCA
AquaF2
RonM_t1
BatL5310 R6036R
M13F
N/A N/A
Ratio in cocktail
N/A
Folmer degenerate [dgLCO-1490 + dgHCO-2198] GGTCAACAAATCATAAAGAYATYGG TAAACTTCAGGGTGACCAAARAAYCA
dgLCO-1490 dgHCO-2198
M13 seq primer
Forward primers for express and forensic barcoding TGTAAAACGACGGCCAGTGGMGCMC CMGATATRGCATTCCC ATCACRACCATCATYAAYATRAARCC
Primer sequence
Primer name
8 DNA Barcoding in Mammals 165
166
N.V. Ivanova et al.
Table 2 PCR reagent mix used in mammal DNA barcoding Each well (ml) regular PCR
96-Well plate (ml) regular PCR
Each well (ml) FTA elute disks
96-Well plate (ml) FTA elute disks
10% Trehalose
6.25
625.0
6.25
625.0
ddH2O
2.00
200
4.00
400
10× Buffer
1.25
125.0
1.25
125.0
50 mM MgCl2
0.625
62.5
0.625
62.5
10 mM primerA
0.125
12.5
0.125
12.5
10 mM primerB
0.125
12.5
0.125
12.5
10 mM dNTPs
0.0625
6.25
0.0625
6.25
Platinum Taq (5 U/ml)
0.060
6.0
0.060
6.0
Total
10.5
1/8 Aliquot
–
1,050 130
12.5 –
1,250 155
(c) Defrost reagents listed in Table 2 (except Taq polymerase) at room temperature; briefly vortex and spin down in a mini-centrifuge. Keep Taq polymerase at –20°C until immediately before use (see Notes 25 and 26); prior to use, spin the tube with Taq polymerase for 10–15 s in a mini-centrifuge. Do not mix Taq polymerase by vortexing or pipetting. (d) Add reagents to a 2 ml tube in volumes listed for “96-well plate” (Table 2). (e) Mix vigorously with vortex or pipette (vortexing traps liquid under the cap of tube necessitating a subsequent 15-s spin in a mini-centrifuge). (f ) Dispense 10.5 ml of PCR mix into well of the 96-well plate (the same pipette tip can be used for all transfers). (g) Add 1–2 ml of DNA template to each well (use new filter pipette tip for each sample). 5. PCR thermocycling (same for all pathways): (a) Place aluminum foil or heat-seal cover over the top of 96-well plate and centrifuge for 1 min at 1,000 × g. (b) Place the plate in a thermocycler block, close the lid and run thermocycling program appropriate for the primer cocktail employed (Table 1). Apply thermal cycling conditions from Table 3. (c) The resulting PCR plate can be stored for several weeks at 4°C or several months at –20°C until ready for sequencing.
8
DNA Barcoding in Mammals
167
Table 3 PCR programs and thermal cycling parameters for different primer sets used in mammal DNA barcoding PCR program
Primer combinations
Thermal cycling conditions
MammCOI
Mammal Cocktail (C_VF1LFt1 + C_VR1LRt1)
94°C for 2 min; 5 cycles (94°C for 30 s, 50°C for 40 s, and 72°C for 1 min); 35 additional cycles (94°C for 30 s, 55°C for 40 s, and 72°C for 1 min); final extension at 72°C for 10 min; then hold indefinitely at 10°C
Fish52
Fish Cocktail (C_FishF1t1 + C_FishR1t1)
94°C for 2 min; 40 cycles (94°C for 30 s, 52°C for 40 s, and 72°C for 1 min); final extension at 72°C for 10 min; then hold indefinitely at 10°C
MammCOI54
421 bp Fragment (RonM_t1 + C_VR1LRt1)
94°C for 2 min; 40 cycles (94°C for 30 s, 54°C for 40 s, and 72°C for 1 min); final extension at 72°C for 10 min; then hold indefinitely at 10°C
MammCOI54
419 bp Fragment (RonM_t1 + C_FishR1t1)
Same as above, but use 52°C for annealing
RatCOI
Rat primers (34) (BatL5310 + R6036R)
94°C for 2 min; 35 cycles (94°C for 30 s, 60°C for 30 s, and 72°C for 1 min); final extension at 72°C for 5 min; then hold indefinitely at 10°C
COIfast
Folmer degenerate primers (dgLCO-1490 + dgHCO-2198)
94°C for 2 min; 5 cycles (94°C for 30 s, 45°C for 40 s, and 72°C for 1 min); 35 additional cycles (94°C for 30 s, 51°C for 40 s, and 72°C for 1 min); final extension at 72°C for 10 min; then hold indefinitely at 10°C
ExpressCOI
~200 bp Fragment (AquaF2 + C_VR1LRt1) or (AquaF2 + C_FishR1t1)
94°C for 2 min; 40 cycles (94°C for 30 s, 51°C for 40 s, and 72°C for 30 s); final extension at 72°C for 10 min; then hold indefinitely at 10°C
6. PCR product check (E-gel, same for all pathways; see Note 27): (a) Preset program EG on Mother E-base; set the run time to 4 min. (b) Remove gel from package and detach plastic comb. Slide gel into electrode connections on Mother E-Base (caution: gel contains ethidium bromide; see Note 27). (c) Dispense 12 ml of ddH2O into each E-gel well. (d) Load 4 ml of PCR product into each corresponding e-gel well. (e) Begin electrophoresis by briefly pressing “pwr/prg” button. Red indicator light should change to green during the run. End of run is signaled by flashing red light and sound alarm; press and release the “pwr/prg” button when finished.
168
N.V. Ivanova et al.
Fig. 3. E-gel image of PCR products generated from an FTA Elute card.
(f ) Remove gel from base and capture a digital image (we use Alpha Imager documentation system; Fig. 3). PCR products can be visualized under any UV-emitting source. (g) If success rate is over 75%, all samples from the PCR plate should be sequenced. Success rates below 75% require hitpicking of amplified products and failure tracking (amplification of failed samples with alternative primer sets) of the remaining samples (see Note 28). Hit-picked PCR products obtained with different M13-tailed primers (e.g., mammal and fish cocktails) can be combined in the same plate for sequencing; ensure that the resulting position of rearranged samples is plotted in the new plate map. 7. Cycle sequencing (same for all pathways): (a) Defrost reagents (Table 4) at room temperature. BigDye is light sensitive and should not be exposed to light for more than a few minutes. Do not vortex undiluted BigDye stock; gently mix by pipetting instead. (b) Prepare “forward” and “reverse” sequencing mixes: Combine forward primer with reagents from Table 4 in a single 2 ml tube in proportions listed under “96-well plate”; repeat the procedure for reverse primer. Mark tubes accordingly. (c) Mix contents of each tube gently but thoroughly by pipetting or vortexing for about 5–10 s (vortexing traps liquid under the cap of tube necessitating a subsequent 15-s spin in a mini-centrifuge).
8
DNA Barcoding in Mammals
169
Table 4 Cycle sequencing reaction recipe used in mammal DNA barcoding Reagents
Each well (ml)
96-Well plate (ml)
Primer
1
104
2.5× SEQ Buffer
1.875
195
BigDye
0.25
26
ddH2O
0.875
91
10% Trehalose
5
520
Subtotal
9
936
DNA template
1–1.5
N/A
Total
10–10.5
N/A
1/8 Aliquot
115
N/A
(d) Assemble “forward” and “reverse” sequencing plates: Dispense 9.0 ml of forward-sequencing mix into each well of an empty 96-well plate; repeat the procedure for reverse mix. Use a new set of pipette tips to transfer forward- and reverse-sequencing mixes (tips can be reused between transfers within each plate). Mark plates accordingly. (e) Add 1–1.5 ml of nonpurified PCR product (see Notes 29 and 30) from the PCR plate [step 6(g)] to each well of the “forward” and “reverse” sequencing plates (use new pipette tip for each well). (f) Place aluminum foil or heat-seal cover over sequencing plates and centrifuge at 1,000 × g for 1 min to collect sequence cocktail and PCR template at the bottom of the wells. (g) Place each sequencing plate in a thermocycler block and start program Seq3.1: initial denaturation at 96°C for 2 min; followed by 30 cycles at 96°C for 30 s, annealing at 55°C for 15 s, and extension at 60°C for 4 min; followed by indefinite hold at 4°C. (h) Store processed sequencing plates for up to 1 week at 4°C in a dark enclosure to avoid degradation of light-sensitive sequencing products (see Note 31). 8. Sequencing cleanup (same for all pathways; see Note 32): (a) Measure dry Sephadex G50 with black column loader into 350 ml PALL filter plate. Prepare two plates to balance
170
N.V. Ivanova et al.
each other or prepare a single plate and appropriate counterbalance. (b) Add 300 ml molecular grade water to plate wells prefilled with Sephadex powder; leave mixture to hydrate overnight at 4°C or for 3–4 h at room temperature before use. (c) Join Sephadex plate with MicroAmp collection plate to catch water flow through and secure with two rubber bands. (d) Balance two sets of plates for centrifuging; use additional rubber bands as needed. (e) Centrifuge at 750 × g for 3 min to remove excess water and generate a packed Sephadex column. (f ) Add the entire sequencing reaction to the center of Sephadex columns by pipette. Do not insert the pipette tip into the column; dispense liquid onto the upper surface of the column. Damaging the column with a pipette tip or dispensing solution onto the side of the column may result in incomplete removal of unincorporated BigDye terminators which will adversely affect sequencing results. (g) Add 25 ml of 0.1 mM EDTA to each well of a new MicroAmp collection plate (see Note 33). (h) Elute clean sequencing reaction by attaching the collection plate containing EDTA to the bottom of Sephadex plate and securing with rubber bands. (i) Balance two sets of plates for centrifuging; use additional rubber bands as needed. (j) Centrifuge at 750 × g for 3 min and remove Sephadex plate. (k) Cover the top of collection plate with septum. (l) Place collection plate into black plate base of capillary sequencer and attach white plate retainer. (m) Stack assembled plates into ABI 3730xl capillary sequencer and import plate records using Plate Manager module of the Data Collection software (Applied Biosystems). (n) Begin sequencing run within Run Scheduler. 9. Tissue sampling and preservation (express DNA barcoding: FTA Elute blotting card; see Note 34): (a) Dissect and remove a 3–4-mm3 piece of skeletal muscle with clean scissors/forceps. If sampling from live animal (see Note 11), collect ca. 10–20 ml of blood on a cotton swab. (b) Dab muscle or cotton swab against blotting circle of the FTA Elute card. Do not oversample (see Note 35).
8
DNA Barcoding in Mammals
171
(c) If employing reusable tools, sterilize them between samples as per step 1(d); if using swabs, discard them after each sample. (d) Map the position of samples on the recording portion of the card (see Note 36). (e) Leave the card to air dry with the blotting portion opened (see Note 37). Do not expose to direct sunlight. (f) Store the card away from light at room temperature in a dry environment or in a sealed bag with desiccant (e.g., silica gel; see Note 38). 10. Tissue lysis (express DNA barcoding: FTA elute blotting card): (a) Wipe Harris mat with ethanol; open FTA Elute card and slide mat under blotting surface (filter paper portion). (b) Punch FTA elute card (filter portion only) using a 1.2 mm Harris Micropunch. (c) Place one punch into each well of 96-well microplate and create sample map as above [step 2(b)]. (d) Between sampling, clean micropunch by punching clean filter paper. 11. DNA extraction—(express DNA barcoding: FTA elute blotting card—muscle blot or blood): (a) Ensure that all supplies are ready. Protocol must be completed in timed consecutive steps (prolonged incubation in water results in DNA loss). (b) Add 100 ml of water to each well of 96-well microplate from step 10(c) using pipette or Liquidator; seal with aluminum film. Vortex plate for 10–15 s and centrifuge immediately for 1 min at 1,000 × g. (c) Aspirate and discard water from each well using pipette or Liquidator (make sure that disks remain in their wells after water is removed). Incubate open plate with disks at 56°C for 5 min to evaporate residual water. Do not cover the plate because disks easily stick to film or cap strips during movement. After drying, disks can be used directly in PCR reaction. 12. PCR (express DNA barcoding): (a) Select primer cocktails for express barcoding (Table 1); refer to volume proportions indicated in the column “Ratio in cocktail” (see also Note 24). (b) Prepare PCR reagent mix using volumes indicated for FTA Elute disks in Table 2. (c) Follow step 4(a–d) for routine barcoding.
172
N.V. Ivanova et al.
(d) Dispense 12.5 ml of PCR mix into each well of plate containing prewashed FTA disks. Change pipette tips after each well. 13. PCR thermocycling (express DNA barcoding)—procedures listed under step 5. 14. PCR product check (express DNA barcoding)—procedures listed under step 6; see also Note 27. 15. Cycle sequencing (express DNA barcoding)—procedures listed under step 7. 16. Sequencing cleanup (express DNA barcoding)—procedures listed under step 8; see also Note 32. 17. Tissue sampling and preservation (forensic DNA barcoding; if applicable)—procedures listed under step 1. 18. Tissue subsampling and lysis (forensic DNA barcoding: alkaline lysis protocol). (a) Follow subsampling step 2(a–g), if applicable; when subsampling for immediate lysis, prefill plate with AL buffer instead of ethanol in step 2(a). 19. DNA extraction (forensic DNA barcoding: alkaline lysis protocol): (a) Add 35 ml of AL buffer to ~0.5–1 mm3 of fresh skeletal muscle or hairs with follicles in tubes or 96-well microplate. (b) Incubate for 5 min in a thermocycler at 95°C or in freshly boiled water in a water bath. (c) Centrifuge for 1 min to remove condensate from caps or plate cover. (d) Add 65 ml of NT buffer and mix by pipetting. (e) Use 1–2 ml of crude lysate for PCR reaction (label PCR plate for reference using sticker or alcohol-resistant marker). Store at –20°C for up to 2 weeks (see Note 39). 20. PCR (forensic DNA barcoding): (a) Select primer cocktails for forensic barcoding (Table 1); refer to volume proportions indicated in the column “Ratio in cocktail” (see also Note 24). (b) Prepare PCR reagent mix using volumes indicated for regular PCR in Table 2. (c) Follow step 4(a–g) for routine barcoding. 21. PCR thermocycling (forensic DNA barcoding)—procedures listed under step 5. 22. PCR product check (forensic DNA barcoding)—procedures listed under step 6; see also Note 27.
8
DNA Barcoding in Mammals
173
23. Cycle sequencing (forensic DNA barcoding)—procedures listed under step 7. 24. Sequencing cleanup (forensic DNA barcoding)—procedures listed under step 8; see also Note 32.
4. Notes 1. Do not use forceps with cross-hatched or serrated tips due to the difficulty of removing tissue remains during sterilization. 2. There are multiple manufacturers of single- and multichannel pipettes; however, pipetting accuracy and design features may vary. The most important feature for a multichannel pipette is the uniformity of tip loading and unloading. Pipettes should be made from durable material; if O-rings are used to seal tips, frequently check their integrity and have spare O-rings available for replacement. In a high throughput facility, Liquidator 96 from Rainin can be utilized for convenient and fast liquid transfers; this device offers manual pipetting of 96 samples at a time. Although the recommended range volume of the Liquidator is 5–200 ml, it can still be used for volumes of 1.5–2 ml. 3. All chemicals used for tissue lysis and DNA extraction should be of molecular biology grade or equivalent—of the highest purity. Wash all labware with ELIMINase and rinse with distilled or deionized dH2O. Weigh reagents using a clean spatula. Use molecular grade doubly distilled (dd) H2O in all buffer formulations. Millipore MilliQ purified water (18 MW) can be used for lysis and DNA extraction buffers, while commercially manufactured molecular grade water is preferred for PCR and sequencing. Do not use DEPC-treated water, as it may inhibit PCR and cycle sequencing reactions. Store molecular grade water in small aliquots to prevent contamination. Filter buffers through a 0.2 mm Nalgene filter into a clean sterile bottle; prepare working aliquots of smaller volume (e.g., 100 ml). Store stock solutions and working aliquots at 4°C. 4. A PALL collar is used to secure a GF plate on top of the Microplate during the elution stage. 5. The centrifuge for DNA extraction should have deep buckets to handle filter plate assemblies and should deliver at least 5,000 × g. 6. To prepare stock solutions and buffers, dissolve salts and other components in a smaller volume than that of the final solution and then add water to attain the final volume. 7. GuSCN does not dissolve without heating (this is an endothermic reaction). Caution: All GuSCN solutions are irritants
174
N.V. Ivanova et al.
and toxic and should be disposed of with care. Use nitrile gloves and goggles at all times and protective mask while handling powder; avoid contact with acids to prevent release of cyanide gas. 8. Some trehalose brands contain traces of pig DNA which may contaminate reactions. 9. To minimize the risk of contamination, do not mix 100 mM stocks by pipetting. After adding water, incubate tubes for 15–30 min at room temperature, vortex for 30 s, and briefly spin tubes in a tabletop mini-centrifuge prior to opening. 10. Although non-skirted microplates can be used in PCR setup, we recommend hard-shell skirted microplates, such as Eppendorf twin-tec 96 PCR plate, to prevent spraying of PCR products while reopening cap strips. Skirted microplates are compatible with heat sealing film, which is more convenient to use in a high throughput setting. Alternatively, 8-cap strips, sealing mats, aluminum-sealing film, or PCR-grade clear film can be used to cover plates prior to thermal cycling. Prior to running actual samples, test the sealing technique by thermal cycling of a plate filled with water. Most clear sealing films do not provide a good seal. 11. Although express barcoding is instrumental in analyzing tissue taken from live mammals (17, 18), this chapter does not cover the tools or methods used in biopsy and live sampling of liquid tissue or the respective animal handling techniques; refer to specialized literature (e.g., refs. 34, 35) for details. 12. While there are numerous guidelines on mammal tissue preservation, many existing tissue collections and protocols focus on studies of chemical contaminants or other specialized tasks (e.g., allozyme analyses) and promote collecting tissue from internal organs, such as liver (e.g., ref. 36). Both liver and muscle have high mitochondria content (37); however, our experience corroborates other studies (38, 39) which indicate that internal organs (e.g., liver and kidney) are suboptimal sources of mitochondrial DNA. The yield of full-length mitochondrial PCR products is lower or complicated by pseudogenes (numts) Fig. 4 which impede or prevent accurate sequencing. These effects appear inconsistent between samples, presumably dependent on the time between euthanasia and tissue collection, type and quality of initial fixation, and subsequent storage conditions. Liver is characterized by the early (<1 h) onset of postmortem autolysis causing ultrastructural changes in mitochondria while nuclear heterochromatin condenses and is less affected (40, 41); similar processes appear to affect cardiac muscle (42). Chromatin structure protects DNA against oxidative and other damages (43) and may provide protection against autolysis. The preferential amplification of
8
DNA Barcoding in Mammals
175
Fig. 4. Fragment of a sequencer chromatogram trace file showing a nuclear pseudogene at the 5¢ end.
numts from some kidney, heart, and liver samples when broad-range primer cocktails are used may reflect degradation of the primary mtDNA target. Although liver, kidney, and heart remain the most commonly preserved mammal tissues, their use for high throughput amplification of mtDNA is discouraged in favor of skeletal muscle. Our data also show high success of barcode recovery from brain tissue, perhaps as a result of good protection of mtDNA against degradation (44). 13. The quality of DNA in ethanol-preserved tissues is dependent on many factors, including ethanol acidity and concentration, storage temperature, exposure to light, sample age, as well as the quality of the tissue. Ethanol is a good preservative of morphological integrity of the tissue, but it does not suppress enzymatic activity. Therefore, it is preferred that ethanolpreserved tissue is stored at low temperatures (ideally, −20°C or lower). Do not use denatured ethanol with unknown additives for tissue preservation and DNA extraction. Although 200 proof ethanol is the traditionally preferred choice for molecular protocols, reagent-grade (histological) ethanol (e.g., Fisher Scientific; Cat. No. A962-4) is also suitable for tissue fixation and all molecular protocols described in this chapter. Cryopreservation in liquid nitrogen is ideal for tissue (26), but may not be feasible in many field situations. FTA cards are convenient, but sensitive to oversampling and ambient humidity (17). When freezing or ethanol fixation is logistically impractical, alternative methods of tissue preservation for routine and express barcoding methods include (a) DMSO—a common salt solution for tissue preservation (39). (b) Laundry detergent with enzymes (e.g., Persil MegaPearls)—small tissue samples should be placed in tubes/bags filled with dry detergent and stored away from moisture. Samples should be removed from medium prior to lysis, but washing is not required. (c) Dried muscle—DNA barcodes can be successfully recovered from muscle of air-dried mammal carcasses prepared
176
N.V. Ivanova et al.
for clearing skeletons in a dermestarium and stored at room temperature. Tissues can be shipped for analysis without additional fixation, but minute samples in microplate wells should be soaked with ethanol to prevent static displacement. This protocol can be used by small museums that prepare skin, skull, and skeleton vouchers but do not have a designated tissue collection. Biopsies (e.g., ear, skin, or hair samples) can be preserved the same way as muscle tissues; do not use analgesics, such as EMLA cream, as they may cause DNA degradation. Regular lysis protocols can be used for fresh tissue and wing membrane punches (in bats) while alkaline lysis is better for hairs. 14. If using the Barcode of Life Data Systems (BOLD; http:// www.boldsystems.org) as the online workbench for barcode data management, ensure that the syntax of individual specimen/sample numbers matches the Sample ID numbers submitted to BOLD. 15. Use of flame for instrument cleaning is discouraged as it quickly degrades equipment. 16. This protocol describes the cheapest reliable solution for preserving mammal tissue. If field cryopreservation in liquid nitrogen is used (26), then ethanol fixation is not required. Ethanol-preserved tissues can be stored in the same solution indefinitely after the first change. If the fixative changes color, a follow-up replacement is recommended. Do not change from one type of fixative to another. In particular, transfer of samples from DMSO to ethanol leads to rapid DNA degradation and should be avoided. 17. Recommended storage is at –80°C; DNA preserves indefinitely under these conditions. If ultracold storage is unfeasible, samples will preserve adequately at –20°C. For ethanol, use freezers rated for storage of flammable liquids. If unavailable, ethanol can be drained after 1–2 weeks following the last fixative change. 18. An outline of standard sampling instructions for 96-well microplates can be found on the iBOL Web site: http://ibol.org/ get-involved/for-scientists/ under “Sampling Instructions”. 19. Premade data templates for plate map creation can be used, such as the CCDB electronic lab book (45) or CCDB plate record available on the iBOL Web site. 20. If the plate is ready for immediate laboratory analysis (no transport), it can be prefilled with lysis buffer and Proteinase K [step 2(h)] rather than ethanol; step 2(a–g) can thus, be omitted. 21. In high throughput facilities, plates with mammal tissue can be frozen after the lysis stage and stored at –20°C for up to several
8
DNA Barcoding in Mammals
177
weeks before extraction. Plates should be defrosted at room temperature and prewarmed for 10–15 min at 56°C to ensure that all precipitated salts are fully dissolved prior to extraction. This approach can be used with other vertebrates, but is not applicable for invertebrate samples. 22. With minor modifications, GF DNA extraction protocol can be performed using individual spin columns (Epoch Biolabs Cat. No. 1920-250). All reagent volumes will be the same as in the plate-based protocol. This method requires a benchtop centrifuge for microtubes. Use 5,000–6,000 × g for binding and wash stages and 10,000 × g for elution. Replace collection tubes with clean ones after the first wash step to prevent overflow (2 ml tubes can be used as a replacement with the caps removed). After the final wash step, place spin column into a clean collection tube and centrifuge at 10,000 × g for 4 min to dry the membrane. Prior to drying and elution, replace collection tube again with a clean decapped 2 ml tube. After elution, transfer DNA into clean tubes with caps. 23. Low-salt buffer, e.g., 10 mM Tris–HCl, pH 8.0, can be used for DNA elution. We do not recommend using even diluted EDTA in the elution buffer as it might inhibit PCR reactions. 24. Primer cocktails are routinely mixed in 1.5 ml tubes from 10 mM primer stocks. Proportions are in volume; for example, to make forward cocktail C_VF1LFt1, mix 100 ml of LepF1_t1, 100 ml of VF1_t1, 100 ml of VF1d_t1, and 300 ml of VF1i_t1. 25. It is traditionally recommended to keep PCR reagents (dNTP, primers, and Taq polymerase) on ice; however, using ice can increase the chance of contamination and is not convenient in a high throughput setting. All reagents, except Taq polymerase, can be defrosted in a clean tube rack at room temperature. After defrosting, briefly vortex all solutions (except Taq) and spin down in the mini-centrifuge. To avoid multiple freeze– thaw cycles (more than ten), store primers and dNTPs in smaller aliquots. To extend the shelf-life of Taq polymerase, it should be kept in the freezer or on a cooling block and added as a last component, just before returning all reagents back to the freezer. Platinum Taq is a very stable “hot-start” enzyme which is not active at room temperature, thus allowing PCR mix or premade plates to be kept at room temperature for at least 15–20 min during the reaction setup. 26. In high throughput facilities, operations can be standardized by premaking plate batches containing PCR and sequencing reagent mixes with 5% trehalose used as a cryoprotector. In order to prevent DNA contamination of batches, reagents for PCR plates should be dispensed only using DNA-free equipment in a UV-sterilized workspace. Plates should be covered with clear film and stored frozen at –20°C until use. Prior to
178
N.V. Ivanova et al.
premaking each large batch of plates, validation is recommended both for individual reagents and for final reagent mixes. Predispensed plates should be defrosted at room temperature and centrifuged for 1 min at 1,000 × g before use. 27. To expedite confirmation of PCR recovery, we recommend using precast agarose E-gels available in 96-well format. If kept moist in original packaging, E-gels can be reused up to three times, preferably within 24 h. Observe water content inside gel wells if intended for reuse; water reloading is usually unnecessary if rerunning the gel within 1 h after the previous load. Load the follow-up round of samples in the same positions as in the previous run; read PCR yield from the top row of bands. Use E-gel Editor software (Invitrogen) to process digital gel images. Caution: E-gels are prestained with ethidium bromide and should be handled and disposed of following appropriate precautions (e.g., refer to MSDS requirements, http://fscimage. fishersci.com/msds/45442.htm, or equivalent national safety standards). 28. Hit-picking is recommended to avoid wasting sequencing reagents (especially BigDye) on failed samples. Failure tracking is a procedure for recovering barcode sequences from DNA extracts that failed to yield PCR products or sequences of adequate quality during the first round of PCR. Routinely, Mammal Cocktail (C_VF1LFt1 + C_VR1LRt1) is used for first-pass PCR and Fish Cocktail (C_FishF1t1 + C_FishR1t1) for failure tracking. Alternative forward primers (RonM_t1 or AquaF2) can be combined with either C_VR1LRt1 or C_ FishR1t1 to recover short length sequences (approx. 400 or 200 bp, respectively) from degraded samples. Ensure that a new plate map is assembled to track the position of each sample following rearrangement. 29. We do not clean up PCR products to reduce the cost and minimize chances of contamination. The PCR master mix recipe listed above contains low concentration of dNTP and primers, thus allowing the normalization of the final concentration of the PCR product and minimization of interference of unincorporated PCR primers with cycle sequencing reaction. Sequencing of unpurified products might lead to lower sequence quality for the first 20–50 bp. However, such signal degradation is of little concern for bidirectional sequencing. 30. Severe overloading of sequencing reaction with PCR product might lead to shorter reads; therefore, minor adjustments might be required based on PCR product concentration (optimal range for products of 600–700 bp from 25 to 50 ng/reaction). The amount of required product may also vary, depending on the sequencing cleanup protocol. For example, for Sephadex cleanup described in this chapter, we do not dilute PCR products, but for other cleanup systems, such as AutoDTR 96
8
DNA Barcoding in Mammals
179
from Edge Biosystems or ethanol precipitation, PCR product dilution is recommended. 31. Bidirectional sequencing is used in reference library applications, while unidirectional sequencing is sufficient for express barcoding and some forensic applications. 32. For large-scale operations involving liquid handling robots, we recommend performing sequence cleanup with AutoDTR96 kit from Edge Biosystems. The protocol for robotic Edge cleanup can be downloaded from http://www.dnabarcoding. ca/CCDB_DOCS/CCDB_Sequencing.pdf. 33. 0.1 mM EDTA is a less toxic substitute for Hi-Di Formamide providing higher signal. Samples dissolved in EDTA should be processed on the sequencer within 36 h. If preparation of the sequencing reaction takes place in a separate facility or is intended for sequencing run at a later date, then the purified sequencing reaction can be collected into an empty plate and dried in a SpeedVac concentrator. Dry purified sequencing products can be stored in the dark at –20°C for up to 2 weeks. 34. FTA Elute cards are suitable for cheek swab, blood, and fresh tissue (muscle, brain, or gonad) blots. In the latter case, a small tissue aliquot should be blotted onto the card for safe and easy shipment followed by quick processing in the molecular lab, while the bulk tissue sample can be deposited in a cryostorage. FTA Elute cards can serve as an additional tissue backup for liquid nitrogen tanks during field collection, but are not recommended as the sole form of tissue archival storage. 35. FTA Elute cards are sensitive to excessive volume of sample; large quantities of tissue prevent proper fixation and drying and lead to DNA degradation and cross-contamination. Liquid tissue (especially blood) should not saturate the filter paper in the blotting circle; refer to Fig. 2 for an example of resulting appearance. 36. This applies to the FTA Elute card with 96 blotting circles, which has a portion for recording sample numbers in a 12 × 8-grid map. This step is optional and should be performed in addition to filling a digital plate map. 37. If the blotting card is closed before the blotting circles have dried, portions of tissue can transfer onto the cover and contaminate adjacent circles. 38. Humidity leads to extraction of preservation agents from the filter paper and their crystallization on its surface; it can also cause leakage of tissue blots between circles. 39. This extraction method is recommended only if DNA extracts are not intended for long-term storage, e.g., for forensic applications.
180
N.V. Ivanova et al.
Acknowledgments We thank Judith Eger, Mark Engstrom, Burton Lim, Don Stewart, Charles Francis, Sergey Kruskop, Andrey Lissovsky, Vladimir Lebedev, Natalia Abramson, Ivan Kuzmin, Bernard Agwanda, Anna Bannikova, Ticul Alvarez, Fernando Cervantez, Curtis Strobeck, Jack Millar, and William Pruitt, Jr. for providing materials for analysis; Robert Hanner for advice on protocol development; Agata Pawlowski and Miranda Elliott for protocol testing; and Paul Hebert for administrative support. DNA analyses were performed at the Canadian Centre of DNA Barcoding, Biodiversity Institute of Ontario, and University of Guelph, and supported by grants to Paul Hebert from the Gordon and Betty Moore Foundation, Genome Canada through the Ontario Genomics Institute, (2008-OGI-ICI-03) the Canada Foundation for Innovation, the Ontario Innovation Trust, and the Natural Sciences and Engineering Research Council of Canada. References 1. Reeder DM, Helgen KM, Wilson DE (2007) Global trends and biases in new mammal species discoveries. Occas Papers Mus Texas Technol Univ 269:1–35 2. Ceballos G, Ehrlich PR (2009) Discoveries of new mammal species and their implications for conservation and ecosystem services. Proc Natl Acad Sci 106:3841–3846 3. Baker RJ, Bradley RD (2006) Speciation in mammals and the genetic species concept. J Mammal 87:643–662 4. Bradley RJ, Baker RD (2001) A test of the genetic species concept: cytochrome-b sequences and mammals. J Mammal 82:960–973 5. Hebert PDN, Cywinska A, Ball S, deWaard JR (2003) Biological identifications through DNA barcodes. Proc R Soc Lond B 270:313–321 6. Hebert PDN, Ratnasingham S, deWaard JR (2003) Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc R Soc Lond B 270:S96–S99 7. Broughton RE, Reneau PC (2006) Spatial covariation of mutation and nonsynonymous substitution rates in vertebrate mitochondrial genomes. Mol Biol Evol 23:1516–1524 8. Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol Mar Biol Biotechnol 3:294–299
9. Simmons RB, Weller SJ (2001) Utility and evolution of cytochrome b in insects. Mol Phylogenet Evol 20:196–210 10. Consortium for the Barcode of Life [Internet]. http://www.barcoding.si.edu/ 11. Meiklejohn C, Montooth K, Rand D (2007) Positive and negative selection on the mitochondrial genome. Trends Genet 23:259–263 12. Pfunder M, Holzgang O, Frey J (2004) Development of microarray-based diagnostics of voles and shrews for use in biodiversity monitoring studies, and evaluation of mitochondrial cytochrome oxidase I vs. cytochrome b as genetic markers. Mol Ecol 13:1277–1286 13. Lissovsky AA, Ivanova NV, Borisenko AV (2007) Molecular phylogenetics and taxonomy of the subgenus Pika (Ochotona, Lagomorpha). J Mammal 88:1195–1204 14. Hajibabaei M, Singer GA, Clare EL, Hebert PDN (2007) Design and applicability of DNA arrays and DNA barcodes in biodiversity monitoring. BMC Biol 5:24 15. Lorenz J, Jackson W, Beck J, Hanner R (2005) The problems and promise of DNA barcodes for species diagnosis of primate biomaterials. Philos Trans R Soc B 360:1869–1877 16. Hajibabaei M, Singer G, Hickey D (2006) Benchmarking DNA barcodes: an assessment using available primate sequences. Genome 49:851–854 17. Borisenko AV, Lim BK, Ivanova NV, Hanner RH, Hebert PDN (2008) DNA barcoding
8 in surveys of small mammal communities: a field study in Suriname. Mol Ecol Resour 8:471–479 18. Ivanova NV, Borisenko AV, Hebert PDN (2009) Express barcodes: racing from specimen to identification. Mol Ecol Resour 9:35–41 19. Clare EL, Lim BK, Engstrom MD, Eger JL, Hebert PDN (2007) DNA barcoding of Neotropical bats: species identification and discovery within Guyana. Mol Ecol Notes 7: 184–190 20. Francis CM, Borisenko AV, Ivanova NV, Eger JL, Lim BK, Guillén-Servent A, Kruskop SV, Mackie I, Hebert PDN (2010) The role of DNA barcodes in understanding and conservation of mammal diversity in Southeast Asia. PLoS ONE 5:e12575 21. Clare EL, Lim BK, Fenton MB, Hebert PDN (2011) Neotropical bats: estimating species diversity with DNA barcodes. PLoS ONE 6(7):e22648 22. Clare EL, Adams AM, Maya-Simoes AZ, Eger JL, Hebert PDN, Fenton MB. Cryptic species in Parnell’s Mustached Bat (Pteronotus parnellii): molecular, morphological and acoustic evidence. Mol Phylogenet Evol (in review) 23. Clare EL. Cryptic species? Patterns of maternal and paternal gene flow in 8 Neotropical bats. PLoS ONE 6(7):e21460 24. Maya-Simões AZ, Clare EL, Fenton MB (2010) Parnell’s mustached bat (Pteronotus parnellii): a morphologically cryptic species complex. Chiroptera Neotropical 16:130–132 25. Borisenko AV, Kruskop SV, Ivanova NV (2008) A new mouse-eared bat (Mammalia: Chiroptera: Vespertilionidae) from Vietnam. Russ J Theriol 7:57–69 26. Engstrom M, Murphy R, Haddrath O (1999) Sampling vertebrate collections for molecular research: practice and policies. In: Metsger D, Byers S (eds) Managing the modern herbarium: an interdisciplinary approach. Elton-Wolf, Vancouver, pp 315–330 27. DeSalle R, Amato G (2004) The expansion of conservation genetics. Nat Rev Genet 5: 702–712 28. Ruedas L, Salazar-Bravo J, Dragoo J, Yates T (2000) The importance of being earnest: what, if anything, constitutes a “specimen examined?”. Mol Phylogenet Evol 17:129–132 29. Borisenko AV, Sones JE, Hebert PDN (2009) The front-end logistics of DNA barcoding: challenges and prospects. Mol Ecol Resour 9:27–34 30. deWaard J, Ivanova N, Hajibabaei M, Hebert P (2008) Assembling DNA barcodes: analytical protocols. In: Martin C (ed) Environmental genomics, methods in molecular biology, vol 410. Humana Press, Totowa, 275–283
DNA Barcoding in Mammals
181
31. Ivanova N, deWaard J, Hebert P (2006) An inexpensive, automation-friendly protocol for recovering high quality DNA. Mol Ecol Notes 6:998–1002 32. Borisenko AV (2008) Reconnaissance survey of the small mammal community in the Churchill Northern Studies Centre Area. Polar Barcode Life Newsl 1:4 33. Hajibabaei M, Smith MA, Janzen DH, Rodriguez J, Whitfield JB, Hebert PD (2006) A minimalist barcode can identify a specimen whose DNA is degraded. Mol Ecol Notes 6:959–964 34. American Society of Mammalogists Animal Care and Use Committee (1998) Guidelines for the capture, handling, and care of mammals as approved by the American Society of Mammalogists. J Mammal 79:1416–1431 35. Wilson DE, Cole FR, Nichols JD, Rasanayagam R, Foster MS (eds) (1996) Measuring and monitoring biological diversity. Standard methods for mammals. Smithsonian Institution Press, Washington; London 36. Lillestolen T, Foster N, Wise S (1993) Development of the National Marine Mammal Tissue Bank. Sci Total Environ 139:97–107 37. Triant DA, DeWoody JA (2007) The occurrence, detection, and avoidance of mitochondrial DNA translocations in mammalian systematics and phylogeography. J Mammal 88:908–920 38. Hanner R, Corthals A, Dessauer H (2005) Salvage of genetically valuable tissues following a freezer failure. Mol Phylogenet Evol 34:452–455 39. Kilpatrick C (2002) Noncryogenic preservation of mammalian tissues for DNA extraction: an assessment of storage methods. Biochem Genet 40:53–62 40. Nunley WC, Schuit KE, Dickie MW, Kinlaw JB (1972) Delayed, in vivo hepatic post-mortem autolysis. Virchows Arch Abt B Zellpath 11:289–302 41. Tomita Y, Nihira M, Ohno Y, Shigeru S (2004) Ultrastructural changes during in situ early postmortem autolysis in kidney, pancreas, liver, heart and skeletal muscle of rats. Leg Med 6:25–31 42. Herdson P, Kaltenbach J, Jennings R (1969) Fine structural and biochemical changes in dog myocardium during autolysis. Am J Pathol 57:539–557 43. Ljungman M, Hanawalt PC (1992) Efficient protection against oxidative DNA damage in chromatin. Mol Carcinog 5:264–269 44. Scheuerle A, Pavenstaedt I, Schlenk R, Melzner I, Rödel G, Haferkamp O (1993) In situ autolysis of mouse brain: ultrastructure of mitochondria and the function of oxidative phosphorylation and mitochondrial DNA. Virchows Arch B Cell Pathol 63:331–334
182
N.V. Ivanova et al.
45. Borisenko A, Dooh R (2007) An electronic lab book to facilitate high throughput DNA barcoding. CCDB Adv Meth Release 8:1 46. Ivanova N, Zemlak T, Hanner R, Hebert P (2007) Universal primer cocktails for fish DNA barcoding. Mol Ecol Notes 7:544–548 47. Meyer CP (2003) Molecular systematics of cowries (Gastropoda: Cypraeidae) and diversification patterns in the tropics. Biol J Linn Soc 79:401–459
48. Robins JH, Hingston M, Matisoo-Smith E, Ross HA (2007) Identifying Rattus species using mitochondrial DNA. Mol Ecol Notes 7:717–729 49. Ward R, Zemlak T, Innes B, Last P, Hebert P (2005) DNA barcoding Australia’s fish species. Philos Trans R Soc B 360: 1847–1857 50. Messing J (1983) New M13 vectors for cloning. Methods Enzymol 101:20–79
Chapter 9 Methods for DNA Barcoding of Fungi Ursula Eberhardt Abstract This chapter describes methods currently used for DNA barcoding of fungi, including some comments on the barcoding of aged herbarium material. The collecting procedures are focussed on macro-fungi. The laboratory methods are for medium-throughput DNA barcoding, targeted at the 96-well format, but without the assistance of robotics. In the absence of an approved and standardized DNA barcoding locus for fungi, the chapter outlines the amplification and sequencing of nuclear ribosomal genes, ITS, and LSU D1/D2 which are most widely used for the identification of fungi from diverse environments. Key words: ITS, LSU D1/D2, Museum specimens, Silica columns, Glass fibre filter plates, DNA barcode
1. Introduction Fungi are all around us—i.e. in the soil, as spores in the air, in the sea, as harmless or parasitic endophytes or mycorrhizal partners in plants, in our houses, in our food, on our body surfaces and, if we are unlucky, also as pathogens inside our bodies. Such, fungi have an enormous biological and economical impact. Only occasionally fungi display the morphological characters needed for identification, which can then often only be done by experts. In many cases, fungi need to be cultured under various conditions to obtain a species identification with non-molecular methods. In addition, the great majority of fungi can still not be cultured. This is why DNA barcoding is such a great opportunity to make the fungal diversity accessible, not only to mycologists, but also to non-experts. Fungi are a diverse group of organisms, representing a large range of life styles and life forms. This chapter concentrates on DNA barcoding of macro-fungi, i.e. fungi that lend themselves to specimen-based barcoding. This definition would also include fungi that form, for instance, sori or other structures with sufficient
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_9, © Springer Science+Business Media, LLC 2012
183
184
U. Eberhardt
mycelium concentration to facilitate direct isolation of DNA. Yet, the molecular methods detailed below can also be applied to mycelium or spores sampled from cultures. At the CBS culture collection, where the strains are kept as mycelium under liquid nitrogen or as freeze-dried spores, the same methods are used for DNA barcoding diverse fungi without prior cultivation. Obviously, there are other strategies for fungal DNA barcoding than specimen-based. Cultures may be obtained from mycelia growing out from life or dead substrates. Cultivation details will greatly depend on the kind of fungi and substrates investigated (see i.e. refs. 1, 2). Therefore, discussing all possible collection scenarios would be beyond the scope of this paper. Some taxonomic groups may require special procedures, for example, the Glomeromycota (3) which have more diverse genomes than other fungi. Environmental barcoding is likely to discover fungi that are commonly overlooked, because they do not form obvious, or in fact, any kind of sporebearing structures or they do not survive in culture. Fungal species level taxonomy has been greatly changed through the advent of molecular methods. Some species seem to have wide distributions (i.e. some circumpolar species), but more often than not, species that were thought to occur on more than one continent, i.e. Lactarius deliciosus (4), were subsequently shown to be different species. Even when collecting in wellexplored geographic areas for fungal groups, the discovery of potentially new or cryptic species is likely (5). Therefore, thorough documentation of the fresh material beyond the obligatory habitat photographs is advisable, if the DNA barcoded collections are to be used for species descriptions. The term “cryptic” species is increasingly used for taxa when researchers are not able to distinguish groups of specimens morphologically that deviate with view to one or several molecular markers (6). Voucher-based DNA barcoding coupled with good initial specimen documentation offers an excellent opportunity for getting a more refined view on such “cryptic species”. Experienced fungal morphologists will likely be able to find differences among putative cryptic species, allowing the differentiation of morphospecies supporting molecular findings (ref. 7 and therein). In addition, increasingly many mycologists are ready to accept taxa that can only be delimited by molecular markers (i.e. multi-locus sequence typing), and not by morphological, physiological, or mating behavioural characters (8). The apparent ubiquity of cryptic species underlines the necessity of sequencing DNA barcode loci of taxonomic type specimens or ex type strains to ascertain the taxonomic assignment of barcode records—provided the collections that hold these types allow the removal of material and there is sufficient material of the type collection (see Note 1). At this moment, the selection of standardized and CBOLapproved DNA barcoding marker(s) for fungi is still ongoing.
9
Methods for DNA Barcoding of Fungi
185
Conrad Schoch (NCBI) is leading an initiative that aims at testing a number of loci for their applicability as species barcodes. For more information, consult http://www.fungalbarcoding.org/. An application to CBOL for an alternative locus for fungal DNA barcoding is planned for 2012. Without wishing to anticipate the decision on the fungal barcode marker(s), this chapter concentrates on the two loci that have been most commonly used in fungal identification in the past years, the Internal Transcribed Spacer region (ITS) and the D1/D2 region of the Large Subunit (further referred to as LSU in this chapter), both of the nuclear ribosomal RNA genes. Normally, the ITS is more variable and more difficult to align because of length differences. The LSU is normally not considered to be strictly species specific, but in the absence of a comprehensive reference database this locus tends to be more efficient in placing an unknown sequence in a known context than the ITS. The main reasons why these loci are commonly used is that they can be easily amplified using relatively universal primers, the same or similar protocols can be used for many groups of fungi, and the reference sequence data volume is currently the largest for these loci for fungi. Whether or no the tandem use of these two loci will be advocated for DNA barcoding, it should be noted that they are not complementing each other, as two independent loci would do. The expectation is that for both markers more species will eventually be distinguished than different “ribotypes” occur. This idea is based on the finding that for a number of economically important and species rich genera more species were distinguished based on various other genes than based on the ITS alone (9–11). In such cases, the discriminatory power of LSU tends to be even smaller, typically around the genus level. In spite of this expected shortcoming of the ITS and LSU (or, in fact, of any small number of barcode loci), DNA barcoding based on ITS and LSU still has a high potential of discovering new species. The DNA barcode standard requires that certain data elements (collection data including geographical information, images, trace files, and the barcode sequence) are submitted to public databases (see Lijtmaer et al. this volume). If all of the requirements are met, a sequence will receive the “barcode flag” from GenBank. Until any other locus is approved, only fungal COI sequences are eligible for barcode approval. However, whether or no ITS or LSU will be part of the future DNA barcode for fungi, depositing such information publicly will greatly enhance the value of the sequence data. Many DNA extraction and PCR protocols have been published for fungi. Any protocol that succeeds in producing the required target sequences is in principle applicable for DNA barcoding. The suggested extraction protocols are in the medium economical range. The individual tube protocol was first published in ref. 12 and was adopted, because it only requires one transfer of the
186
U. Eberhardt
extract, thus reducing the possibilities for errors and the volume of plastic waste. This protocol has been successfully used for a number of years for a wide range of fungi. The 96-well plate scale protocols (slightly modified from ref. 13 have not been tested as extensively in this lab, but are sufficiently robust to warrant inclusion here. The presented methods represent medium throughput DNA barcode technology, meaning a few 1,000 sequences per month, without robotics. Whenever possible, the 96-well format is used.
2. Materials 2.1. Fungal Material
1. The standard way to preserve fungal specimens is to dry them. Recently dried specimens are an excellent source of DNA for the amplification of multi-copy genes, provided the drying temperature was not too hot and the material was placed on the drier relatively soon after collecting. Single copy genes may be more problematic. Accordingly, recent specimens from public collections are a good source for DNA barcoding, provided the collection permits the removal of material for DNA extractions. Since the fifties of the last century (M. Noordeloos, personal communication) electric driers have commonly been used to preserve fungal herbarium specimens. Some herbaria advocate methods to press fungal specimens after drying (14). The re-wetting of the specimens is potentially harmful to DNA, but does not render the material useless for DNA extraction. In the nineteenth century, multiple preservation methods were in use, employing diverse chemicals for preservation. Materials treated in this way are not good sources of DNA.
2.2. Field Collecting
1. Knife or, if necessary, other tools to remove fungal fruit bodies from wood or other hard surfaces. 2. Hand lens. 3. Lidded boxes of different sizes, i.e. tackle boxes and food container boxes; size depending on target organisms; aluminium foil to compensate for a lack of boxes. 4. Field book and pencil (dictaphone or other electronic equipment if preferred). 5. Small paper slips, optionally pre-numbered for identifying specimens in photographs, boxes, and later in the drying process. 6. Digital camera, tripod, and white board for setting the white balance. 7. GPS and/or map and compass.
9
2.3. Specimen Preparation
Methods for DNA Barcoding of Fungi
187
1. Paper and cover slips for spore prints (if applicable). 2. Colour charts. 3. Electric dryer (see Note 2). 4. Pre-numbered vials. 5. 2× CTAB buffer: 100 mM Tris–HCl pH 8.0, 20 mM EDTA, 1.4 M NaCl, 2% CTAB (if applicable; see ref. 15 and Note 3). 6. Chemicals for identification (if applicable, see Note 4 and ref. 16). 7. Herbarium labels. 8. Ziplock plastic bags for specimens.
2.4. Sample Preparation for DNA Extraction
1. Forceps. 2. Dissecting microscope (for small or partially infected specimens). 3. Safe lock 1.5 ml tubes (Eppendorf) or comparable. 4. Steel beads approx. 3 mm diameter, i.e. ball bearing balls (alternatively: glass beads, 3 mm diameter). 5. Tissuelyzer (Qiagen) or other bead mill apparatus compatible with 1.5 ml reaction tubes.
2.5. Tissue Lysis
1. Prepman Ultra buffer (Applied Biosystems). 2. Water bath: set to boiling. 3. Microcentrifuge for 1.5 ml reaction tubes.
2.6. DNA Extraction and Purification: Individual Tube Method
1. Prepman Jetquick DNA Clean Up Spin Kit (Genomed).
2.7. DNA Extraction and Purification: 96-Well Plate Format
1. 1× 96-well plate (200 ml well volume) for mixing lysates and binding buffer, 1× 96-well plate (0.5–0.6 ml well volume) for elution; 0.2× square well blocks (2 ml) for waste, tubes for DNA storage.
2. 100% Ethanol, for reconstituting wash buffer M2 in kit. 3. Microcentrifuge with rotor for 2 ml reaction tubes. 4. ddH2O at 65°C, or 10 mM Tris–HCl pH 8 at 65°C, for elution of DNA.
2. AcroPrep 96-well plates, 1 ml well volume with glass fibre filters, 1 mm (Pall Corporation). 3. Vacuum manifold (Pall Corporation) and vacuum pump. 4. Table-top centrifuge with swinging bucket rotors to accommodate microplates, capable of a maximum speed ³5,000 × g force. 5. Adapter collar (Pall) for holding AcroPrep plates on receiver plates during centrifugation.
188
U. Eberhardt
6. Binding buffer: mix guanidine thiocyanate stock (6 M GuSCN, 20 mM EDTA pH 8.0, 10 mM Tris–HCl pH 6.4, 4% Triton X-100), and ddH2O at a 5:1 ratio. 7. Wash buffer 1: mix guanidine thiocyanate stock and concentrated ethanol at a 1:1 ratio. 8. Wash buffer 2: 60% EtOH, 50 mM NaCl, 10 mM Tris–HCl pH 7.4, 0.5 mM EDTA pH 8.0. 9. ddH2O or 10 mM Tris–HCl pH 8, pre-warmed to 65°C, for elution of DNA. 2.8. Primers for ITS and nucLSU PCR Amplification
All primers are non-degenerate and have a wide spectrum of applicability. Forward and reverse primers can be used in all possible combinations for PCR (see also Note 5). 1. ITS forward: ITS5 (17), ITS1-F (18), or V9G (19). 2. ITS reverse: ITS4 (17). Paired with one of the forward primers, this primer will amplify the entire ITS, including the 5.8S. 3. Intermediate ITS primers: ITS primers: ITS2 (17) reverse, ITS3 (17) forward. Paired with one of the forward primers, ITS2 will amplify the ITS1. Paired with one of the reverse primers, ITS3 will amplify most of the 5.8S rDNA and the ITS2. 4. LSU forward: LR0R (20) or NL1 (21). 5. LSU reverse: LR3 (22), NL4 (21), or LR5 (22), paired with one of the LSU forward primers, LR3 and NL4 will amplify ca 600–650 bp of the LSU; LR5 will amplify more than 900 bp.
2.9. PCR Template Mixture
Standard PCR reactions are done in 12.5–25 ml volume. The reaction mix contains: 1. 5–10% DNA extract (individual tube extraction) or 2.5–10% (plate extraction). 2. 10× PCR buffer (Bioline). 3. 0.075 mM dNTP mix (Bioline). 4. 1.875 mM MgCl2. 5. 0.025 U/ml reaction volume Taq (Bioline) (see Note 6). 6. 5% DMSO (Carl Roth)—optional. 7. 0.2 mM of forward and reverse primer.
2.10. Alternative PCR Template Mixture
“Slim” PCR reactions (23), optimized as to dispense with PCR purification, contain: 1. 5–10% DNA extract (individual tube extraction) or 2.5–10% (plate extraction).
9
Methods for DNA Barcoding of Fungi
189
2. 10 × PCR buffer (Bioline). 3. 0.05 mM dNTP mix (Bioline). 4. 1.875 mM MgCl2. 5. 0.025 U/ml reaction volume Taq (Bioline). 6. 5% DMSO (Carl Roth)—optional. 7. 0.15 mM of forward and reverse primer. 2.11. PCR Product Purification
1. For each (ca. 20 ml) PCR product add 0.5 ml exonuclease I 20 U/ml (EXO) and 2 ml Shrimp Alkaline Phosphatase 1 U/ml (SAP) (Fermentas). 2. PCR thermal cycler for incubation of ExoSap.
2.12. Cycle Sequencing
1. Sequencing buffer: (160 ml 1 M Tris–HCl pH 9.0, 3 ml 1 M MgCl2, 50 ml tetramethylene sulphone (melting point around 37°C), 1 ml Tween 20; add ddH2O to 500 ml and filter sterilize. 2. The cycle sequencing reaction mix contains 0.5–1.5 ml PCR product, 0.5–1 ml BigDye® Terminator v3.1 (Applied Biosystems), 3.5 ml sequencing buffer and 0.5 ml of 10 mM primer. Reaction volume is 10 ml. 3. Any of the primers listed under Subheading 2.8 can be used for cycle sequencing. 4. PCR machine and PCR plasticware.
2.13. Cycle Sequencing Product Purification: Sephadex Clean-Up
1. MultiScreen Column Loader and a MultiScreen-HV 0.45 mm filter plate (Millipore). 2. Sephadex G-50 fine (GE Healthcare). 3. ddH2O. 4. 96-well PCR plate for waste (reusable) and 96-well PCR plate for purified sequence solution, compatible with ABI sequencers. 5. Table-top centrifuge with swinging bucket rotor for microplates.
2.14. Alternative Cycle Sequencing Product Purification: Isopropanol Clean Up
1. Table-top centrifuge with swinging bucket rotor for microplates. 2. Sticky foil for covering plate. 3. 75% Isopropanol. 4. 70% Isopropanol. 5. Folded tissue fitting plate holders of centrifuge for drying the cycle sequencing products by centrifugation with the plate turned upside down. 6. ddH2O.
190
U. Eberhardt
3. Methods 3.1. Collecting (see Also Notes 7 and 8)
1. For each collection, note immediately the precise location (GPS) and the setting, i.e. on the ground, on wood, kind of wood, potential mycorrhizal partner (if applicable), etc. 2. Only gather specimens in a single collection when they obviously grow together in the same spot, i.e. if they can be assumed to have been formed by the same mycelium. If in doubt, better make separate collections. When removing fungi from the substrate, make sure to collect the entire fruit bodies, including parts that are dug in the substrate. If available, it is recommended to include fruit bodies of all age states in a collection to facilitate the morphological identification. Avoid fruit bodies contaminated by other fungi. 3. It is recommended to take photographs in situ, assembling all fruit bodies belonging to the same collection. The photograph should document all macroscopic features. The easiest way to take photographs in the field is: (a) Choose a place with even light conditions that is not exposed to the full sun. (b) Use a digital camera and a tripod; do not use flash; use a small aperture for maximum depth of field. (c) Set the white balance of your camera (for details, consult the manual of your camera); depending on the conditions, this may have to be done for each photograph. (d) Arrange collection so that all features are visible in the picture. (e) Collection numbers or some standard measure may be included in the photograph. Allow a separate container for each collection, include a paper slip with a collection id, and if necessary, pad with moss or similar to avoid damage during transport. Part of the material may already be placed in 2× CTAB in the field.
3.2. Documentation
The most important morphological characters for fungal identification are normally microscopic. However, they can be determined from dry material later. Therefore, the main focus of the initial documentation should primarily be on the characters that cannot or were not adequately documented by photograph. Many people find it helpful to use forms for documenting fungal specimens as not to forget any important characters. Note down for each collection: 1. Characteristics of surface, consistency, and size. Colours and colour changes may be determined with the help of a colour
9
Methods for DNA Barcoding of Fungi
191
chart, particularly if the photographs were taken without determining the white balance. Chemical tests may aid later identification (see Note 4). 2. Smell. 3. Taste. For some fungal groups, the taste is an important character for identification, i.e. in the Russulaceae. However, as fungi can be highly toxic, it is advisable only to taste fruit bodies if you know what you are doing. In some parts of the world, it may not be advisable to taste any fruit bodies because of the occurrence of Hanta viruses or other potentially harmful organisms. 4. Spore print colour. The colour of the accumulated spores is an important identification character for many fungi, particularly if you do not have a clear idea what genus you are dealing with. In some taxonomic groups, the spore print colour is also a character used for species identification. To obtain a spore print from a mushroom, place the cap of the mushroom horizontally of a piece of paper, cover and wait for a few hours or over night. Remove the cap and let the paper with the spores dry. Take two cover slips and scrap the spores into a small heap that is then taken up between the two cover slips (ideally forming a dense patch of spores) and place on a colour chart for determining the spore print colour (see also Note 9). 5. If photographs were not taken in the field, it is best to use a standardized background. 6. If required, part of the collection can be stored in CTAB buffer now, unless this was done in the field already. 7. Fresh fungal collections can be stored in the fridge for a limited time, i.e. over night. The sooner the material is dried, the better the chances are of good preservation of the DNA. 8. Place the material together with identification markers on an electric dryer. Big fleshy fruit bodies should be cut into slices. Small specimens can be placed in small sieves (i.e. tea strainers). The drying temperature should be below 40°C (50°C), but above 30°C. Dry until completely dry and place specimens while they are still warm together with prepared herbarium labels, spore prints, and loose notes into ziplock bags. 3.3. General Lab Procedures
1. Always wear gloves and a lab coat to prevent contamination. 2. All plastic ware should be obtained DNAse free or autoclaved. 3. Water used should be molecular biology grade. 4. Pre- and post-PCR working steps should be separated and lab equipment should be dedicated to either pre- or post-PCR. 5. Lysis buffer and extraction chemicals are kept at room temperature. PCR and cycle sequencing chemicals should be kept in
192
U. Eberhardt
the freezer. If in doubt, follow the recommendations of the manufacturer. 6. Follow the waste disposal rules of your institution. 7. Use repetitive pipettes or (extendable!) multi-channel pipettes whenever the number of pipetting steps can be reduced (see Note 10). 3.4. Sample Preparation for DNA Extraction
1. It is recommended to use a dissecting microscope for small specimens, corticoid fungi or if the material is not in a good state. Flame forceps between specimens to avoid carryover of material. Any part of the fruit body or spores can be used, avoiding darkly pigmented parts of the fungus, if possible. More material may be used, if the material is old. 2. Dry material is disrupted without the addition of liquids. When working from dried material, prepare pre-numbered safe-lock tubes with one metal bead each. Add the equivalent of 1–2 mm2 of gill. 3. When working from soft material (fresh or conserved in CTAB or other liquids), it is recommended to disrupt the material in the lysis buffer. For soft tissue (fresh or frozen material or material conserved in liquids), prepare pre-numbered safe-lock tubes with 110 ml Prepman ultra for single tube purification or 70 ml for plate purification. Fungal material can be kept in Prepman ultra at room temperature for some time. Metal beads should only be added to the buffer immediately prior to disruption and DNA extraction.
3.5. Tissue Lysis
1. Disrupt the material in a bead mill for 2 × 3 min and 30 Hz or until the entire contents of the tube is pulverized (see also Note 11). When working from dry material, add 110 ml Prepman ultra lysis buffer for single column purification or 70 ml lysis buffer per well for plate purification (see also Notes 12 and 13). Once the lysis buffer is added, immediately proceed with DNA extraction. 2. Seal tubes well and place in floating tube holders. 3. Incubate in boiling water for 10–20 min (see Note 13). 4. Centrifuge at maximum speed for 5 min.
3.6. Purification of DNA Extracts: Individual Tube Format
1. Label and assemble jetquick columns and receiver tubes. 2. While the lysates are centrifuged (Subheading 3.5, step 4), add 400 ml M1 to each column (see Note 14). 3. Add up to 100 ml of the supernatant of lysate to each column and mix by pipetting. Discard tube with pellet and metal bead. 4. Centrifuge for 1 min at maximum speed. Discard the flow through.
9
Methods for DNA Barcoding of Fungi
193
5. Wash by adding 700 ml M2 (make sure the EtOH was added previously) and centrifuge for 1 min. Discard flow through. 6. Centrifuge for additional 2 min. 7. Place the columns in the labelled 1.5 ml tubes and discard the receiver tube with the flow through. Add 100 ml of 65°C pre-warmed ddH2O water (see Note 15), leave at room temperature for 2–5 min (up to 20 min recommended if expected DNA yield low) and centrifuge for 1 min. 8. Add again 100 ml of 65°C pre-warmed ddH2O water, leave at room temperature for 2–5 min and centrifuge for 1 min. 3.7. Purification of DNA Extracts: 96-Well Plate Format
This protocol was initially described for plants (13) and is also recommended with modifications for algae (see Saunders, this volume). We have used the protocol successfully for fungi, using manifold and multi-channel pipettes and using a plate centrifuge for the elution steps. 1. Add 100 ml of binding buffer to each well of a 200 ml 96-well plate. 2. Add 50 ml of the supernatant from (Subheading 3.5, step 4) to each well, mix by pipetting and incubate for 4 min on lab bench. 3. Assemble the manifold with a 2 ml square well plate as receiver plate and an AcroPrep filter plate. Transfer the mix from above to the filter plate and place plate under vacuum (23 in HG) for 2 min. 4. Add 180 ml of wash buffer 1 to each well and apply vacuum for 3 min. 5. Add 220 ml of wash buffer 2 to each well and apply vacuum for 2 min. 6. Add 660 ml of wash buffer 2 to each well and apply vacuum for 10 min. 7. Place the Pall AcroPrep plate on a 2 ml square-well block and centrifuge for 5,000 × g for 2 min. 8. Dry off plates in an incubator (45–50°C) for 30 min. 9. Add 50 ml of pre-warmed (65°C) ddH2O to filter plate and incubate at room temperature for 5 min. 10. Assemble Pall collar and a 0.6 ml collection plate. Place filter plate on top and centrifuge for 3 min at 5,000 × g. 11. Repeat step 9, but centrifuge for 5 min at 5,000 × g. Proceed with PCR before transferring the purified DNA extracts to storage tubes (see also Note 16).
194
U. Eberhardt
3.8. PCR Amplification
Prepare the PCR master mix by first adding the generic ingredients (see also Notes 17 and 18), then the primers and last the polymerase. Mix well before distributing to the reaction tubes using a repetitive pipette. The template (DNA extract) is added last. The template concentration is typically 10–20% (i.e. 2.5 ml in 25 ml PCR reaction volume) for the single tube purified extracts and 5–20% for the plate-purified extracts. A PCR strategy for aged material is outlined in Note 19. A typical PCR programme consists of: 1. 5 min initial denaturation at 95°C. 2. Thirty-five cycles of: 30 s denaturation at 95°C, 30 s of annealing at 55°C, 1 min elongation at 70°C. 3. 5 min 70°C for one cycle. 4. Hold at 10°C indefinitely. 5. PCR success is checked with electrophorese on agarose gels (see also Note 6). Proceed only with verified PCR products.
3.9. PCR Product Purification (see Also Note 20)
1. Mix the two enzymes EXO and SAP in the required amounts depending on the number—typically, a full plate—and volume of PCR products. 2. Distribute EXO-SAP mix to the PCR products. 3. Place in a PCR thermal cycler and incubate for 30 min at 37°C, followed by 15 min at 85°C inactivation of the enzymes and hold at 10°C.
3.10. Cycle Sequencing
DNA barcoding requires both strands to be sequenced. All primers listed under Subheading 2.8 can be used for sequencing. We have not observed any negative effects of using the PCR primers as sequencing primers. All downstream forward or upstream reverse primers of the originally used PCR primers can be applied for sequencing. The amount of PCR product and BigDye required in a sequencing reaction depends on the purification method chosen: Sephadex purification requires a “richer” mix. Too strong signals are hardly ever observed in trace files. When going for the isopropanol purification method, the cycle sequencing mix must be more carefully optimized, but uniformly good results can be achieved with lower concentrations of BigDye and template. Using this method, choose a 96-well plate for cycle sequencing that is compatible with an ABI-automated sequencer. 1. Prepare a cycle sequencing master mix for each primer needed, mix well, and distribute the required amount per reaction to the reaction plate. 2. Add the required amount of template (PCR product). 3. Mix and centrifuge contents down to the bottom of the plate (5 s at 100 × g) before placing into the PCR machine.
9
Methods for DNA Barcoding of Fungi
195
4. The standard cycle sequencing program consists of an: (a) Initial denaturation 96°C for 1 min. (b) 25 (-35) cycles of 96°C for 10 s denaturation, 50°C for 5 s annealing, 60°C for 4 min elongation. (c) Hold at 10°C indefinitely. (d) Thermal ramp is 1°C/s in all steps. 3.11. Cycle Sequencing Product Purification: Sephadex Purification
Sephadex tends to draw water. However, it is critical that the powder is completely dry. Therefore, it is recommended to keep a working stock of sephadex. Use only completely dry MultiScreen plates. These plates can be reused about 25× or until the filters appear pink (from the stain in the BigDye) against the light. 1. Shake a small amount of sephadex on MultiScreen Column Loader and distribute the powder into the wells of the loader. 2. Place a MultiScreen plate on the loader, turn around, and pound the back of the loader to fill the wells of the MultiScreen plate evenly. 3. Add 300 ml of ddH2O to each of the wells. 4. Let the filled MultiScreen plate (hence sephadex plate) rest on the lab bench for 3 h. Cover and place in the fridge, if stored for more than 3 h. 5. Assemble sephadex plate on top of a 200 ml 96-well collecting plate and centrifuge at 910 × g for 5 min. 6. Discard flow throw. 7. Add 100 ml of ddH2O and centrifuge again with the receiver plate for 5 min at 910 × g. 8. Discard flow throw. 9. Apply the cycle sequencing products to the sephadex plate. It is important to pipette the sample in the middle of the well, without touching the sephadex plugs. Place the loaded sephadex plate on top of a 96-well plate compatible with your automated sequencer and centrifuge for 5 min with standard settings. Seal the 96-well plate. The plate is now ready for sequencing. Store in the freezer until used and protect from light. 10. Shake the sephadex plugs out of the MultiScreen plate. Clean plate with ddH2O and place for drying.
3.12. Alternative Cycle Sequencing Product Purification: Isopropanol Purification
1. Add 40 ml 75% isopropanol to each sample. Cover with sticky tinfoil cover and vortex. Incubate on lab bench for at least 15 min (but less than 24 h). 2. Centrifuge at 3,000 × g for 45 min. 3. Prepare folded tissue. Immediately after the centrifuge has stopped, remove the plate, remove cover from plate (can be reused), turn the plate upside down and empty into waste vial.
196
U. Eberhardt
Without ever turning over put the plate on the folded tissue and blot until tissue stays more or less dry. Only turn around plate now. (Note: If you are not able to proceed immediately when centrifuge stops, centrifuge for another 2 min before proceeding.) 4. Add 150 ml of 70% isopropanol to samples, reseal with foil and mix by turning plate upside down a few times. 5. Centrifuge at 2,000 × g for 10 min. 6. Again, turn plate upside down and empty into waste vial, put plate on folded tissue without turning around. Prepare a generous multi-layered fold of tissue to fit in the centrifuge carrier. Transfer plate upside down to the folded tissue in the centrifuge carrier. Centrifuge at 700 × g for 1 min. 7. Let air dry on bench top. Store in the freezer until used and protect from light. 8. Add 10 ml of ddH2O for sequencing. 3.13. Sequence Editing
The original trace files are part of any approved DNA barcode. Archive the unchanged trace files until submission to a public database. As a matter of principle, any sequence editing software can be used (see also Note 21). The standard procedures of sequence editing include: 1. Trim the ends of the obtained sequence reads, either based on sequence quality values or on visual control. 2. Align forward and reverse trace(s) for the same sequence and, if applicable, reverse-complement sequence to 5¢–3¢ direction. 3. Inspect visually the unpaired ends of the sequence and correct or cut off if not readable or of insufficient quality. 4. Search for sequence conflicts, and correct based on visual inspection. For ambiguous readings (i.e. overlaying or “double” peaks) that are present in forward and reverse sequence for a given position, use the IUPAC code (see also Note 22). 5. Should ITS and/or LSU be included in the CBOL approved DNA barcode markers for fungi, requirements for sequence quality and length will be formulated. For Saccharomyces cerevisiae, the beginnings and the ends of the ribosomal genes and regions have been determined experimentally. The sequence GenBank acc. no. D89886 can thus serve as a guide as to whether an ITS sequence is complete. However, it is not always easy to locate all of the different region borders exactly.
3.14. Database and Trace File Management
1. A DNA barcode database fulfils the dual purpose of a data depository and a tool for monitoring the results and progress. As a minimum requirement, it should contain all data or links
9
Methods for DNA Barcoding of Fungi
197
to all data that will be needed to meet the DNA barcode data standard. To serve as a monitoring tool, the database has to be kept up to date throughout the project. It is a matter of taste, whether or not a database system is kept on a private or private shared PC in Excel or a database system like Access or Filemaker in combination with alignment and blast software, or whether an online resource like BOLD (24) or PlutoF (25) is used. Some institutions may also own licences for software packages like BioNumerics (Applied Maths), BioloMics (Bio-aware), or Geneious (Biomatters; see Meyer-LIMS this volume) that can also be applied to manage DNA barcode data and have a number of powerful features dealing with DNA sequence data and associated information. 2. If creating your own database, choose the data format (database fields) such that the data can be easily exported to the online database of destination when the data are published (see Kress, Erickson and Lopez, this volume; also Meyer-LIMS, this volume). It is recommended to create a database with two linked tables, one for collections or strains and another one for DNA extracts. It is further recommended to use a system of extraction numbers and not to use the collection number, i.e. for discerning extracts if the same specimen has been used more than once for extraction, or to avoid duplication of numbers if the collections originate from different sources. It is also highly recommended to use a stringent naming system for the trace files that allows for easy retrieval and importation into databases (unless an LIMS or similar is used that takes care of file management or if all data are entered into a public resource immediately after production). 3. Blast searches are a fast and efficient way for checking the taxonomic assignment of obtained data. At this moment, BOLD does not contain many public fungal ribosomal sequences. GenBank is still the best-known one-stop solution for a “quick and dirty” identification of fungal ribosomal sequences. PlutoF which is associated with UNITE (26) offers the possibility of blasting ITS sequences against GenBank and the UNITE database at the same time. UNITE and PlutoF also offer the option of blasting many sequences in one go. Fasta and blast searches of up to ten sequences simultaneously against published fungal ribosomal sequences are also offered at http://www.boreal fungi. uaf.edu/. 4. It is advisable to check the first twenty or so hits of the Blast results to assess the reliability of a GenBank taxon assignment and, if in doubt, check the origin of the reference sequences. Generally speaking, published sequences are more reliable than unpublished ones and sequences gained as part of a species-level
198
U. Eberhardt
Table 1 Public databases for the identification of fungi using sequence data Database name
URL
Fungal Barcoding database. A computerized key to species of Penicillium subgenus Penicillium
http://www.fungalbarcoding.org/BioloMICSSequences. aspx?file=all
The Dermatophytes ITS DNA barcode database
http://www.cbs.knaw.nl/dermatophytes/
TrichOKEY (Tricoderma, Hypocrea)
http://www.isth.info/
UNITE. A database for the identification of fungi
http://unite.ut.ee/
taxonomic study are presumably the most reliable. Table 1 gives an overview over public dedicated and in most cases also curated databases for the identification of fungal sequences. A number of these databases also allow multi-locus sequence typing including other genes. 5. It is advisable to create your own blast database, including own sequences and possibly other “curated” sequences that are considered trustworthy and relevant references. A number of commercially available programmes offer the possibility of creating blast databases (i.e. SeqMan Pro (DNASTAR), Genious (Biomatters)). For those who do not want to negotiate the freely distributed Blast + software (NCBI), BioEdit (T. Hall, freeware) may be a less challenging solution. If you establish your project in BOLD or PlutoF, you can use the Blast search options of those resources. 6. In addition, it is recommended to assemble alignments with barcode sequences and make at least simple neighbour joining analyses to detect potential problem taxa or sequences that do not cluster as expected. There are many options of doing this and it depends on the local set up what software or online resources work best. Two things are important: (a) Use an alignment method that aligns identical or very similar sequences the same way, regardless of whether the sequences are entered in the alignment in direct succession or some much later than others. (b) Whatever phylogenetic analysis method you prefer, choose a tree output that includes branch lengths.
9
Methods for DNA Barcoding of Fungi
199
4. Notes 1. For completing DNA barcode data sets, it may be necessary to use older material for rare species or taxonomic types. It is absolutely necessary to consult the herbarium whether types or, in fact, any material, are available for DNA extraction and barcoding. In addition, it is recommended to process type specimens and old material separate from recent collections and take all possible precautions against cross-contamination, such as strictly separating pre- and post-PCR, strictly using filter tips and trying to confirm the results from independent extractions. 2. Electric dryers are recommended because the drying temperature is crucial. Obviously, also other sources of heat can be used if an even temperature can be maintained. Theoretically, there is an inverse relation ship between added heat and DNA preservation. However, adverse effects have not observed of leaving the specimens a few days longer on the dryer than strictly necessary. 3. Under certain conditions, it is advisable to preserve some material in other ways then just drying to have an alternative source of DNA, if the material is fragile, the climate is hot and moist, reliable drying facilities are unavailable or problems are foreseen with amplifying single copy genes. Preservation methods other than 2× CTAB include: freezing of fresh material at −20°C or colder; preservation in 70–90% ethanol (works well for shortterm preservation, but the ethanol will eventually degrade and the material will spoil); or Whatman FTA paper (27). 4. For some groups of fungi, chemical tests of fresh fruit bodies can greatly aid identification. The most important chemicals, also for microscopy are listed in ref. 16. The use of chemicals in the field is recommended, if such tests can help to avoid collecting many times the same taxa. Otherwise, it may be easier to test chemical reactions back at the base, as many of the chemicals are corrosive. 5. The forward ITS primer ITS1 (17) was omitted from the list because it does not amplify from many ascomycete taxa. For further information and primers see also refs. 28–30. 6. If agarose gels for verifying the amplification success are stained with the nucleic acid stain, the use of stained Taq [i.e. BioTaq Red (Bioline)] is recommended. 7. Fungal fruiting shows a strong dependence on weather events, often paired with a strong seasonality. If collecting away from home, it is strongly recommended to keep a flexible collecting
200
U. Eberhardt
schedule and find a local contact (colleague; local mycological societies) to survey the fruiting situation in the target habitat(s) and organisms. While rain is needed to trigger fruiting and spore formation of many fungi, collecting and specimen preservation works better when done in dry weather. The time window for collecting great numbers of specimens can be very small in some years. While there are typical fruiting times for seasonal climates, be aware that there are always species with deviating fruiting times. 8. If collecting is good, a single person can easily collect in a few hours enough material to be busy until late at night with sample preparation and documentation. Take that into account when planning your collecting. 9. For obtaining good spore prints, it is important to have mature, but not over-mature specimens. It is recommended to create a moist atmosphere, by covering the fruit body on the paper, adding a bit of moist tissue or placing the paper with the fruit body in a small plastic bag. Sometimes, it is necessary to “sacrifice” half caps for obtaining a spore print. It is often not possible to get spore prints in low temperatures, such as in the fridge. 10. Many labs use economy pipette tips. When relying on multichannel pipettes, the fit of the tips can make an enormous difference, both in terms of accuracy and time saved. If it is often troublesome to eject the tips, the tips are unevenly filled or cannot hold their liquid well, it is advisable to test other brands of tips. The fit of the tips of a given brand may vary with pipette volume. 11. The disruption step can be omitted when working with yeasts. Disruption may not be strictly necessary for all other fungi, but it ensures a better over-all success. 12. When disrupting the material in liquid, the use of heavy beads is recommended. For dried material, glass beads (i.e. two 3 mm beads per sample) may be sufficient. 13. When glass beads or other non-corroding beads are used, the exposure to the lysis buffer may be extended. After incubation at 100°C, the samples can be left on the lab bench for an hour or even over night. Extending the incubation at 100°C to more than 20 min is not advisable. 14. Most purification protocols using silica columns recommend to pipette the supernatant to a new 1.5 ml reaction tube, add the binding buffer, mix and only then add to the column. The recommended Jetquick columns are not leaking within the time normally used for processing 24–30 samples (= the rotor capacity of an average microcentrifuge) so that the mixing can be done in the columns, thus saving plastics and time.
9
Methods for DNA Barcoding of Fungi
201
15. Using pre-warmed ddH2O or elution buffer ensures that also the long DNA fragments are eluted from the column. Waiting for more than 2–5 min may even improve results. 16. Sometimes, with old material, salt precipitation methods, such as the Gentra Puregene kit (Qiagen) or Masterpure kit (Epicentre Technologies), are successful where silica- column or magnetic bead-based methods fail. These methods can also be up-scaled to the 96-well format. 17. The addition of DMSO to PCR reactions seems to increase the success rate of ITS PCR reactions more often than LSU PCR reactions. 18. It is possible to amplify the ITS and LSU in a single amplification reaction, using any of the ITS forward primers paired with any of the LSU reverse primers. However, if dealing with a set of extracts from diverse fungi or from specimens of different ages and if a high initial success rates are desired, it is recommended to amplify the ITS and the LSU separately. 19. When working with aged material, fragmentation of DNA seems to be one of the main sources of PCR problems. The following strategy is recommended for difficult materials: (a) Standard PCR amplifications of the ITS and LSU. (b) For DNA extracts with negative ITS results, try the amplification of the ITS in two parts, using intermediate primers (see Subheading 2.8, item 3) located in the 5.8S rDNA; for negative LSU results, try using one of the primer combinations for amplifying a shorter fragment (ca. 600–650 bp) of the LSU. (c) Re-extract DNA from the specimens that still do not show any amplification success, if possible from another element of the collection or part of the fruit body and start again at (a). If the second cycle is equally unsuccessful and the collection is of sufficient importance, other extraction methods may be considered (see Note 16) Apart from optimizing PCR reactions, trying out different Taq polymerases may also help. Be aware that contaminations that are relatively rare when working with DNA extracts from recent specimens are rather common when working with old specimens. Therefore, it may be advisable to confirm important or even slightly doubtful results by a second independent result, starting from another DNA extract. 20. Whether or no PCR product purification is necessary, depends on the local sequencing set up, the PCR mix and the length of the desired sequence: the longer the sequence, the greater the profit from the purification. If PCR products are sent between institutions for sequencing, it is generally recommended to purify the PCR products. 21. If contigs are to be exchanged between researchers, interchangeable export formats of contigs or trace files may be a
202
U. Eberhardt
consideration as to which editing software is chosen. If edited trace files can be exported in a .scf format, contigs can be recreated in any other software. However, the unmodified trace files are part of the obligatory barcode data and should not be overwritten. 22. Sequence editors that do not change the trace image when bases (or gaps) are inserted or removed are the best choice when difficult traces are negotiated at a routine bases. Examples are Sequencher (GenCodes), CodonCode aligner (CodonCode Corporation), and 4Peaks (A. Griekspoor and T. Groothuis, www.mekentosj.com; the latter is a simple freeware trace viewer; contigs cannot be made). Ribosomal genes are multi-copy genes and have been shown to occur in different versions in the genome (31). In direct sequencing, this only plays a role if a small number deviating ribotypes are present at relatively high proportions. It is a recurrent and often taxon-specific problem with ITS and to a lesser degree also with LSU sequences. This then causes the occurrence of overlaid peaks in traces. The least troublesome manifestation of this phenomenon are isolated “double peaks” that can be observed in corresponding positions in both strands of otherwise low-noise trace data. Such ambiguous peaks that can easily be translated into IUPAC code. Obviously, there is no absolutely sure way from telling such difference from noise. However, when comparing sequences from different collections of the same taxon it can often be observed that such ambiguities typically occur at the same positions. They may be observed many of the sequences from the same species originating from a wide geographical range. When length differences occur mixed templates become problematic. Length differences of the ITS is more often observed in direct sequences of basidiomycetes (dikarya!) than of ascomycetes. As with single base pair exchanges, these results tend to be highly reproducible for the same specimen, if sequencing is repeated, and even for different specimens from the same taxon. In some taxa, almost all genets could be affected. In traces, this phenomenon is visible as “double” or even multiple peaks (if varying numbers of base pairs were inserted or deleted) downstream of the point of insertion or deletion (see Fig. 1). If such differences occur in inter-simple sequence repeats or microsatellites where different numbers of repeats may interfere with the readability of the downstream sequence, it is normally not easy to tell whether these differences are the result of “reading errors” in the in vitro amplification or also occur in vivo. The omission of a number of bases could also be the result of secondary structure formation if elongation already proceeds during the annealing step and annealing is
9
Methods for DNA Barcoding of Fungi
203
Fig. 1. Direct sequencing result of two separate ITS haplotypes of Hebeloma sp. that differ by a 1 base InDel and a single base pair mutation. The forward sequence is at the bottom, the reverse on top. The grey boxes highlight the positions where the two ITS haplotypes differ.
done at low temperatures. This effect may be avoided by designing primers with higher annealing temperatures than the universal ITS primers (i.e. ref. 32). However, not all mixed template variation follows patterns suggestive of this process. If the trace data is of good quality and not more than two peaks are superimposed on each other in any given place, one might consider editing these sequences anyhow with view to the fact that cloning may produce outsider sequences (31) and that a great majority of the members of a taxon may show the same sequence pattern. The most honest but also time consuming approach is to edit each of the traces separately and thereby formulating a hypothesis of how the observed pattern of double peaks came about (see Fig. 1) and then test this hypothesis when aligning the two opposite traces with each other. Unfortunately, there is no accepted way of expressing the result in a single sequence, as there is no code for “A or nothing”, “C or nothing”, etc. If more than two peaks are superimposed in a stretch of sequence, it is impossible to proceed with Sanger sequencing without cloning.
Acknowledgements My former colleagues from the Department of Forest Mycology and Pathology at the Swedish University of Agriculture Science in Uppsala, the members of the Genome Center of Uppsala University, and my current colleagues from the CBS-KNAW Fungal Biodiversity Centre (Centraalbureau voor Schimmelcultures) in Utrecht have contributed over the years to the protocols and experiences assembled in this chapter. Their contributions are greatly appreciated.
204
U. Eberhardt
References 1. Crous PW, Verkley GJM, Groenewald JZ, Samson RA (2009) Fungal biodiversity. CBS laboratory manual series vol. 1. CBS Fungal Biodiversity Centre, Utrecht 2. Mueller GM, Bills GF, Foster MS (2004) Biodiversity of fungi. Inventory and monitoring methods. Elsevier Academic Press, London 3. Stockinger H, Krüger M, Schüßler A (2010) DNA barcoding of arbuscular mycorrhizal fungi. New Phytol 187:461–474 4. Nuytinck J, Verbeken A, Miller SL (2007) Worldwide phylogeny of Lactarius section Deliciosi inferred from ITS and glyceraldehyde3-phosphate dehydrogenase gene sequences. Mycologia 99:820–832 5. Brock PM, Döring H, Bidartondo MI (2009) How to know unknown fungi: the role of a herbarium. New Phytol 181:719–724 6. Crespo A, Lumbsch HT (2010) Cryptic species in lichen-forming fungi. IMA Fungus 1:167–170 7. Del-Prado R, Cubas P, Lumbsch HT, Divakar PK, Blanco O, Amo de Paz G, Molina C, Crespo A (2010) Genetic distances within and among species in monophyletic lineages of Parmeliaceae (Ascomycota) as a tool for taxon delimitation. Mol Phylogenet Evol 56:125–133 8. Taylor JW, Turner E, Pringle A et al (2007) Fungal species: thoughts on their recognition, maintenance and selection. In: Gadd GM, Watkinson SC, Dyer PS (eds) Fungi in the environment. Cambridge University Press, Cambridge, pp 313–339 9. Geiser DM, Klich MA, Frisvad JC et al (2007) The current status of species recognition and identification in Aspergillus. Stud Mycol 59: 1–10 10. Seifert KA, Samson RA, deWaard JR et al (2007) Prospects for fungus identification using CO1 DNA barcodes, with Penicillium as a test case. Proc Natl Acad Sci USA 104:3901–3906 11. Aveskamp MM, de Gruyter J, Woudenberg JHC et al (2010) Highlights of the Didymellaceae: a polyphasic approach to characterise Phoma and related pleosporalean genera. Stud Mycol 65:1–60 12. Taylor AFS, Hills A, Simonini G et al (2007) Xerocomus silwoodensis sp. nov., a new species within the European X. subtomentosus complex. Mycol Res 111:403–408 13. Ivanova NV, Fazekas AJ, Hebert PDN (2008) Semi-automated, membrane-based protocol for
DNA isolation from plants. Plant Mol Biol Rep 26:186–198 14. Ryman S, Holmåsen I (1992) Svampar. En fälthandbok, 3rd edn. Interpublishing, Stockholm 15. Halling RE (1996) Recommendations for collecting mushrooms for scientific study. In: Alexiades MN, Sheldon JW (eds) Selected guidelines for ethnobotanical research: a field manual. Botanical Garden Press, New York, pp 135–141 16. Largent DL, Baroni TJ (1988) How to identify mushrooms to genus VI: modern genera. Mad River Press, Eureka, CA 17. White TJ, Bruns T, Lee S, Taylor J (1990) Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. In: Innis MA, Gelfand DH, Sninsky JJ, White TJ (eds) PCR protocols a guide to methods and applications. Academic Press, San Diego, CA, pp 282–287 18. Gardes M, Bruns TD (1993) ITS primers with enhanced specificity for Basidiomycetes – application to the identification of mycorrhizae and rusts. Mol Ecol 2:113–118 19. De Hoog GS, Gerrits van den Ende AHG (1998) Molecular diagnostics of clinical strains of filamentous Basidiomycetes. Mycoses 41: 183–189 20. Rehner SA, Samuels GJ (1994) Taxonomy and phylogeny of Gliocladium analysed from nuclear large subunit ribosomal DNA sequences. Mycol Res 98:625–634 21. O’Donnell K (1993) Fusarium and its near relatives. In: Reynolds DR, Taylor JW (eds) The fungal holomoph: mitotic, meiotic, and pleomorphic speciation in fungal systematics. CAB International, Wallingford, pp 225–233 22. Vilgalys R, Hester M (1990) Rapid genetic identification and mapping of enzymatically amplified ribosomal DNA from several species of Cryptococus. J Bacteriol 172:4238–4246 23. Groenewald M, Groenewald JZ, Crous PW (2005) Distinct species exists within the C. apii morphotype. Phytopathology 95:951–959 24. Ratnasingham S, Hebert PDN (2007) BOLD: The barcode of life data system (www. barcodinglife.org). Mol Ecol Notes 7:355–364 25. Abarenkov K, Tedersoo L, Nilsson RH et al (2010) PlutoF—a Web based workbench for ecological and taxonomic research, with an
9 online implementation for fungal ITS sequences. Evol Bioinformatics 6:1–8 26. Kõljalg U, Larsson K-H, Abarenkov K et al (2005) UNITE – a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytol 166:1063–1068 27. Dentinger BM, Margaritescu S, Moncalvo J-M (2010) Rapid and reliable high-throughput methods of DNA extraction for use in barcoding and molecular systematics of mushrooms. Mol Ecol Res 10:628–633 28. Martin K, Rygiewicz P (2005) Fungal-specific primers developed for analysis of the ITS region of environmental DNA extracts. BMC Microbiol 5:28
Methods for DNA Barcoding of Fungi
205
29. Vancov T, Keen B (2009) Amplification of soil fungal community DNA using the ITS86F and ITS4 primers. FEMS Microbiol Lett 296: 91–96 30. Bellemain E, Carlsen T, Brochmann C et al (2010) Environmental DNA barcode for fungi: an in silico approach reveals potential PCR biases. BMC Microbiol 10:189 31. Simon UK, Weiss M (2008) Intragenomic variation of fungal ribosomal genes is higher than previously thought. Mol Bio Evol 25:2251–2254 32. Bakkeren G, Kronstad JW, Lévesque CA (2000) Comparison of AFLP fingerprints and ITS sequences as phylogenetic markers in Ustilaginomycetes. Mycologia 92:510–521
Chapter 10 Methods for DNA Barcoding Photosynthetic Protists Emphasizing the Macroalgae and Diatoms Gary W. Saunders and Daniel C. McDevit Abstract This chapter outlines the current practices used in our laboratory for routine DNA barcode analyses of the three major marine macroalgal groups, viz., brown (Phaeophyceae), red (Rhodophyta), and green (Chlorophyta) algae, as well as for the microscopic diatoms (Bacillariophyta). We start with an outline of current streamlined field protocols, which facilitate the collection of substantial (hundreds to thousands) specimens during short (days to weeks) field excursions. We present the current high-throughput DNA extraction protocols, which can, nonetheless, be easily modified for manual molecular laboratory use. We are advocating a two-marker approach for the DNA barcoding of protists with each major lineage having a designated primary and secondary barcode marker of which one is always the LSU D2/D3 (divergent domains D2/D3 of the nuclear ribosomal large subunit DNA). We provide a listing of the primers that we currently use in our laboratory for amplification of DNA barcode markers from the groups that we study: LSU D2/D3, which we advocate as a eukaryote-wide barcode marker to facilitate broad ecological and environmental surveys (secondary barcode marker in this capacity); COI-5P (the standard DNA barcode region of the mitochondrial cytochrome c oxidase 1 gene) as the primary barcode marker for brown and red algae; rbcL-3P (the 3¢ region of the plastid large subunit of ribulose-l-5-bisphosphate carboxylase/ oxygenase) as the primary barcode marker for diatoms; and tufA (plastid elongation factor Tu gene) as the primary barcode marker for chlorophytan green algae. We outline our polymerase chain reaction and DNA sequencing methodologies, which have been streamlined for efficiency and to reduce unnecessary cleaning steps. The combined information should provide a helpful guide to those seeking to complete barcode research on these and related “protistan” groups (the term protist is not used in a phylogenetic context; it is simply a catch-all term for the bulk of eukaryotic diversity, i.e., all lineages excluding animals, true fungi, and plants). Key words: Bacillariophyta, Chlorophyta, COI-5P, DNA barcode, LSU, Phaeophyceae, Protist, rbcL-3P, Rhodophyta, tufA, DNA barcoding
1. Introduction Few tasks are as challenging as trying to use dichotomous keys and other primary literature to identify marine macroalgae. Even a taxonomic specialist, when confronted with exclusively vegetative W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_10, © Springer Science+Business Media, LLC 2012
207
208
G.W. Saunders and D.C. McDevit
material (reproduction often an indication of ordinal and familial affinities) or even with reproductive material when considering species within a genus, can find frustration in this seemingly simple task. The challenges derive from commonalities of marine macroalgae, viz., simple morphology and anatomy, rampant convergence (in part owing to the previous), remarkable degrees of phenotypic plasticity in response to environmental factors, and incompletely understood life histories with alternation of heteromorphic generations (1). Microalgae are challenging to identify for all of the same reasons as the macroalgae, but have the added challenge of being small such that all diagnostic features are observable only at the microscopic level. In many geographic regions and for many lineages dichotomous keys are not even available and, where they are, many only take the researcher to the level of a species complex or genus. It is for this reason that algal taxonomists have increasingly resorted to molecular tools for routine sample identification, species discovery, and other related taxonomic tasks (see ref. 1 for a summary). Although a step in the right direction, Saunders (1) noted that the tendency for different laboratories to focus on their “pet gene of choice” was generating an unsatisfactory shortcoming in the ongoing efforts among algal systematists—the lack of a universally applied marker was limiting the development of a global system for algal identification and for routine comparisons of species identifications among research groups. There is unquestionably ample justification for the development of multiple and divergent molecular markers for phylogenetic and taxonomic studies, but agreement on a standard marker among all researchers for the purposes of quick and accurate species identification would be a powerful tool for the practicing taxonomist in generating a database to allow for global comparisons of species diversity and identification. The field of DNA barcoding has explicitly recognized the previous shortcoming and has championed the use of the 5¢ end of the mitochondrial marker cytochrome c oxidase subunit 1 (COI-5P) to generate a global comparative genetic database for the animals. In a pair of foundation publications, Hebert et al. (2, 3) established the utility of COI-5P as a “core of a global bio-identification system for animals.” These authors established that, for a variety of animal species, this gene could be used to assign unknown species to high-level taxa, and where comprehensive databases were established species-level assignments were possible. It is now clear that a single barcode marker does not meet the needs of the biodiversity community, which is not overly surprising, especially for the protists—an artificial assemblage of lineages so broad in their evolutionary breadth and depth that they dwarf the better known Animalia, Fungi, and Plantae. Regardless, the minimalist agenda (with regard to the number of markers necessary to include all of
10
Methods for DNA Barcoding Photosynthetic Protists…
209
eukaryote life in a global database) of the barcode vision remains a worthwhile objective and a guiding principle in our marker selection strategy (see Note 1). Here we present streamlined procedures, from our approach to collecting macroalgae in the field to marker selection and the final generation of sequence data, that have greatly facilitated our efforts for the DNA barcoding of particular lineages of protists. We expect that much of what we have learned will be broadly applicable to researchers of the protist community interested in this global initiative.
2. Materials 2.1. DNA Extraction
1. DNA extraction tubes: 1.1 ml PROgene® Mini Tube System (UltiDent Scientific, St. Laurent, QC). 2. Tissue lyser: Tissuelyser II (Qiagen Inc., Valencia, CA). 3. Red and green algal extraction buffer: 0.1 M Tris base pH 8.0, 2.5 M potassium acetate, 0.05 M disodium ethylenediaminetetraacetic acid (EDTA), 0.2 M NaCl (4). 4. Brown algal and diatom extraction buffer: 0.1 M Tris base pH 8.0, 0.3 M CaCl2, 0.05 M EDTA, and 0.2 M NaCl (4, 5). 5. Extraction buffer additions: 10 ml of 10% Tween-20 and 1 ml of Proteinase K (20 mg/ml) for each 100 ml of buffer used (4). 6. DNA extraction robot: Qiaxtractor DNA purification robot (Qiagen Inc.). 7. DNA extraction robot filter plate: 1 ml Acroprep 96-well filter plate (1 mm glass fiber media—Pall Life Sciences, Ann Arbor, MI).
2.2. DNA Binding, Wash, Elution, and Storage Buffers
1. Binding buffer: 6 M GuSCN, 20 mM EDTA pH 8.0, 10 mM Tris–HCl pH 6.4, and 4% Triton® X-100 (6). 2. First wash buffer: 1:1 mixture of the binding buffer: 95% EtOH (6). 3. Second wash buffer: 60% EtOH, 50 mM NaCl, 10 mM Tris–HCl pH 7.4, and 0.5 mM EDTA pH 8.0 (6). 4. DNA elution and storage buffer (TE): 10 mM Tris–HCl (pH 8.0) and 1 mM EDTA.
2.3. PCR Reaction Mix
The ratio of reagents was optimized from the manufacturer’s protocol to minimize the level of unincorporated dNTPs in a technique modified from Ivanova et al. (7). 1. 5× MgCl2-free buffer (Takara, Shiga, Japan). 2. MgCl2: 25 mM.
210
G.W. Saunders and D.C. McDevit
3. dNTPs: 2.5 mM each. 4. Forward and reverse primers, each: 10 mM. 5. Ex Taq polymerase: 5 U/ml (Takara, Shiga, Japan). 6. Molecular biology grade H2O (Fisher Scientific, Fair Lawn, NJ). 2.4. Sequence Reaction Mix
1. 5× Big Dye master mix (PE Applied Biosystems (ABI), Foster City, CA). 2. 5× Sequencing Buffer: 350 mM Tris–HCl pH 9.0, 1.25 mM MgCl2. 3. Molecular biology-grade H2O. 4. Primer: 0.16–0.33 mM (see Note 2) per 10 ml reaction.
3. Methods 3.1. Intertidal Field Procedures for Macroalgae
Our field procedures for macroalgae have been modified dramatically over the years in efforts to move algal barcoding to a highthroughput format (see Note 3). 1. Intertidal work is best done in teams of two or more. For each specimen, we take a photograph in situ (if practical) and write the number of the image on a piece of write-in-the-rain paper along with any other collection details, such as where the alga was growing (e.g., location in the intertidal, low to high, and whether or not it was in a tide pool), what it was growing on (e.g., rock, other algae, etc.), and any other distinguishing features that may be lost when the specimen is removed from the field (e.g., iridescence). 2. The alga is then placed in a plastic bag with the corresponding piece of paper and we move on to the next collection. Using this approach, we can easily obtain a few hundred samples during a collecting episode (e.g., a low tide).
3.2. Subtidal Field Procedures for Macroalgae
Obviously, the degree of detail that we can record during scuba collecting is less than for intertidal specimens, and taking in situ images is virtually impossible unless you are willing to settle for a low number of samples per dive, which greatly increases the cost per sample. 1. For scuba collecting, we try and place collections into individual bags, and then place these into larger bags with other specimens from a similar depth (in essence, “binning” depths on a dive relative to each dive’s overall profile). 2. If the alga is growing on substrate that can be co-collected with the sample, this is done so that this information can be later recorded.
10
Methods for DNA Barcoding Photosynthetic Protists…
211
3.3. Field Procedures for Microalgae (Diatoms)
We have done little in our laboratory with regard to enhanced sampling of microalgal lineages. We still rely on plankton tows with the samples retained in seawater (or a DNA-friendly fixative for singlecell trials) and transported to the laboratory in jars stored on ice.
3.4. Post-field Preparation for Macroalgae
On return to the laboratory, macroalgal specimens are processed in turn. 1. Each sample is photographed if this was not completed in the field. 2. The specimen is then searched for a relatively clean region that is further cleaned, blotted dry with paper towel, and then wrapped in clean kimwipe for later use in molecular work (Fig. 1a). 3. Diagnostic pieces of the alga, e.g., the base, branch tips, and reproductive material, are then subsampled to serve as a voucher for later microscopy. Voucher fragments are coarsely cleaned, blotted dry, and then placed in prenumbered vials filled to two-thirds with silica gel that are sealed in parafilm to reduce exposure to moisture (Fig. 1b) (see Note 4). 4. The DNA subsample, wrapped in kimwipe, is then placed in the prenumbered tube (see Note 4), which is closed and resealed with parafilm.
Fig. 1. Field samples are photographed and then a small piece of sample is cleaned of all obvious surface contaminants, blotted dry, and wrapped in kimwipe (a). An additional subsample emphasizing diagnostic traits (vegetative base and tips, as well as reproductive material) is coarsely cleaned, blotted dry, and placed into a vial prenumbered and filled to two-thirds with silica gel (b). On return of samples to the laboratory, part of the material in kimwipe for DNA analyses is recovered into 96-well plates in preparation for DNA extraction (c).
Fig. 2. Database for accumulating all of the metadata and images associated with each specimen for use in the field as a quick guide to what we have collected and as a storage mechanism for data until they are uploaded to BOLD.
5. All metadata and images are recorded in our database on the day of collecting to ensure the highest quality of maintenance for each record (Fig. 2), and to contribute to a growing fieldaccessible listing and picture guide of what we have collected from different areas. 3.5. Post-field Preparation for Microalgae (Diatoms)
For the microalgae, essentially unicells and simple colonies, we are currently using manual pipetting procedures to establish unialgal cultures (see Note 5) (we plan to trial cell sorting technologies in the near future).
3.6. DNA Extraction for Macroalgae
Using this method, DNA extraction from 95 samples can be accomplished in about 2 h. 1. Approximately 5 mm2 of each silica-dried sample is loaded into DNA extraction tubes (Fig. 1c) (see Subheading 2.1, item 1) (see Note 6). 2. A 5-mm stainless steel bead is added to each sample and the plate is sealed with 8-strip caps.
10
Methods for DNA Barcoding Photosynthetic Protists…
213
3. Samples are then ground using the Tissuelyser II (see Subheading 2.1, item 2) with two rounds of shaking at 20 Hz for 20 s followed by centrifugation at 2,000 × g for 2 min. 4. For brown algae, 1 ml of acetone is added to each sample after mechanical grinding (as above), followed by incubation at room temperature for 10 min with regular agitation. Samples are then centrifuged at 2,500 × g for 2 min and the acetone removed. The samples are allowed to air dry for 10 min. 5. Next, 200 (red and green algae) or 600 ml (brown algae) of taxon-appropriate DNA extraction buffer (see Subheading 2.1, items 3 and 4) is added to each sample (4). At the time of extraction, Tween-20 and Proteinase K are added (see Subheading 2.1, item 5). 6. Samples are incubated for 1 h at room temperature, placed on ice for 10 min, and centrifuged at 2,500 × g for 10 min at 4°C. 7. Samples are then moved to the deck of a Qiaxtractor DNA purification robot (see Subheading 2.1, item 6). Samples are purified using a modification of ref. 6. Briefly, 100 ml of each sample is robotically added to a well in a deep-well plate and combined with 200 ml binding buffer (see Subheading 2.2, item 1). 8. Each sample is mixed four times and moved to a well in a 1-ml Acroprep 96-well filter plate (see Subheading 2.1, item 7). 9. Vacuum is applied to the filter plate at 25 mmHg for 2 min to discard the filtrate. 10. Next, 200 ml of first wash buffer (see Subheading 2.2, item 2) is added to each well and vacuum applied for 1 min. 11. Next, 200 ml of second wash buffer (see Subheading 2.2, item 3) is added to each sample and vacuum applied for 1 min. 12. Next, 600 ml of second wash buffer (see Subheading 2.2, item 3) is added to each sample and vacuum applied for 2 min. 13. Vacuum is applied for an additional 5 min at 25 mmHg to dry the filter plate. 14. Finally, 200 ml of elution buffer (see Subheading 2.2, item 4) at 70°C is added to each well and the vacuum applied for 1 min to elute the DNA (see Note 7). 3.7. DNA Extraction for Diatoms
For diatoms, the brown algal DNA extraction buffer is used (see Subheading 2.1, items 4 and 5), but initially minus the Proteinase K. 1. The samples are not mechanically ground but are subjected to five rounds of freeze–thaw cycles (samples transferred between liquid nitrogen and a 65°C water bath). 2. Proteinase K is then added and the samples are incubated at 65°C for 20 min.
214
G.W. Saunders and D.C. McDevit
3. Samples are then subjected to incubation at 95°C for 5 min, allowed to cool to room temperature, and then placed at 4°C for 20 min. 4. The remainder of the extraction protocol continues as above (low-specimen protocols for microalgae are being actively explored in our laboratory). 3.8. PCR Profiles and Primers
Each 12.5 ml reaction includes 11.5 ml of PCR reaction mix (see Subheading 2.3, item 1) and 1 ml of template genomic DNA (see Note 8). Primer development remains the single most active area of ongoing protocol development in our laboratory (see Note 9). 1. PCR reaction mix: 1.25 ml 5× MgCl2-free buffer, 1.25 ml MgCl2 stock, 0.24 ml dNTPs (2.5 mM each), 0.125 ml forward primer (10 mM), 0.125 ml reverse primer (10 mM), 0.06 ml Ex Taq Polymerase (Takara, Shiga, Japan), 1 ml template, and 8.45 ml of molecular biology grade H2O (Fisher Scientific) per 12.5 ml reaction. 2. The thermal profile for PCR amplification for LSU D2/D3 includes an initial 5-min denaturation at 94°C, 38 cycles of 94°C for 30 s, 50°C annealing for 30 s, and 72°C extension for 1 min, followed by 72°C final extension for 7 min (8). For primers, see Fig. 3 (see Note 10). 3. The thermal profile for PCR amplification for COI-5P includes an initial denaturation at 95°C for 2 min followed by 5 cycles of 30-s denaturation at 95°C, 30-s anneal at 45°C, and 1-min extension at 72°C, followed by 35 cycles of 30 s denaturation at 95°C, 30 s anneal at 46.5°C and 1 min extension at 72°C, followed by an additional 7 min at 72°C and storage at 4°C (modified from ref. 7). For primers, see Fig. 3 (see Note 11). 4. The thermal profile for PCR amplification for rbcL-3P includes an initial denaturation at 95°C for 2 min, followed by 35 cycles of 94°C for 20 s, 50°C for 30 s, and 72°C for 2 min, and a final extension of 72°C for 7 min (8). For primers, see Fig. 3 (see Note 12). 5. The thermal profile for PCR amplification for tufA includes an initial 4-min denaturation at 94°C, 38 cycles of 94°C for 1 min, 45°C annealing for 30 s, and 72°C extension for 1 min, followed by 72°C final extension for 7 min (9). For primers, see Fig. 3 (see Note 13).
3.9. Sequencing
1. Each 10 ml reaction contains 9 ml of sequence reaction mix (see Subheading 2.4, item 1) and 1 ml of PCR product as template. 2. Sequence reaction mix: 0.5 ml 5× Big Dye master mix (ABI), 1.75 ml 5× Sequencing Buffer, 5.75 ml of molecular biology grade H2O (Fisher Scientific), and 0.16–0.33 mM primer (see Note 2) per 10 ml reaction and 1 ml PCR as above.
10
Methods for DNA Barcoding Photosynthetic Protists…
215
Fig. 3. A diagrammatic representation of our primer schemes for the barcode markers LSU D2/D3, COI-5P, rbcL-3P, and tufA, respectively. Primers in BOLD indicate the forward and reverse primers currently in the widest use in our laboratory for each marker.
3. The thermal profile for the sequencing reaction is 25 cycles: 10-s denaturation at 96°C, 15-s anneal at 50°C, and 2-min extension at 60°C, followed by storage at 4°C. 4. Products are precipitated and rehydrated in HiDi formamide (ABI) following the manufacturer’s recommended protocol and analyzed using an ABI 3130XL automated sequencer. 3.10. Sequence Data Editing and Management
1. We typically edit our sequence data in Sequencher™ 4.8 (Gene Codes Corporation, Ann Arbor, MI) and align them by eye using MacClade (Version 4.06) for OSX (10). We are also trialing the program Geneious (11), which combines the previous
216
G.W. Saunders and D.C. McDevit
functions with many additional features, such as tree construction and plugins for a variety of applications. 2. We routinely screen all sequences through the Barcode of Life Data Systems (BOLD; www.boldsystems.org) identification engine to check for contaminants and to match a specimen to the current genetic species groups, where a match exists in BOLD (see Note 14).
4. Notes 1. DNA barcoding is in essence an effort to identify a short region of the genome (typically <700 bp such that a single read on Sanger sequencers can read the full length of a barcode) that can be used to assign a biological specimen to a recognized species. The choice of genomic region must be “standardized and agreed upon.” Whereas the animal, fungal, and plant communities are dealing with clearly defined and relatively recently derived lineages on the tree of life, which potentially facilitates the task of identifying a suitable marker region (although this has not always been true, e.g., in the case of the embryophyte plants), protists represent numerous deeply branching lineages with evolutionary depth and breadth that dwarf the previously listed lineages. This presents a substantial challenge to the protist community in identifying a suitable strategy for DNA barcoding. One approach would be to deal with protists at the class or phylum level and let each group of researchers working on those groups assess and identify the best marker for their branch on the tree of life. This is the ideal model for the systematist concerned with the identification of every species in their lineage of study, but it also has disadvantages. Most importantly, the protist community typically studies broader (taxonomically speaking) aspects of biodiversity and biogeography through molecular ecological surveys of, for example, plankton tows or sediment grabs for which a universal marker is better suited even if it comes with some loss in species resolution. Another point worthy of consideration is that consistency with markers being identified for the other kingdoms (e.g., COI-5P for animals and rbcL + matK for embryophyte plants) should guide marker selection in the protist community to facilitate meaningful global barcode analyses (fewer markers across the tree of life will facilitate faster identification of unknown biological specimens). In our laboratory, we are advocating a two-marker system for protists (e.g., refs. 8, 9) using the LSU D2/D3 in combination with a lineage-specific marker (Table 1). The LSU
10
Methods for DNA Barcoding Photosynthetic Protists…
217
Table 1 Summary of primary and secondary barcode markers trialed and recommended for use in taxa currently under study in our laboratory Taxonomic group
Primary barcode marker
Secondary barcode marker
Brown algae (Phaeophyceae)
COI-5P (16)
LSU D2/D3 (advocated here)
Diatoms (Bacillariophyta)
rbcL-3P (8)
LSU D2/D3 (8)
Green algae (Chlorophyta)
tufA (9)
LSU D2/D3 (9)
Red algae (Rhodophyta)
COI-5P (1)
LSU D2/D3 (advocated here)
D2/D3 region has reasonable universality, which can easily be improved with further primer design, and is found in both heterotrophic and phototrophic protists (not true of plastid markers), as well as the amitochondrial lineages (for which mitochondrial markers are of no use). We consider that this marker should be used as the Primary Barcode Marker in those lineages for which it provides species resolution (discriminatory power) and as a Secondary Barcode Marker (i.e., it should be generated for at least one representative of each protist species regardless to the primary marker in use for routine species identification in that lineage) in all other lineages to facilitate global molecular ecology surveys (e.g., refs. 12, 13) and avoid some pitfalls of a single locus system in resolving taxonomic issues. For lineages that fail to achieve species-level resolution with the LSU D2/D3, the Primary Barcode Marker should be one that allows for species-level resolution and has good universality and sequence quality with an attempt where possible, at compatibility with DNA barcode markers for the animals, plants, and fungi. With these guiding principles in mind, we have established that COI-5P is a suitable primary barcode marker for representative red (Rhodophyta) (1, 14) and brown (Phaeophyceae) (15, 16) algae. In our laboratory, we have also investigated DNA barcoding in the diatoms and green algae (Chlorophyta) for which we are advocating the primary barcode markers rbcL-3P (8) and tufA (9), respectively. We should acknowledge that this two-marker strategy has not yet been formally submitted to the Consortium for the Barcode of Life (CBOL; the organization tasked with sanctioning official markers for the global barcode initiative—an important filter in the quest for standardization) for approval. However, we have developed this approach in consultation with other protistan barcode researchers, all of whom agree with the two(or more)-marker approach and of the importance of the LSU D2/D3 region in any system. Agreement on the
218
G.W. Saunders and D.C. McDevit
additional marker(s) for each taxonomic group has proven more contentious, but the principle of a primary and a secondary marker, as well as the necessity to minimize markers by using those sanctioned for other taxa (e.g., COI-5P), has received general acceptance (personal correspondence between the senior author and international protistan barcode researchers). Currently, BOLD accepts data for all of the markers that we advocate under the following labels: LSU D2/D3 data are entered under the BOLD marker “28S-D2—28S Ribosomal RNA Section D2” (note that there are a number of alternative LSU entries in BOLD); COI-5P data under “COI-5P— Cytochrome Oxidase Subunit 1 5¢ Region”; rbcL-3P data under “rbcL—Ribulose-1,5-Bisphosphate Carboxylase/Oxygenase Large Subunit” (not under “rbcLa,” which is specifically for the 5¢ region of the rbcL that is used as a barcode marker in land plants (which interestingly, for the algae we study at least, is less variable than the (albeit longer) 3¢ region advocated here for diatom barcoding)); and tufA data under “tufA—Elongation factor Tu (EF Tu).” For consistency, we recommend that the protist community uses these labels for the entry of data into BOLD for the markers advocated here. 2. Primer concentration for sequencing is dependent on primer composition. For primers containing less than three degenerate bases, 1 ml of 1.6 mM primer is used in each 10 ml reaction (final primer concentration 0.16 mM). For primers with three or more degenerate bases, 1 ml of 3.3 mM primer is used in each 10 ml sequencing reaction (final primer concentration 0.33 mM). 3. In order to facilitate the procurement of substantial numbers of collections, we have had to greatly modify typical field procedures associated with phycology. Whereas the plant press has long been the standard by which macroalgal vouchers have been prepared, we find that this method greatly reduces the number of samples that can be processed during a collecting trip, especially when the work is completed “on the road” rather than at a field station or other facility (e.g., with pressed vouchers, we might manage 150–200 samples per week, whereas with the method outlined here close to a 1,000 samples can be processed). Further, although the gross morphology is lost, the silica-dried vouchers have proven superior for sectioning to facilitate observations of anatomical attributes with light microscopy, which is in no small part explained by the fact that many of our presses start to mold in the field (extended damp rainy trips along the coast are not conducive with rapid drying of pressed vouchers). In combination with a good initial habit image, we find that our method provides adequate vouchering with superior material for both anatomical study and molecular procedures (previously, algal portions for DNA were thrown
10
Methods for DNA Barcoding Photosynthetic Protists…
219
into plastic bags with silica and were typically degraded to small fragments by the time they reached the laboratory). 4. We use scintillation vials (Fig. 1b), which can be conveniently ordered in boxes of 500 (5 racks of 100; Wheaton, Millville, NJ). A typical collecting trip will have us ship 1,500–3,000 prenumbered vials (3–6 boxes), two-thirds filled with silica, to our collection destination to be ready for our arrival. 5. The advantage of establishing cultures, for example relative to protocols that fix, photograph, and then grind up a cell for DNA, is that a living voucher is established that provides both material for DNA extraction (typically, harvested by centrifugation) and for detailed microscopic analyses and documentation of specific traits. Although laborious, this approach has the advantage of establishing a high-quality DNA barcode library for species that are amenable to culture (plus cultures can be cryopreserved for future scientific or applied use). The disadvantage is that many protists are not amenable to culturing. 6. To facilitate compatibility with high-throughput PCR and sequencing, samples are loaded by column, starting with Well A1 through H1 and continuing through well G12. Well H12 is left empty to act as a negative control. 7. Elution can be accomplished with warm water; however, the use of TE lessens any DNAse activity and aides in preserving the extracted DNA. DNA is eluted from the filter plate in 200 ml of liquid to allow for direct transfer to PCR plates without diluting the sample. 8. This PCR mastermix recipe is designed to end the reaction with a minimum of unincorporated primer and dNTPs. It allows for the direct sequencing of PCR product without the need for product cleaning. 9. Primer development for the markers advocated here is an ongoing process in our laboratory. We commonly receive correspondence from colleagues experiencing problems amplifying a particular taxon for one of the markers. Most often the problem is one that we have already encountered and is a primer specificity issue. Typically we instruct the user to try a more up-to-date primer pair. In many cases, primers that we have not used in years (e.g., GazF1 and GazR1 (1)) are still being employed in other laboratories. All of our primers are recorded on the BOLD database as they are developed and tested and are publicly assessable (enter any public project, which is typically taxon specific, and a list of the primers used in that study will be available). We strongly encourage all researchers to take advantage of this service on BOLD, as well as to contact the senior author to stay up to date on primer advances.
220
G.W. Saunders and D.C. McDevit
10. The LSU D2/D3 region is an ideal barcode marker from the perspective of universality, and it actually does reasonably well in most protistan groups for its discriminatory power and the quality of sequence data (8, 9, 17). The D2/D3 variable domain is flanked by highly conserved regions and we have successfully used the primer combination T16N versus T24U (T24 less universal) modified from refs. 18, 19 in red, brown, and some green macroalgae, as well as for the diatoms (Fig. 3). Interestingly, T16N, despite the taxonomic breadth observed in our studies, does not work for some green algae indicating that further primer development is necessary to identify the “perfect” primer pair (9). There is limited sequence data available outside of the target region of the LSU for the problem taxa, which has hampered primer design. T16C is a new primer that we are only now testing in the chlorophytan greens as the first step toward designing a novel and more universal primer to amplify this region from protists. 11. The COI-5P has excellent resolving power among species of red and brown algae, but designing universal primers has proven a substantial challenge. Over the past 5 years, we have designed and tested dozens of primer pairs and this is an ongoing activity in our laboratory. Currently, the combination GWSFn and GWSRx yields PCR amplification for most red algal lineages, as well as for some brown algae (Fig. 3). We typically start with this primer pair, although we are starting to test GWSRin as a more universal substitute to GWSRx. For problem red taxa, we variously trial the following combinations: GWSFn versus GWSRi; GWSFa (particularly Halymeniales) or GWSFt versus GWSRx or GWSRi; and GWSFi versus GWSRx. GWSFi and GWSRi typically fail to yield amplification product when used in combination. For the brown algae, we still rely heavily on GazF2 and GazR2, although for species that fail with this primer combination we are typically finding that the newer red algal primers (e.g., GWSFn and GWSRx) work well. We are, thus, slowly moving toward primers that work for both red and brown algae. Although we are not there yet, we are clearly on a path toward more universal COI-5P primers that will amplify this marker region from a diverse sampling of species from these two disparate algal lineages (in fact, we have used this primer combination successfully on a variety of invertebrate animals as well). 12. The primers used here for the rbcL-3P, CfD (8) and DPrbcL7 (20), work well across a diversity of diatoms. The apparent (testing is still ongoing) universality is likely a result of their application to what is essentially a class-level taxon relative to the taxonomic breadth that we are targeting for both the LSU D2/D3 and COI-5P. Nonetheless, the target region is sufficiently variable with the current primers anchored in relatively
10
Methods for DNA Barcoding Photosynthetic Protists…
221
conserved regions of this gene (Fig. 3). Thus, any future primer modifications should be easily managed for this marker in this group of algae. 13. To amplify tufA, we developed the primer tufGF4 to use with tufAR (21). In fact, four forward primers were initially designed for testing (Fig. 3), but tufGF4 worked so well (9) that we have not pursued the other primers in any meaningful trials. We have tested these primers on a wide variety of marine green macroalgae and in all cases (except the Cladophorales, which have not worked for any plastid or mitochondrial markers), they have yielded species-resolution equivalent to other markers currently in use for green algal work (e.g., rbcL), but without the hindrance of introns and with substantially better universality (single primer pair thus far for all taxa tested) and sequence quality (9). 14. This step is critical for our macroalgae because they are typically infested with epi- and endophytic biota from all of the major groups of life, which can be amplified in lieu of target sequence, especially where the primers used are a poor match to the desired target. Failure to complete this step rapidly diminishes BOLD’s ability to provide meaningful genetic matches to unknown sequence queries.
Acknowledgments Everyone who has worked in the Saunders laboratory since our digression into the realm of DNA barcoding is thanked for his/her contributions toward the overall objective of improved protocols, better primers, and preferred markers—it has truly been a group effort. This research was supported through funding to the Canadian Barcode of Life Network from Genome Canada through the Ontario Genomics Institute, NSERC, and other sponsors listed at www.BOLNET.ca. Additional support was provided by the Canada Research Chair Program, the Canada Foundation for Innovation, and the New Brunswick Innovation Foundation. References 1. Saunders GW (2005) Applying DNA barcoding to red macroalgae: a preliminary appraisal holds promise for future applications. Phil Trans R Soc B 360:1879–1888 2. Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications through DNA barcodes. Proc R Soc Lond B 270:313–322 3. Hebert PDN, Ratnasingham S, deWaard JR (2003) Barcoding animal life: cytochrome c
oxidase subunit 1 divergences among closely related species. Proc R Soc Lond B 270: S96–S99 4. Saunders GW (1993) Gel purification of red algal genomic DNA: an inexpensive and rapid method for the isolation of polymerase chain reaction-friendly DNA. J Phycol 29:251–254 5. Saunders GW, Kraft GT (1995) The phylogenetic affinities of Notheia anomala (Fucales,
222
6.
7.
8.
9.
10. 11.
12.
13.
14.
G.W. Saunders and D.C. McDevit Phaeophyceae) as determined from partial small-subunit rRNA gene sequences. Phycologia 34:383–389 Ivanova NV, Fazekas AJ, Hebert PDN (2008) Semi-automated, membrane-based protocol for DNA isolation from plants. Plant Mol Biol Rep 26:186–198 Ivanova NV, Zemlak TS, Hanner RH, Hebert PDN (2007) Universal primer cocktails for fish DNA barcoding. Mol Ecol Notes 7:544–548 Hamsher SE, Evans KM, Mann DG, Poulícková A, Saunders GW (2011) Barcoding diatoms: exploring alternatives to COI-5P. Protist 162: 405–422 Saunders GW, Kucera H (2010) An evaluation of rbcL, tufA, UPA, LSU and ITS as DNA barcode markers for the marine green macroalgae. Crypt Algol 31:487–528 Maddison WP, Maddison DR (2003) MacClade, version 4.06. Sinauer, Sunderland Drummond AJ, Ashton B, Buxton S, Cheung M, Cooper A, Heled J, Kearse M, Moir R, Stones-Havas S, Sturrock S, Thierer T, Wilson A (2010) Geneious v5.1. http://www. geneious.com Guillou L, Nézan E, Cueff V, Erard-Le Denn E, Cambon-Bonavita MA, Gentien P, Barbier G (2002) Genetic diversity and molecular detection of three toxic dinoflagellate genera (Alexandrium, Dinophysis, and Karenia) from French Coasts. Protist 153:223–238 McDonald SM, Sarno D, Zingone A (2007) Identifying Pseudo-nitzschia in natural samples using genus-specific PCR primers and clone libraries. Harmful Algae 6:849–860 Saunders GW (2008) A DNA barcode examination of the red algal family Dumontiaceae in Canadian waters reveals substantial cryptic
15.
16.
17.
18.
19.
20.
21.
species diversity. 1. The foliose Dilsea-Neodilsea complex and Weeksia. Botany 86:773–789 Kucera H, Saunders GW (2008) Assigning morphological variants of Fucus (Fucales, Phaeophyceae) in Canadian waters to recognized species using DNA barcoding. Botany 86:1065–1079 McDevit DC, Saunders GW (2009) On the utility of DNA barcoding for species differentiation among brown macroalgae (Phaeophyceae) including a novel extraction protocol. Phycol Res 57:131–141 Trobajo R, Mann DG, Clavero E, Evans KM, Vanormelingen P, McGregor RC (2011) The use of partial cox1, rbcL and LSU rDNA sequences for phylogenetics and species identification within the Nitzschia palea species complex (Bacillariophyceae). Eur J Phycol 45: 413–425 Harper JT, Saunders GW (2001) The application of sequences of the ribosomal cistron to the systematics and classification of the florideophyte red algae (Florideophyceae, Rhodophyta). Cah Biol Mar 42:25–38 Harper JT, Saunders GW (2001) Molecular systematics of the Florideophyceae (Rhodophyta) using nuclear large- and small-subunit rDNA sequence data. J Phycol 37:1073–1082 Levialdi Ghiron JH, Amato A, Montresor M, Kooistra WH (2008) Plastid inheritance in the planktonic raphid pinnate diatom Pseudonitzschia delicatissima (Bacillariophyceae). Protist 159:91–98 Fama P, Wysor B, Kooistra W, Zuccarello GC (2002) Molecular phylogeny of the genus Caulerpa (Caulerpales, Chlorophyta) inferred from chloroplast tufA gene. J Phycol 38: 1040–1050
Chapter 11 DNA Barcoding Methods for Land Plants Aron J. Fazekas, Maria L. Kuzmina, Steven G. Newmaster, and Peter M. Hollingsworth Abstract DNA barcoding in the land plants presents a number of challenges compared to DNA barcoding in many animal clades. The CO1 animal DNA barcode is not effective for plants. Plant species hybridize frequently, and there are many cases of recent speciation via mechanisms, such as polyploidy and breeding system transitions. Additionally, there are many life-history trait combinations, which combine to reduce the likelihood of a small number of markers effectively tracking plant species boundaries. Recent results, however, from the two chosen core plant DNA barcode regions rbcL and matK plus two supplementary regions trnH–psbA and internal transcribed spacer (ITS) (or ITS2) have demonstrated reasonable levels of species discrimination in both floristic and taxonomically focused studies. We describe sampling techniques, extraction protocols, and PCR methods for each of these two core and two supplementary plant DNA barcode regions, with extensive notes supporting their implementation for both low- and high-throughput facilities. Key words: DNA barcoding, Plant field collecting, Plant DNA extraction, PCR amplification, Cycle sequencing, rbcL, matK, trnH–psbA, Internal transcribed spacer
1. Introduction The land plants encompass an enormous diversity of form and function. They consist of the seed plants (angiosperms and gymnosperms), along with the bryophytes (mosses, hornworts, and liverworts), ferns, and fern allies. Estimates of total species numbers vary greatly among authors (1–3), but a recent estimate has suggested that there are approximately 380,000 species of land plants, comprising ca. 352,000 species of angiosperms, ca. 1,300 species of gymnosperms, and ca. 13,000 species each of bryophytes and ferns/fern allies (4). The standard animal DNA barcode comprising a portion of the mitochondrial gene CO1 evolves too slowly in plants to serve as a
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_11, © Springer Science+Business Media, LLC 2012
223
224
A.J. Fazekas et al.
useful DNA barcode (5). This has led to the search for an equivalent DNA barcode for land plants. The primary focus of this search has been the plastid genome, with many authors recognizing that multiple regions are required (5–12). Selecting a standard plant DNA barcode has been difficult, as all of the various candidate loci have different strengths and weaknesses, with no clear-cut front runners. In a community-authored paper, the combination of portions of the plastid regions rbcL and matK was suggested as the core DNA barcode for land plants (13) and subsequently provisionally adopted by the Consortium for the Barcode of Life. In addition to this core DNA barcode, other loci are often required to increase the levels of species resolution. At the 2009 International Barcode of Life Conference in Mexico City, it was recommended that the community continue to gather data from additional DNA barcoding loci to establish whether other loci should be formally incorporated into the plant DNA barcode. The two most widely used supplementary loci are the plastid intergenic spacer trnH–psbA (one of the leading contenders for the core plant DNA barcode) and the nuclear ribosomal internal transcribed spacers (ITS). The nuclear ribosomal ITS regions had previously been discounted as a standard DNA barcode due to concerns over paralogy and the presence of putative pseudogenes which lead to sequencing difficulties in many plant groups (e.g., refs. 14–18). However, the increased resolution of ITS over plastid DNA barcodes in many studies (e.g., ref. 19) suggests that it should continue to be explored as part of the plant DNA barcode, and some authors have noted that just using a subset of the ribosomal cassette (ITS2) can lead to greater amplification and sequencing success compared to the entire ITS region (20). We, therefore, include methods for all four of these regions [rbcL, matK, trnH–psbA, and ITS (including ITS2)] to provide the maximum utility to users of plant DNA barcoding. Details of other loci that have been used in plant DNA barcoding studies can be found elsewhere (e.g., refs. 5, 11, 13, 21). It should be noted that levels of species discrimination in plants with standard DNA barcoding loci are in general lower than those obtained by CO1 in many animal groups (22). This is in part due to the lower rate of nucleotide substitution in the plastid genome, but also due to other reasons, including hybridization, polyploidy, speciation via breeding system transitions, species defined on very narrow taxon concepts, large ancestral population sizes, and low levels of intraspecific gene flow for plastid markers (23, 24). These issues are not evenly distributed among all plant groups; therefore, it is expected that resolution at the species level will be reasonably good in some groups and quite poor in others. In floristic contexts where geographical limitation usually restricts the number of closely related species, rates of species discrimination are expected to be greater (e.g., refs. 25, 26). Methods are invariably open to improvement from a variety of sources, and there are often many ways to achieve the same result.
11
DNA Barcoding Methods for Land Plants
225
For example, the reader may have a different way of drying plant samples or prefer to do PCR in larger reaction volumes. Where multiple methods are commonly in use, we attempt to provide details for each. The notes provided in the last section illuminate some of the principles that the methods we have provided aim to achieve. Some of the methods provided have been optimized to be cost-efficient, and are those currently in use at the Canadian Centre for DNA Barcoding (http://www.ccdb.ca/pa/ge/ research/protocols).
2. Materials 2.1. Field Collecting
1. Field press with blotting paper and spacers for voucher preparation. 2. Jewelry tags for labeling. 3. Silica gel (with 10–30% indicating silica beads). 4. Waterproof markers or pens. 5. Container(s) for silica drying of tissue, e.g., 20-ml scintillation vials, sealable plastic whirl-packs, zip-lock bags, or coin envelopes/tea bags that can be placed in a sealable container.
2.2. Tissue Sample Storage
1. Use of a climate-controlled facility if available or airtight containers filled with silica gel desiccant to archive tissue samples.
2.3. Tissue Subsampling for DNA Extraction
1. Grinding beads: for example, stainless steel 440C 3.17 mm beads. 2. Small forceps. 3. Latex or nitrile disposable gloves. 4. Ethanol: 100%. 5. ELIMINase®, DNA AWAY®, or a similar product. 6. Alcohol burner. 7. For single tube-based extractions: 2-ml screw-cap tubes with O-ring seals that are strong enough to withstand the homogenization process without breaking. 8. For plate-based extractions: Racked sterile mini tube strips with cap strips (e.g., PROgene® Mini Tube System 1.1 ml 8 Strip Pre-sterilized Mini Tube and sterile cap strips).
2.4. DNA Extraction: Single Sample-Based Extraction: Commercial Kits
1. Equipment for tissue grinding: for example, FastPrep® or TissueLyser with tube adaptor. 2. Microcentrifuge with a rotor for 2-ml tubes. 3. Vortex.
226
A.J. Fazekas et al.
4. Ethanol: 100%. 5. Heating block/incubator capable of heating to 70°C. 6. Pipettes and pipette tips. 7. 1.5- or 2-ml microcentrifuge tubes. 8. Individual tube-based DNA extraction kit. 9. Latex or nitrile disposable gloves. 10. ELIMINase®, DNA AWAY®, or a similar product. 2.5. DNA Extraction: Single Sample Extraction: Non-kitBased Method (Adapted from Ref. 26)
1. ELIMINase®, DNA AWAY®, or a similar product. 2. Silica-membrane spin columns (e.g., EconoSpin® mini spin columns, Epoch Life Science Inc.). 3. Equipment for tissue grinding: FastPrep® or TissueLyser with tube adaptor. 4. Microcentrifuge with a rotor for 2-ml tubes. 5. Vortex. 6. Ethanol: 100%. 7. Molecular biology grade water. 8. Heating block/incubator capable of heating to 70°C. 9. Pipettes and pipette tips. 10. Latex or nitrile disposable gloves. 11. 1.5- and 2-ml microcentrifuge tubes. 12. CTAB lysis buffer: 2% cetyltrimethylammonium bromide (CTAB), 100 mM Tris–HCl pH 8.0, 20 mM EDTA, and 1.4 M NaCl. 13. Binding buffer: 5 M guanidine thiocyanate, 20 mM EDTA pH 8.0, 10 mM Tris–HCl pH 6.4, and 4% Triton® X-100. 14. First wash buffer: 50% ethanol, 3 M GuSCN, 10 mM EDTA pH 8.0, 5 mM Tris–HCl pH 6.4, and 2% Triton® X-100. 15. Second wash buffer: 60% ethanol, 50 mM NaCl, 10 mM Tris– HCl pH 7.4, and 0.5 mM EDTA pH 8.0.
2.6. DNA Extraction: Plate-Based Extraction (96 Samples): Commercial Kits
1. Equipment for tissue grinding (e.g., TissueLyser with plate adaptor). 2. Centrifuge with a deep-well swinging bucket rotor capable of achieving 5,600–6,000 × g force. 3. Ethanol: 100%. 4. Incubator capable of heating to 70°C. 5. Pipettes and pipette tips. 6. Latex or nitrile disposable gloves. 7. ELIMINase®, DNA AWAY®, or a similar product.
11
DNA Barcoding Methods for Land Plants
227
8. 96-well microplate. 9. Reagent reservoirs (100 ml). 10. Plate-based DNA extraction kit. 2.7. DNA Extraction: Plate-Based Extraction (96 samples): Non-kitBased Method (Adapted from Ref. 26)
1. 96-well microplate. 2. AcroPrep™ 96 1 ml filter plate with 1.0 μm Glass Fiber media (PALL Life Sciences). 3. Equipment for tissue grinding: FastPrep® or TissueLyser with tube adaptor. 4. Centrifuge with a deep-well swinging bucket rotor capable of achieving 5,600–6,000 × g force. 5. Vortex. 6. Orbital Shaker for microplates. 7. Laboratory tape. 8. Molecular biology grade water. 9. Ethanol: 100%. 10. Incubator capable of heating to 70°C. 11. Pipettes and pipette tips. 12. Latex or nitrile disposable gloves. 13. ELIMINase®, DNA AWAY®, or a similar product. 14. CTAB lysis buffer: 2% cetyltrimethylammonium bromide (CTAB), 100 mM Tris–HCl pH 8.0, 20 mM EDTA, and 1.4 M NaCl. 15. Binding buffer: 5 M guanidine thiocyanate, 20 mM EDTA pH 8.0, 10 mM Tris–HCl pH 6.4, and 4% Triton® X-100. 16. First wash buffer: 50% ethanol, 3 M GuSCN, 10 mM EDTA pH 8.0, 5 mM Tris–HCl pH 6.4, and 2% Triton® X-100. 17. Second wash buffer: 60% ethanol, 50 mM NaCl, 10 mM Tris– HCl pH 7.4, and 0.5 mM EDTA pH 8.0. 18. Square-well block PALL collar (PALL Life Sciences). 19. Square-well block.
2.8. PCR
1. D-(+)-Trehalose dehydrate: 10 and 20% solutions. 2. 10× Polymerase Chain Reaction (PCR) Buffer, without Mg (Invitrogen). 3. Magnesium chloride: 50 mM solution. 4. Molecular biology grade water. 5. Latex or nitrile disposable gloves. 6. Pipettes and pipette tips. 7. Deoxynucleotide solution mix: 10 mM. 8. Oligonucleotide primers (Table 1).
a
Trehalose buffer
10× Buffer
MgCl2
dNTPs
Forward primer
Reverse primer
Polymerase
Second
Third
Fourth
Fifth
Sixth
Seventh
Eighth
Total volume of reaction
DNA (30–50 ng/μl)
Recommended amount to mix for a 96-well plate
Last
Molecular-grade water
First
Total volume of PCR mix
Component
Order to add PCR components
0.05 mM 0.1 μM 0.1 μM 0.025 U/μl
10 μM 10 μM 5 U/μl
2.5 mM
50 mM 10 mM
1×
5%
Final concentration
10×
10%
Stock concentration
Table 1 General PCR mix for rbc L, ITS, ITS2, and trn H–psb A
12.5
2
10.5
0.0625
0.125
0.125
0.0625
0.625
1.25
6.25
2
Volume for 1 reaction (ml)
1,050
6.25
12.5
12.5
6.25
62.5
125
625
200
Volume for 100 reactionsa (ml)
228 A.J. Fazekas et al.
11
DNA Barcoding Methods for Land Plants
229
9. Platinum Taq DNA Polymerase (Invitrogen). 10. PCR 96-well microplate. 11. Aluminum Sealing Film (Axygene Scientific, VWR). 12. Clear Sealing Film (Axygene Scientific, VWR). 13. Thermocycler. 14. Microcentrifuge. 15. Centrifuge with a swinging bucket rotor for microplates. 16. PCR workstation. 2.9. PCR Product Determination: Precast E-gel Method
1. Precast agarose gel (e.g., 2% E-gel, Invitrogen). 2. E-Base. 3. Reagent reservoir. 4. Molecular biology grade water. 5. Latex or nitrile disposable gloves. 6. Pipette and pipette tips. 7. Gel imaging system.
2.10. PCR Product Determination: Routine Agarose Gels
1. Gel rig and combs. 2. Agarose. 3. Latex or nitrile disposable gloves. 4. Pipette and pipette tips. 5. Gel imaging system. 6. 1× TBE buffer: 90 mM Tris base, 90 mM boric acid, 2 mM EDTA. 7. DNA stain: Ethidium bromide or equivalent (e.g., SYBR® Safe DNA gel stain, Invitrogen). 8. Gel loading solution (e.g., Gel loading solution Sigma G7654) * if not already in the PCR mixture. 9. Size standard (e.g., 1 kb DNA ladder). 10. Power supply.
2.11. Cycle Sequencing
1. D-(+)-Trehalose dehydrate: 10% solution. 2. 5× Sequencing Buffer: 400 mM Tris–HCl pH 9.0, 10 mM MgCl2. 3. Molecular biology grade water. 4. Latex or nitrile disposable gloves. 5. Pipette and pipette tips. 6. 96-well PCR microplate. 7. Aluminum sealing foil.
230
A.J. Fazekas et al.
8. Clear sealing film. 9. Microcentrifuge. 10. Thermocycler. 11. Centrifuge with a swinging bucket rotor for microplates. 12. PCR workstation. 13. Oligonucleotide primer: 10 μM. 14. BigDye™ Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems). 2.12. Cycle Sequencing Reaction Cleanup and Processing for an ABI 3730xl Capillary Sequencer
1. Sephadex® G50 (Sigma). 2. Acroprep™ 96 Filter plate, 0.45 μM GHP (PALL Corporation Catalog No. 5030). 3. Molecular biology grade water. 4. Latex or nitrile disposable gloves. 5. Pipette and pipette tips. 6. Septum (Applied Biosystems). 7. Black plate base (Applied Biosystems). 8. White plate retainer (Applied Biosystems). 9. Pop-7™ Polymer for 3730xl DNA Analyzers (Applied Biosystems). 10. 3730xl DNA Analyzer Capillary Array, 50 cm (Applied Biosystems). 11. 10× Running buffer for 3730xl DNA Analyzers (Applied Biosystems). 12. MicroAmp 96-well reaction plate (Applied Biosystems).
3. Methods 3.1. Field Collecting
1. Prior to going to the field, dispense the silica gel into scintillation vials (~2/3–3/4 full), plastic bags (~15 ml of silica), or a 1-L container (~15% full) for coin envelopes or tea bags. 2. Harvest the plant: whole plant if small, or a branch with leaves from woody shrubs or trees. 3. Place the voucher in the field press such that identifying features (flowers, fruits, both sides of leaves) can be easily inspected when dried. 4. Identify the voucher with a unique collecting number, either with a jewelry tag attached to the voucher or by writing on the paper the sample is pressed in.
11
DNA Barcoding Methods for Land Plants
231
5. Take a small amount of leaf tissue (3–10 cm2; see Notes 1–6), and place in either: the scintillation vial containing silica gel, the plastic bag containing silica gel, or a coin envelope/tea bag, which is placed in the 1-L container with silica gel. 6. Label the container or coin envelope with the same collection number as the voucher. 3.2. Sample Storage
1. Store the tissue samples in a dry location or retain in silica until ready to subsample for DNA extraction (see Note 7).
3.3. Tissue Subsampling: For DNA Extraction Using Single Tubes
1. Clean the bench working area with ELIMINase®, DNA AWAY®, or a similar product. 2. With clean gloves and forceps, add one clean grinding bead to each tube and recap tubes. 3. Sterilize the forceps by dipping them in alcohol and flaming them. 4. Open a container with the sample, break off a piece of leaf or find a piece of the right size (see Notes 8–13), and insert it into a tube. 5. Label the tube with the collection number. 6. Clean the forceps by dipping them in alcohol and flaming, and then repeat step 4 for the remaining samples. 7. Change gloves often (or any time, you feel that they may have become contaminated).
3.4. Tissue Subsampling: For DNA Extraction Using 96-Well Plate Format
1. At a computer, organize the sample names in a spreadsheet in the plate format (8 rows × 12 columns). A good practice is to organize samples such that different genera are in adjacent wells. This facilitates the detection of cross contamination. 2. In the lab, clean the bench working area with ELIMINase®, DNA AWAY®, or a similar product. 3. With clean gloves and forceps, add one clean grinding bead to each tube in the plate, and add the strip caps to the tubes. 4. Organize the physical tissue samples in silica gel (vials, bags, or coin envelopes) on the bench, in columns and rows corresponding with the spreadsheet created in step 1. 5. Work with one strip of eight tubes (each corresponding to a numbered column) at a time. Remove one set of eight tubes to a new holder to physically separate the eight tubes being filled from the others. 6. Remove the lids from the strip of eight tubes and put them somewhere where they will not be contaminated by any flying plant material (e.g., between two kimwipes or on a kimwipe covered with a plastic lid).
232
A.J. Fazekas et al.
7. Sterilize the forceps by dipping them in alcohol and flaming them. 8. Open the container with the sample, break off a piece of leaf or find a piece of the right size (see Notes 8–13), and insert it into the correct tube. Pieces of plant tissue that are linear in shape (e.g., grass leaves and stems, conifer needles) need to be broken into smaller pieces to achieve proper homogenization using the grinding beads. 9. Clean the forceps by dipping them in alcohol and flaming, and wipe the gloves with a kimwipe moistened with ethanol in order to remove any plant tissue. 10. Repeat steps 8 and 9 for the remaining seven samples in the column. 11. Once the column of eight tubes is loaded, discard the gloves and put on new ones. 12. Attach the clean strip cap to the tubes, making sure that the lids are on tightly (they may pop off if not pushed all the way on). 13. Repeat the process from step 5, changing gloves after each set of eight tubes (or any time, you feel that they may have become contaminated). 3.5. Tissue Disruption
1. Homogenize the plant material with the grinding bead using a FastPrep®, TissueLyser, or a similar instrument: for the TissueLyser, apply 28 Hz for 30 s, then rotate the adaptors, and repeat once (or a maximum of two more times if necessary to obtain good disruption) (see Note 14). 2. Briefly centrifuge the tubes or the plate of strip tubes after homogenization to limit the amount of material stuck to the cap (see Note 15).
3.6. DNA Extraction: Kit-Based Protocols
1. For kit-based instructions.
methods,
follow
the
manufacturer’s
3.7. DNA Extraction: For Non-kit, Single Sample-Based Methods (Adapted from Ref. 26)
1. Carefully remove the screw caps from each tube and discard the caps. Powderized plant tissue will be adhered to the cap and will easily dislodge if the caps are not handled carefully (see Note 15). 2. Dispense 200 μl of CTAB lysis buffer to each tube and recap the tubes with new caps. 3. Gently invert each tube in order to mix the powderized plant material with the lysis buffer, and briefly centrifuge the tubes for 1,000 × g force for 1 min to collect the sample to the bottom. 4. Incubate the samples for 1 h at 65°C with occasional mixing by inversion.
11
DNA Barcoding Methods for Land Plants
233
5. Centrifuge the tubes at 1,500 × g force for 1 min. 6. Remove the caps and transfer 50 μl of lysate from each sample to a new 1.5-ml microcentrifuge tube (see Note 16). 7. Add 100 μl of binding buffer to each tube with lysate. 8. Immediately after addition of the binding buffer, carefully and slowly mix three to four times by aspirating and dispensing 100 μl. 9. Transfer 150 μl of each lysate into a spin column, placed in a 1.5-ml microcentrifuge tube, and close the cap on the spin column. 10. Centrifuge at 5,000 × g force for 5 min to bind the DNA to the membrane of the spin column. 11. Add 200 μl of the first wash buffer to each spin column. 12. Centrifuge at 5,000 × g force for 2 min. 13. Remove the spin column from the tube, discard the flow through, and replace the spin column in the tube. 14. Add 500 μl of the second wash buffer to the spin column. 15. Centrifuge at 5,000 × g force for 5 min. 16. Remove the spin column from the tube and discard the tube and contents. 17. Open the cap of the spin column, place the spin column on the lid of a tip box, and incubate at 56°C for 30 min to evaporate residual ethanol. 18. Place the spin column in a new 1.5-ml microcentrifuge tube. 19. Add 50 μl of ddH2O (at 56°C) to the center of the spin column. 20. Incubate at room temperature for 1 min. 21. Centrifuge at 5,000 × g force for 5 min to collect the DNA eluate. 22. Remove the spin column and discard it. 23. Store the DNA at 4°C for short-term storage or at −20°C (preferably at −80°C) for long-term storage. 3.8. DNA Extraction: For Non-kit, PlateBased Methods (Adapted from Ref. 26)
1. Remove one set of strip tubes to a separate holder for cap removal and addition of CTAB lysis buffer. 2. Carefully remove the strip of caps using each individual cap tab to pull the cap off the tube, and discard the strip caps. Powderized plant tissue will be adhered to the cap and will easily dislodge if the caps are not handled carefully (see Note 15). 3. Dispense 200–350 μl of CTAB lysis buffer to each tube (depending on the amount of sample) and recap the tubes with a new strip cap.
234
A.J. Fazekas et al.
4. Repeat steps 2 and 3 for the remaining 11 sets of strip tubes. 5. Use tape to tightly seal the caps on the tubes (which may otherwise pop off during incubation). 6. Gently invert the rack of tubes once to mix the powderized plant material with the lysis buffer. 7. Briefly centrifuge the tubes at 1,000 × g force for 1 min to collect the sample to the bottom. 8. Incubate the samples for 1 h at 65°C using shaker (80–100 rpm). Do not invert rack. 9. Centrifuge the plate at 1,500 × g force for 1 min. 10. Remove the strip caps and transfer 50 μl of lysate from each sample to the corresponding position of a 96-well microplate (see Note 16). 11. Add 100 μl of binding buffer to each well. 12. Immediately after addition of the binding buffer, carefully and slowly mix three to four times by aspirating and dispensing 100 μl. 13. Transfer 150 μl of each lysate into a well in a 1 ml Acroprep™ 96-well glass fiber plate, placed on a 2-ml square-well block (see Note 17). 14. Seal the glass fiber plate with clear PCR film. 15. Centrifuge at 5,000 × g force for 5 min to bind the DNA to the glass fiber membrane. 16. Remove the PCR film and add 200 μl of the first wash buffer to each well of the glass fiber plate. 17. Seal the plate with clear PCR film and centrifuge at 5,000 × g force for 2 min. 18. Remove the PCR film and add 750 μl of the second wash buffer to each well of the glass fiber plate. 19. Seal the plate with clear PCR film and centrifuge at 5,000 × g force for 5 min. 20. Remove the seal, place the glass fiber plate on the lid of a tip box, and incubate at 56°C for 30 min to evaporate residual ethanol. 21. Position a collar on the collection microplate (optional) and place the glass fiber plate on top. 22. Add 50 μl of ddH2O (at 56°C) to each well of the glass fiber plate. 23. Seal the glass fiber plate with clear PCR film. 24. Incubate at room temperature for 1 min. 25. Place the assembled glass fiber plate and microplate on top of a square-well block to prevent cracking of the collection plate and centrifuge at 5,000 × g force for 5 min to collect the DNA eluate.
11
DNA Barcoding Methods for Land Plants
235
26. Remove the glass fiber plate and retain it at −20°C as a backup until the extraction is determined to be successful, after which it can be discarded. 27. Cover the DNA plate with aluminum sealing film and store at 4°C for short-term storage or at −20°C (preferably at −80°C) for long-term storage. 3.9. PCR: PCR Mixture
1. Prepare and label a 1.5-ml microcentrifuge tube for the PCR cocktail of 100 reactions (Table 1). This number of reactions is recommended when using a 96-well plate to accommodate pipetting error. 2. Defrost all components of the cocktail at room temperature, except the polymerase which has to be kept at −20°C at all times prior to use. 3. Prepare the PCR cocktail adding the components in order listed in Tables 1–3 (see Notes 18–21). See also Table 4 for the standard primers for amplification of rbcL, matK, ITS, ITS2, and trnH–psbA. 4. Vortex the mix and centrifuge at 1,000 × g force briefly. 5. Dispense 10.5 μl of the PCR cocktail in each well using the same tip [replace tip occasionally (every 16 wells) to reduce pipetting error]. 6. Add 2 μl of the sample DNA (30–50 ng/μl) to each well. Leave one or two wells blank as a negative control. Use a fresh tip for each DNA sample. 7. Seal the plate tightly with aluminum foil (using a roller to seal) or thermo-seal cover (apply heat to seal) (see Note 22). 8. Centrifuge the plate at 1,000 × g force for 1 min (see Note 23). 9. Place the plate into the thermo-cycling block, close it, and apply the appropriate PCR program.
3.10. PCR Thermal Cycling Programs
1. rbcL, trnH–psbA (see Notes 24 and 25): 94°C for 4 min; 35 cycles of 94°C for 30 s, 55°C for 30 s, 72°C for 1 min; final extension 72°C for 10 min. 2. trnH–psbA for ferns and allies, and bryophytes (see Note 25): 94°C for 4 min; 2 cycles of 94°C for 45 s, 50°C for 45 s, 72°C for 1 min; 35 cycles of 94°C for 45 s, 45°C for 45 s, 72°C for 1 min; final extension 72°C for 10 min. 3. trnH–psbA using Phusion polymerase (see Note 26, Table 3): 98°C for 45 s; 35 cycles of 98°C for 10 s, 64°C for 30 s, 72°C for 40 s; final extension 72°C for 10 min. 4. matK first round (matK-KIM1R/matK-KIM3F) (see Note 27): 94°C for 1 min; 35 cycles of 94°C for 30 s, 52°C for 20 s, 72°C for 50 s; final extension 72°C for 5 min.
Polymerase
Eighth
0.5 μM 0.5 μM 0.1 U/μl
10 μM 10 μM 5 U/μl
0.15
0.375
0.375
0.15
7.5
Reverse primer
Seventh
0.2 mM
10 mM
0.225
Total volume of reaction
Forward primer
Sixth
Recommended amount to mix for a 96-well plate
a
dNTPs
Fifth
1.5 mM
50 mM
0.75
1.875
1
MgCl2
Fourth
1×
5%
10×
20%
2.60
Volume for 1 reaction (ml)
DNA (3–5 ng/μl)
10× Buffer
Third
Last
Trehalose buffer
Second
Final concentration
6.5
Molecular-grade water
First
Stock concentration
Total volume of PCR mix
Component
Order to add PCR components
Table 2 PCR mix for mat K
650
15.0
37.5
37.5
15.0
22.5
75
187.5
260
Volume for 100 reactionsa (ml)
236 A.J. Fazekas et al.
Polymerase
Seventh
0.025 U/μl
2 U/μl
b
Recommended amount to mix for a 96-well plate Note that in limited trials HF buffer does not appear to be compatible with trehalose
a
0.1 μM
10 μM
0.125
0.1
0.1
10
Reverse primer
Sixth
0.1 μM
10 μM
0.056
2
Total volume of reaction
Forward primer
Fifth
0.056 mM
10 mM
1×
1
dNTPs
Fourth
5×
6.32
0.3
Volume for 1 reaction (ml)
DNA (30–50 ng/μl)
HF buffer (containing 1.5 mM MgCl2)b
Third
Last
Molecular-grade water
Second
3%
Final concentration
9
DMSO
First
Stock concentration
Total volume of PCR mix
Component
Order to add PCR components
Table 3 PCR mix for use with Phusion polymerase
900
12.5
10
10
5.6
200
632
30
Volume for 100 reactionsa (ml)
11 DNA Barcoding Methods for Land Plants 237
AB101 AB102
psbAF trnH2 psbA trnH(GUG) psbA501f
ITS
trnH–psbA
GTTATGCATGAACGTAATGCTC CGCGCATGGTGGATTCACAATCC CGAAGCTCCATCTACAAATGG ACTGCCTTGATCCACTTGGC TTTCTCAGACGGTATGCC
ACGAATTCATGGTCCGGTGAAGTGTTCG TAGAATTCCCCGGTTCGCTCGCCGTTAC
ATGCGATACTTGGTGTGAAT TCCTCCGCTTATTGATATGC
R F R F R
F R
F R
F R F R F R F R
R
R
F
Direction
Sang et al. (29) Tate and Simpson (28) Hamilton (30) Hamilton (30) Cox et al. (31)
Sun et al. (39) Sun et al. (39)
Chen et al. (20) White et al. (38)
Ki-Joong Kim, personal communication Ki-Joong Kim, personal communication Cuenoud et al. (37) Cuenoud et al. (37) Damon Little, personal communication Damon Little, personal communication Fazekas et al. (5) Fazekas et al. (5)
Fazekas et al. (5)
Levin et al. (35), modified from Soltis et al. (34) Kress and Erickson (24), modified from Fofana et al. (36)
References
See Notes 24–27 for primer usage and alternatives. As different authors use different conventions as to what constitutes “forward” and what constitutes “reverse” primers, the notation of F and R on primer names can mean different things. This is particularly problematic for matK and trnH–psbA. The “Direction” column indicates primer orientation with reference to the direction of the reading frame of rbcL and matK and following the convention of clockwise orientation for trnH–psbA
ITS-S2F ITS4
ITS2
rbcLajf634R ACCCAGTCCATCTGGAAATCTTGGTTC CGTACAGTACTTTTGTGTTTACGAG CGATCTATTCATTCAATATTTC TCTAGCACACGAAAGTCGAAGT CTGGATYCAAGATGCTCCTT GGTCTTTGAGAAGAACGGAGA CCCTATTCTATTCAYCCNGA CGTATCGTGCTTTTRTGYTT
GAAACGGTCTCTCCAACGCAT
rbcLa-R
matK-KIM1R matK-KIM3F matK-390f matK-1326r NY552F NY1150R matKpkF4 matKpkR1
GTAAAATCAAGTCCACCRCG
rbcLa-F
rbcL
matK
ATGTCACCACAAACAGAGACTAAAGC
Primer name
Region
Sequence (5′–3′)
Table 4 Primers commonly used for DNA barcoding in plants
238 A.J. Fazekas et al.
11
DNA Barcoding Methods for Land Plants
239
5. matK second-round failure tracking (matK-390f/matK-1326r) (see Note 27): 94°C for 1 min; 35 cycles of 94°C for 30 s, 50°C for 40 s, 72°C for 40 s; final extension 72°C for 5 min. 6. ITS (AB101/AB102) (see Note 28): 94°C for 5 min; 30 cycles of 94°C for 1 min, 55°C for 1 min, 72°C for 1 min, 45 s; final extension 72°C for 10 min. 7. ITS2 (ITS-S2F/ITS4): 94°C for 5 min; 35 cycles of 94°C for 30 s, 56°C for 30 s, 72°C for 45 s; final extension 72°C for 10 min. See Note 29 in situations, where results from PCR are unsuccessful or poor. 3.11. PCR Product Determination: Electrophoresis with Precast E-gels
1. Open the package with precast agarose gel (see Note 30), remove the plastic comb, and place the gel on the mother E-base. 2. Set the mother E-base at “EG” program and a runtime of 4 min. 3. Load 14 μl of molecular-grade water into each well of the 96-well precast agarose gel. 4. Load 3–4 μl of each PCR product into the corresponding E-gel well. 5. Slide E-gel into electrode connections of mother E-base and start electrophoresis. A green light indicates the beginning of run. A red light and beeping indicate the end of run. Stop the current by pressing pwr/prg button. 6. Remove E-gel from base and capture a digital image with the imaging documentation system.
3.12. PCR Product Determination: Electrophoresis with Routine Agarose Gels
There is a large selection of gel combs and trays on the market designed to accommodate different numbers of samples. Please refer to the manufacturer’s notes for the recommended volume of agarose to be used. 1. Select the appropriate gel tray and combs for the number of samples to be run (leaving an appropriate number of wells free for size standards). Seal the ends of the tray with masking tape or use a gel-forming cassette. 2. Weigh out the agarose and place in a glass conical flask. To check PCR success, a 1% agarose gel is used; 1% agarose gel = 1 g of agarose per 100 ml of 1× TBE buffer. 3. Add the appropriate volume of 1× TBE buffer to the agarose and gently swirl. 4. Heat the solution in a microwave on maximum heat setting for approx. 30 s, remove flask from the microwave, and gently swirl to mix. Continue to heat, mixing occasionally. Carefully
240
A.J. Fazekas et al.
remove the solution from the microwave, gently swirl, and check that all the agarose has dissolved. 5. Place the gel on the bench and leave to cool (or cool under cold running water), until it is comfortable to touch the side of the conical flask. 6. Add the appropriate volume of DNA gel stain (see Note 30). For Sybrsafe, this is 1 μl/10 ml of agarose gel; for ethidium bromide, this should be to a final concentration of 0.5 μg/ml of agarose gel. Gently swirl the solution to mix. 7. Pour the gel into the gel tray and leave to set for approx. 30 min. 8. If gel-loading solution is not already in the PCR mixture, prepare your samples for gel electrophoresis by mixing the gelloading solution with the PCR product (3 μl gel-loading solution plus 5 μl PCR product). 9. Carefully remove the masking tape or undo the clamp of the gel-forming cassette and gently remove the comb. 10. Place the agarose gel in the electrophoresis tank containing 1× TBE buffer, making sure that the gel is totally immersed in buffer. The buffer should just be covering the surface of the gel. 11. Load the recommended volume of size standard into the assigned lanes (typically, 0.1 μg of standard per millimeter lane width). Then, load the samples into the subsequent wells. 12. Run gel for 30 min to 1 h at 80 V. 13. Transfer the gel to the imaging documentation system and capture a digital image. 3.13. Cycle Sequencing
1. Dilute the PCR product: (a) For rbcL, ITS, ITS2, trnH–psbA: one part of PCR product/two parts of water. (b) For matK: one part of PCR product/nine parts of water. 2. Cover the plate with plastic seal, and spin at 1,000 × g force for 1 min. 3. Defrost sequencing reagents (Table 5) at room temperature. Keep BigDye™ away from light exposure prior to use. 4. Prepare sequencing mix adding components in the order listed in Table 5 (see Note 31). After adding BigDye™, mix components gently by inverting the tube several times. Do not vortex. Add one primer. Mix gently with tip. Note that separate reactions are carried out using the forward or reverse primers. 5. Dispense 9.0 μl of sequencing mix into each well of 96-well plate.
Primer
Fifth
Last
BigDye™
Fourth
11
Total volume of reaction
9.0
1
0.25
1.875
0.875
5
Volume for 1 reaction (ml)
2
10 μM
5×
Final concentration
Diluted PCR product
Total volume of sequencing mix
Sequencing buffer
Third
10%
Stock concentration
b
Recommended amount to mix for a 96-well plate Sequencing buffer: for 50 ml: 20 ml of 1 M Tris–HCl pH 9, 500 μl of 1 M MgCl2, 29.5 ml of molecular-grade water
a
Molecular-grade water
Second b
Trehalose (Sigma-Aldrich, No. T9531-100 g)
Component
First
Order to add components
Table 5 General cycle-sequencing mix
936
104
26
195
91
520
Volume for 104 reactionsa (ml)
11 DNA Barcoding Methods for Land Plants 241
242
A.J. Fazekas et al.
6. Add 2 μl of diluted PCR product to each well (use fresh tip for each PCR product). 7. Place aluminum foil or heat-seal cover over the top of the 96-well plate. Apply heat for heat-seal cover, and use roller to close the plate tightly (see Note 22). 8. Spin the plate using centrifuge at 1,000 × g force for 1 min (see Note 23). 9. Place the plate into the thermocycler block and apply the program (see Note 32): 96°C for 2 min; 30 cycles of 96°C for 30 s, 55°C for 15 s, 60°C for 4 min; hold at 4°C. 10. After cycle sequencing reaction is complete, keep the plate in a dark box at 4°C to avoid degradation of BigDye™. 3.14. Cycle Sequencing Cleanup and Processing for an ABI 3730xl Capillary Sequencer
1. Measure dry Sephadex G-50 (Sigma-Aldrich, Cat. No. G5080500 g) with the MultiScreen Column Loader (Millipore, Cat. No. MACL09645) into the Acroprep 96 Filter plate with 0.45 μm GHP membrane (PALL, Cat. No. PN5030). This loader adds the specific amount of Sephadex required (see Note 33). 2. Hydrate each well with 300 μl of molecular-grade water using a pipette. 3. Let the Sephadex hydrate overnight at 4°C or for 3–4 h at room temperature before use. 4. Assemble the Sephadex plate onto the collection plate and secure with two rubber bands. 5. Centrifuge at 750 × g force for 3 min to drain the water from wells. Discard water from the collection plate (when centrifuging two plates, make sure that both sets have equal weight which can be achieved by using additional rubber bands). The collection plate can be reused without autoclaving. 6. Add the entire volume of the sequencing reaction to the centre of the Sephadex columns using a pipette. 7. Add 25 μl of 0.1 mM EDTA to each well of the Sephadex plate. 8. Elute clean sequencing reaction by attaching a 96-well plate to the bottom of Sephadex plate and secure with rubber bands. 9. To balance two plates, attach additional rubber bands as needed. 10. Centrifuge at 750 × g force for 3 min. Remove Sephadex plate. 11. Cover the top of the collection plate with a septum. 12. Place 96-well plate into black plate bases and attach white plate retainer.
11
DNA Barcoding Methods for Land Plants
243
13. Stack assembled plate in ABI 3730xl capillary sequencer and import plate record using Plate manager module of the Data Collection software (Applied Biosystems). 14. Begin sequencing run with Run Scheduler. 3.15. Sequence Editing
Careful and consistent editing of the raw sequence data is a critical component of generating a high-quality dataset. There are a number of software programs (e.g., Sequencher, CodonCode Aligner, etc.) that allow the import of raw trace files and include a variety of editing features. Since each sequence editing program is different, we cannot include a software-specific detailed editing procedure. We present instead the chain of events involved in going from the output of the sequencer to a useable sequence. 1. Retrieve electropherogram trace files from sequencer. 2. Import trace files into a sequence editing software package. 3. Generate sequence-quality scores for individual trace files. 4. Trim primer sequences from the sequences. 5. Trim sequences from both ends based upon minimum quality threshold (e.g., mean QV > 20 and no more than 2 bp QV < 20 in any 20-bp window). 6. Assemble forward and reverse sequence traces for each individual sample to create a sequence contig. 7. Manually edit individual sequences: pay particular attention to bases with low-quality scores or ambiguous calls (see Notes 34–37). 8. Acquire sequence-quality statistics for individual forward and reverse sequences (e.g., length of read, proportion of bases with QV > 20). 9. Generate consensus sequence. 10. Acquire consensus sequence quality statistics (e.g., length of consensus, percentage of bidirectional coverage, proportion of bases with QV > 20 for unidirectional and bidirectional portions of the consensus). 11. Export consensus sequence for downstream analysis.
4. Notes 1. Properly collected plant tissue is essential for maximizing PCR and sequencing success. Key to this process is that material from which DNA is extracted must be dried as quickly as possible to prevent the degradation of the DNA. Field collections
244
A.J. Fazekas et al.
of specimens must be immediately split into two components: (a) the voucher and (b) a portion of the voucher (typically, leaf tissue) which is placed in a container with silica gel or similar drying agent. It is important that the portion taken for DNA is put into silica gel as rapidly as possible after harvesting from the field. This should ideally be done immediately, but if impractical, it should be done no later than at the end of the collecting day. Delays to drying material in silica gel can result in samples with reduced DNA quality and lower PCR success. 2. It is important to keep the freshly collected tissue samples in separate containers. Pooling different samples into a single Ziploc bag, for example, increases the chances of cross contamination. 3. We describe three types of containers that we have used in various settings, each with relative advantages and disadvantages. (a) Scintillation vials provide a separate enclosed environment for each sample. This can be useful in humid conditions, in which coin envelopes may absorb some moisture from the air, slowing the tissue drying process, or for tissue that has a high water content and dries more slowly inside a coin envelope rather than when in direct contact with the silica. (b) Coin envelopes are probably the simplest medium for sampling plant tissue. It is easier to insert a sample into an envelope than into the narrow opening of a scintillation vial. Multiple coin envelopes can be stored in an airtight container with silica gel, requiring less space than scintillation vials. The envelopes also keep the silica gel separate from the tissue, facilitating tissue subsampling. Tea bags can also be used in place of envelopes; they are more porous, facilitating the drying process, but are also slightly more fragile. (c) Small (~10 × 15 cm) plastic bags with silica can work well in the field, but are prone to punctures from thorns or prickles, and are somewhat permeable which does exhaust the silica over time. When the samples are dry, the plastic bags need to be handled carefully to prevent excessive breakage of the plant tissue. 4. In the case of specimens that are likely to take a long time to dry (such as samples with waxy leaves), tear the leaf sample into smaller fragments or chop with a sterile blade to increase the surface area available for contact with the silica gel. 5. The best samples for plant DNA extraction typically come from actively growing plant tissues; senescing, damaged, or infected tissues should be avoided. The usual choice of plant tissue is leaves, but shoot tips or flower buds or petals can also be used. For canopy tree species in which reaching leaf or flower material
11
DNA Barcoding Methods for Land Plants
245
is logistically challenging, an alternative approach is to use a leather punch to obtain samples of cambium tissue which avoids the need for tree climbing (27). 6. Sampling herbarium material for DNA extraction can be successful, but success is often variable and unpredictable. The quality of the extraction is most likely a function of the age of the specimen, the species in question, and the speed with which samples have been dried, which is often unknowable. The priority should be given to samples not much greater than 10 years old. However, the most critical criterion is that the samples should still be green in color. Brown coloration of the herbarium sample indicates that the tissue quickly oxidized after collection or was infected by mold, indicating that the DNA is most likely degraded and/or contaminated by fungal DNA. 7. Tissue samples from which DNA extractions are made should be prevented from rehydrating from the atmosphere. This can be achieved through a climate-controlled facility or in airtight containers (refresh the silica as necessary). Long-term experiments are still needed to provide empirical data on optimum storage procedures for tissue samples. 8. The sampling of silica-dried material into tubes for DNA extraction and the extraction process are probably the most important steps in the process of generating good-quality DNA barcode data. It is the step that is the easiest for contamination or sample mix-up to occur. Thus, it is very important to follow the steps outlined in Subheading 3.3 to prevent this. A poor-quality extraction will result in inefficient or failed PCR reactions. 9. The appropriate amount of plant material to sample for DNA extraction is 10–15 mg dry tissue. In this case, more is not better; using more than this amount of tissue will result in a poorly ground sample, overwhelm the buffers used in the extraction process, and result in low-yield or poor-quality DNA. This amount usually corresponds to ~0.5 cm2, but may be smaller depending on the leaf thickness. Plastic materials (such as sampling tubes) often have a static charge that will attract small particles of plant tissue. Fragments of plant material literally jump from one well to another, so care must be exercised when placing bits of leaves into the tubes. 10. Plant tissues that are linear in shape (e.g., grass leaves and stems, conifer needles) need to be broken into smaller pieces to achieve proper pulverization using the grinding beads. 11. When sampling plant tissue from herbarium samples in areas where an alcohol burner is prohibited, it is good practice to wipe the forceps after each sample with a kimwipe moistened with ethanol.
246
A.J. Fazekas et al.
12. It is important to keep freshly collected silica-dried material and older herbarium-sampled tissue in separate extraction plates, as they may require different extraction protocols. 13. A note on bryophytes: Extreme care is required when sampling bryophytes due to the common occurrence of mixed species samples being collected from the field. Tissue subsampling is best done at the same time as determinations are made. 14. A frequency higher than 28 Hz can destroy the tubes. We do not recommend homogenization for longer than a total of 1 min, with the exception of samples with very tough tissue in which case an additional run of 30 s can be applied. 15. After the plant tissue is ground to a fine powder, the tubes require careful handling. Centrifuging does not help significantly in removing powderized plant tissue from the lids or caps as the static charge is strong enough to keep them adhered to the interior surface of the tube’s walls and caps. Opening the caps should be done with extreme care to avoid cross contamination prior to addition of the lysis buffer. 16. In the non-kit-based protocols are provided, the entire volume of the CTAB lysate is not used. Unused lysate can be stored at −20°C as a backup until the extraction is determined as being successful as indicated by the results of first PCR reaction. The lysate can also be used as a source for additional extractions if more testing of the DNA is necessary. 17. The square-well blocks that are specified in the protocol have enough volume to collect all the wash buffers without needing to discard between washes. However, if a block with a smaller volume is used, it may be necessary to discard the wash buffer between steps 16 and 17 of Subheading 3.8. 18. Trehalose (which is also a potent PCR enhancer) acts as a cryoprotectant for Taq polymerase when PCR mixes are prepared in large volume batches and frozen for future use. 19. Many available PCR protocols for matK include 4% DMSO. Experiments based on several hundred reactions have demonstrated that a 5% Trehalose solution can replace DMSO without any significant difference in PCR success or sequence quality. 20. After DNA extraction, it is recommended to begin the first round of PCR for the rbcL DNA barcoding marker using the nearly universal primers rbcLa-F/rbcLa-R; a greater degree of PCR success and quality is obtained in bryophytes with the reverse primer rbcLajf634R. These primers generate a high rate of PCR success with DNA of good quality. Hence, this first PCR for rbcL acts as a test for DNA quality for a broad variety of taxa among angiosperms, gymnosperms, ferns, and mosses.
11
DNA Barcoding Methods for Land Plants
247
21. PCR cleanup is both expensive and time consuming, but can be avoided through use of the low concentrations of primers and dNTPs in the PCR mix and the subsequent dilution of the PCR product prior to cycle sequencing reaction. This protocol provides a high success rate for PCR and sequences for regions that are amplified by universal, highly conserved primers (plastid rbcL, trnH–psbA, and nuclear ribosomal ITS2). In contrast, the matK DNA barcoding region needs distinct conditions for successful PCR amplification. For matK, the concentration of the primers, dNTPs, and Taq polymerase cannot be significantly reduced. Based on experiments optimizing the PCR conditions for matK, we recommend a protocol with diluted DNA (0.3–0.5 ng/μl) and a smaller PCR reaction volume (7.5 μl). These conditions have yielded a higher rate of PCR success and increased sequence quality over the general PCR mix. 22. The volumes of the PCR and cycle sequencing reactions recommended here are very small. Thus, it is very important to follow the instructions in Subheadings 3.9 and 3.14 carefully. The foil or thermal-seal cover should be placed evenly and tightly over the PCR plate without wrinkles or holes to prevent evaporation during PCR cycling. 23. Centrifuging is required to collect the PCR components at the bottom of the well and eliminate any air bubbles that might have been trapped. It also aids in mixing the PCR components with the DNA sample, or cycle sequencing mix with PCR product. 24. Although rbcL is present in the vast majority of land plants, there are some groups, such as holoparasites, that no longer have a functioning copy of this gene. As a result, the primers most commonly used typically do not work in these groups. 25. The primers most widely used for PCR amplification of the plastid trnH–psbA intergenic spacer for DNA barcoding are those recommended by Kress et al. (7) or Kress and Erickson (10) (Table 4). They are, respectively, trnH2 (originally from ref. 28) and psbAF (originally from ref. 29) or trnH(GUG) and psbA (originally from ref. 30). In bryophytes, this region is often short (<200 bp) and an alternative primer, psbA501f (31) located further back in the psbA gene can be used to obtain additional characters in combination with the primer trnH2. 26. The trnH–psbA region often contains homopolymer runs that are known to reduce sequence quality after the run. Fazekas et al. (32) have shown that the use of alternative polymerases, such as Phusion (Finnzymes) or Herculase II Fusion (Agilent), can improve quality for runs of up to 13–14 bases.
248
A.J. Fazekas et al.
27. The matK gene region is more difficult to amplify and sequence than rbcL for a number of reasons. First, it is approximately 300 bp longer than rbcL, and thus more sensitive to DNA degradation. Second, the presence of mononucleotide repeats in some groups dramatically affects the quality of the sequence reaction, resulting in a contig that is primarily supported by two unidirectional sequences with only a small amount of overlap. Finally, matK requires different primer combinations for different taxonomic groups. Therefore, it is important to estimate the taxonomic composition of the plate prior to amplification. If the plate contains angiosperms from different genera, families, and even orders, the combination of matK-KIM1R/ matK-KIM3F is the optimal first choice. This primer combination was recommended as the first choice by the CBOL Plant Working Group (13), and has been confirmed in thousands of PCRs from floristic projects in biodiversity hot spots. Those samples, which failed in the first round, are collected into a new plate, and subjected to a second round of PCR using the primers matK390f/matK1326r (failure tracking). Usually, the combination of these two sets of matK primers yields around 80% successful sequences in floristic projects focusing on angiosperms. However, if the project is represented by one specific group like a genus or family that does not work well with any of these primer combinations, it is best to search for appropriate primers for this group. A selection of order-specific primers has been published by Dunning and Savolainen (33). Two alternate primer pairs for gymnosperms (NY552F/NY1150R and matKpkF4/matKpkR1) are given in Table 4. Routine amplification of this region is difficult in non-seed plants and the development of primers for ferns/fern allies and bryophytes is currently underway. 28. ITS and ITS2 offer higher levels of species discrimination in some groups. However, one risk with ITS is that of fungal contamination. Even the cleanest leaf sample will likely have fungal hyphae associated with it, and in some groups this can be a serious source of contamination. For the entire ITS region, the use of the angiosperm-specific primers AB101 and AB102 can reduce this problem for flowering plants. 29. In situations where PCR is unsuccessful or patchy in its success, some optimization of PCR conditions can improve success. An often-successful first approach is to dilute the DNA ten times. DNA is often not limiting in PCR, but the extraction process occasionally does not remove all PCR inhibitors sufficiently. A dilution often reduces inhibitors to the point, where PCR can succeed. Other steps to improve success are as follows: (a) for faint or absent PCR products: a decrease in primer-annealing temperatures, an increase in primer concentration, an increase
11
DNA Barcoding Methods for Land Plants
249
in MgCl2, or an increase in the number of cycles and (b) when multibanded PCR products are produced: an increase in primer-annealing temperatures, a decrease in primer concentration, or a decrease in MgCl2. These various approaches of course depend on the initial PCR conditions one starts with; there are minimums and maximums, which, if exceeded, will usually result in failed PCR. 30. Ethidium bromide is highly toxic and a mutagen. Handling agarose gels and/or buffers containing ethidium bromide requires a designated work area isolated from other space in the lab, dedicated pipettes, and water reservoirs. Nitrile gloves should be worn when handling any components, and discarded when the PCR check is completed to avoid contamination by ethidium bromide. There are numerous alternatives to ethidium bromide that are less toxic. SYBR® Safe DNA gel stain, for example, is not classified as hazardous waste under the US federal regulations but has a comparable sensitivity to ethidium bromide. 31. The use of 5.5% trehalose in the sequencing reaction mix allows sequencing mixes to be premade in primer specific batches, aliquoted directly into plates, covered with PCR film, and frozen for future use. The mix with the primer can be stored at −20°C for up to 3 months. However, one must avoid unnecessary thawing of the mix before use, which can cause degradation of the Dye Terminator. The mix should be thawed only just prior to use. 32. The annealing temperature of the sequencing reaction can also be adjusted to the specific primer conditions. For example, for those primers which require a 50°C annealing temperature during PCR, the annealing temperature in the cycle sequencing program also can be set at 50°C. However, it is not recommended to use an annealing temperature for the sequencing reaction lower than 50°C. 33. For higher throughput, the semiautomated AutoDTR™ method from EdgeBio® (http://www.edgebio.com/catalog/ dye-terminator-removalproducts-AutoDTR™-96-c-28_1005. html) has been used at the Canadian Centre for DNA Barcoding. This sequencing cleanup method is less sensitive to PCR product concentration and allows longer high-quality reads and a further reduction of BigDye™ in the sequencing reaction due to increased sensitivity. 34. Check and maintain the proper orientation of the sequences. Sequences that are in the reverse complement orientation disrupt proper alignment, causing subsequent analytical problems if not corrected. 35. Look for odd gaps or insertions in the sequence, especially at the ends of the read. The trace file you view is a result of an
250
A.J. Fazekas et al.
algorithm that interprets the fluorescence pattern and compression artifacts often result at the ends of a read. These are particularly problematic with multiple repeats, where the algorithm may have difficulty distinguishing whether there are 4 or 5 Adenine nucleotides in a row for example. 36. Check for an open reading frame (ORF) in coding regions, such as rbcL and matK. This is an important first step in identifying stop codons and errors in editing. In the correct reading frame, there should be no internal stop codons present in coding regions. There are two potential reasons for observing stop codons in the sequence. First, it may be a real stop codon indicating that the region that has been sequenced is no longer functional, and likely a pseudogene. These cases are quite rare in both rbcL and matK genes, and usually observed in only a few specific groups of plants. Second, it can be evidence of errors due to editing, e.g., miscalls, or a frameshift of the ORF due to missed or extra base calls. Each instance needs to be carefully investigated and addressed. 37. Check for chimeric sequences (a result of nonspecific PCR amplification from multiple templates), bacterial, virus, fungal, and algal contaminations, and remove them from the consensus sequence.
Acknowledgments We are grateful to Michelle Hollingsworth, Alan Forrest, David Erickson, and John Kress for helpful comments on this manuscript. References 1. Govaerts R (2001) How many species of seed plants are there? Taxon 50:1085–1090 2. Thorne RF (2002) How many species of seed plants are there? Taxon 51:511–522 3. Scotland RW, Wortley AH (2003) How many species of seed plants are there? Taxon 52:101–104 4. Paton AJ, Brummitt N, Govaerts R, Kehan H, Hinchcliffe S, Allkin B, Lughadha EN (2008) Towards target 1 of the global strategy for plant conservation: a working list of all known plant species: progress and prospects. Taxon 57: 602–611 5. Fazekas AJ, Burgess KS, Kesanakurti PR et al (2008) Multiple multilocus DNA barcodes from the plastid genome discriminate plant species equally well. PLoS One 3:e2802
6. Chase MW, Salamin N, Wilkinson M et al (2005) Land plants and DNA barcodes: shortterm and long-term goals. Phil Trans Lond B 360:1889–1895 7. Kress WJ, Wurdack KJ, Zimmer EA et al (2005) Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci USA 102:8369–8374 8. Newmaster SG, Fazekas AJ, Ragupathy S (2006) DNA barcoding in the land plants: an evaluation of rbcL in a multigene tiered approach. Can J Bot 84:335–341 9. Cowan RS, Chase MW, Kress WJ, Savolainen V (2006) 300,000 species to identify: problems, progress, and prospects in DNA barcoding of land plants. Taxon 55:611–616 10. Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land plants: the coding
11
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
rbcL gene complements the non-coding trnHpsbA spacer region. PLoS One 2:e508 Chase MW, Cowan RS, Hollingsworth PM, van den Berg C, Madriñan S, Petersen G, Seberg O, Jørgsensen T, Cameron KM, Carine M, Pedersen N, Hedderson TAJ, Conrad F, Salazar GA, Richardson JE, Hollingsworth ML, Barraclough TE, Kelly L, Wilkinson M (2007) A proposal for a standardised protocol to barcode all land plants. Taxon 56:295–299 Newmaster SG, Fazekas AJ, Steeves RAD, Janovec J (2008) Testing candidate plant barcode regions with species of recent origin in the Myristicaceae. Mol Ecol Notes 8:480–490 CBOL Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci USA 106:12794–12797 Álvarez I, Wendel JF (2003) Ribosomal ITS sequences and plant phylogenetic inference. Mol Phylogenet Evol 29:417–434 Bailey CD, Carr TG, Harris SA, Hughes CE (2003) Characterization of angiosperm nrDNA polymorphism, paralogy, and pseudogenes. Mol Phylogenet Evol 29:435–455 Moeller M (2000) How universal are universal rDNA primers? A cautionary note for plant systematists and phylogeneticists. Edinburgh J Bot 57:151–156 Gonzalez MA, Baraloto C, Engel J et al (2009) Identification of Amazonian trees with DNA barcodes. PLoS One 4:e7483 Razafimandimbison SG, Kellogg EA, Bremer B (2004) Recent origin and phylogenetic utility of divergent ITS putative pseudogenes: a case study from Naucleeae (Rubiaceae). Syst Biol 53:177–192 Okuyama Y, Kato M (2009) Unveiling cryptic species diversity of flowering plants: successful biological species identification of Asian Mitella using nuclear ribosomal DNA sequences. BMC Evol Biol 9:105 Chen S, Yao H, Han J et al (2010) Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS One 5:e8613 Taberlet P, Coissac E, Pompanon F et al (2007) Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding. Nucleic Acids Res 35:e14 Fazekas AJ, Kesanakurti PR, Burgess KS et al (2009) Are plant species inherently harder to discriminate than animal species using DNA barcoding markers? Mol Ecol Res 9:130–139 Hollingsworth PM, Graham SW, Little DP (2011) Choosing and using a plant DNA bar-
DNA Barcoding Methods for Land Plants
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
251
code. PLoS ONE 6(5):e19254. doi:10.1371/ journal.pone.0019254 Kress WJ, Erickson DL, Jones FA et al (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proc Natl Acad Sci USA 106:18621–18626 Burgess KS, Fazekas AJ, Kesanakurti PR et al (2011) Discriminating plant species in a local temperate flora using the rbcL + matK DNA barcode. Methods Ecol Evol 2:333–340 Ivanova NV, Fazekas AJ, Hebert PDN (2008) Semi-automated, membrane-based protocol for DNA isolation from plants. Plant Mol Biol Rep 26:186–198 Colpaert N, Cavers S, Bandou E et al (2005) Sampling tissue for DNA analysis of trees: trunk cambium as an alternative to canopy leaves. Silvae Genet 54:265–269 Tate JA, Simpson BB (2003) Paraphyly of Tarasa (Malvaceae) and diverse origins of the polyploid species. Syst Bot 28:723–737 Sang T, Crawford DJ, Stuessy TF (1997) Chloroplast DNA phylogeny, reticulate evolution and biogeography of Paeonia (Paeoniaceae). Am J Bot 84:1120–1136 Hamilton MB (1999) Four primer pairs for the amplification of chloroplast intergenic regions with intraspecific variation. Mol Ecol 8:513–525 Cox CJ, Goffinet B, Shaw AJ, Boles SB (2004) Phylogenetic relationships among mosses based on heterogeneous Bayesian analysis of multiple genes from multiple genomic compartments. Syst Bot 29:234–250 Fazekas AJ, Steeves R, Newmaster SG (2010) Improving sequencing quality from PCR products containing long mononucleotide repeats. Biotechniques 48:277–285 Dunning LT, Savolainen V (2010) Broad-scale amplification of matK for DNA barcoding plants, a technical note. Bot J Linn Soc 164:1–9 Soltis PS, Soltis DE, Smiley CJ (1992) An rbcL sequence from a Miocene Taxodium (bald cypress). Proc Nat Acad Sci USA 89:449–451 Levin RA, Wagner WL, Hoch PC, Nepokroeff M, Pires JC, Zimmer EA, Sytsma KJ (2003) Family-level relationships of Onagraceae based on chloroplast rbcL and ndhF data. Am J Bot 90:107–115 Fofana B, Harvengt L, Jardin P, Baudoin JP (1997) New primers for the polymerase chain amplification of cpDNA intergenic spacers in Phaseolus phylogeny. Belg J Bot 129:118–122 Cuenoud P, Savolainen V, Chatrou LW, Powell M, Grayer RJ, Chase MW (2002) Molecular
252
A.J. Fazekas et al.
phylogenetics of Caryophyllales based on nuclear 18S rDNA and plastid rbcL, atpB, and matK DNA sequences. Am J Bot 89: 132–144 38. White TJ, Bruns T, Lee S, Taylor J (1990) Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. In: Innis MA, Gelfand DH, Sninsky JJ, White TJ
(eds) PCR protocols: a guide to methods and applications. Academic Press, New York, pp 315–322 39. Sun Y, Skinner DZ, Liang GH, Hulbert SH (1994) Phylogenetic analysis of Sorghum and related taxa using internal transcribed spacers of nuclear ribosomal DNA. Theor Appl Gen 89:26–32
Part III Generating DNA Barcode Data
Chapter 12 Field Information Management Systems for DNA Barcoding John Deck, Joyce Gross, Steven Stones-Havas, Neil Davies, Rebecca Shapley, and Christopher Meyer Abstract Information capture pertaining to the “what?”, “where?”, and “when?” of biodiversity data is critical to maintain data integrity, interoperability, and utility. Moreover, DNA barcoding and other biodiversity studies must adhere to agreed upon data standards in order to effectively contextualize the biota encountered. A field information management system (FIMS) is presented that locks down metadata associated with collecting events, specimens, and tissues. Emphasis is placed on ease of use and flexibility of operation. Standardized templates for data entry are validated through a flexible, project-oriented validation process that assures adherence to data standards and thus data quality. Furthermore, we provide export functionality to existing cloud-based solutions, including Google Fusion Tables and Flickr to allow sharing of these data elements across research collaboration teams and other potential data harvesters via API services. Key words: Field data, Metadata, FIMS, Templates, bioValidator, Google Fusion Tables
1. Introduction All DNA barcoding projects begin with a tissue sample from which all subsequent genetic processing takes place. In order to receive the keyword “Barcode” in a Genbank submission of DNA barcode records, certain metadata elements are required associated with the specimen from which the tissue was derived. The three essential elements are: (1) Voucher Specimen with Unique identification, (2) name of the species, and (3) country of origin (see Note 1). Additional recommended metadata are strongly suggested; these include (4) latitude and longitude, (5) date of collection, (6) name of collector, and (7) name of identifier (see Note 2). Many researchers in biodiversity science recognize that this is but a fraction of the data elements required to truly contextualize biodiversity in space and time. Other data elements associated with both the collecting
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_12, © Springer Science+Business Media, LLC 2012
255
256
J. Deck et al.
event and the specimen itself are important to capture for long-term comparative analyses. Most of these metadata types are included in the Darwin Core Standard (see Note 3). Specimens and ultimately tissues for DNA barcoding or any other genetic investigations are sourced from one of two places: new collections in the field (e.g., expeditions) or from existing collections (e.g., museums). There are advantages and disadvantages for gathering genetic material from each source. Table 1 reviews these pros and cons and should be consulted as a general guide for formulating an effective strategy for DNA barcoding. Regardless of where the source material is derived, some data capture system is required to digitally lock down the associated metadata and specimen photos and to track the information through the DNA barcoding pipeline. In this chapter, we present a field information management system (FIMS) as a vital part of the barcoding information flow. This is where the location, time, habitat, collectors, and initial taxonomic identification are recorded.
Table 1 Evaluating the source of specimens Existing collections
Field collections
DNA quality
(−) Can be low
(+) Likely high
Permitting
(+) Presumably completed
(−) Significant effort may be required
Photo documentation
(−) Poor (color gone, desiccated, etc.)
(+) Better (in situ, colors, etc.)
Habitat/taxon coverage
(−) Opportunistic, rely on what is there
(+) Sample strategically, targeted
Diversity
(+) Rare species available
(−) Rare species hard
New curation costs
(+) Already committed
(−) Significant (~80%)
Field metadata quality
(−) Possibly poor (Country or County, etc.)
(+) Can be exact (microhabitat, GPS, etc.)
Taxonomic identification
(+) Likely high precision
(−) Burden to taxonomic community
Access and logistics
(+) Group aggregated in one place
(−) Multiple trips
Collection cost of specimen
(+) No collection cost
(−) Expeditions costs
Focus of study
(+) Systematics focus
(+) Geographic focus: monitoring, ecology
Taxonomic focus
(−) Traditional preservation precludes molecular work (e.g., formalin)
(+) Fresh material available
Temporal perspective
(+) Change over time
(−) Single time slice
12
Field Information Management Systems for DNA Barcoding
257
This information is necessary to verify each barcode sequence that is run and is essential for any further reporting or information gathering. What we present herein is based on our experience with the Moorea Biocode Project, and effort to genetically characterize all macrobiota on a tropical island in French Polynesia. We have generalized our approach in order to be useful for any other researchers intending to initiate DNA barcoding projects or field expeditions.
2. Field Information Management Workflow
2.1. Data Entry via Spreadsheets
The design of the FIMS is based on a compromise between the needs of information managers (database engineers) and the front-end users (collectors). The protocol presented here does not assume an internet connection in the field. As such it can be deployed and used in remote locations. After trying a variety of data ingestion methods, including Web forms, we ultimately settled on using Microsoft Excel spreadsheets as most researchers are familiar with the interface. Ease of use is the primary concern for getting buy-in and cooperation at the beginning of the data cascade in order to insure metadata capture. These spreadsheets are then validated using a distributed bioValidator client software tool in order to minimize data post-processing, reduce the burden on information managers, and provide direct feedback to the input users who are more familiar with the original metadata when there are problems. Finally, when an internet connection is available, data may be uploaded to Fusion Tables for exploration, mapping, sharing, recombining, and even exchange into other tools like LIMS. Your project will require adopting a project spreadsheet template or creating a new one. The Moorea Biocode project created a spreadsheet template for all researchers to use in the field, and which the validation tool “bioValidator” is able to interpret. Likewise, Barcode of Life Data Systems (BOLD) has created a different template with a subset of the Biocode fields, based on the requirements of the barcoding community for collecting specimenbased data. BOLD spreadsheet templates can be similarly validated using bioValidator in order to minimize user input error. Whatever your goals are for your project, it is important to spend time at the beginning to think about all of the information that you require. There is a balance between creating so many fields that the user becomes overwhelmed and also being explicit enough to capture the pertinent information. We maintain a simple concept behind the organization of the data in the Biocode spreadsheet templates that has been effective for project management and workflow. The template includes two sheets. In the first sheet, each line represents a collecting event and includes
258
J. Deck et al.
all metadata pertaining to where and when a particular specimen was collected (e.g., time, latitude and longitude, elevation, collectors, collection method). Because a single event can have multiple specimens (one to many relationship), we then have a second sheet in which each line represents a specimen from a particular collecting event and its relevant metadata (e.g., taxonomy, microhabitat, fixation, life stage). Each specimen must have a corresponding collecting event (one of the validation steps checked by bioValidator). Importantly, a third data object critical to the barcoding pipeline is the tissue sample. All DNA barcoding requires the collection of some genetic material for processing. Because many tissue samples can come from the same specimen, and because some genetic analyses may specifically target certain tissue subtypes (e.g., transcriptomics), we enforce unique tissue identifiers in the validation process. Tissues are entered into the specimen spreadsheet for matter of convenience, but upon validation and uploading to Fusion Tables each line is based on a unique tissue identifier. In this way, the Tissue_ID is the pivot point linking the field and specimen metadata to the molecular workflows, linking the FIMS to the Laboratory Information Management System (LIMS) (see Chapter 13)). The existing Biocode template can identify up to two unique tissue objects, a tissue sample in a plate/well format meant for destructive sampling and an archival tissue that has to have a unique identifier entered into the “tissue_barcode” field. 2.1.1. Download Spreadsheet Templates
Follow the links here to download either the Biocode spreadsheet template (http://biocode.berkeley.edu/excel/BiocodeTemplate.xls) or the BOLD spreadsheet template (http://www.boldsystems.org/ docs/SpecimenData.xls). New spreadsheet templates and mechanisms for validating that data can be accommodated by the authors on a case by case basis. Please contact JD for more information. Open the files using a spreadsheet utility (see Note 4).
2.1.2. Managing Spreadsheets
Spreadsheets are a familiar tool for researchers that allow quick and easy manipulation of data. However, certain practices need to be followed to allow for success. It is up to the user, for instance, to maintain backup copies of their spreadsheets, and to ensure that data are handled correctly while using the spreadsheet. A classic mistake that may be difficult to recover from is sorting only one column in an entire sheet of data. This is a type of error that even the most sophisticated validation tool may not be able to spot. Typically, it is better to keep spreadsheets to less than 1,000 rows. This will make the validation process go more smoothly, make it easier to spot and fix errors, and help with validation, photo-matching, loading, and processing. This requires some planning with how to store and manipulate your spreadsheets. Some teams organize their spreadsheets by tissue plates, while others may organize by taxonomic group, collecting event, collecting trip, or a particular collector.
12
Field Information Management Systems for DNA Barcoding
259
Not all fields are required before validating and uploading your spreadsheets. Certain core fields must be filled, but other information can be updated at a later stage (see Note 5). 2.2. Validation Using bioValidator
Once sufficient event, specimen, and tissue metadata have been entered into the spreadsheets, these data can be validated based on user-defined parameters. bioValidator is a tool that is designed to validate event, specimen, and tissue data while offline. Operating offline allows researchers to check their spreadsheet data while it is being entered in remote locations and to promote the use of unique identifiers and linking to photos and other metadata before it is uploaded to an online server. bioValidator thus provides the validation funnel that organizes field data and provides the linkage to other data that are generated in the molecular barcoding process (see Note 6). This section demonstrates how to use bioValidator effectively.
2.2.1. Download bioValidator
bioValidator can be downloaded at the following link (https:// sourceforge.net/projects/biovalidator/). Information about bio Validator can be found at: https://sourceforge.net/projects/ biovalidator/. bioValidator installs and runs in a single file. It is recommended, however, that you create its own directory when downloading it. If you have Java 1.5 or later installed on your computer, everything will run fine. Otherwise, you will need to download and install Java 1.5 (http://www.java.com/en/download/ manual.jsp).
2.2.2. Building a Configuration File to Process Validation Rules
Before bioValidator can be run, a project must agree on a set of rules that the data must abide by. Examples of rules can be: enforcing unique names for a particular column, requiring data for particular columns, or restricting data elements in a column. This section discusses the validation rules that are available and is not meant to be a guide for actually building a rules file (see Note 7). Recognizing that all projects are different, the rules file is configurable and can be used to support projects in any domain and different types of spreadsheet templates. Two error levels can be set for each rule: error or warning. If error level is set, then it means that this row cannot be loaded to an online database or photo-matching cannot be started. The rules file is written in XML and can call any of the following predefined methods (Table 2). Currently, the bioValidator configuration file must be done by hand as there is not a mechanism to generate it automatically. An automated validation file generator is planned soon. Meanwhile, the developers can be contacted with help on this for setting up new files (see http://biovalidator.sourceforge.net/).
2.2.3. Validating Spreadsheets
Once you have created or downloaded your spreadsheet template and filled in some data, you can run the validation. By default, the Biocode validation schema will be loaded but you can change this
260
J. Deck et al.
Table 2 Rule types implemented in bioValidator Rule type
Purpose
uniqueValue
Enforce unique values in a particular column.
checkLowestTaxonLevel
Ensures that the lowestTaxonLevel fields match known values
checkTissueColumns
Given a plateName and wellNumber attributes, check for consistency in naming, parseable names, and nonduplication
checkInXMLFields
Check value against a list of acceptable values.
RequiredColumns
These columns must exist in the spreadsheet and also not have empty data
DwCLatLngChecker
Check for acceptable latitude, longitude, error radius, and datum values
BoundingBox
Check that the latitude, longitude is within a specified bounding box
to point to any of the other available validation schemas by selecting the “Load Validation” button. After selecting a new schema, bioValidator will continue to use that schema every time you start the application. The next step is to load your spreadsheet. Click the “Load Spreadsheet” button and select your spreadsheet from your file system. When the spreadsheet is loaded you can click the “Run” button to run the validation (Fig. 1). You will be able to see the progress of the validator as it processes each rule. When it is done running, you will see either a green “valid” message, orange “warning” message, or red “error” message depending on the errors in your spreadsheet. By clicking on the Specimen or Collecting Event Results buttons you can see the context of the messages and fix the issues in your spreadsheet. When the issues have been addressed in your spreadsheet, click the “Run” button again and repeat the process until everything is “valid” or you can live with the “warning” messages. At this point, you are ready to continue entering data, load your data to Google Fusion Tables, or match photos. 2.3. Data Exchange Using Fusion Tables
Here, we take advantage of Google Fusion Tables. Fusion Tables is a public tool created by Google developers that allows researchers to visualize and publish their data online. Data from multiple researchers can be combined into one table. The Google Fusion Tables API enables programmatic access to the Fusion Tables content. We envision the use of Fusion Tables similar to a project
12
Field Information Management Systems for DNA Barcoding
261
Fig. 1. Main Page of bioValidator program.
oriented TAPIR/DIGIR service provider that allows collaborating researchers to share their data in a standardized format, maintain updated information, and allow remote access via the API to other tools like Laboratory Information Systems that can link workflows through unique identifiers to annotate subsequent processes. 2.3.1. Uploading Process
After you have validated your spreadsheet, you can upload to a Google Fusion Table. Press the “Upload to Fusion Table” button and enter your Google Account information (a Google Account can be created for free online with any email address). The loading process may take several minutes or longer if you are on a slow connection. When you are finished, you should see a message like the following: Uploading to Fusion Tables … be patient, this can take awhile … Create table Specimens_fishlarvae Inserting 188 rows into Specimens_fishlarvae (be patient) Create table Collecting_ Events_fishlarvae Inserting 11 rows into Collecting_Events_ fishlarvae (be patient) Specimens fusion table data is named Specimens_fishlarvae and is visible at: http://www.google.com/ fusiontables/DataSource?dsrcid=723314.
262
J. Deck et al.
Note that each time you load a spreadsheet you can choose the name of the existing spreadsheet to replace or to create a new one. You can also enter the name of the table ID to replace if you have that. 2.3.2. Fusion Tables Functionality
Google Fusion Tables has many functions aside from storing data online. While we discuss a few features in this section that are of particular interest in working with barcode data, it is best to also look at the Fusion Tables Web site for a comprehensive list of current features and capabilities (http://www.google.com/ fusiontables). When collecting event data are loaded, latitude and longitude fields can be used to automatically generate a map showing the point locations of the data. A map offers another visual validation tool, as well as helps tell the story of collecting events to others. It is also possible to select different styles for the points on the map, or export to Google Earth. It is possible to construct views of tables or to make a union of multiple tables into one larger table. This is useful for projects that may want to make a comprehensive view of all data that has been collected, or perhaps, to display only a subset of data that has been loaded. For example, suppose you load a spreadsheet with 100 different columns and you wish to view or share only three or four columns worth of data. Using Views, you can make a special table that just shows those columns. Finally, with Fusion Tables, we have a user-authenticated online storage system that we can easily use to share data between systems. This allows anyone in the world to adopt FIMS and LIMS systems and work with data seamlessly between the two systems. See Chapter 13 for information on how to connect your field data to the LIMS system.
2.3.3. Updating Data
While Fusion Tables is useful for loading, extracting, combining, and visualizing data, it does not have a robust user interface for editing data. Thus, we recommend that you make changes in your original spreadsheet and then use bioValidator to reload that data into Fusion Tables. When loading data into Fusion Tables, you can either choose to replace an online table by selecting from a drop down list of all tables associated with your user account, or by entering the table ID. In either case, the data for the items in the particular spreadsheet will be replaced in the fusion table (see Note 8). These data will be synchronized with the LIMS system as discussed in Chapter 13.
2.4. Managing Photos
Management of specimen photos in either the field or a museum requires some ability to attach this digital record to the specimen record. Complex workflows (multiple camera types, in situ
12
Field Information Management Systems for DNA Barcoding
263
photography, underwater images, etc.) often are difficult to manage in the field. We have created this photomanaging tool within bioValidator to link these images with the various specimens and provide a method to post them to a shared data source (Flickr) where they can be retrieved by multiple users using automated calls. 2.4.1. Naming Conventions
The photo naming system for photo loading consists of the collector’s specimen number, a “+” sign as a delimiter, and then any other information about the photo after the “+”. Photos without an exact match with the specimen number before “+” will not be loaded. Examples of photo names for a specimen with the identifier CM_1234 are: CM_1234+1.jpg and CM_1234+2.jpg; or CM_1234+a.jpg and CM_1234+b.jpg; or CM_1234+vellidae-0001. jpg and CM_1234+vellidae-0002.jpg; or just this, if you have only one photo for a specimen: CM_1234.jpg.
2.4.2. Photo-Matching in bioValidator
After you have loaded and validated a spreadsheet in bioValidator, you can then match a directory of photos to the specimens in your spreadsheet. If you have any errors on your spreadsheet, then you will not be allowed to match photos until those errors are fixed. The photo-matching process essentially renames images with the specimen_num_collector value (or other value as designated by your bioValidator configuration file). Images are renamed and copied to a specified output directory. Input image files are not changed or renamed in any way. Output image files are only renamed. Use the following procedures to match photos: ●
Click the “PhotoMatcher” tab on bioValidator.
●
Select “Input Directory” to select a directory that contains images that you want to match. This will build a local cache of thumbnails and may take a minute or two to process while it is doing this.
●
Select “Output Directory” to set the directory where the renamed photos will be copied to.
●
Choose any additional fields to view in the specimen scroller by clicking on the green plus icon (Fig. 2).
●
Navigate the input photos (on the left side of the screen) and the specimens (on the right side of the screen) until you are able to match a photo to a particular specimen.
●
When you find a match, control-click on the image which copies the image over to the rename selection box. This is especially useful to quickly add multiple images to an individual specimen (Fig. 3).
●
When you are done matching photos for a particular specimen, click “rename” to rename the images and copy them into the output directory.
264
J. Deck et al.
Fig. 2. Screenshot in PhotoMatcher Tool in bioValidator for selecting additional fields.
Fig. 3. Matching images to specimens in PhotoMatcher.
2.4.3. Photo Loading to Flickr
Photos can be bulk-loaded to Flickr using bioValidator or by directly loading into Flickr and assigning appropriate machine tags so photos can be harvested later. You must have a Flickr account to use this feature. It is free to create a Flickr account to load up to 200 photos. If you wish to load more photos than that, then you must subscribe to Flickr which costs $25/year.
12
Field Information Management Systems for DNA Barcoding
265
Following is the process to load photos to Flickr using bioValidator: ●
Click the PhotoLoader Tab.
●
Click on “Select Photo Directory” button.
●
Select the images you want to load to Flickr; selecting only one image will show the image and the specimen details that it is associated with.
●
Once you have selected the images to load to Flickr, then you can click the “Upload to Flickr” button which will begin the upload process. The “FlickrStatus” should update to show each image as it is loaded to your Flickr account (Fig. 4).
Once your photos are in Flickr, they can be referred to online and searched using machine tags that are embedded in the site along with the image. The Moorea Biocode LIMS software (described in Chapter 13) integrates with the Flickr images uploaded by BioValidator. It may take up to 2 h for images that you have loaded into Flickr to be searchable via the LIMS system. Also, bioValidator sets the taxonomy:binomial construct to enable harvesting from other biodiversity providers, such as the Encyclopedia or Life (EOL) (see Note 9). Following are the machine tag designations set by bioValidator for the photo that you uploaded (Fig. 4):
Fig. 4. Screenshot in Flickr tool.
266
J. Deck et al. ●
bioValidator:specimen=MParis0979
●
bioValidator:file=MParis0979 + 02329.jpeg
●
bioValidator:date=2011-04-20_15:58:58
●
taxonomy:binomial=Blenniella gibbifrons
●
geo:lat=−17.613028
●
geo:lon=−149.821611
●
dwc:recordedby=Biocode User
3. Notes 1. Required DNA barcode data elements: While no specific Unique Identifier is noted, the DNA barcoding community recommends using the Darwin Core Triplet consisting of [Institutional Acronym:Collection Code:Catalog ID]. A listing of institutional acronyms can be found at http://www.biorepositories. org. For species names, provisional species are allowed (e.g., Galathea sp.8 Poupin). Finally, country of origin can include a body of water for marine taxa not within territorial waters (e.g., Pacific Ocean). 2. Recommended DNA barcode data elements: In addition to the four listed recommended elements, we highly recommend “basis of ID” as an additional field. In order to minimize the propagation of error, this field would allow tracking of consistent mistakes in identity (e.g., mislabeled field guides) and delineate those IDs based on morphologic features vs. the molecular sequence itself. 3. Darwin core standards: A listing of the Darwin Core Standards can be found online at http://rs.tdwg.org/dwc/index.htm. This is good place to start in defining fields for your project. Other metadata standards include those sanctioned by the Genomic Standards Consortium at http://gensc.org. 4. Spreadsheet templates: A description of the Moorea Biocode fields and spreadsheet template is available at http://biocode. berkeley.edu/batch_upload_help.html. Likewise, you can get started with the BOLD systems spreadsheet template by visiting http://www.boldsystems.org/docs/handbook.php? page=datasubprotocol. 5. Essential spreadsheet fields Biocode required fields: Collection Event ID, Entered By, TaxTeam, Country, Year Collected, Month Collected, Specimen Number, Phylum, Holding Institution. BOLD required fields: Sample ID, Field ID, Museum ID, Institution Storing, Country.
12
Field Information Management Systems for DNA Barcoding
267
6. FIMS as a stand alone tool: This proposed FIMS system can be adopted for any expeditionary work regardless of whether genetic postprocessing of the material will occur. Because we follow the data standards adopted by the museum community, simple ingestion scripts can be written to easily port the validated data to museum databases, such as KE EMu, Specify, or other adopted database systems. 7. Default validation schema: If your project adopts the Biocode spreadsheet template, then a prebuilt validation rules file that works with this template is installed automatically with bioValidator. 8. Updating data in fusion tables: Each time data are loaded to the Fusion Tables System, data are written to a new table instance in fusion tables. If you want to re-upload a spreadsheet, then you will need to note the Fusion Table ID in your LIMS system to update data there. 9. Flickr tags: Adding or changing the machine tag data in Flickr is possible but changes to the bioValidator:specimen field may render your image unsearchable to other systems.
Chapter 13 Laboratory Information Management Systems for DNA Barcoding Meaghan Parker, Steven Stones-Havas, Craig Starger, and Christopher Meyer Abstract In the field of molecular biology, laboratory information management systems (LIMSs) have been created to track workflows through a process pipeline. For the purposes of DNA barcoding, this workflow involves tracking tissues through extraction, PCR, cycle sequencing, and consensus assembly. Importantly, a LIMS that serves the DNA barcoding community must link required elements for public submissions (e.g., primers, trace files) that are generated in the molecular lab with specimen metadata. Here, we demonstrate an example workflow of a specimen’s entry into the LIMS database to the publishing of the specimen’s genetic data to a public database using Geneious bioinformatics software. Throughout the process, the connections between steps in the workflow are maintained to facilitate post-processing annotation, structured reporting, and fully transparent edits to reduce subjectivity and increase repeatability. Key words: Laboratory information management systems, Workflow, DNA barcoding, Submission, Bioinformatics
1. Introduction Only recently have powerful, customizable laboratory information management systems (LIMSs) software programs been built and published for public purchase and use (www.limsource.com). Previously, when researchers had a need for a comprehensive method of databasing and tracking specimens, they either had to build a program by hand or contract a programmer to customize a system that would meet their specific needs. Though many of these LIMS programs were unique in one way or another, they were inherently all built to do the same thing: track specimens and workflows. What ensued from the lack of low-cost, customizable LIMS
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_13, © Springer Science+Business Media, LLC 2012
269
270
M. Parker et al.
programs available on the bioinformatics market was an exorbitant amount of energy used by researchers and programmers to build many, slightly different, LIMS software programs. While the concept behind all LIMS software programs is inherently the same, LIMS programs do need to be customizable because of the individual needs of the scientist. There are now many marketed LIMS software programs that allow the users to customize the program to fit their needs without having to script an entire new program. Further, many LIMS software developers have incorporated into their LIMS program the ability not only to track individual specimens, but also to capture all laboratory workflows associated with each specimen, creating a detailed, transparent electronic notebook for the users and collaborators. The Moorea Biocode Project (MBP—http://www.mooreabiocode.org/) was awarded funds for 3 years from the Gordon and Betty Moore Foundation to genetically index all plant, animal, and fungal macrobiotic species living on the island of Moorea, French Polynesia. It was understood by the project’s scientists that the number of specimens to be collected during the 3-year period would be monumental, and that the need for a LIMS software program would be essential. The MBP contracted Biomatters Ltd., a New Zealand bioinformatics company, to cooperatively build a LIMS software program that would track all molecular processes conducted with each individual specimen and link to field specimen metadata. This electronic notebook has allowed project scientists and collaborators from all over the world to view, edit, and share data. Scientists working on the MBP have, to date, genetically databased over 25,000 specimens using the Geneious LIMS software program. Since 2008, Biomatters Ltd. has worked with the MBP to create a LIMS software program that not only meets all of Biocode’s needs, but also the needs of any researcher who, with either large or small datasets, desires the ability to database and track all genetic samples with confidence. There are now many high-quality LIMS software programs available on the bioinformatics market for the single and group users. The Biocode LIMS is intended to fill a niche for people wishing to perform DNA barcoding, but without the capacity to develop or purchase their own LIMS. Moreover, it is adaptable to any similar Sanger-based molecular workflows. In this chapter, we demonstrate an example workflow of a specimen’s entry into the LIMS database to the publishing of the specimen’s genetic data to Genbank using Geneious’s bioinformatics software and the Biocode PlugIn.
13
Laboratory Information Management Systems for DNA Barcoding
271
2. Materials 2.1. Downloading and Installing Geneious Software
You need to download Geneious for Windows, Mac, or Linux, available from the Biomatters Web site http://www.geneious. com/. To run the Biocode software, you need Geneious version 5.1 or higher. Three files, in addition to Geneious, need to be downloaded in order to use the Biocode plugin: the Biocode plugin itself, the Biocode Genbank Submission plugin (which allows you to submit completed contigs to Genbank), and the MySQL Connector (which allows you to connect to FIMS and LIMS databases). All of these downloads can be retrieved from http:// software.mooreabiocode.org/. To install the plugins, open Geneious and drag the plugin onto the Geneious interface. You will see three new icons—two in the toolbar and one in the service tree (Fig. 1).
2.2. Setting Up Databases
Once the Biocode plugin has been installed, the next step is to set up the two required databases, field information management system (FIMS) and LIMS. The FIMS database stores information relating to “field” work (e.g., specimen records, collecting events, and taxonomy). The LIMS stores data relating to lab work (e.g., workflows). The Biocode plugin includes a LIMS that links to data from a FIMS database of your choosing. Since all molecular processes start with a tissue sample, we assume that this is the entry point to LIMS and the tissue’s unique ID can be used to track the appropriate specimen metadata from the FIMS. Connect to the Biocode plugin by right clicking on the Biocode icon in the service tree. The Biocode login screen will open (Fig. 2). The login screen has two sections: one for FIMS and one for LIMS. At the bottom of the screen, you need to specify a MySQL Driver (the MySQL connector file that you downloaded with the Biocode plugin file). Click the Browse button and locate the file on your disk. Your FIMS database can be set up in one of the following four ways: an EXCEL file (see Note 1), a server such as TAPIR (see Note 2), a Google Fusion Table (see Note 3), or a MySQL (see Note 4). ●
The EXCEL file is most useful for single users or people who cannot set up a server.
Fig. 1. Biocode plugin icons along the main toolbar (left ) and Biocode plugin/login icon in service tree (right ).
272
M. Parker et al.
Fig. 2. The Biocode plug-in login screen. ●
TAPIR, MySQL, and Google Fusion Tables are most useful for groups who want to store their lab information in a single FIMS or for collaborative projects, where the data are stored on a central server.
You also have the choice to set up your LIMS database in one of the following two ways: using a Remote LIMS (see Note 5) or a Local LIMS (see Note 6).
3. Laboratory Information Management System Workflows
●
Using the Remote LIMS, the plugin stores your lab workflows on a remote MySQL server. This is the best option for labs or collaborative projects.
●
Using the Local LIMS, the plugin stores your lab workflows on your local machine. This is intended for single users.
Workflows (see Note 7) are an integral part of the LIMS system (Fig. 3). Workflows represent the laboratory path that a tissue extraction takes for a particular locus. We define locus as an expected amplicon region that may contain multiple genes. A workflow is limited to a single extraction, but can have any number of other reactions. Workflows can be queried for tracking progress, troubleshooting, and reports. Information from workflows is also taken by the assembler module to aid in the assembly and analysis of your sequences, so it is important that they are set up correctly. Figure 3 illustrates an example workflow. We begin with an example set of 96 reactions and demonstrate how these samples move through the LIMS.
13
Laboratory Information Management Systems for DNA Barcoding
273
Fig. 3. Laboratory information management systems workflow (see also Note 7). Yellow connectors represent potential paths in a workflow.
Fig. 4. Geneious main toolbar with Biocode plugin.
3.1. The LabBench Module: Generating Plates and Associated Workflows 3.1.1. Generating the Extraction Plate
Following the LIMS workflow, the first plate generated is the extraction plate. Click the “New Reaction” button on the Geneious main toolbar (Fig. 4). In the drop-down menu of the “New Reaction” window (Fig. 5), you can choose the type of reaction (extraction, PCR, or cycle sequencing) and plate size (one of the three predefined sizes, or a number of individual reactions). A new window will open displaying a blank extraction plate (or the specific number of samples you have chosen) (Fig. 6). Your FIMS data must be uploaded to the blank extraction plate so that each specimen’s field data are correctly associated with that same specimen’s laboratory workflow. This can be done using the “Bulk Edit” button in the toolbar of the new extraction plate (Fig. 6). When using the bulk-editor to generate an extraction plate, a number of tools are available in the “Tools” menu within the “Bulk Edit” window (Fig. 7). ●
“Get Tissue ID’s from archive plate” allows you to fill your plate with extraction IDs from the FIMS database (either from extraction or tissue archive plates). If you are using an EXCEL,
274
M. Parker et al.
Fig. 5. New reaction window.
Fig. 6. Blank Plate document.
MySQL, or Google Fusion Table FIMS database, you need to have plate and well columns in your spreadsheet, and to set those in the login screen (see Notes 1–4). ●
“Import Extraction Barcodes” allows you to import barcode values directly from the output file of the scanner if you are using 2D well barcodes.
●
“Fetch extractions from barcodes” is used during “cherry picking” to populate newly reconstituted plates from prior plate locations.
●
“Generate Extraction IDs” automatically generates appropriate extraction IDs based on the tissue sample IDs.
The well locations are displayed on the left-hand side of each column to make placement easier (Fig. 7). You do not have to enter anything in the workflow ID column. New workflows will be generated when you save the plate. “Swap Direction” allows the user to choose between reading across and down versus down and across along the plate.
13
Laboratory Information Management Systems for DNA Barcoding
275
Fig. 7. Bulk-editor window displaying “Tools” drop-down window.
Once the field data have been added to the extraction plate, you can then edit each well. The “Edit All Wells” (also a button in the toolbar of the new plate) is a customizable viewer and editor for your plate documents. It is shown both when creating new plates and when viewing existing plates in the database (Fig. 6). You can select wells in the plate by dragging the mouse across the plate, or select a single well by clicking it. You can hold down the shift and ctrl (command on mac) keys to help you select multiple individual wells. When wells have been selected, click “Edit Selected Wells” to customize those wells. The edit dialog (Fig. 8) has a column of checkboxes on its left-hand side. Values in the checked fields are applied to all selected reactions, and unchecked fields are left as they are. Most values can simply be entered into a dialog box, with the exception of PCR cocktails, cycle sequencing cocktails, thermocycler profiles, and primers which are set elsewhere. The plate editor can display any number of fields from your FIMS or LIMS database. Click the Display Options button in the toolbar to open the display dialog. The split pane at the center of the dialog allows you to choose the fields to be displayed on your wells (Fig. 9). The available fields are shown in the left-hand pane.
276
M. Parker et al.
Fig. 8. Edit dialog window within extraction plate.
Fig. 9. Display option panel for viewing plates.
13
Laboratory Information Management Systems for DNA Barcoding
277
Select the fields you want to display, and click the right arrow to move them to the right-hand pane. Select fields in the right-hand pane, click the left arrow to return them to the right-hand panel, and stop them from being displayed in the wells. Once you have chosen your fields to display, you can choose their order using the up/down arrows on the right-hand side of the dialog. The fields will appear in the wells in the order you choose here. The bottom part of the dialog controls the well coloring. Wells are colored by one field, with each different value for that field being given a different color. Choose the field you are interested in, and all possible values will be displayed below it. You can change the color of any value by clicking it. You can save any settings you make here as a template by clicking the “Select a template” button at the top of the dialog, and clicking “Create template.” Click the “Save as Default” button to make that template the default (separate defaults are stored for extraction, PCR, and sequencing plates). Display preferences are not saved with the plate; so in order to preserve the view for your plate, you should save your settings as a template and/or set them to the default for the reaction type. 3.1.2. Generating a PCR Plate
From this point forward, generating new plates, whether PCR or cycle sequencing, is very similar to the way the extraction plate is generated. Geneious can use an existing plate as a guide to create a new plate so that new reactions are appended onto corresponding workflows. To do this, highlight the existing plate (in the plate viewer window, Fig. 10) and click “New Reaction” in the Geneious main toolbar. Select the “Create plate from existing document” checkbox (Fig. 11). If the reaction types are the same (e.g., creating an extraction plate as a working aliquot from an archival extraction plate), then all reaction parameters will be copied to the new plate. If the reaction type is different (e.g., an extraction to a PCR plate), then only the extraction IDs will be copied across. It is possible to create a new 384-well plate from a group of 96-well plates, and to create a group of 96-well plates from a 384-well plate (see Note 8). When the new PCR plate has been generated (either from an existing document or a new document), it can then be edited using the same commands used to edit the extraction plate. You can select the target locus for each well individually or for all wells at once in the “Edit Wells” dialog box. Select the wells to be edited and click “Edit Selected Wells” in the toolbar. When the “Edit Well” window from the new PCR plate opens (Fig. 12), the edit dialog boxes, again, have a column of checkboxes on the left-hand side. Values in the checked fields are applied to all selected reactions, and unchecked fields are left as they are. You may mark a PCR as not run, run, passed, or failed. Additionally, options to customize your
278
M. Parker et al.
Fig. 10. Plate viewer window.
Fig. 11. Generate plate from the existing document checkbox.
PCR cocktails (see Note 9) and thermocycler profiles (see Note 10) are available. You can create your own local primer database or select primers from a global database (see Note 11). 3.1.3. Uploading Gel Images
You can attach GEL images to all types of plates. Click the “Add/ Remove GEL Image” button in the plate editor toolbar (Fig. 6), click “Add,” and then browse to the image file on your hard disk. You can also add notes to each GEL image. Geneious accepts images in JPEG, GIF, PNG, and TIFF formats. You can split a GEL image into wells using a draggable grid by clicking the Split GEL button (located above the GEL image in the gel viewer window) (Fig. 13).
13
Laboratory Information Management Systems for DNA Barcoding
279
Fig. 12. PCR “Edit Wells” window.
Fig. 13. Splice tool for scoring PCRs via gel image.
Choose the number of rows and columns, the start row and column, and the direction of the wells in the grid, and then drag your mouse on the image to generate the grid. If you misplace the grid, you can start again by dragging the mouse. Click “ok” when done. Automated calling of pass/fail is possible, but should be verified through various display options of the PCR plate (Fig. 13). 3.1.4. Generating Cycle Sequencing Plates
To append new cycle sequencing plates onto existing workflows, highlight the existing PCR plate (in the plate viewer window, Fig. 10) and click “New Reaction,” followed by checking the box labeled “Create plate from existing document” (Fig. 11). A plate very similar to the PCR plate will be generated and you will go
280
M. Parker et al.
Fig. 14. Cycle sequencing “Edit Wells” window.
about editing the plate in the same manner you edited both extraction and PCR plates; all associated metadata will carry through. The “Edit Wells” (Fig. 14) window should now begin to look familiar, and here, again, you can customize the dialog boxes as well as add new cycle sequencing cocktails, cycle sequencing thermocycler profiles, and primers (see Notes 9–11). 3.1.5. Adding Traces
Once the plate or samples have been sequenced, you can add the traces to your cycle sequencing plates or reactions. To add traces to individual reactions in the plate view, open the reaction well in which you would like to add a trace file to (double click on the specific well), and click “Add/Edit Traces” in the “Edit Wells” dialog (Fig. 14). You can then click the “Add Sequences” button in the upper left corner to add one or more sequences to the well. To remove one or more sequences from the well, select the sequences you want to remove and click “Remove Sequence(s).” You can also import the sequences into Geneious as documents by clicking Import Sequence(s) into Geneious. You also have the option to bulk upload traces onto the cycle sequencing reactions or plate. To bulk upload traces to the cycle sequencing plate or reactions, open the appropriate cycle sequencing plate. Click “Bulk Add Traces” on the plate’s toolbar and click
13
Laboratory Information Management Systems for DNA Barcoding
281
Fig. 15. Bulk add traces from cycle sequencing plate toolbar.
“Browse” to locate the trace files on your hard drive. Traces from each cycle sequencing plate should be in a separate folder. The traces are matched to the appropriate well by field or well number, which are found in the filename of the trace file. You need to tell Geneious where in the filename to look for either “Well number” or “Field.” For example, if you want Geneious to link the sample’s trace file to that sample’s well position on the cycle sequencing plate and you had named that trace file “3726294_A01_capture. ab1, you would select “Match 2nd” (from drop-down menu) part of name and “separated by Underscore” (from drop-down menu) and click “OK” (Fig. 15). Geneious would then attach the sample’s trace file to its appropriate reaction well. After all traces have been attached and saved, they are ready to be downloaded into the Geneious Assembler. When the download is complete, all FIMS and LIMS metadata will be attached to the correct sample, and each sample will be ready for editing and assembly. 3.2. “The Assembler” Module: Downloading, Assembling, and Exporting Edited Data 3.2.1. Importing Traces
The Assembler workflow is graphically represented in Fig. 16. Once trace files have been downloaded into Geneious’s assembler module, they can then be edited, saved, exported, and ultimately published. To import the trace files into Geneious for assembly, first generate a new folder for the traces in your “Local” database (located in the service tree). To generate a new folder, highlight “Local” in the service tree and the select File > New Folder under the main toolbar. The easiest way to import your traces into Geneious is to download them directly from the LIMS database (if you have previously attached your trace files to a sequencing plate). If downloading from the LabBench, traces will have all associated metadata. If you import traces from disk, you will have to manually set parameters such as read direction (see Note 12) and choose “Annotate from FIMS/LIMS Data” to link in other metadata (see Note 13). To download your traces from the LIMS, select the sequencing plates you want to download in the Biocode plugin search, and
282
M. Parker et al.
Fig. 16. Workflow in the assembler module.
13
Laboratory Information Management Systems for DNA Barcoding
283
Fig. 17. An example of downloading two sequencing plates (one for each direction).
then select Biocode > Download Traces from LIMS (Fig. 17). Typically, you would choose one forward- and one reversesequencing plate for a set of sequences. You may select a single plate if both your forward and reverse reads are contained on it. Click “OK “and Geneious will ask you to choose a folder, and begin downloading your sequences. Once complete, you will have all of the traces from the plates you entered and they will already have their read directions set and be annotated with the necessary data from the FIMS. If you know the names of your sequencing plates, you can download them directly without having to perform a search. Select the folder you want to download to in Geneious, then select Biocode > Download Traces from LIMS, and enter your plate names manually. 3.2.2. Batch Rename
If you want to change the names of your reads to reflect aspects of the FIMS data, from the main toolbar select Edit > Batch Rename to copy your choice of fields into the name column. This feature is also available in renaming assemblies.
3.2.3. Trimming
Geneious treats trimming as an annotation class so that information is not lost once a sequence is trimmed. The underlying raw data are maintained throughout downstream analyses for possible adjustment later in the pipeline. Assembly and other analyses automatically take the trims into account, and exclude these regions in all calculations. To trim sequences, highlight all of the sequences you are going to assemble, and from the main toolbar, select Sequence > Trim Ends. You can also add trim ends to the main toolbar by right clicking on the toolbar and turning on Trim Ends (Fig. 18). For most applications, the default of Error Probability Limit 0.05 is a good start. This option works by trimming the sequence to find the longest possible untrimmed region, which has an overall error probability <0.05. Decrease the limit to trim more aggressively. Other options include screening for vectors, which
284
M. Parker et al.
Fig. 18. The Geneious trimming dialog.
uses a clone of NCBI’s VecScreen tool, screening for primers, and basic limiting options, such as minimum amount to trim. When you click “OK” to run trimming, it will add annotations to each of the sequences, which correspond to the trimmed regions. You can flick through all of the trimming results in the Sequence View before saving the changes. If there are reads which are obviously not of good enough quality for assembly, then you can mark these as failed in the LIMS, but it is easier to let the assembly report pick these out for you (described in Subheading 3.2.5). Before using the primer screening feature, you need to establish a database of primers in Geneious (see Note 11). This is done by creating an oligo-type sequence that can be stored anywhere in your local folders (or you can generate a specific “Primer” folder if you want to store them all in one place). Oligo sequences are generated in one of the following ways. ●
Extract a primer/probe annotation from a sequence.
●
Select “Sequence” → “New Sequence” from the menu and choose Primer or Probe as the type of the new sequence.
●
Select one or more existing primer sequences (these may be imported from a file, e.g., a fasta file) and then click “Primers” → “Primer Characteristics” to transform them into oligo-type sequences.
Once you have generated oligo sequences for all of your primers, the “Screen for Primers” option will search for matches with
13
Laboratory Information Management Systems for DNA Barcoding
285
any of them. Geneious uses a custom Smith–Waterman search to locate primer matches in sequences. You can retrim sequences using different parameters at any stage. To do this, just select the sequences for retrimming and follow the steps above for trimming. The only difference is you should select the “Annotate new trimmed regions” option to have the new trims replace the old one. When a sequence is retrimmed, it currently stores the history of trims that were used in the Notes tab for each sequence. Sequences can be manually trimmed as well by selecting a region at the end of a sequence in the Sequence View and clicking “Annotate” and choosing “Trimmed” for the annotation type. If a sequence has more than one trimmed annotation at its end, then the largest trimmed annotation will be used. You can also manually edit any trimmed region by clicking and dragging either end of the trim annotation in the sequence view. 3.2.4. Binning
“Binning” is used to group traces and assemblies into three categories (high, medium, and low) based on various measures of quality (see Binning Parameters below). The purpose of binning is to speed up processing by summarizing the properties of sequencing results. The Bin column is hidden by default. To view, click on the small icon in the top-right corner of the table and then check the “Bin” item (this can also be done by right clicking on the table header). Documents can be sorted according to Bin by clicking on the Bin table header. The binning parameters define thresholds for each bin (Fig. 19). They cover metrics, such as the percentages of high- and lowquality bases in your sequences, sequence length, number of ambiguities, and, in the case of assemblies, coverage (see Note 14). For information on any of the binning parameters, hold the mouse over the option to get a description. There are three levels at which binning parameters can be set: globally (for all local and server documents), per-folder (all documents inside a particular folder or any subfolder), and per-document. To set the global binning parameters, from the main toolbar, select Tools > Preferences and go to the Sequencing tab. To set per-folder or per-document parameters, select the folder or documents you want to change and, again, from the main toolbar select Sequence > Set Binning Parameters. The most specific parameters are used in favor of less specific ones. Per-document parameters are used over any per-folder or global parameters that are set. To help in the detection of frameshifts, you can set the number of stop codons as an optional binning parameter. The number of stop codons is calculated for the specified genetic code, and is defined as the minimum count of stop codons in the consensus sequence for all frames (i.e., we check frames 1, 2, 3 in the forward and reverse direction, count the number of stop codons in each, and then take the frame with the minimum number of stops).
286
M. Parker et al.
Fig. 19. The binning parameters dialog box.
As an example, when looking at assembly results, the bins could be used in the following way (if the parameters as set up as such). ●
High = There is probably no need to look at these assemblies.
●
Medium = These assemblies may need to be edited.
●
Low = Fail: These assemblies are likely beyond rescue and should be marked as failed.
To set up the parameters in this way, you need to have strict parameters for the high bin, for example 0 ambiguities, 658 consensus length, and 1.8 coverage (for the COI barcode). The medium bin can be quite relaxed depending on how many assemblies you want to examine. Users can create binning profiles and share these with collaborators or use them for standard operating procedures to ensure repeatability. 3.2.5. Assembly
Select all of the reads you are going to assemble (and a reference sequence or list if you have one) and then click the “Assembly”
13
Laboratory Information Management Systems for DNA Barcoding
287
Fig. 20. The assembly dialog box.
button in the main toolbar. To assemble pairs of reads by name, check “Assemble by name” and choose the appropriate delimiters (Fig. 20). The recommended Sensitivity is “Highest Sensitivity/ Slow.” It is also possible to choose “Custom Sensitivity,” and choose your own parameters (e.g., minimum overlap). If you have already trimmed your sequences, select “Use existing trim regions”; otherwise, specify trim options. “Save assembly report” and “Save results in a new subfolder” should both be selected. After clicking “OK,” a new subfolder called “Assemblies” will be generated and assemblies will be added to it as the operation runs. When the operation is finished, an Assembly Report and list of Consensus Sequences will also be added to the folder. Geneious generates a new subfolder each time an assembly is run. The assembly report (Fig. 21) provides a record of which reads were assembled successfully and which reads failed. For example, click the blue hyperlink next to the red “X” to select all reads which failed to assemble and use the “Mark as Failed in LIMS” tool to mark these reads for resequencing. 3.2.6. Viewing And Editing Assemblies
As with the traces, assemblies are each assigned a bin based on various quality criteria. By default, Geneious uses a Highest Quality consensus which rarely generates ambiguities because the highest quality base call is used automatically. However, ambiguities are generated in situations, where the qualities of conflicting bases are similar.
288
M. Parker et al.
Fig. 21. The assembly report (showing assembled and unassembled reads).
The procedure for checking disagreements in an assembly is as follows. ●
Turn on Allow Editing in the Viewer toolbar.
●
Select an assembly to display an overview. Disagreements are shown as small black marks on the sequences and the trimmed regions can be seen.
●
Highlight disagreements or ambiguities in the Display tab of the control panel to the right of the viewer. Ctrl+D on Windows and Linux or Command+D on Mac OS jumps between highlighted bases. If this is the first assembly you have looked at, you should zoom in to a level you are comfortable with. Geneious remembers this zoom level for the next assembly (Fig. 22).
●
If you agree with a call or an ambiguity in the consensus, then you can go to the next disagreement because the call has already been made.
●
If you disagree, you can resolve the conflict by editing either of the traces or by editing the consensus (editing the consensus is a shortcut for changing all base calls at that position).
●
Continue editing through the disagreements until you have looked at all of them. Save the assembly and repeat for the next assembly.
13
Laboratory Information Management Systems for DNA Barcoding
289
Fig. 22. A sample assembly view in Geneious.
If you decide that some assemblies are not good enough despite having assembled correctly, then you can mark these as failed at this point. Select the assemblies that have failed and go to “Mark as Failed in LIMS.” It is a good idea to move the failed assembly to a new subfolder (e.g., named “fail”). 3.2.7. Alignment of Consensus Sequences
An alignment of consensus sequences is a useful tool for checking and correcting assembly accuracy, especially near the ends of traces, where there might be poor coverage. To generate an alignment of consensus sequences: ●
Select all of the assemblies you want to align and click the “Alignment” button in the toolbar.
●
In the Alignment dialog box (Fig. 23), click “Consensus Align,” select “Generate alignment of consensus sequences only,” and choose an alignment algorithm (e.g., MUSCLE, MAFFT, and ClustalW).
●
Click “OK” and an alignment is generated as a new document (Fig. 24).
The alignment retains all information from the original assemblies. Clicking the small blue arrow button to the left of each name brings you to the associated assembly (Fig. 24). Geneious currently does not propagate changes in the alignment back to the original assembly, but you can use the alignment for downstream steps so that alignment edits are not lost.
290
M. Parker et al.
Fig. 23. Alignment dialog box.
Fig. 24. Nucleotide alignment consensus dialog box.
To view the alignment translation, follow these steps. ●
In the options to the right of the alignment view, change the Colors option to “By Translation.”
●
Turn off the Highlighting option.
●
Open the Complement and Translation section and set up the appropriate translation options, such as Genetic Code and Frame. We recommend that “Translation” is set “Relative to Consensus.” You can also set the amino acid Color scheme here (e.g., MacClade).
13
Laboratory Information Management Systems for DNA Barcoding
291
●
You should also turn off Annotations so that editing history annotations do not interfere with the layout.
●
Check the alignment for frameshifts and stop codons (binning should have identified these previously).
Clicking on Help in the toolbar while viewing an alignment displays documentation on editing and shortcut keys. 3.3. Verify Taxonomy and Loci
To help verify taxonomy annotated in the FIMS and identify contaminants (high-quality sequences, but wrong targets), Geneious can run a specialized batch BLAST against the NCBI public DNA sequence database. This can be run on any selection of contigs and alignments of contigs. If you have performed an alignment as above, then you should use the alignment to make sure that you are using the edited consensus sequence. ●
Select an assembly, a list of assemblies, or an alignment.
●
From the main toolbar, select Biocode > Verify Taxonomy (Fig. 25). This brings up the standard BLAST options. It is required that “Fully annotate hit summaries” is turned on but the rest of the options can be modified as necessary. Click “OK” to begin the search. This can take quite a long time to run due to BLAST.
Fig. 25. Verify taxonomy dialog box.
292
M. Parker et al.
Fig. 26. A sample verify taxonomy table; see “Note columns” for explanation of headers.
●
When the process is complete, a “Verify Taxonomy Results” document will be produced (Fig. 26). This displays a table, which has a row for each of the queries comparing them with each of their top hits returned from BLAST. As with traces and assemblies, customizable binning options are available for efficient reporting on the results (see Note 15).
●
Rows can be selected in the table by clicking/dragging and holding shift/ctrl/command while clicking. Click on “Go To Queries” to jump to the assemblies associated with the selected rows. Click “Show Other Hits” to see additional hits that were downloaded for the selected row. “Show Other Hits” is only enabled when one row is selected. Double clicking on a row also shows other hits.
The Verify Taxonomy Results may reveal that some sequences do not match the expected taxonomy. If you decide that the sequencing was a failure (possibly due to contamination), you can go back to the assemblies and “Mark as Fail in LIMS” and list the reason in the notes section. Also, as mentioned above, it is always a good idea to move any failed assemblies to a new subfolder (e.g., named “fail” or “contaminants”). 3.4. Mark Sequences as Pass in LIMS
Once you have verified taxonomy, assured that all sequence quality parameters are acceptable, and trimmed the primers, select either the assembles themselves from the Assembly folder or the aligned consensus sequences from the alignments and select “Mark As Pass
13
Laboratory Information Management Systems for DNA Barcoding
293
in LIMS” under the Biocode button. This action writes the following data to the LIMS database: ●
The extraction ID.
●
The consensus sequence (with sequence quality values).
●
The parameters used to trim and assemble the reads.
●
The average coverage of the assembly.
●
The number of disagreements in the assembly.
●
A record of any edits made to the sequences in the assembly.
●
The assembly bin.
By marking a sequence as Pass, this operation saves the consensus sequence of your assembly to the LIMS. This is the sequence that you submit to public sequence databases. You should make sure that the sequence is of sufficient quality and that you have completed all edits before you Mark as Passed. 3.5. Searching the Database
Biocode searches return four types of documents as follows. ●
Tissue sample documents—each of these represents a tissue sample in the field database. Tissue documents contain collection information, and optionally taxonomy and photographs.
●
Plate documents—these represent a plate in the lab database, and contain a diagram of the wells, as well as the plate’s thermocycle and attached GEL images if available.
●
Workflow documents—these contain a linked set of reactions performed on an extraction.
●
Sequence documents—sequences entered into the LIMS when traces/assemblies are marked as pass/fail.
You can perform either a basic search or an advanced search. Basic searches are performed by entering text into the search box, and return all documents that have a field with a similar value to the text entered (Fig. 27). You can restrict searches to particular types of documents by unchecking some of the checkboxes to the right of the search box. Advanced searches explicitly search against particular fields. They are performed by clicking the More Options button. Click the + and − buttons to add and remove fields from the search.
Fig. 27. Search box.
294
M. Parker et al.
Fig. 28. The cherry picking window.
Choose the fields you want to search using the leftmost dropdown, and choose the search condition using the drop-down box to its right (see Note 16). 3.6. Cherry Picking
The Cherry Picking function (Fig. 28) is available in the Biocode drop-down menu and allows you to select reactions from one or more plates, based on the criteria that you specify (e.g., failed reactions for second attempts or extractions based on taxonomy for additional genes). You can use these selected reactions to create a new plate (or plates) or have them returned to you as a list. To perform Cherry Picking, select the plates containing the reactions you want to pick and click on Cherry Picking in the Biocode toolbar menu. Choose your destination, and then choose the criteria to select your reactions. You can add additional criteria using the orange “+” button on the right.
3.7. Preparing Your Documents for Genbank Submission
Genbank has stringent requirements for submitted sequences. It is important that you correctly prepare your sequences before you begin the submission process. This section outlines the fields you need for your submission, and how to attach them to your Geneious documents. All fields that are a part of your Genbank submission need to be either entered in the submission dialog or annotated on your sequences. In order to receive the BARCODE keyword in Genbank, the following fields need to be annotated on your sequences: ●
Specimen Voucher/ID.
●
Sequence ID.
●
Target locus.
●
Collector.
●
Collection date.
●
Identified By.
●
Organism.
For non-barcode submissions, requirements vary depending on what type of sequences you are submitting (e.g., AFLPs, SNPs.
13
Laboratory Information Management Systems for DNA Barcoding
295
Fig. 29. The document notes tab, from a group of selected assembly documents (showing primer information attached).
nDNA, etc.). We recommend that you check with the NCBI Trace Archive: http://eutils.ncbi.nih.gov/Traces/trace.fcgi?cmd= show&if=rfc. 3.7.1. Attaching Data to Your Sequences
The easiest way to attach data to your sequences is to have included it in your FIMS database. When you download the traces from LIMS (or annotate the sequences with FIMS/LIMS data; see Note 13), the information will be automatically attached to your sequences. If you are using the submission tool without a FIMS or LIMS or you have extra information you want to attach, then you can use Document Notes. Document Notes appear as the rightmost viewer for any selected document(s), and enable you to store arbitrary information on your sequences (Fig. 29). When you click on the notes tab, you will see a list of the notes currently added to your documents, displayed in name/value pairs. To add a note, click the “Add Custom Note” button in the toolbar to see a list of predefined note types. “The Genbank Submission” note type contains the fields most commonly used by submitters. Any notes (and note fields) added to your documents will be able to be attached to your Genbank submissions. If you do not see a note type that meets your needs, you can generate your own by clicking “Edit note types.” Click the “Generate Note Type” button in the note types dialog, and click the orange + buttons to add some note fields. We recommend choosing “Text” as the field type for Genbank fields. Once you have generated your note type, add it to your selected documents by selecting it from the “Add Custom Note” drop-down menu.
296
M. Parker et al.
Any values you enter in the viewer are applied to all selected documents when you click save. 3.8. Genbank Submission
The Genbank Submission plugin allows you to submit your contigs to Genbank once you have completed all edits.
3.8.1. Preparing Your Submission
If you did not install the Genbank submission plugin when you set up the Biocode plugin, do so now (http://software.mooreabiocode. org/index.php?title=Download). Once the plugin is installed, select the contigs and/or sequences that you want to submit to Genbank, and click Tools > Submit to Genbank. Fill in the options and fields (Fig. 30) according to the following guidelines. ●
The submission name is a free-form field and does not affect the results of your submission, so can be filled in as desired.
●
Click Edit Publisher Details to edit your author/publication information. This information is preserved between submissions; so for many cases, it does not need to be changed between submissions.
●
The next set of options matches fields annotated to your documents. You may choose a field from the drop-down to map a field on your documents to a Genbank submission field. All fields displayed in the main dialog are required. If you want to add optional fields, click the “Additional Source Fields” button. You can choose the fields you want from the drop-down menu, and click the + button to add more fields.
●
If you have selected documents with traces and want to include them in your submission (required for BARCODE keyword), click the “Include Traces” checkbox. The required fields are variable, so the options you see will change depending on what values are selected. You can use the additional fields button to add optional fields to your submission.
●
Check the “Include Primers” checkbox to include primers. If you are submitting sequences annotated from the LIMS, then primers will have been annotated on your sequences as document notes. If not, you can annotate the primers yourself by clicking the notes tab when viewing the selected documents and choosing “Sequencing Primer” in the “Add Custom Note” drop-down.
●
If you have selected assemblies, you can choose the options used for building the consensus sequence (the passed consensus is what is submitted to Genbank).
●
If you have chosen BARCODE as your experimental strategy, then you are able to enter the target locus (gene) of your sequences. This will be included in your submission as gene annotations spanning the entire length of your sequences.
13
Laboratory Information Management Systems for DNA Barcoding
297
Fig. 30. The Genbank submission options. Some of these options may not be displayed, depending on your selected documents.
3.8.2. Validation and Submission
You may either generate a submission file or upload the submission directly to NCBI (you need a BankIT FTP account to do this: see Note 17). You can make your choice at the top of the submission options dialog. If you are updating an existing submission, you can choose the update option and enter the BankIt ID in the field provided. Otherwise, choose “Upload New Submission” and a new submission ID is generated for you.
298
M. Parker et al.
Fig. 31. The Genbank validation dialog.
Your submission is validated using tbl2asn, and you will be shown any problems before the submission is commenced. The validation result window has two tabs. (1) The Validation errors/ warnings tab shows you a list of errors that may prevent your submission from being accepted. (2) The Discrepancies tab shows you potential errors that you may have made, based on common errors made in Genbank submissions. It is recommended that you thoroughly review the information in both tabs before proceeding. If you have chosen to automatically upload your submission, further validation will take place on the server once the upload is complete. Geneious informs you whether your submission has been Accepted, Accepted with Warnings, or Rejected (Fig. 31). You should also receive an e-mail from Genbank detailing your submission. Once your submitted sequence(s) have been assigned accession numbers, you should receive a further e-mail from Genbank with the details. 3.9. Getting Help
The Biocode plugins are complex, and while learning how to use them can seem like a daunting task, help is available. Your first port of
13
Laboratory Information Management Systems for DNA Barcoding
299
call should be the Geneious introduction video (http://www.biomatters.com/assets/demonstrations/biocode.html), which walks you through the plugins, and an instruction manual is available at http:// software.mooreabiocode.org. A user community and technical support are available from http://connect.barcodeoflife.org/group/lims. Here, you can engage with the wider community, get help from experienced users, and make suggestions about how to improve the plugins. If you have any questions or suggestions that you do not want to post to the community, you can e-mail at
[email protected].
4. Notes 1. EXCEL FIMS For single users, or users who cannot set up a field information management systems (FIMS) server, the EXCEL FIMS is the easiest way to connect to a specimen/tissue database. Geneious will read data from an excel workbook, and convert the rows into specimen/tissue records. It is assumed that all molecular processes start with tissue and this is the entry point into the LIMS. Your excel workbook must conform to the following: Your workbook should have only one sheet. If you have more than one sheet, only the first sheet will be read. Each column corresponds to a data field in your database. You can have as many columns as you like, but you must have at least a specimen ID column and a tissue sample ID column. The first row of the table should be the names of the columns. The other rows should contain the data. Right click on the biocode icon in the Geneious service panel (on the left-hand side of the main window), and click login. Choose “EXCEL FIMS” in the uppermost drop-down select box, and enter the location of the EXCEL file. Choose the columns that contain the specimen and tissue ID’s from the drop-downs (if you use only one ID for specimens and tissues, enter the same column in both drop-downs). Also enter the taxonomy fields in your excel file, in order of highest to lowest, using the + and − buttons to add and remove fields. You can also specify plate name and well columns in your sheet if you keep your tissues in plates. Just check the “The FIMS database contains plate information” checkbox, and enter your plate and well fields. You will then be able to make a direct copy of the tissue plate when making new extraction plates.
300
M. Parker et al.
2. TAPIR FIMS TAPIR, or TDWG Access Protocol for Information Retrieval, is a standard protocol for sharing specimen data. The TAPIR FIMS connection reads in tissue data from a TAPIR server. Reliably integrating museum collection management systems (CMS) or FIMS to a LIMS can be difficult. Often, data needs to be exported and then reimported into each system. The TAPIR protocol is an attempt to standardize the way in which collection databases communicate, and to remove the difficulties associated with collaborative collection management. Setting up a TAPIR provider with LIMS extension These instructions assume the use of the TapirLink software (written in PHP), a collections management database with tissue records, stored in a TapriLink-compatible relational database, and the free version of Geneious (biomatters software), with the Biocode plug-in installed. Step 1: Set up TAPIR. There are several TAPIR software installation tools (http:// wiki.tdwg.org/twiki/bin/view/TAPIR/TapirSoftware). This tutorial assumes that you are using the TapirLink software available (http://wiki.tdwg.org/twiki/bin/view/TAPIR/ TapirLink). TapirLink requires a Web server running PHP, with an appropriate module to connect to your FIMS database. We recommend Apache, although this is not required. Download the TapirLink installation archive, and follow the installation instructions included. Step 2: Incorporate the LIMS extension into the TAPIR provider. While setting up your TAPIR provider (or afterwards), be sure to include the LIMS extension as an additional schema (available at http://biocode.berkeley.edu/schema/lims_extension. xml). You will be asked to map your local database fields to the LIMS extension fields. Step 3: Point the Geneious LIMS system to your TAPIR provider link. Right click on the biocode icon in the service tree on the left-hand side of the main Geneious window, and click “login.” Choose “TAPIR” from the field database drop-down at the top of the connection dialog, enter the address of your TAPIR server. Choose an LIMS database (see Remote LIMS; Note 3 and Local LIMS; Note 4), and click OK. 3. Google Fusion Tables FIMS Google Fusion tables is ideal for groups that want to have a collaborative shared FIMS database, but do not want to set up their own server. You may enter data directly into Fusion
13
Laboratory Information Management Systems for DNA Barcoding
301
Tables, or upload excel spreadsheets (see http://www.google. com/fusiontables/public/tour/index.html). As with the Excel FIMS, it is assumed that all molecular processes start with tissue and this is the entry point into the LIMS. Right click on the biocode icon in the Geneious service panel (on the left-hand side of the main window), and click login. Choose Google Fusion Tables in the uppermost dropdown. When viewing a fusion table on the Web, the url will contain the phrase dsrcid=XXXX (with XXXX being a number). This number is your table’s ID. Enter the ID in the space provided, and click Update. Choose the columns that contain the specimen and tissue ID’s from the drop-downs (if you use only one ID for specimens and tissues, enter the same column in both drop-downs). Also enter the taxonomy fields, in order of highest to lowest. You may press the autodetect button to do this for you, or use the + and − buttons to add and remove fields. You can also specify plate name and well columns in your table if you keep your tissues in plates. Just check the “The FIMS database contains plate information” checkbox, and enter your plate and well fields. You will then be able to make a direct copy of the tissue plate when making new extraction plates. 4. MySQL FIMS If you are already using a MySQL database for your FIMS but do not want to set up a TAPIR server (see Note 2), then you can connect directly to the MySQL database. Right click on the biocode icon in the Geneious service panel (on the left-hand side of the main window), and click login. Choose MySQL Database in the uppermost drop-down. Enter your server URL, port, username, password, and database in the fields provided, and click Update. Choose the columns that contain the specimen and tissue ID’s from the drop-downs (if you use only one ID for specimens and tissues, enter the same column in both drop-downs). Also enter the taxonomy fields, in order of highest to lowest. You may press the autodetect button to do this for you, or use the + and − buttons to add and remove fields. You can also specify plate name and well columns in your table if you keep your tissues in plates. Just check the “The FIMS database contains plate information” checkbox, and enter your plate and well fields. You will then be able to make a direct copy of the tissue plate when making new extraction plates. 5. Remote mySQL LIMS The remote LIMS is intended for labs and research groups, where the lab data needs to be shared between and edited by a large number of users. To set up a remote LIMS server, you will need MySQL, available from http://www.mysql.com. Download and install the MySQL server software on a server
302
M. Parker et al.
which is accessible by everyone who needs to use the LIMS. The next step is to create a blank schema. You can store multiple LIMS database on one server, but each database must have its own schema. Create a schema, and then run the following script to create a blank LIMS database: http://www. biomatters.com/assets/plugins/biocode/labbench_latest_ mysql.sql. You will need to create at least one user account with read/ write access to this schema. To connect to the LIMS database, you need to choose “Remote Server” in the biocode login screen within Geneious. 6. Local LIMS This is intended for single or small-scale users, and is a database that exists within your local copy of Geneious. To create a new database, click on the “Add Database” button, enter a name for your new database, and click “Ok.” A new, empty database will be created for you. If you have already created a database, select it from the drop-down (other users of Geneious will not be able to connect to any LIMS databases that you create as a local database). To create a LIMS database that you can use to share data with other people, please see Remote LIMS (Note 5). The Local LIMS databases are stored within your local Geneious user directory, so they will be backed up if you choose to do a Geneious backup (choose the “Back Up” button from the main Geneious toolbar). It is suggested that you back up your Geneious data regularly to avoid losing data. 7. Workflow considerations One extraction can belong to many workflows (as extractions are often used as stock for many reactions). You can have any number of failed reactions in a workflow, and you can have any number of passed reactions in a workflow. The passed or failed status of a workflow for a given reaction type is taken from the most recent reaction of that type. While each workflow can only contain reactions of a single locus, each locus can have any number of workflows (useful if multiple people are working independently on the same locus for the same extraction). Workflows are created with reactions. Any reaction (apart from extractions) that has an empty workflow field when saved will have a new workflow created for it. That means that it is particularly important that you fill in the workflow field correctly for all reactions that you save. Fortunately, this is easily accomplished in the “Bulk editor” (see Subheading 3.1.1 and Fig. 7). Clicking autodetect workflows in the tools drop-down will automatically fill in the workflow field for any reactions that have an available workflow (i.e., one with a matching extraction and locus).
13
Laboratory Information Management Systems for DNA Barcoding
303
Fig. 32. A graphical illustration representing how four 96-well plates are converted into a single 384-well plate.
If more than one matching workflow exists, the most recent one will be chosen. If no matching workflow exists, the workflow field will remain blank, and a new workflow will be created when you save the reaction. Reactions are entered into the Lab Bench Database by plates. Plates come in a number of sizes, 48, 96, or 384 wells, and also in a grouping of any number of reactions. Creating a plate opens that plate in the plate viewer. 8. Converting between 96- and 384-well plates It is possible to create a new 384-well plate from a group of 96-well plates, and to create a group of 96-well plates from a 384-well plate (Fig. 32). Each 96-well plate corresponds to one quadrant in the 384-well plate. To create a 384-well plate, select up to four 96-well plates in Geneious, and click “New Reaction.” Select the “Create plate from existing document” checkbox, and choose 384-well plate. A panel will appear at the bottom of the dialog which will allow you to choose to which quadrant each 96-well plate corresponds. 9. Creating custom cocktails Cocktails are a recipe for the ingredients that will go into a reaction (excluding the primer). You can choose from a list of existing cocktails, or create your own. To create your own cocktail, click “Edit Cocktails,” then click new in the dialog, and enter the volumes and concentrations in the fields provided (Fig. 33). There is space for you to store one extra ingredient (both concentration and volume). Any additional information about your cocktail can be stored in the notes field. For safety reasons, you cannot modify or delete cocktails once they are created. You can create a copy of an existing cocktail by selecting it in the view, and then clicking “Add.”
304
M. Parker et al.
Fig. 33. “Edit Cocktails” window.
The new cocktail will have all the same volumes and concentrations as the one you selected. 10. Creating thermocycler profiles To create custom thermocycler profiles, click “View/Add Thermocylces” in the New PCR plate toolbar. When the “Edit Thermocycles” window opens, click the “Add” button on the lower left-hand corner of the window (Fig. 34). A New Thermocycle window will open and here you will be able to customize temperatures and cycles using the dialog boxes and “Edit Cycles” buttons (Fig. 35). 11. Creating Primers To create a new primer in Geneious, click “Sequence” along the top of the main toolbar. In the Sequence drop-down menu, click “New Sequence.” The New Sequence window will open and at the bottom of the window choose “Primer” from the Type drop-down menu. Then, enter your sequence and indentifying information in the dialog boxes and click ok (Fig. 36). Primers set on reactions will be saved to the lab bench database so that they can be viewed by others without you needing to send them your primer library. If you have a large number of primers, you may want to organize your primers by type. You can do this by storing your primers in a folder structure in the Geneious service tree (for example, you could store all your primers for a particular locus in the same folder, or store your primers by taxonomy). This
13
Laboratory Information Management Systems for DNA Barcoding
Fig. 34. “Edit Thermocycles” window from a new PCR plate.
Fig. 35. A “New thermocycle” profile entry.
305
306
M. Parker et al.
Fig. 36. “New Sequence” window for adding custom primers.
Fig. 37. Primer database accessed from the “Choose” button in the “Edit Wells” window.
13
Laboratory Information Management Systems for DNA Barcoding
307
folder structure will be displayed when you choose your primers when editing wells in your plate. To choose a primer (or primers) for your wells in the plate editor, select the wells you want to edit, and click “edit selected wells.” Select the primer you want to add to the reaction (Primer fields display a list of primers in your local database), and click the “Choose” button (Fig. 37). You can choose any primer from your database, and it will be applied to all the selected wells. Only primers you have set on wells are stored in the LIMS database. Primers that have not been set on wells exist only in your local copy of Geneious and cannot be seen by others accessing the LIMS. 12. Importing .abi files from disk, setting read directions, and batch renaming To import traces from a disk, locate the .ab1 or .scf files on disk and then click and drag them from the file manager on to the new folder in Geneious. Alternatively, you can import from inside Geneious using File > Import > From File… in the menu. Once you have imported the raw trace files, it is currently necessary to tell Geneious which reads are in the forward or reverse direction. To set read directions, select all of either the forward or reverse reads from the ones you have imported and select Biocode > Set Read Direction in the toolbar. Choose either Forward or Reverse for the read direction and click OK. It is only necessary to mark either the forward or reverse reads; Geneious will work out the rest by process of elimination (this is so that the correct read is reversed during assembly and downstream steps are able to identify the direction of the reads). After performing this task, an extra column will be added to the reads named “Is Forward Read” with a value of true or false. If your forward and reverse reads are in different folders, it is easiest to import all of the reads from one folder, then set the read direction for those, and then import the second folder. If you want to change the names of your reads to reflect some aspect of the FIMS data, from the main toolbar select Edit > Batch Rename to copy your choice of fields into the name column. This feature is also available in renaming assemblies. You can also use Edit > Batch Rename… to add _F or _R to the names of your reads if the names do not have any indication of direction (not required). If you have imported both forward and reverse reads into Geneious before setting read direction, you can use Search or Filter in the top right corner of the Geneious window to locate a particular direction of read based on names.
308
M. Parker et al.
Fig. 38. “Annotate with FIMS/LIMS data” screen.
13. Annotating with FIMS/LIMS data You can either enter a forward and reverse plate or use the annotated plate and well if you are updating sequences you have previously annotated. To aid downstream analysis and submission, it is extremely useful to annotate sequences with the associated data from the FIMS. This must be done pre-assembly (with the reads) because forward and reverse reads can come from different sequencing plates. Annotating is the first step in the assembly pipeline that utilizes the FIMS/LIMS database, so you will need to connect to the Biocode service before proceeding. To do this, right click on Biocode in the source panel on the lefthand side and select login. Select all of the reads which you imported and go to Biocode > Annotate with FIMS/LIMS Data in the toolbar. If you have plate data in your FIMS database and you do not wish to enter reaction information for your data in the LIMS, choose “Biocode > Annotate with FIMS data only…” (see below), and enter the name of your FIMS plate. You need to enter the forward and reverse sequencing plate names (from the LIMS) which correspond to your reads and identify which part of the sequence names identify the well location. If both forward and reverse reads are on a single plate, then you can leave the reverse plate field blank or enter the same name twice. Click OK and the operation will add many new columns to the table for each of the reads (Fig. 38). These include things like Specimen ID, Taxonomy, and Collector. The values should be identical for each forward and reverse pair of reads. Often, there will be reads which do not have entries in the FIMS due to sequencing results coming through from wells
13
Laboratory Information Management Systems for DNA Barcoding
309
Fig. 39. Example of Mean Coverage.
which were essentially empty. This operation will tell you about any of these and the extra columns will be left blank. Annotating with FIMS data only Please note that you will not get primer information for your sequences using this method, so you may have to annotate those yourself if you want to use the sequences to generate a genbank submission. Tip: To get the empty well reads out of your way, you can easily select them all by sorting the table by one of the FIMS attributes (e.g., Tissue ID) and then selecting the ones with no value. You can then either delete them or create a new subfolder called “empties” and move them into there. If you want to change the names of your reads to reflect some aspect of the FIMS data, you can use Edit > Batch Rename… to copy your choice of fields into the name column. 14. Mean Coverage Mean coverage is one of the binning criteria for assemblies and is also available as a column in the table. It is also the least intuitive value, so here is a description. Coverage is the number of sequences that cover a given position in an alignment/assembly. Mean coverage is, therefore, the mean of this value across all positions in the alignment/assembly (Fig. 39). For this alignment above, the first two positions have a coverage of 1. The next five positions have a coverage of 2 and the last three have coverage 1 again. Mean coverage is, therefore, (2 × 1 + 5 × 2 + 3 × 1)/10 = 1.5. The mean coverage will be between 1 and the number of sequences in the alignment/ assembly. For a pairwise assembly, that means 2 is full coverage and 1 is no coverage. 15. Taxonomic Verification Binning Similar to the bin column that has been used for reads and assemblies, Bin columns in the Verify Taxomony Results window summarize properties of the verification process by assigning each result a High, Medium, or Low value (in the form of a smiley). Query: The name of the query assembly. Query Taxon: The taxonomy of the query from the FIMS. The verify operation fills in higher taxonomy by searching
310
M. Parker et al.
NCBI taxonomy. If the taxon could not be found in NCBI, this will be noted and result will be marked as Low bin. Hit Taxon: The taxonomy of the top hit from BLAST. Levels in the taxonomies are marked as green or red depending on whether they match with the query. Keywords: A user-defined list of keywords which are expected in the hit definition from BLAST. These are highlighted red or green depending on whether they are found in the definition. Hit Definition: The definition of the top hit returned from BLAST with matching keywords highlighted. Hit Length: Length of the hit alignment from BLAST, highlighted according to binning parameters (red, orange, or green). Hit Identity: Identity of the hit alignment from BLAST, highlighted according to binning parameters (red, orange, or green). Assembly Bin: The bin that was assigned to the assembly according to the previously mentioned binning parameters. You can sort by any of the columns as usual and rearrange/ resize them. 16. Useful example searches Last Modified (LIMS) | Greater Than | 01 May 2010—all work done after the beginning of May. Plate Name (LIMS) | Contains | “Plate1”—all plates which have the phrase “Plate1” somewhere in their name. Locus | Contains | “COI”—all COI workflows and plates. 17. Most users should use the Geneious Bankit FTP account when submitting sequences. Larger research groups or sequencing centers may wish to create their own submission account, which can be done by contacting
[email protected].
Fig. 40. Genbank Submission Account.
Chapter 14 DNA Extraction, Preservation, and Amplification Thomas Knebelsberger and Isabella Stöger Abstract The effectiveness of DNA barcoding as a routine practice in biodiversity research is strongly dependent on the quality of the source material, DNA extraction method, and selection of adequate primers in combination with optimized polymerase chain reaction (PCR) conditions. For the isolation of nucleic acids, silicagel membrane methods are to be favored because they are easy to handle, applicable for high sample throughput, relatively inexpensive, and provide high DNA quality, quantity, and purity which are prerequisites for successful PCR amplification and long-term storage of nucleic acids in biorepositories, such as DNA banks. In this section, standard protocols and workflow schemes for sample preparation, DNA isolation, DNA storage, PCR amplification, PCR product quality control, and PCR product cleanup are proposed and described in detail. A PCR troubleshooting and primer design section may help to solve problems that hinder successful amplification of the desired barcoding gene region. Key words: DNA barcoding, DNA extraction, DNA preservation, PCR amplification, Agarose gel electrophoresis, PCR cleanup
1. Introduction The extraction of genomic DNA requires careful sample preparation, followed by tissue lysis and isolation of the nucleic acids. The lysis of the tissue samples is performed by applying enzymatic digestion, commonly with Proteinase K, which degrades proteins and rapidly inactivates nucleases that might otherwise degrade DNA during isolation and purification. After digestion, nucleic acids are separated from all other remaining cellular components. The condition of the biological source material plays a pivotal role for the quality, quantity, and purity of the extracted DNA. Therefore, appropriate tissue or sample storage after collecting the biological source material in the field is required. Besides the use of recently collected and preserved organisms, museum
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_14, © Springer Science+Business Media, LLC 2012
311
312
T. Knebelsberger and I. Stöger
specimens represent a convenient source for species-wide samplings. But nucleic acids might be highly degraded due to either extensive exposure to killing agents, like acetate, ethyl alcohol, or cyanide (1–3), or sample storage under inappropriate conditions; fixatives, like formaldehyde or other aldehyde mixtures often used in museums to preserve biological material, degrade DNA, which makes the extraction of utilizable nucleic acids quite challenging (4–8). For routine DNA extraction, the methods based on silica-gel membrane technology have proven to yield DNA of high quantity and quality (9). In comparison to other extraction technologies, these methods are easy to handle, relatively inexpensive, and allow high sample throughput (96-well format). After enzymatic digestion of the samples, nucleic acids are adsorbed to a silica-gel membrane in the presence of highly concentrated chaotropic salts. Fragment lengths up to 20 kb can be recovered in high purity usable for downstream applications as well as for long-term storage. Other DNA extraction technologies, like salting out precipitation or anion exchange methods, yield DNA fragment lengths up to 150 kb but are very expensive and time consuming. Fast and easy DNA extractions can be performed by inhibitor binding by sorbent technology or by the use of chelating resin, but both methods deliver DNA of poor quality and further applications might be hindered due to compounds still present in the DNA solution. Even though for DNA barcoding standardized DNA extraction protocols can be used for a broad range of taxa, some groups are still left problematic. Especially for taxa containing high quantities of polysaccharides, mucopolysaccharides, polyphenols, resins, or other secondary metabolites, substances known for binding firmly to nucleic acids during DNA extraction procedure and/or interfering with subsequent reactions, specialized protocols were suggested (10–15). DNA isolation methods for small taxa, for example nematodes (16), tardigrades (17, 18), copepods (19), collemboles (20), or mites (21–23), as well as DNA isolation methods for fungi (24–26) or plants (12, 27–32) may help to recover DNA of sufficient quantity and quality. Besides the extraction of DNA, high-quality long term storage of the DNA samples, in order to conserve the genetic resources, verify the already existing results, or conduct further analyses, is challenging. Although in biological research millions of DNA samples are currently being processed and DNA and tissue banks were founded all around the world, the process of DNA degradation during storage is barely investigated and only few studies are available dealing with this subject (e.g., 33–38). Currently, there is no common sense about the optimal DNA storage conditions, but several commercial products are already available which allow storage of small amounts of dehydrated DNA at room temperature. These products are based on the natural principle of anhydrobiosis which can be found in Tardigrades using a mixture of dissolvable
14
DNA Extraction, Preservation, and Amplification
313
compounds, e.g., trehalose, that stabilize DNA for storage at room temperature (35). For the long-term storage of higher amounts of DNA, a combination of both appropriate preserving agents and low storage temperatures (−20°C or lower) is needed to minimize the loss of DNA quality during storage. In DNA barcoding, DNA extracts are used for amplification of a specific predefined gene region by using the technique of the polymerase chain reaction (PCR) which utilizes short, user-defined DNA sequences called oligonucleotide primers. In the first step, the DNA double helix is denaturized by heating into single-stranded template DNA, where the primers are able to bind then. The thermostable enzyme DNA Polymerase starts to extend the primers by adding single Deoxynucleotide triphosphates (dNTPs) producing new doublestranded DNA. This process is performed in a Thermocycler and has to be repeated several times to increase the number of the target fragments exponentially. The quality of the PCR products is commonly checked by agarose gel electrophoresis. Before sequencing, PCR products have to be purified to eliminate the remaining PCR ingredients. The use of suitable primers is essential for amplification success. Barcoding primers should correspond to rather conservative sites with low substitution rates to apply them to a broad range of taxa. Such “universal” primers amplifying an approximately 650-bp-long fragment of the mitochondrial cytochrome oxidase subunit I (COI) gene were first defined by Folmer et al. (39) and then suggested as barcoding primers for the whole animal kingdom (40). In the course of time, it turned out that these primers are not applicable for all animal taxa and more and more group-specific ones were additionally suggested. In case of sponges (www.spongebarcoding. org), plants (41), or fungi (42), even new or additional barcoding regions were defined to get a resolution on species level. Although a tremendous variety of primers is available now, the design of new primers or the adjustment of existing primers remains still necessary for successful and effective DNA barcoding.
2. Materials 2.1. DNA Extraction
1. DNeasy Blood and Tissue Kit single columns and DNeasy 96 Blood and Tissue Kit (Qiagen): Buffers ATL, AL, AW1, AW2, AE, and Proteinase K are included in the kit. 2. NucleoSpin® Tissue Kit single columns and NucleoSpin® 96 Tissue Kit (Macherey-Nagel): Buffers T1, B1, B2, BW, B5, BE, PB, BQ1, and Proteinase K are included in the kit. 3. NucleoSpin® Plant II Kit single columns and NucleoSpin® 96 Plant II Kit (Macherey-Nagel): Buffers PL1, PC, PW1, PW2, PE, and RNase A are included in the kit (see Note 1).
314
T. Knebelsberger and I. Stöger
4. CTAB buffer: 100 ml of 1 M Tris–HCl (pH 8.0), 280 ml of 5 M NaCl, 40 ml of 0.5 M ethylene-diamine-tetraacetic-acid (EDTA), 20 g cetyltrimethyl-ammonium-bromide (CTAB). 5. TE buffer: 10 mM Tris–HCl, 1 mM EDTA pH 8.0. 6. Chloroform-isoamyl-alcohol: Chloroform:Isoamyl-alcohol, 24:1 (can be ordered at different companies). 2.2. DNA Preservation
1. 2 M trehalose stock solution: Dissolve 7.6 g trehalose (d-(+)Trehalose-dihydrate (Sigma Aldrich)) in 10 ml molecular water. 2. Qiasafe tubes and plates (Qiagen).
2.3. DNA Amplification
2.4. PCR Cleanup
1× TBE buffer: TBE is used as running buffer and for dissolving the agarose powder. To prepare the buffer, 10× TBE can be ordered (Rothiphorese 10× TBE buffer, Roth) and diluted to a 1× TBE buffer solution. Alternatively prepare one liter of 10× TBE buffer by mixing 108 g Tris, 55 g boric acid, and 7.4 g Na-EDTA in a beaker together with 500 ml demineralized water and heat for 20 min at 60°C (use a magnetic stir bar). Filter the buffer and transfer the solution to a bottle and fill up to 1 l. 1. NucleoSpin® Extract II Kit (Macherey-Nagel): Buffers NT3, NT, and NE are included in the kit. 2. Ethanol precipitation: Ethanol 100% and 70%, 3 M sodium acetate.
3. Methods 3.1. DNA Extraction: Source Material
Fresh material, if not immediately used for DNA extraction, should be directly frozen (−20°C or at lower temperatures) or fixed and preserved in 96% pure ethanol (large specimens can be subsampled) and stored at least at −20°C. Preserving agents, like DESS (43) or RNAlater (RNAlater RNA Stabilization Reagent, Qiagen), have also been proven to prevent DNA degradation during sample storage (see Note 2). Vascular plants, algae, and fungi may be better rapidly dried on silica gel and then stored in a dry, cool and dark place. Ancient material from museum collections exhibit dramatic DNA degradation (Fig. 1). Whenever possible, use recent material for DNA extraction.
3.2. DNA Extraction: Sample Preparation
1. Decontaminate workbench from any DNA using agents, like DNA Exitus Plus (BioChem) (see Note 3). 2. Wear new gloves. 3. Decontaminate all other tools (forceps, scalpels, etc.) from any DNA by flame (Bunsen burner) between processing each sample.
14
DNA Extraction, Preservation, and Amplification
315
Fig. 1. DNA extracts of Phalera bucephala (Lepidoptera) performed with Qiagen Blood and Tissue Kit on agarose gel. Dried specimens were taken from a museum collection collected in 2007 (1 ), 1971 (2 ), and 1935 (3 ). Number (1 ) contains DNA fragments up to 20 kb, whereas numbers (2 ) and (3 ) show dramatic DNA degradation with fragments between 100 and 300 bp.
4. Use 1.5-ml microcentrifuge tubes (e.g., Eppendorf ) for lysis (not included in commercially available kits); in case of plate extractions (96-well format), special lysis plates (deep-well plates) are provided (see Note 4). 5. Transfer a predefined (see DNA extraction protocols) amount of tissue into lysis tubes or deep-well plates (see Note 5). 6. Air dry EtOH-preserved tissue until alcohol is evaporated completely by placing the tubes with open caps in a Thermomixer
316
T. Knebelsberger and I. Stöger
at 40°C. Dried, fresh, or frozen material can be used directly for DNA extraction (see Note 6). 7. Clean workbench again before extraction procedure and wear new gloves. 8. Use filter tips and change pipette tips between each reagent. Do not touch the surface with tip. Keep pipette always in upright position. 3.3. DNA Extraction: Protocol Overview
For DNA isolation from animal and fungal samples, DNeasy Blood and Tissue Kit (Qiagen) (Subheadings 3.4 and 3.5) or NucleoSpin® Tissue Kit (Macherey-Nagel) (Subheadings 3.6 and 3.7) is recommended. The latter might be preferred especially for DNA extractions of arthropod samples. For plant samples, NucleoSpin® Plant II Kit (Macherey-Nagel) (Subheadings 3.8 and 3.9) achieves optimal DNA yields. All these kits are available in convenient single preparation as well as in 96-well plate format for high-throughput extractions. Alternatively, in case of mollusc, fungal, and algal taxa containing high amounts of mucopolysaccharides, better results might be achieved using the proposed CTAB protocol (Subheading 3.10) (see Note 7).
3.4. DNA Extraction: Protocol 1
DNA isolation from animal and fungal tissue with DNeasy Blood and Tissue Kit (Qiagen) single columns (see Note 8). 1. Adjust water bath or Thermomixer to 56°C (see Note 9). Add ethanol (96–100%; not provided with the kit) to buffers AW1 and AW2 as indicated on the bottles. 2. Place up to 25 mg of tissue in a 1.5-ml microcentrifuge tube (not provided). 3. Add 180 ml Buffer ATL and 20 ml Proteinase K and mix by inverting or vortexing. Briefly centrifuge samples at 3,000 × g. Incubate samples at 56°C for a few hours or overnight until tissue is lysed (see Note 10). 4. Vortex for 15 s. Briefly centrifuge samples at 3,000 × g. Add 200 ml Buffer AL to the lysate. 5. Immediately add 200 ml ethanol (96–100%), and mix by vortexing. Briefly centrifuge samples at 3,000 × g. 6. Transfer the mixture from step 5 into the DNeasy Mini spin column placed in a 2-ml collection tube (provided). Centrifuge at 6,000 × g for 1 min. Discard flow through and collection tube. 7. Place the DNeasy Mini spin column in a new 2-ml collection tube. Add 500 ml Buffer AW1 and centrifuge for 1 min at 6,000 × g. Discard flow through and collection tube. 8. Place the DNeasy Mini spin column in a new 2-ml collection tube. Add 500 ml Buffer AW2 and centrifuge for 3 min at 20,000 × g. Discard flow through and collection tube.
14
DNA Extraction, Preservation, and Amplification
317
9. Place the DNeasy Mini spin column in a clean 1.5-ml microcentrifuge tube (not provided) and add 100 ml Buffer AE directly onto the DNeasy membrane (see Note 11). Incubate at room temperature for 1 min, and then centrifuge for 1 min at 6,000 × g to elute DNA. 10. Repeat step 9 with the same microcentrifuge tube. A new microcentrifuge tube can be used for the second elution step to prevent dilution of the first eluate. 11. DNA should be immediately stored at −20°C. 3.5. DNA Extraction: Protocol 2
DNA isolation from animal and fungal tissue with DNeasy 96 Blood and Tissue Kit (Qiagen) (see Note 8). 1. Adjust water bath or Thermomixer to 56°C (see Note 9). Add ethanol (96–100%; not provided with the kit) to Buffers AL, AW1, and AW2 as indicated. If multichannel pipettes are used (recommended), sterilized reservoirs are required. 2. Place up to 20 mg of tissue in each collection microtube (96-well format, provided). 3. Add 180 ml Buffer ATL and 20 ml Proteinase K to each sample. Seal the collection microtubes properly with the provided cap strips. Mix by inversion. Briefly centrifuge up to 1,500 × g. Incubate samples at 56°C for a few hours or overnight until tissue is lysed (see Note 10). 4. Ensure that the microtubes are still properly sealed and mix by inversion for 15 s. Briefly centrifuge racks at 1,500 × g. Carefully remove the caps. Add 410 ml Buffer AL–ethanol to each sample. Seal collection microtubes with new cap strips and mix by inversion for 15 s. 5. Briefly centrifuge racks up to 1,500 × g. Place two DNeasy 96 plates on top of S-Blocks (provided). 6. Remove the first cap strip from the collection microtubes and carefully transfer the lysate to the DNeasy 96 plates. Continue with the next eight samples and so on until all samples are transferred. Seal DNeasy 96 plates with AirPore Tape Sheets (provided). Centrifuge for 10 min at 6,000 × g. 7. Remove the tape, and check that all of the lysate has passed through the membrane in each well of the DNeasy 96 plates. If lysate remains in any of the wells, centrifuge for further 10 min. 8. Remove the tape and carefully add 500 ml Buffer AW1 to each well. Seal each DNeasy 96 plate with a new AirPore Tape Sheet. Centrifuge for 5 min at 6,000 × g. 9. Remove the tape. Carefully add 500 ml Buffer AW2 to each well. Centrifuge for 15 min at 6,000 × g. Do not seal the plate
318
T. Knebelsberger and I. Stöger
with AirPore Tape Sheet in this step to allow evaporation of residual ethanol. 10. Place each DNeasy 96 plate in the correct orientation on a rack of Elution Microtubes RS (provided). 11. To elute the DNA, add 150 ml Buffer AE to each sample (see Note 11), and seal the DNeasy 96 plate with new AirPore Tape Sheet. Incubate for 1 min at room temperature (15–25°C). Centrifuge for 2 min at 6,000 × g. 12. Repeat step 11 with another 150 ml Buffer AE. Use appropriate cap strips (provided) to seal the Elution Microtubes RS for storage. 13. DNA should be immediately stored at −20°C. 3.6. DNA Extraction: Protocol 3
DNA isolation from animal and fungal tissue with NucleoSpin® Tissue Kit (Macherey-Nagel) single columns (see Note 8). 1. Prepare Buffer B3 by transferring buffer B1 into Buffer B2 (see Note 12). Dissolve Proteinase K (lyophilized) by adding the volume of Proteinase Buffer (PB) that is indicated on the Proteinase K label (see Note 13). Add appropriate volume (see label on Buffer B5 bottle) of ethanol (96–100%; not provided with the kit) to Buffer B5 before use. Adjust water bath or Thermomixer at 56°C (see Note 9). After tissue lysis, prepare a 70°C water bath and warm Buffer BE (elution buffer) to 70°C before use. 2. Place up to 25 mg of tissue in a 1.5-ml microcentrifuge tube (not provided). 3. Add 180 ml of Buffer T1 and 25 ml of Proteinase K to each sample and mix by inverting or vortexing. Briefly centrifuge samples at 3,000 × g. Incubate samples at 56°C for a few hours or overnight until tissue is lysed (see Note 10). 4. Vortex for 15 s. Briefly centrifuge samples at 3,000 × g. Add 200 ml of Buffer B3. Vortex and incubate at 70°C for 10 min. 5. Briefly centrifuge samples at 3,000 × g. Add 210 ml of 96–100% ethanol and vortex immediately. 6. Briefly centrifuge samples at 3,000 × g. Transfer the mixture from step 5 into the NucleoSpin column placed in a 2-ml collection tube (provided). Centrifuge at 11,000 × g for 1 min. Discard flow through. 7. Add 500 ml of Buffer BW to the spin column. Centrifuge at 11,000 × g for 1 min. Discard flow through. 8. Add 600 ml of Buffer B5 to the spin column. Centrifuge at 11,000 × g for 1 min. Discard the flow through. Centrifuge again at 11,000 × g for 1 min to remove the residual Buffer B5.
14
DNA Extraction, Preservation, and Amplification
319
9. Place the NucleoSpin column in a clean 1.5-ml microcentrifuge tube (not provided) and pipette 100 ml of Buffer BE (warmed to 70°C) onto the NucleoSpin membrane (do not touch the membrane) (see Note 11). Incubate at room temperature for 1 min, and then centrifuge at 11,000 × g for 1 min to elute DNA. 10. Repeat step 9 with the same microcentrifuge tube. A new microcentrifuge tube can be used for the second elution step to prevent dilution of the first eluate. 11. DNA should be immediately stored at −20°C. 3.7. DNA Extraction: Protocol 4
DNA isolation from animal and fungal tissue with NucleoSpin® 96 Tissue Kit (Macherey-Nagel) (see Note 8). 1. Dissolve Proteinase K (lyophilized) by adding the volume of Proteinase Buffer (PB) that is indicated on the Proteinase K label and store at −20°C (see Note 13). Add appropriate volume (see label on Buffer B5 bottle) of ethanol (96–100%; not provided with the kit) to Buffer B5 before use. Adjust water bath or Thermomixer at 56°C (see Note 9). After tissue lysis, preheat incubator at 70°C and warm Buffer BE (elution buffer) to 70°C before use. 2. Place up to 20 mg of tissue into each well of a Round-well Block (provided). 3. Add 180 ml Buffer T1 and 25 ml of Proteinase K to each sample. Seal wells properly with cap strips and mix by inverting for 10–15 s. Spin briefly for 15 s at 1,500 × g. Incubate samples at 56°C for a few hours or overnight until tissue is lysed (see Note 10). 4. Ensure that the microtubes are still properly sealed and centrifuge the Round-well Block for 15 s at 1,500 × g. Carefully remove cap strips. Add 200 ml of Buffer BQ1 and 200 ml of 96–100% ethanol to each sample. Close the wells with new cap strips and mix by inversion for 10–15 s. Centrifuge racks for 10 s at 1,500 × g. 5. Place each NucleoSpin plate onto a MN Square-well Block (provided). 6. Remove the first cap strip from the first eight wells and carefully transfer the lysate to the NucleoSpin plates. Continue with the next eight samples and so on until all samples are transferred. Seal each NucleoSpin plate with adhesive PE foil (provided). Centrifuge for 10 min at 5,600 × g. 7. Remove foil, and check that all of the lysate has passed through the membrane in each well of the NucleoSpin plates. If lysate remains in any of the wells, centrifuge for further 10 min.
320
T. Knebelsberger and I. Stöger
8. Carefully add 500 ml of Buffer BW to each well. Seal each plate with a new adhesive PE foil. Centrifuge for 2 min at 5,600 × g. 9. Remove adhesive PE foil. Carefully add 700 ml of Buffer B5 to each well. Seal the plate with a new adhesive PE foil. Centrifuge for 4 min at 5,600 × g. 10. Remove adhesive PE foil. Place NucleoSpin plates on an opened rack with tube strips and incubate for 10 min at 70°C in an incubator to evaporate residual ethanol. 11. To elute DNA, dispense 100 ml of prewarmed (70°C) Buffer BE (elution buffer) to each well directly onto the membrane of the NucleoSpin plates (see Note 11). Incubate at room temperature for 1 min. Centrifuge for 2 min at 5,600 × g. 12. Repeat step 11 with another 100 ml Buffer BE. 13. Remove the NucleoSpin plate and seal tube strips. DNA should be immediately stored at −20°C. 3.8. DNA Extraction: Protocol 5
DNA isolation from plant tissue with NucleoSpin® Plant II Kit (Macherey-Nagel) single columns (see Note 8). 1. Add appropriate volume (see label on Buffer PW2 bottle) of ethanol (96–100%; not provided with the kit) to Buffer PW2 before use. Dissolve RNase A (lyophilized) by adding the volume of molecular water that is indicated on the RNase A label and store at −20°C (see Note 14). Adjust water bath or Thermomixer at 65°C (see Note 9). After tissue lysis, preheat incubator at 70°C and warm Buffer PE (elution buffer) to 70°C before use. 2. Homogenize 50 mg (up to 100 mg) wet-weight or 10 mg (up to 20 mg) dry-weight (lyophilized) plant material (see Note 15). 3. Transfer the resulting powder to a new 1.5-ml microcentrifuge tube (not provided) and add 400 ml Buffer PL1. Vortex the mixture thoroughly. Add 10 ml RNase A solution and mix sample thoroughly. Incubate the suspension for 10 min at 65°C. 4. Briefly centrifuge samples at 3,000 × g. Place a NucleoSpin® Filter column (violet ring) into a 2-ml collection tube and load the lysate onto the column. Centrifuge for 2 min at 11,000 × g, collect the clear flow through (see Note 16), and discard the NucleoSpin® Filter. 5. Add 450 ml Buffer PC and mix thoroughly by vortexing. 6. Briefly centrifuge samples at 3,000 × g. Place a NucleoSpin® Plant II Column (green ring) into a new 2-ml collection tube and load a maximum of 700 ml of the sample. Centrifuge for 1 min at 11,000 × g and discard flow through (see Note 17).
14
DNA Extraction, Preservation, and Amplification
321
7. Add 400 ml Buffer PW1 to the NucleoSpin® Plant II Column. Centrifuge for 1 min at 11,000 × g and discard flow through. 8. Add 700 ml Buffer PW2 to the NucleoSpin® Plant II Column. Centrifuge for 1 min at 11,000 × g and discard flow through. 9. Add another 200 ml Buffer PW2 to the NucleoSpin® Plant II Column. Centrifuge for 2 min at 11,000 × g in order to remove wash buffer and dry the silica membrane completely. 10. Place the NucleoSpin® Plant II Column into a new 1.5-ml microcentrifuge tube (not provided). Pipette 50 ml Buffer PE (preheated to 70°C) onto the membrane. Incubate the NucleoSpin® Plant II Column for 5 min at 70°C. Centrifuge for 1 min at 11,000 × g to elute the DNA. 11. Repeat step 10 with another 50 ml Buffer PE (preheated to 70°C) and elute into the same tube. 12. DNA should be immediately stored at −20°C. 3.9. DNA Extraction: Protocol 6
DNA isolation from plant tissue with NucleoSpin® 96 Plant II Kit (Macherey-Nagel) (see Note 8). 1. Add appropriate volume (see label on Buffer PW2 bottle) of ethanol (96–100%; not provided with the kit) to Buffer PW2 before use. Dissolve RNase A (lyophilized) by adding the volume of molecular water that is indicated on the RNase A label and store at −20°C (see Note 14). Adjust water bath or Thermomixer at 65°C (see Note 9). After tissue lysis, preheat incubator at 70°C and warm Buffer PE (elution buffer) to 70°C before use. 2. Homogenize 50 mg (up to 100 mg) wet-weight or 10 mg (up to 20 mg) dry-weight (lyophilized) plant material (see Note 15) in each tube of the tube strips. 3. Add 500 ml Buffer PL1 and 10 ml RNase A to each sample. Close tubes using cap strips (provided). Mix vigorously by shaking for 15–30 s. Centrifuge briefly for 30 s at 1,500 × g. Incubate samples at 65°C for 30 min. 4. Centrifuge the samples for 20 min at 5,600 × g. Remove cap strips. 5. Predispense 450 ml Binding Buffer PC to each well of an MN Square-well Block. Add 400 ml cleared lysate of each sample and mix by repeated pipetting up and down. Mix at least three times. 6. Place NucleoSpin® Plant II Binding Plate on an MN Squarewell Block. Transfer samples from the previous step into the wells of the NucleoSpin® Plant II Binding Plate. Do not moisten the rims of the individual wells while dispensing the samples. 7. Place the NucleoSpin® Plant II Binding Plate stacked on an MN Square-well Block in the rotor buckets. Centrifuge at 5,600 × g for 5 min.
322
T. Knebelsberger and I. Stöger
8. Add 400 ml PW1 to each well of the NucleoSpin® Plant II Binding Plate. Optional: Seal plate with a gas-permeable foil. Centrifuge again at 5,600 × g for 2 min. Place NucleoSpin® Plant II Binding Plate on a new MN Square-well Block. 9. Add 700 ml PW2 to each well of the NucleoSpin® Plant II Binding Plate. Optional: Seal plate with a gas-permeable foil. Centrifuge again at 5,600 × g for 2 min. 10. Add 700 ml PW2 to each well of the NucleoSpin® Plant II Binding Plate. Optional: Seal plate with a gas-permeable foil. Centrifuge again at 5,600 × g for 10 min for complete removal of residual Buffer PW2. 11. Place NucleoSpin® Plant II Binding Plate on the rack with tube strips. Dispense 100 ml Buffer PE (preheated 70°C) directly onto the membrane of each well of the NucleoSpin® Plant II Binding Plate. Incubate at room temperature for 2 min. Centrifuge at 5,600 × g for 2 min. 12. Repeat step 10 with another 100 ml Buffer PE (preheated to 70°C) and elute into the same rack with tube strips. 13. DNA should be immediately stored at −20°C. 3.10. DNA Extraction: Protocol 7
DNA isolation from tissue containing high amounts of mucopolysaccharides with CTAB method. 1. Adjust water bath or Thermomixer at 55°C (see Note 9). Mark and precool 1.5-ml microcentrifuge tubes containing 25 ml 3 M ammonium acetate + 600 ml 70% ethanol per sample at 4°C. Mark another two sets of tubes according to the number of samples. Precool 70% ethanol. 2. Place tissue sample to one set of marked tubes. Ground sample (5–20 mg tissue) if necessary or let the residual ethanol evaporate. 3. Add 300 ml CTAB buffer and 0.6 ml b-mercaptoethanol per sample (see Note 18). Perform this step under a fume hood. 4. Add 10 ml Proteinase K (20 mg/ml, Qiagen) to each sample and mix carefully. 5. Incubate samples at 55°C for a few hours or overnight until tissue is lysed (see Note 10). 6. Add 300 ml chloroform-isoamyl-alcohol (24:1) and mix well by shaking tubes for 2 min. Perform this step under a fume hood. Proteins are precipitated now. 7. Centrifuge for 10 min at 11,000 × g. Pipette the supernatant into another set of clean tubes. 8. Add another 300 ml chloroform-isoamyl-alcohol (24:1) to the new set of tubes including the supernatant from step 7 and mix
14
DNA Extraction, Preservation, and Amplification
323
well by shaking tubes for 2 min. Perform this step under a fume hood. Remaining proteins are precipitated now. 9. Centrifuge for 10 min at 12,000 × g. Pipette the supernatant to the set of clean tubes containing cold 25 ml 3 M ammonium acetate + 600 ml 70% ethanol. The supernatant includes DNA which is precipitated in this step. 10. Centrifuge for 10 min at 12,000 × g. Pour or pipette off the liquid, being careful not to touch or lose the DNA pellet (see Note 19). 11. Add 250 ml cold 70% ethanol and mix to wash the DNA pellet. 12. Centrifuge for 10 min at 12,000 × g. 13. Pour or pipette off the liquid. Dry pellet in the incubator for 5–10 min (at 60°C). 14. Dissolve the pellet in 50 ml TE buffer. 15. DNA should be immediately stored at −20°C. 3.11. DNA Preservation: Protocol 1
According to the DNA extraction protocols, exclusively use buffers to elute or dissolve DNA (see Note 20). For storage, DNA isolates ought to be portioned into two (or more) aliquots. One aliquot serves as backup for long-term storage at −80°C. The other aliquot(s) can be kept as working solution at −20°C to be used for PCR amplification (see Note 21). The best method to preserve high DNA quality of the backup aliquot is the use of QIAsafe DNA Tubes (Qiagen), which contain a mixture of dissolvable compounds that stabilize DNA (see Note 22). For sample storage, proceed according to the following steps. 1. Pipette up to 50 ml of the DNA solution (not more than 30 mg DNA) on the colored DNA-protecting matrix of the QIAsafe DNA Tubes. 2. Dry samples with a vacuum concentrator (1 h at 55°C for 20 ml DNA solution) or under a laminar flow hood at room temperature (12 h for 20 ml of DNA solution). 3. Seal completely dried samples and preferably store QIAsafe DNA Tubes at −80°C (see Note 23). 4. To recover DNA, dissolve pellet of dried protection matrix including DNA with appropriate volume of molecular water. DNA solution can immediately be used for PCR or other applications.
3.12. DNA Preservation: Protocol 2
QIAsafe DNA Tubes are relatively cost intensive. Therefore, we suggest a further professional method to store DNA using the preserving agent trehalose. 1. Transfer 90 ml DNA extract to sterile and temperature-stable storage tubes (e.g., Rotilabo®-microcentrifuge tubes, Roth).
324
T. Knebelsberger and I. Stöger
2. Add 10 ml of trehalose stock solution (2 M) to obtain a final concentration of 200 mM in the DNA sample (see Note 24). 3. Dry samples with a vacuum concentrator at 55°C (this may last for a few hours). 4. Store sealed samples at −80°C (see Note 25). 5. To recover DNA, dissolve pellet with appropriate volume of molecular water. DNA solution can immediately be used for PCR or other applications. 3.13. DNA Amplification: PCR Ingredients
1. DNA Polymerase: Recombinant Taq DNA Polymerase (e.g., Qiagen) is commonly used for standard PCR. It is a thermostable enzyme of the thermophilic bacterium Thermus aquaticus and is, therefore, able to synthesize DNA at high temperatures (see Note 26). Usually, 0.025 U of Taq DNA Polymerase are used per ml of the PCR reaction. Hot Start Polymerases can be used to prevent the amplification of unspecific PCR products before the PCR program is started. They are inactive at lower temperatures and are activated during the first heating step of the PCR program. 2. PCR buffer: For optimal DNA Polymerase reaction activity, PCR buffers are used containing Tris–HCl, KCl, and, optional, MgCl2. Buffers are provided by the supplier together with Taq Polymerase. It is important to use Polymerase and PCR buffer from the same manufacturer. 3. Oligonucleotide primers: PCR primers are short, singlestranded DNA fragments (usually, 20–30 nucleotides). PCR requires one forward and one reverse primer to assign the favored fragment of the DNA. Primers are usually delivered in desalted mode. Resuspend primers in molecular water to a stock concentration of 100 pmol/ml; prepare aliquots of working solutions with a concentration of 10 pmol/ml ready to use for PCR. There should be a surplus of primers in the reaction mix, but too much of them may lead to unspecific reactions. For standard PCR, 0.5 ml of each primer working solution (10 pmol/ml) is enough. For primer design, see Note 27. 4. Deoxynucleotide triphosphates (dNTPs): dNTPs (dATP, dTTP, dGTP, and dCTP) are the nucleotide bases added by the DNA Polymerase during synthesis of the template strand. They are available as single ingredients or as a dNTP mix (e.g., Fermentas). There should always be a slight surplus of dNTPs in the reaction mix. For PCR, a final concentration of 2 mM dNTPs (which means 2 mM of each type of nucleotide!) is applicable. In case of the nucleotide premix (all nucleotides in a total concentration of 10 mM), the solution has just to be diluted to the desired concentration. In case of single nucleotides, add 20 ml (100 mM stocks) of each dNTP to 920 ml molecular water.
14
DNA Extraction, Preservation, and Amplification
325
5. Additives: Additives, like MgCl2, trehalose, DMSO, Q-solution, etc., can enhance PCR efficiency (see Note 28). Use additives only if standard protocols do not work. Too high concentrations of MgCl2, for instance, increase the amount of unspecific products due to unspecific amplification of the Polymerase. 6. Molecular water: Use only ultra pure and nuclease-free water for PCR. Water is used to fill the mix of ingredients up to the desired volume, which is normally 10–25 ml. 7. Template DNA: This is the original genomic DNA material. Use 1–2 ml DNA solution (obtained from extraction) with a concentration between 20 and 100 ng/ml. Usually, PCR also works well with lower concentrations (below 2 ng/ml) (see Note 29). 3.14. DNA Amplification: Principle Steps
PCR amplification is carried out in a Thermocycler using a specific temperature profile. It involves initial denaturation of the template DNA, followed by a specific number of cycles, including denaturation, primer annealing, and elongation steps. The program is finished by an extended final elongation step (see Note 30). 1. Initial denaturation: Melting of double-stranded DNA in two single-stranded templates by disrupting the hydrogen bonds between complementary nucleotides. This step is usually performed at a temperature of 94°C for about 5 min. If the template DNA is GC rich, the interval should be extended up to 10 min. Heating the lid is recommended and normally an option of every Thermocycler. 2. Denaturation: Similar to the initial denaturation, this step leads to melting of the double-stranded DNA into single strands for primer annealing. Amplified DNA with high GC content needs increased denaturation time (3–4 min). 3. Annealing: In most cases, temperatures between 50 and 65°C allow successful annealing of primers to the single-template DNA strands. Typically, the optimal annealing temperature (Ta) is 3–5°C below the melting temperature of the primers (Tm). Tm can be calculated by several computer programs (see primer design below). If nonspecific PCR products are produced in addition to the desired product, temperature can be optimized by stepwise raising in increments of 1–2°C. 4. Elongation: In this step, the DNA Polymerase synthesizes a new DNA strand complementary to the template strand by adding dNTPs. The optimal elongation temperature is dependent of the Polymerase itself and the length of the desired fragment. In case of Taq DNA Polymerase, the highest synthesis rates can be performed at 70–75°C. For fragments up to 1,000 bp, optimal elongation time is between 1 and 2 min.
326
T. Knebelsberger and I. Stöger
For longer fragments, more elongation time is needed (vice versa for smaller fragments). 5. Number of cycles: Now, the steps 2–4 are repeated several times (cycles). The number of cycles depends on the amount of template DNA. If the initial DNA quantity is low, up to 40 cycles can be performed. For higher amounts of template, 30–35 cycles may last. 6. Final elongation: After the last PCR cycle, a final elongation is performed to ensure that all remaining single DNA strands are fully extended. It is usually performed at 72°C for 5–10 min. 7. Cooling (optional): After the final elongation step, samples can remain in the Thermocycler if reactions are performed overnight. For cooling overnight, use a temperature of 15°C. This temperature neither damages PCR products nor strains the heating block too much. Subsequently to amplification, the PCR products can be stored for a while in the fridge (4°C) until further processing. 3.15. DNA Amplification: PCR Performance
It is recommended to prepare a so-called master mix for all samples that should be processed. The master mix contains all required ingredients, except the template DNA (Subheadings 3.16 and 3.17, see also Note 31). Volumes of ingredients are calculated according to the number of processed samples including a positive and a negative control (see Note 32) plus about 5–10% more just to make sure that it is enough. (It is very annoying if there is no master mix left for the last few samples!) For PCR preparation, notice the following steps: 1. Select samples for PCR. Calculate volumes of required ingredients according to the protocol. 2. Wear new gloves and use filter tips for pipetting steps to avoid contamination. 3. Thaw required ingredients and DNA samples. Vortex all ingredients, except DNA samples. Briefly centrifuge all ingredients and DNA samples before opening. Keep everything on ice. 4. Prepare master mix with all ingredients, except Taq DNA Polymerase. Start with water or buffer first. Use PCR form as checklist (make check marks). Find below protocols for master mix preparation. 5. Transfer 1–2 ml of DNA template to 0.2-ml PCR reaction tubes (single tubes, 8-stripes or 96-well PCR plates, dependent on the number of samples). 6. Take Taq DNA Polymerase out of the freezer and centrifuge briefly. Add Polymerase to master mix and pipette up and down carefully to mix (put Polymerase immediately back to the freezer!). Dispense master mix to PCR reaction tubes (see Note 33).
14
DNA Extraction, Preservation, and Amplification
327
7. Cap PCR tubes properly (in case of plates, seal with foil) and mark for identification (use thermostable markers, e.g., Stabilo OH Pen). 8. Briefly centrifuge the PCR tubes, place them into the Thermocycler, and start the required program. After PCR, check products on agarose gel (see Subheading 3.20). 3.16. DNA Amplification: PCR Master Mix Protocol 1
Protocol for one sample using Taq DNA Polymerase (e.g., Qiagen) with a 25 ml PCR reaction volume; dispense 24 ml of the master mix to 1 ml of the template DNA (see Note 34): 1. Molecular-grade water: 15.875 ml. 2. 10× PCR buffer: 2.5 ml. 3. MgCl2: 2.0 ml. 4. dNTPs, 2 mM each: 2.5 ml. 5. Primer forward, 10 pmol/ml: 0.5 ml. 6. Primer reverse, 10 pmol/ml: 0.5 ml. 7. Taq Polymerase 5 U/ml: 0.125 ml. 8. DNA: 1.0 ml.
3.17. DNA Amplification: PCR Master Mix Protocol 2
Protocol for one sample using Hot Start DNA Polymerase (e.g., Phire-Polymerase, New England BioLabs) with a 20 ml PCR reaction volume; dispense 19 ml of the master mix to 1 ml of the template DNA (DMSO is provided by the supplier of the Polymerase). 1. Molecular water: 11.2 ml. 2. 5× PCR buffer: 4.0 ml. 3. DMSO: 1.0 ml. 4. dNTPs, 10 mM each: 0.4 ml. 5. Primer forward, 10 pmol/ml: 0.5 ml. 6. Primer reverse, 10 pmol/ml: 0.5 ml. 7. Phire-Polymerase: 3.6 ml. 8. DNA: 1.0 ml.
3.18. DNA Amplification: PCR Temperature Scheme Protocol 1
Standard temperature profile using Taq DNA Polymerase. 1. Initial step: 94°C—2 min. 2. Denaturation: 94°C—30 s. 3. Annealing: 50°C—30 s. 4. Elongation: 72°C—1 min. 5. Cycles: Repeat steps 2–4 for 35 times. 6. Final elongation: 72°C—10 min. 7. Cooling: 15°C—as long as required (often called “forever” in programs of Thermocyclers).
328
T. Knebelsberger and I. Stöger
3.19. DNA Amplification: PCR Temperature Scheme Protocol 2
Temperature profile using Hot Start Polymerase (see Note 35). 1. Initial step: 98°C—30 s. 2. Denaturation: 98°C—5 s. 3. Annealing: 65°C—5 s. 4. Elongation: 72°C—35 s. 5. Cycles: Repeat steps 2–4 for 35 times. 6. Final elongation: 72°C—1 min. 7. Cooling: 15°C—as long as required (often called “forever” in programs of Thermocyclers).
3.20. PCR Product Quality Control by Agarose Gel Electrophoresis (see Note 36)
1. Prepare loading dye and Molecular Size Marker (100 bp DNA Ladder Plus, Fermentas) according to the manufacturer’s instructions. Loading dye is used in 1× concentration in this protocol. 2. Prepare the tray with appropriate combs. 3. In case of a usual 100-ml gel, weigh 1.0 g agarose powder and add 100 ml 1× TBE buffer. Boil the mixture in a microwave until the agarose powder is completely dissolved (see Note 37). Add 2 ml (or one drop) ethidiumbromide or 10 ml GelRed and shake carefully. Immediately pour the mix to the prepared tray and wait until the agarose gel is solid which takes about half an hour. 4. Apply the gel to an adequate electrophoresis chamber filled with 1× TBE buffer. Gel should be completely dipped. Remove the combs. Mix 2 ml of each PCR product with 2 ml loading dye (prepared in a microtiter plate according to the number of samples) and pipette up and down a few times to mix. Load the PCR samples into the pockets (see Note 38). 5. Connect voltage (90 V) and let samples run for about 30 min. Afterwards, the double-stranded PCR products can be viewed in ultraviolet light. Take a photo to select samples for the cleanup. Sharp bands indicate successful amplification of the desired DNA fragment (Fig. 2). When PCR fails, no bands are present. In case of suspicious PCR results, see PCR troubleshooting in Note 39.
3.21. PCR Cleanup: ExoSAP-It
PCR purification eliminates the remaining PCR ingredients and single-stranded DNA fragments which may inhibit the adjacent sequencing reaction. Cleanup can be carried out in three different ways: enzymatic digestion (this section), ethanol precipitation (Subheading 3.22), or via commercial kits (Subheading 3.23) that are based on column methods (similar to the extraction methods of Qiagen and Macherey-Nagel). After the cleanup, the PCR products are ready to use for sequencing reactions.
14
DNA Extraction, Preservation, and Amplification
329
Fig. 2. Amplified COI fragments on agarose gel. Lanes 1–4 show very intense and sharp PCR products. In lanes 5–8, DNA amplification failed, and only unconsumed primers are visible.
Enzymatic digestion using ExoSAP-It (GE Healthcare, formerly Amersham Biosciences) (see Note 40): 1. Mark the appropriate number of tubes. Do cleanup of only the samples that worked well in the PCR reaction. Transfer 5 ml of each of the selected PCR products to new PCR tubes (single tubes, 8-stripes, or plates). 2. To clean up 5 ml of PCR product, 1.0 ml molecular water and 0.5 ml ExoSAP-It are needed. Produce a master mix of water and ExoSAP-It according to the number of samples plus about 5–10% more just to make sure that it is enough. 3. Mix 5 ml PCR product with 1.5 ml master mix, then place it to the Thermocycler, and start the program with the following temperature scheme: 37°C—40 min. 80°C—15 min. 15°C—as long as required. In the first step, Exonuclease I degrades residual single-stranded DNA and primers. Shrimp Alkaline Phosphatase hydrolyzes remaining dNTPs. Those ingredients would otherwise interfere with the sequencing reaction. In the second step, ExoSAP-It itself is inactivated. After this procedure, the cleaned up PCR product can be directly used for sequencing reaction (see Note 41). 3.22. PCR Cleanup: NucleoSpin® Extract II Kit
Column-based method using NucleoSpin® Extract II Kit (Macherey-Nagel): 1. Dilute buffer NT3 with the correct amount of ethanol (96–100%). Mark the appropriate number of 1.5-ml microcentrifuge tubes (not provided).
330
T. Knebelsberger and I. Stöger
2. For sample volumes of <100 ml, adjust the volume of the reaction mix to 100 ml using Buffer NT. 3. Mix one volume of sample with two volumes of Buffer NT. 4. Place a NucleoSpin Extract II Column into a Collection Tube (2 ml), load the sample-buffer mix, and centrifuge for 1 min at 11,000 × g. Discard flow through and place the column back to the collection tube. The PCR fragments are now bound to the membrane. 5. To wash the membrane, add 700 ml Buffer NT3 to the NucleoSpin Extract II Column and centrifuge for 1 min at 11,000 × g. Discard flow through and place the column back into the collection tube. 6. Centrifuge for 2 min at 11,000 × g to remove Buffer NT3 completely (especially ethanol residues). Make sure that the spin column does not come in contact with the flow through while removing it from the centrifuge. Discard collection tube. 7. Place the NucleoSpin Extract II Column into a new 1.5-ml microcentrifuge tube (previously marked). Add 30 ml Buffer NE and incubate at room temperature for 1 min. Centrifuge for 1 min at 11,000 × g. Now the cleaned up PCR product is eluted and can be used for sequencing reaction. 3.23. PCR Cleanup: Ethanol Precipitation
1. Put ethanol (100% and 70%) into the freezer: both need a temperature of −20°C when applied. Mark the appropriate number of microcentrifuge tubes and cool down the centrifuge to 4°C. 2. Add one-tenth of the amount of the PCR product of 3 M sodium acetate to the PCR product (e.g., if you have 10 ml PCR product, then add 1 ml 3 M sodium acetate to the complete PCR product). 3. Add two volumes of 100% ethanol (−20°C) to one volume of the PCR product (e.g., if you use 10 ml PCR product, then add 20 ml 100% ethanol (−20°C)). 4. Centrifuge for 15 min and 4°C at 11,000 × g. 5. Discard supernatant without discarding the pellet that might be visible or not (see Note 19). 6. Add 200 ml 70% ethanol (−20°C) onto the pellet to wash it. 7. Centrifuge for 5 min and 4°C at 11,000 × g. 8. Discard supernatant. 9. Dry pellet at room temperature or in a Thermomixer at 37°C to remove the residual ethanol. 10. Dissolve pellet in 30 ml molecular water and the cleaned up PCR product is again ready to use for sequencing reaction.
14
DNA Extraction, Preservation, and Amplification
331
4. Notes 1. Buffers PL1, PC, and PW1 contain toxic components. Wear gloves and goggles! 2. Do not use buffers like DESS or RNAlater for samples containing calcium carbonate structures, like shells, etc. They might be dissolved completely during storage. 3. Cleaning with alcohol does not destruct and eliminate nucleic acids. 4. To avoid mix-up of samples, label tubes or plates carefully with a permanent marker. It is helpful to choose appropriate numbers of samples for DNA extraction to tare centrifuge. In case of plate extraction, equal numbers of plates should be processed at the same time; most plate centrifuges accommodate two plates. 5. Do not pool individuals in a single lysis tube. Extractions also work with very small samples, e.g., arthropods with a body size below 500 mm or with legs of small insects. Increasing the amount of tissue does not necessarily lead to higher DNA yields (limited binding capacity of silica-gel membrane) but to a remarkable decrease of the purity of the DNA extract. If dry material is used (e.g., insect legs), special care must be taken during the filling of the tubes. Because of electrostatic forces, the samples may pop out of the tubes. 6. In most cases, there is no need for mechanical disruption of the samples which reduces hands-on time considerably, especially if high-throughput sample processing is planned. In case of fibrous plant material or fungi with rigid cell walls, DNA extraction might be problematic. To increase the efficiency of the Proteinase K digestion, samples can be grinded with pestles or beads. 7. CTAB protocol is not applicable for high-sample-throughput DNA extractions. 8. Usually, kits can be stored at room temperature (15–25°C). Some ingredients may need storage at lower temperatures. 9. For sample incubation, Thermomixers are to be preferred. Water baths can also be used, but samples should then be inverted several times during lysis. 10. In most cases, lysis can be performed overnight. In case of small samples, lysis might be completed after a few hours. For nondestructive DNA extraction and voucher (hard body parts) retrieval, special care must be taken regarding the lysis treatment (e.g., 20, 44). 11. Do not use water for DNA elution. DNA degrades faster if stored in pure water; buffers are to be preferred (elution buffers are supplied with DNA extraction kits).
332
T. Knebelsberger and I. Stöger
12. Buffer B3 is stable for 5 months when stored in the dark at room temperature. 13. Dissolved Proteinase K should be kept at −20°C when it is not used and is stable for up to 6 months then. 14. Dissolved RNase A should be kept at −20°C when it is not used and is stable for up to 1 year then. 15. In case of algae and higher plants, ground tissue with metal beads (e.g., 3-mm metal beads using a RETSCH MM301 shaking-mill at a mill frequency 30/s for 45 s). 16. If not all liquid has passed the filter, repeat the centrifugation step. If a pellet is visible in the flow through, transfer the clear supernatant to a new 1.5-ml microcentrifuge tube. 17. The maximum loading capacity of the NucleoSpin® Plant II Column is 700 ml. For higher sample volumes, repeat the loading step. 18. b-mercaptoethanol smells extremely! Keep the stock solution closed as far as possible and discard pipette tips and all stuff that came in touch with the solution immediately to a bottle or a can that can be firmly closed. 19. There might be no pellet visible in the tube which does not mean that there is no DNA. It might be helpful to place the microcentrifuge tubes in an orientated way into the centrifuge, so you will know where the pellet must be, if you do not see it. 20. DNA storage in water leads to accelerated DNA degradation, especially at higher temperatures. 21. Do not keep the working solution at 4°C. Repeated freezing and thawing do not affect DNA quality at all (own results) but storage at higher temperatures does. 22. Storage tubes are offered in single tubes as well as in 96-well format. 23. Although the manufacturer recommends storage at room temperature, lower storage temperatures are to be favored, especially for long-term storage. Own results (Knebelsberger T, Raupach M, Zetzsche H and Klenk HP) have shown that long DNA fragments are better preserved in QIAsafe DNA Tubes at lower temperatures than at room temperature. 24. Trehalose does not interfere with downstream applications, such as PCR or sequencing. So no further DNA cleanup is necessary when using the master sample in the future. 25. There is no need to store samples at lower temperatures than −80°C, for example in liquid nitrogen; shear forces are not effective at temperatures lower than −80°C. 26. Keep in mind that Taq DNA Polymerase has no exonuclease (proofreading) activity and additionally adds an extra nucleotide (dATP) at the 3¢ end of the strand.
14
DNA Extraction, Preservation, and Amplification
333
27. Primer design: If primers have to be designed new or if they are not suitable and have to be adjusted, please note the following crucial issues. –
Primer length should be between 17 and 30 nucleotides.
–
GC content should be between 40% and 60%.
–
For primers, do not use self- or pairwise complementary sequences at the 3¢ ends to avoid the occurrence of primer dimers.
–
Do not repeat three or more Gs or Cs at the 3¢ end.
–
Avoid a T at the 3¢ end because it may lead to mismatches.
–
Avoid complementary regions within primers and between them.
–
Primers should not build stem-loop or hairpin structures at temperatures over 40°C.
–
Choose primers with similar melting temperatures (Tm).
–
Start with annealing temperature ca. 5°C below melting temperature (Tm).
–
Simple calculation of Tm: Tm = 2°C × n (A + T) + 4°C × n (G + C).
–
Software for primer design: http://frodo.wi.mit.edu/, Primer3, ClustalW2, GeneFisher, or Universal ProbeLibrary (Roche).
28. The use of additives is especially important in case of GC-rich templates, where PCR amplifications frequently yield low and unspecific or even no product at all. 29. Higher amounts of template DNA can inhibit PCR reactions or yield higher amounts of unspecific products. 30. It is necessary to adapt the profile to the set of primers used and to the Thermocycler because cyclers differ considerably in heating rates. In general, some unsuccessful runs have to be started for testing before everything is optimized and works perfect. 31. In ready-to-use PCR kits, all essential ingredients, except primers and template DNA, are already mixed up. In case of verylow-template DNA quantity or quality, very good results can be obtained with illustra™ puRETaq Ready-To-Go PCR Beads (GE Healthcare). 32. For the positive control, use template DNA which already worked very well under the same conditions. The positive control is important to monitor the PCR performance. If the positive control fails, PCR conditions or ingredients have to be checked. As negative control, use water instead of DNA
334
T. Knebelsberger and I. Stöger
template in your PCR mix. Usually, no PCR product appears in the negative control. A band in the negative control indicates contamination and the whole PCR has to be repeated. 33. If pipette tips have to be saved, the master mix can be dispensed first by use of only one tip and then secondly the DNA by changing tips from sample to sample. Work fast to reduce time until PCR program is started. 34. Do not increase the PCR reaction volume over 25 ml to guarantee a rapid homogeneous heating of the sample during PCR procedure. 35. Hot Start DNA Polymerases work very fast and are effective at higher temperatures. 36. Please note that since the protocol includes toxic (DNA binding) agents it is essential to wear appropriate cloths and special gloves, e.g., nitrile gloves. Do not use latex gloves! 37. Be careful: Boiling retardation could occur during heating. 38. Leave blank the first and/or the last pocket of each row for the size marker. Do not forget to change tips after each sample and to apply the marker! 39. PCR troubleshooting No or very weak PCR products: –
Pipetting error or lack of one of the ingredients of the master mix → repeat the PCR.
–
Problems with the Thermocycler → check cycler and PCR report.
–
Insufficient number of cycles → add 5 cycles more (not more than 40 cycles in total).
–
Annealing temperatures for primers are not optimal → be sure that both primers exhibit similar annealing temperature profiles. Reduce the annealing temperature in the PCR program (start with 2–3°C less) or perform a gradient PCR to find the optimal annealing temperature. Be careful: If the annealing temperature is too low, primers bind unspecifically. (DNA is everywhere! Mostly, Homo sapiens or some kind of fungal DNA is amplified then.) If the annealing temperature is too high, primers do not anneal at all.
–
Time for elongation too short → extend the elongation step.
–
Too little or too high DNA concentration → measure DNA concentration and then increase (by vacuum concentration) or diminish (by diluting) the DNA template concentration.
–
Too weak DNA quality → estimate DNA quality by measuring the absorbance ratio at 260 and 280 nm (A260/280). The optimal value of A260/280 is about 1.8.
14
DNA Extraction, Preservation, and Amplification
335
–
Concentration of primers is not optimal → test of different concentrations.
–
Primers are not suitable → adjust primers by new primer design (see Note 27).
–
Primers are already degraded → order new primers, especially if the positive control fails.
–
Concentration of MgCl2 is not optimized → increase MgCl2 concentration to get more product. Be careful: If the concentration of MgCl2 is too high, the amount of unspecific products is increasing due to unspecific amplification of the Polymerase.
Double or multibands or smear is visible on the agarose gel (Fig. 3): –
Annealing temperatures for primers are not optimal → increase the annealing temperature in the PCR program (start with 2–3°C more) so that primers bind more specifically, or perform a gradient PCR to find the optimal annealing temperature. Alternatively, touchdown PCR can be performed which prevents the accumulation of unspecific products by stepwise decreasing of the annealing temperature from high temperatures to low temperatures per set of cycles.
–
Too much cycles → decrease the number of cycles stepwise (three cycles per run).
–
Hot Start PCR → use Hot Start DNA Polymerase to avoid amplification of unspecific products prior to amplification.
–
Concentration of primers is not optimal → test different concentrations.
Fig. 3. Amplified COI fragments on agarose gel (DNA ladder, far left). PCR products with multiple bands.
336
T. Knebelsberger and I. Stöger
–
Concentration of MgCl2 is not optimized → test different MgCl2 concentrations.
–
Too much DNA template → measure DNA concentration and then dilute the DNA template concentration.
–
Too much Polymerase → reduce Polymerase concentration.
–
Primers are already degraded → order new primers.
–
Contamination → a combination of smear and product indicates contamination. Replace all reagents.
Positive negative control: –
Contamination → replace all reagents. To minimize contamination, perform DNA extraction and PCR amplification in separate laboratories. Separate pre- and post-PCR manipulations.
40. The mix of the two enzymes (Exonuclease I, Shrimp Alkaline Phosphatase) is not thermostable and loses activity at room temperature. Keep it on ice while handling with it and put it back to the freezer immediately after usage. 41. In case of very bright bands on agarose gel, it is recommended to dilute the cleaned up product with ca. 10 ml molecular water.
Acknowledgment This work was supported by the German Science Foundation DFG as part of the DNA Bank Network project (http://www.dnabanknetwork.org). References 1. Rohland N, Siedel H, Hofreiter M (2004) Nondestructive DNA extraction method for mitochondrial DNA analyses of museum specimens. Biotechniques 36:814–821 2. Chakraborty A, Sakai M, Iwatsuki Y (2006) Museum fish specimens and molecular taxonomy: a comparative study on DNA extraction protocols and preservation techniques. J Appl Ichthyol 22:160–166 3. Gilbert TMP, Moore W, Melchior L, Worobey M (2007) DNA extraction from dry museum beetles without conferring external morphological damage. PLoS ONE 2:e272 4. France SC, Kocher TD (1996) DNA sequencing of formalin-fixed crustaceans from archival research collections. Mol Mar Biol Biotech 5:304–313
5. Chase MR, Etter RJ, Rex MA, Quattro JM (1998) Extraction and amplification of mitochondrial DNA from formalin-fixed deep-sea molluscs. Biotechniques 24:243–247 6. Chatigny ME (2000) The extraction of DNA from formalin-fixed, ethanol-preserved reptile and amphibian tissues. Herpetol Rev 31:86–87 7. Schander C, Halanych KM (2003) DNA, PCR and formalinized animal tissue – a short review and protocols. Org Divers Evol 3: 195–205 8. Coura R, Prolla JC, Meurer L, Ashton-Prolla P (2008) An alternative protocol for DNA extraction from formalin fixed and paraffin wax embedded tissue. J Clin Pathol 58: 894–895
14
DNA Extraction, Preservation, and Amplification
9. Zetzsche H, Klenk H-P, Raupach MJ, Knebelsberger T, Gemeinholzer B (2008) Comparison of methods and protocols for routine DNA extraction in the DNA Bank Network. In: Gradstein R, Klatt S, Normann F, Weigelt P, Willmann R, Wilson R (eds) Systematics. Universitätsverlag Göttingen, Göttingen, p 354 10. Winnepenninckx B, Backeljau T, De Wachter R (1993) Extraction of high molecular weight DNA from molluscs. Trends Genet 9:409 11. Van Moorsel CHM, Van Nes WJ, Megens HJ (2000) A quick, simple, and inexpensive mollusc DNA extraction protocol for PCR-based techniques. Malacologia 42:203–206 12. Pirttilä AM, Hisikorpi M, Kämäräinen T et al (2001) DNA isolation methods for medical and aromatic plants. Plant Mol Biol Rep 19:273a–f 13. Nishiguchi MK, Doukakis P, Egan M et al (2002) DNA isolation procedures. In: DeSalle R, Giribet G, Wheeler WC (eds) Methods and tools in biosciences and medicine: techniques in molecular systematics and evolution. Birkhäuser Verlag, Basel, pp 249–287 14. Thomson JA (2002) An improved non-cryogenic transport and storage preservative facilitating DNA extraction from ‘difficult’ plants collected at remotes site. Telopea 9:755–760 15. Skujienė G, Soroka M (2003) A comparison of different DNA extraction methods for slugs (Mollusca: Pulmonata). Ekologija 1:12–16 16. Bhadury P, Austen MC, Bilton BT et al (2006) Exploitation of archived marine nematodes – a hot lysis DNA extraction protocol for molecular studies. Zool Scr 36:93–98 17. Schill RO (2007) Comparison of different protocols for DNA preparation and PCR amplification of mitochondrial genes of tardigrades. J Limnol 66:164–170 18. Sands CJ, Convey P, Linse K, McInnes SJ (2008) Assessing meiofaunal variation among individuals utilising morphological and molecular approaches: an example using Tardigrada. BMC Ecol 8:7 19. Schizas NV, Street GT, Coull BC, Chandler GT, Quattro JM (1997) An effective DNA extraction method for small metazoans. Mol Mar Biol Biotech 6:381–383 20. Porco D, Rougerie R, Deharveng L, Hebert P (2010) Coupling non-destructive DNA extraction and voucher retrieval for small soft-bodied Arthropods in a high-throughput context: the example of Collembola. Mol Ecol Res 10: 942–945 21. Hill CA, Gutierrez JA (2003) A method for extraction and analysis of high quality genomic DNA from ixodid ticks. Med Vet Entomol 17:224–227
337
22. Halos L, Jamal T, Vial L et al (2004) Determination of an efficient and reliable method for DNA extraction from ticks. Vet Res 35:709–713 23. Mtambo J, van Bortel W, Madder M et al (2006) Comparison of preservation methods of Rhipicephalus appendiculatus (Acari: Ixodidae) for reliable DNA amplification by PCR. Exp Appl Acar 38:189–199 24. Zhang D, Yang Y, Castlebury LA, Cerniglia CE (1996) A method for the large scale transformation efficiency fungal genomic DNA. FEMS Microbiol Lett 145:261–265 25. Fredricks DN, Smith C, Meier A (2005) Comparsion of six DNA extraction methods for recovery of fungal DNA assessed by quantitative PCR. J Clin Microbiol 43:5122–5128 26. Muller FM, Werner KE, Kasai M et al (1998) Rapid extraction of genomic DNA from medically important yeasts and filamentous fungi by high-speed cell disruption. J Clin Micobiol 36:1625–1629 27. Csaikl UM, Bastian H, Brettschneider R et al (1998) Comparative analysis of different DNA extraction protocols: a fast, universal maxipreparation of high quality plant DNA for genetic evaluation and phylogenetic studies. Plant Mol Biol Report 16:69–86 28. Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem Bull 19:11–15 29. Drabkowa L, Kirschner J, Vlcek C (2002) Comparison of seven DNA extraction and amplification protocols in historical herbarium specimen of Juncaceae. Plant Mol Biol Report 20:161–175 30. Shepherd M, Cross M, Stokoe RL et al (2002) High-throughput DNA extraction from forest trees. Plant Mol Biol Rep 20:425a–425j 31. Haymes KM, Ibrahim IA, Mischke S et al (2004) Rapid isolation of DNA from chocolate and date palm tree crops. J Agric Food Chem 52:5456–5462 32. Ribeiro RA, Lovato MB (2007) Comparative analysis of different DNA extraction protocols in fresh and herbarium specimens of the genus Dalbergia. Gen Mol Res 6:173–187 33. Lindahl T (1993) Instability and decay of the primary structure of DNA. Nature 362:709–715 34. Mitchell D, Willerslev E, Hansen AJ (2005) Damage and repair of ancient DNA. Mutat Res 571:265–276 35. Smith S, Morin PA (2005) Optimal storage conditions for highly dilute DNA samples: a role for trehalose as a preserving agent. J Forensic Sci 50:1101–1108
338
T. Knebelsberger and I. Stöger
36. Murray S, Butler RC, Hardacre A, TimmermanVaughan G (2007) Use of quantitative realtime PCR to estimate maize endogenous DNA degradation after cooking or extrusion and in food products. J Agric Chem 55:2231–2239 37. Anchordoquy TJ, Molina MC (2007) Preservation of DNA. Cell Preserv Technol 5:180–188 38. Zimmermann J, Hajibabaei M, Blackburn DC et al (2008) DNA damage in preserved specimens and tissue samples: a molecular assessment. Front Zool 5:1–18 39. Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol Mar Biol Biotech 3:294–299
40. Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications through DNA barcodes. Proc R Soc Lond B 270: 313–321 41. CBOL Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci USA 106:12794–12797 42. Seifert KA (2009) Progress towards DNA barcoding of fungi. Mol Ecol Res 9: 83–89 43. Yoder M, De Ley IT, Wm King I et al (2006) DESS: a versatile solution for preserving morphology and extractable DNA of nematodes. Nematology 8:367–376 44. Knölke S, Erlacher S, Hausmann A et al (2005) A procedure for combined genitalia extraction and DNA extraction in Lepidoptera. Insect Syst Evol 35:401–409
Chapter 15 DNA Mini-barcodes Mehrdad Hajibabaei and Charly McKenna Abstract Conventional DNA barcoding uses an approximately 650 bp DNA barcode of the mitochondrial gene COI for species identification in animal groups. Similar size fragments from chloroplast genes have been proposed as barcode markers for plants. While PCR amplification and sequencing of a 650 bp fragment is consistent in freshly collected and well-preserved specimens, it is difficult to obtain a full-length barcode in older museum specimens and samples which have been preserved in formalin or similar DNA-unfriendly preservatives. A comparable issue may prevent effective DNA-based authentication and testing in processed biological materials, such as food products, pharmaceuticals, and nutraceuticals. In these cases, shorter DNA sequences—mini-barcodes—have been robustly recovered and shown to be effective in identifying majority of specimens to a species level. Furthermore, short DNA regions can be utilized via highthroughput sequencing platforms providing an inexpensive and comprehensive means of large-scale species identification. These properties of mini-barcodes, coupled with the availability of standardized and universal primers make mini-barcodes a feasible option for DNA barcode analysis in museum samples and applied diagnostic and environmental biodiversity analysis. Key words: DNA barcoding, Museum specimens, Biodiversity, COI, Formalin, Molecular diagnostics
1. Introduction DNA barcoding is a cost-effective genomics tool for species identification and a valuable approach to aid the discovery of new and cryptic species (1, 2). Since its inception, the Barcode of Life initiative has revolutionized biodiversity assessment and has significantly aided in fields, such as bio-security, cryptic species identification, paleocological studies, forensics and diet analysis (3–6) The 650 bp animal DNA barcode—near the 5¢ region of the mitochondrial cytochrome c oxidase 1 (COI, cox1) gene (1)—is readily sequenced and provides species-level resolution of approximately 98% for large taxonomic assemblages, such as birds, mammals, fishes, and various arthropods (7). Current barcode library construction has mainly focused on the analysis of recently collected W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_15, © Springer Science+Business Media, LLC 2012
339
340
M. Hajibabaei and C. McKenna
specimens (i.e., up to 5 years old) or samples that have been preserved in a DNA-friendly manner (3). While these samples provide an efficient means to obtain cost-effective barcode sequences, it is crucial to obtain genetic information from millions of older museum specimens. There are a number of advantages associated with the use of museum collections; the avoidance of costly field collections, the availability of rare or extinct taxa (5) and a means to build broad geographical barcode libraries (3). Most importantly, museum samples represent the vast and valuable taxonomic knowledge that has been generated by analyzing these specimens. Hence, obtaining genetic information from these well-studied specimens will enable an efficient way of linking available taxonomic knowledge to recently collected specimens through the use of DNA barcodes. For example, museum specimens include type specimens that are considered to be a gold standard for the taxonomic information linked to a species. Therefore, recently collected and barcode identified individuals from presumably the same species, can only be verified if a DNA barcode sequence from the type specimen is available (4). Conventional barcoding methodology is often limited by its failure to amplify and sequence degraded DNA, which is often found in museum specimens and in preserved and processed biological material (food products, decayed tissues (4)). Approaches to repair DNA in vitro are inefficient and not cost-effective as the DNA damage and degradation in museum samples is complex and difficult to characterize (6, 8). In comparison, short sequences (i.e., 100 bp) are usually stable in museum specimens (6). The use of a short or minimalist barcode (100–300 bp, referred herein as “mini-barcode”) greatly expands the applications of DNA barcoding (3) (Table 1). Most specimens of archive collections have not been assembled with the purpose of genetic studies. The variance in DNA quality
Table 1 Comparison of full-length DNA barcode and mini-barcode Full-length barcode (650 bp)
Mini-barcode (100–300 bp)
Specimen sequence success relative to age
>90%, 5–10 years
>90%, up to 200 years
Species resolution
95–98%
91–95%
Technology
Sanger (ABI)
Sanger (ABI) NextGen sequencing (i.e., 454) Single pyrosequencing (PSQ)
Applicability
Barcode library construction Routine barcoding
Museum and preserved samples Processed material (i.e., food products, pharmaceuticals) Environmental barcoding
15
DNA Mini-barcodes
341
among similar aged samples can be largely accounted for by the variation in preservation methods. Many specimens, particularly insects, are pinned for storage allowing the soft tissue to desiccate and decompose (9). This dehydration combined with exposure to the environment can lead to a reduction of DNA quality and increased incidences of DNA fragmentation (6). Field collected specimens and pathology tissue samples are commonly exposed to formaldehyde, which has severe consequences on the DNA (6). In a formalin-exposure study utilizing amphibian specimens, a clear negative correlation was found between exposure time and PCR success (6). Furthermore, a recent study by Baird et al. (10), examined the effect of formalin preservation on DNA barcoding. This work used tissue samples of four invertebrate species commonly used in freshwater biomonitoring programs as well as archival specimens of macroinvertebrates. The authors concluded that exposure to formalin followed by long-term storage can dramatically reduce the ability to obtain a full-length barcode; however, some mini-barcodes can still be recovered from these samples (10). In many cases, damaged DNA can be extracted from a sample but the DNA is broken into small fragments due to hydrolysis of the DNA backbone (6, 11). This fragmented DNA is unable to be amplified using standard barcoding primers. Models derived from preserved specimen and tissue samples by capillary electrophoresis predict a rapid initial decline in average DNA fragment size in the first 5 years followed by a more gradual change in the following time period (6). The same problems that plague preserved samples affect fossils and ancient DNA. Our understanding of evolutionary processes is hindered due to DNA fragmentation, cross-linking due to condensation (12) and pyrimidine oxidation which prevents extension during the amplification process (11). Furthermore, these ancient records often produce DNA extracts that are a combination of bacterial, fungal, and human contaminants (11) complicating the ability to achieve a useable standard barcode. Mini-barcodes (e.g., 100–300 bp) have been found effective for species-level identification in DNA-damaged samples and in situations, where it is difficult to obtain a full-length barcode (Table 2). Additionally, components, such as average nucleotide composition, patterns of strand asymmetry, and a high frequency of hydrophobic amino acid encoding codons can be accurately predicted from a short barcode sequence (13). Furthermore, it has been shown that mini-barcodes may provide measures at both the intra-specific and intra-generic levels of sequence variability and divergence in some cases when compared to full length barcodes (3). Full-length 650 bp COI barcodes can exhibit up to 98% species resolution, with smaller regions 100 bp and 250 bp producing correspondingly lower rates of identification success (3, 4) (Fig. 1), but when employed in ecological or environmental contexts where the number of species per genus is often low, they can produce
Age of sample (years, unless specified)
2–21
1–14
>1–23
Average 80
N/A
Average 15
N/A
22 960 ± 120 (OxA-15348) and 15 810 ± 75 (OxA-14930) uncalibrated Radiocarbon years
N/A
Sample type
Oven-dried (museum)
Ethanol-preserved (museum)
Formalin-preservation
Arsenic/borax-preserved and air-dried (museum)
Dried and formalin-fixed tissue (forensic samples)
Museum
Silica-dried (museum)
Permafrost
Decayed carcass
Ots213
P6 loop of trnH intron
Plantae
Actinopterygii: Salmoniformes
P6 loop of trnH intron
COI
C01
COI
COI
COI
COI
Gene
Plantae
Insecta: Diptera
Reptilia
Aves: 17 orders
Insecta: Trichoptera
Insecta: Hymenoptera
Insecta: Lepidoptera
Taxonomic group
170–375
13–158; average 43.2
13–158
227–469
175–245
190 and 310
130
135
134 and 221
(17)
(15)
(15)
(20)
(18)
(8)
(10)
(3)
(3)
Mini-barcode size (bp) References
Table 2 Different research projects that used DNA mini-barcodes for biodiversity analysis and species identification
342 M. Hajibabaei and C. McKenna
15
DNA Mini-barcodes
343
Fig. 1. Comparison of DNA barcode size versus proportion of species identified reveals the efficiency of mini-barcodes in resolving species (adapted from ref. 4). Sequence read lengths typically obtained from three commonly used next-generation sequencing technologies as well as Sanger sequencing are shown on the graph. It is clear that 454 pyrosequencing and Sanger are currently optimal technologies for mini-barcode and full-barcode recovery.
rates of identification that are very high (4). In silico studies have been utilized to corroborate the empirical tests of the rates of identification success for DNA barcodes, but also point to the need to carefully design experiments in environmental contexts where primer bias may affect the results (3, 14). This discovery has led to an increase in the use of minibarcodes. The consistency of mini-barcodes to distinguish between species has been explored in plants (15), fish (3, 4, 14, 16, 17), reptiles (18), birds (4, 8, 19), arthropods (3, 4, 9, 20–23) fungi (4, 20), and mammals (4, 14). Additionally, multiple overlapping mini-barcodes have been used to reconstruct the full COI barcode (9, 19). Applications of mini-barcodes include food Web analysis (21), distinction of cryptic species (16), biodiversity studies (3, 11, 15, 17, 22, 23), and effective law enforcement for the conservation of wildlife (18). Short DNA regions can also be utilized via new parallel highthroughput sequencing platforms (aka next-generation sequencing), such as pyrosequencing-based 454 Roche sequencer allowing a comprehensive and inexpensive means for barcoding applications (15, 23) (Fig. 1). With these technologies, the need for traditional cloning is eliminated as simultaneous amplification of several thousands to millions of 10–400 bp DNA molecules is achieved during the emulsion PCR process (7).
344
M. Hajibabaei and C. McKenna
2. Materials We recommend using molecular biology laboratory material, including gloves, disposable pipette tips, PCR-grade tubes/strips, or 96-well microtiter plates. 2.1. Silica-Based DNA Extraction
1. NucleoSpin 96 Tissue Kit (Macherey-Nagel). 2. Ethanol (99.9%). 3. Matrix ImpactII P1250 pipette (Matrix Technologies). 4. Centrifuge with deep-well plate rotor (25R, Beckman Coulter). 5. Incubator (Fisher Scientific).
2.2. PCR Amplification
1. 10× PCR Buffer for Platinum Taq DNA polymerase (Invitrogen). 2. 10 mM deoxynucleotide (dNTP) mix (New England Biolabs). 3. Oligonucleotide primers (Forward and Reverse), 100 mM stock, and 10 mM working solutions (Integrated DNA Technologies). 4. Platinum Taq DNA Polymerase (Invitrogen). 5. 50 mM Magnesium chloride (MgCl2) (Sigma-Aldrich). 6. Molecular biology grade distilled water. 7. Thermocycler (Mastercycler EP Gradient, Eppendorf).
2.3. PCR Amplification Check Using E-gel 96 Gel
1. Mother E-base (Invitrogen).
2.4. PCR Amplification Check Using Handmade Agarose Gel
1. Molecular biology grade Agarose (Sigma-Aldrich).
2. 2% E-gel 96 gels (Invitrogen). 3. Gel documentation system (AlphaImager 3400, Alpha Innotech Corporation).
2. 1× TBE buffer: 0.9 M Tris base, 0.89 M Boric acid, 0.02 M Na-EDTA. 3. Ethidium Bromide (10 mg/ml). 4. DNA size standard (“ladder”; New England Biolabs). 5. Submarine electrophoresis apparatus and power supply (Thermo EC, Fisher Scientific). 6. Gel documentation system (AlphaImager 3400, Alpha Innotech Corporation).
2.5. Sanger Sequencing Reaction
1. BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems). 2. 5× Sequencing Buffer: 400 nm Tris–HCl pH 9.0 and 10 mM MgCl2.
15
DNA Mini-barcodes
345
Table 3 PCR Primer sets commonly used for mini-barcode amplification Amplicon size (bp)
References
COI; Universal eukaryotes
130
(4)
COI; Reptile, snake
175
(18)
COI; Reptile, snake
245
(18)
16S; plant
166
(15)
Name
Primer sequence (5¢–3¢)
Target
Uni-MinibarF1
TCCACTAATCACAARGATATT GGTAC
Uni-MinibarR1
GAAAATCATAATGAAGGCATGAGC
Minibar-F1
TGA TTY TTT GGH CAC CCR GAA GT
Minibar-R1
AAT ATR TGR TGG GCY CAD AC
Minibar-F2
GGT AGY GAT CAA ATC TTT AAY GT
Minibar-R2
GGG TAG ACD GTT CAV CCT GTT CC
Primerpair-1
AGCATTAGCTCTCCCTGA AGCCATACGGCGGTGAAT
3. Sequencing Oligonucleotide Primers. For bi-directional sequencing, two sequencing reactions are performed, each require a single Forward or Reverse primer. See Table 3 for a list of primers.
3. In Silico Approaches for Mini-barcode Analysis 3.1. Mini-barcode Primer Selection
Primers should be designed based on an alignment of sequences from a reference barcode library source, such as GenBank or BOLD (http://www.barcodinglife.com/views/idrequest.php). Keeping in mind physical and structural properties, such as G + C content, annealing temperature and self complementary, oligos should target highly conserved regions flanking stretches of 100–300 bp in barcode region. To facilitate high-throughput sequencing applications, M13 tails may be attached to the forward and reverse primers. Although the addition of these tails generally do not bias PCR results, it is best to verify this empirically before using tailed primers in real applications (4). Primer suitability can be confirmed using a number of freely available programs, such as Primer3 (24) and IDT OligoAnalyzer (25). Most applications of mini-barcodes allow for a selection of a target taxonomic group and subsequent primer design. For example, Hajibabaei et al. (3) developed primers for mini-barcoding specific Lepidoptera species. In this case, the availability of reference sequences from closely related taxa to the target species will increase the chances of developing robust
346
M. Hajibabaei and C. McKenna
primers. For protein-coding genes, such as COI, conservation of primer binding sites at amino acid level will aid the selection of target primers. However, when dealing with unknown specimens, or in cases where identification speed is required, universal primers for mini-barcodes can be utilized. Such a universal mini-barcode system has been developed for COI and tested on a number of taxonomic groups (4). Additionally, universal primers are important for sequencing mini-barcodes from environmental samples containing different organisms (23). Table 3 summarizes different commonly used primers for mini-barcoding. 3.2. Bioinformatics Analysis to Estimate Mini-barcode Performance
Before using a mini-barcode in empirical tests, it is important to determine whether sequence information obtained from a given mini-barcode sequence can provide species-level resolution in a given taxonomic group. Available taxon-specific barcode data can be downloaded from barcode libraries, such as BOLD (http:// or GenBank. www.barcodinglife.com/views/idrequest.php) Alternatively, sequences obtained in full-length barcode analysis in a given taxonomic group can be used as template for primer design for mini-barcoding. The full-length region can then be divided into short subsets from 5¢ to 3¢ end (for example see Subheading 3). Software such as MEGA (26) provides a simple bioinformatics tool to partition data for this purpose. By comparing each putative mini-barcode segment (i.e., the first 100 bp of 5¢ region) to the full-length barcode, through simple statistics, such as number of variable and parsimony informative sites and intra-specific and intra-generic divergences (3) one can determine the mini-barcode fragment with optimal information. Subsequent neighbor-joining (NJ) analysis can measure resolution and ultimately tests the accuracy of the short DNA fragments for the practice of species identification for the target taxonomic group (3, 7).
3.3. Testing the Performance of a Putative COI Mini-barcode for Species Identification
1. Select a target sequence library from BOLD or GenBank or use a set of taxonomically identified sequences. Typically, 100–500 taxa are optimal for this analysis (see Note 1). 2. Align sequences using automated alignment tools (ClustalW, MUSCLE). These tools are available stand-alone or embedded in sequence analysis software, such as MEGA. 3. Inspect alignment visually using a program, such as MEGA. Look for obvious signs of misaligned sequences, such as indels (insertions/deletions). Use amino acid translation to guide the alignment verification. 4. Save the aligned sequences in Mega format (.meg). If you use MEGA for alignment, this can be done from alignment viewer. If you use other programs for alignment, save your alignment in FASTA format (.fasta, .fas) and import the file in MEGA.
15
DNA Mini-barcodes
347
5. Open the file for phylogenetic analysis in MEGA. Use “Sequence Data Explorer” tool in MEGA to select minibarcode fragments for analysis. This can be done in “Data” using “Select & Edit Genes/Domains”. 6. Select your putative mini-barcode(s) from the full-length library by specifying the beginning and the ending nucleotides. Make sure these positions do not include positions of your forward and reverse primers. You can select multiple minibarcodes as long as the positions are not overlapping. For overlapping mini-barcodes, you will require the use of a separate analysis for each overlapping mini-barcode. 7. Select your target mini-barcode by checking the box next to it in the same menu (Select & Edit Genes/Domains). Once done, check the alignment in “Sequence Data Explorer” to ensure only your target mini-barcode is highlighted. 8. Use “Highlight” menu in “Sequence Data Explorer” to highlight/measure Variable and Parsimony Informative sites as simple statistics. These numbers can be compared with numbers obtained from a full-length barcode or other mini-barcodes. 9. Use “Phylogeny” tools to assemble an NJ tree for your target mini-barcodes. Use this tree to inspect species-level resolution. Compare this NJ tree to the one obtained from full-length barcode library. Evaluate cases where the mini-barcode does not provide species-level resolution. If these cases are among your putative target taxa for mini-barcoding, it is important to consider a longer or alternative fragment for mini-barcoding. Note that DNA degradation in museum samples often dramatically decreases the chances of obtaining fragments longer than 150–250 bp. 10. Once your optimal mini-barcode fragment(s) is selected, proceed to design forward and reverse PCR primers using 20–30 bases flanking mini-barcode fragment(s). 3.4. DNA Extraction
Many protocols are available for DNA extraction from a number of different tissue types (see Chapter III. B. on DNA extraction (III) for details). Subsequently, several of these procedures have been incorporated into relatively inexpensive and effective commercial kits, such as the silica-based Nucleospin Tissue Kit (MachereyNagel, Düren, Germany) (see Note 2). Most of our mini-barcode tests have utilized this kit to obtain DNA from specimens (including museum samples). Additionally, a recent study by Shokralla et al. (27) has shown that modern PCR enzymes are capable of amplifying genetic information from preservative ethanol for noninvasive sampling or when no tissue specimen is available. This suggests that DNA extraction may be unnecessary in many protocols and in many cases DNA can be obtained from valuable museum specimens
348
M. Hajibabaei and C. McKenna
noninvasively. Because mini-barcodes are much smaller than full-length barcodes, simple DNA extraction and releasing approaches, that may not provide full-length intact barcode DNA, are suitable for mini-barcoding. 3.5. PCR Amplification from Dried Museum Samples
Mini-barcodes can be amplified in a simple PCR reaction containing dNTPs, primers (Forward and Reverse), MgCl2, 10× PCR Buffer, Taq polymerase, and template DNA. A master mix (excluding template DNA) should be prepared for all samples that will be processed. The volume of ingredients is to be calculated based upon the number of samples to be amplified with additional 5% to account for pipette error. No special condition or additive is required for amplification of mini-barcodes and different versions of Taq polymerase should be able to amplify mini-barcode sequences (see Note 3). The thermocycler program will vary depending on the annealing temperature and extension time. For the universal primer set, (4) a touch up PCR program was used: 95°C for 2 min, followed by five cycles of 95°C for 1 min, 46°C for 1 min, and 72°C for 30 s, followed by 35 cycles of 95°C for 1 min, 53°C for 1 min, and 72°C for 30 s, and finally a final extension at 72°C for 5 min (see Note 5). Negative control reactions with no DNA template as well as a positive control reaction should always be included (see Note 4). To determine PCR success, it is necessary to visualize PCR products on an agarose gel. For high-throughput analysis a precast 2% E-gel 96 agarose (Invitrogen, Burlington, ON, Canada) can be used. Alternatively, casting one’s own gel and utilizing a molecular size marker (e.g., 100 bp DNA Ladder, New England Biolabs) can give more comprehensive results. Positive amplification (clean bands) can then be bidirectionally sequenced using standard BigDye chemistry (Applied Biosystems, Foster City, CA) on sequencers, such as a 3730xl DNA analyzer (Applied Biosystems, Foster City, CA, USA). The sequence reads are to be trimmed and edited resulting in clean contigs for phylogenetic analysis using tools, such as CodonCode (CodonCode Corporation) BioEdit (27) and MEGA (25). 1. Select DNA samples for PCR amplification (see Note 2) Calculate the volume of each necessary reagent (see Table 4 for a detailed PCR recipe). 2. Prepare master mix with calculated volumes. It is often easy to use the calculation as a check list to ensure all reagents have been added. Briefly vortex and centrifuge. 3. Briefly spin down Taq polymerase before adding to the master mix. Do not vortex, mix by gentle pipette up and down. Return Taq polymerase to freezer. 4. Dispense master mix to PCR reaction tubes (0.2 ml single tubes, strips, or 96-well plates).
15
DNA Mini-barcodes
349
Table 4 A typical PCR recipe for amplification of mini-barcodes Reagent
Initial concentration
Final concentration
Volume per reaction (ml)
H2O
–
–
17.5
PCR buffer (Platinum)
10×
1×
2.5
MgCl2
50 mM
2.0 mM
1
dNTPs
10 mM
0.2 mM
0.5
Primer (Forward)
10 mM
0.2 mM
0.5
Primer (Reverse)
10 mM
0.2 mM
0.5
Platinum Taq polymerase
5 U/ml
2.5 U
0.5
Template DNA Final volume
2 25
5. Add 1–2 ml DNA template to each reaction tube. Ensure that pipette tip is changed each time. 6. Cover PCR tubes securely (caps in the case of strips or seal plates with foil) and label using permanent marker. 7. Centrifuge the plate (or PCR tubes/strip in plate holder) about 1 min at 1,000 × g ensuring that the centrifuge is wellbalanced. 8. Place tubes, strips, or plate firmly into thermocycler. Double check the covers to minimize evaporation before beginning the required program. Ensure that the thermocycler lid is secure and that the program begins. 9. Perform PCR amplification check using gel-electrophoresis (Protocols 3 and 4) to determine positive samples. 10. (Optional) Use a PCR purification method to clean samples prior to sequencing reaction (i.e., QIAQuick (Qiagen, Duesseldorf, Germany)). 3.6. PCR Amplification Check Using Pre-cast 2% E-gel 96 Agarose (Invitrogen, Burlington, ON, Canada)
1. Remove gel from package and using thumb and reasonable force to remove comb. Use packaging as base for gel while applying sample (see Note 6). 2. Load 14 ml of ddH2O to wells holding 12-multichannel pipette on slight angle. 3. Load 4 ml of PCR product to wells using 12-multichannel pipette. 4. Slide gel into the electrode connections on E-BaseTM. Ensure that the E-gel display screen says “EG” and change the time to
350
M. Hajibabaei and C. McKenna
appropriate amount (~4 min). Press and release pwr/prg button, the red light should turn to green. 5. Remove gel from base and acquire image using a UV transilluminator and digital camera if necessary. 6. Discard gloves, tips and gel in hazardous waste. 3.7. PCR Amplification Check Using 1.5% Hand-Made Agarose Gel
1. Prepare gel-casting tray with appropriate combs relative to number of samples and secure open edges using gasket system or tape. 2. Weigh 1.5 g agarose powder and add this to 100 ml 1× TBE buffer in a glass beaker, preferably with lid loosely tightened (this calculation may vary for different gel sizes; check your electrophoresis apparatus to verify). 3. Boil mixture in microwave (~2 min) until the powder has completely dissolved and solution is uniform. 4. Allow to cool slightly before adding 3 ml ethidium bromide. Swirl beaker gently to mix (see Note 6). 5. Allow more cooling, you should be able to touch bottom of beaker without burning hand. 6. Steadily pour agarose solution into gel tray, do not move while doing this as it may create bubbles. Ensure that there are no bubbles around combs, if so gently remove these using clean pipette tip. 7. Place gel in electrophoresis chamber filled with 1× TBE buffer, add buffer until gel is fully submerged. Gently remove combs. 8. Load 3–5 ml ladder in the first lane of the agarose gel. Mix by pipette 4 ml PCR product with 2–3 ml loading dye on parafilm or in another plate. 9. Apply PCR samples to gel wells. 10. Connect voltage (100 V) and allow samples to run for 20–30 min. Using UV transilluminator of Geldoc system visualize gel and acquire image.
3.8. Sanger Sequencing Reaction
1. Prepare BigDye terminator (Applied Biosystems, Foster City, CA) master mix to appropriate dilution for size of product (i.e., 1/16 dilution) and according to sample number (see Table 5 for details). 2. Aliquot BigDye mix into PCR tubes, strips, or plate. Add 1.5–2 ml of PCR product as template. If using a plate, securely seal plate using foil or strip caps to prevent evaporation. Perform cycle-sequencing reaction for each primer direction (forward and reverse), an optimized thermocycler protocol can be found on the Canadian Center for DNA Barcoding (CCDB) Web site (http://www.ccdb.ca/) Protocols: Sequencing.
15
DNA Mini-barcodes
351
Table 5 A typical Sanger sequencing recipe with BigDye (1/16 dilution) Reagent
One reaction (ml)
Dye terminator mix 3.1
0.25
5× ABI sequencing buffer
1.875
10% trehalose
5
10 mM primer
1
H2O
0.875
Total volume
9
PCR product
1–2
Total volume
~10
3. Perform cycle sequencing clean-up using method, such as AutoDTRTM 96 (EdgeBio, MD, USA). A detailed protocol can be found on the Canadian Center for DNA Barcoding (CCDB) Web site (http://www.ccdb.ca/) in Protocols: Sequencing. 4. After clean-up, submit reactions for sequence analysis using an automated DNA sequencer (e.g., Applied Biosystems 3730xl DNA Analyzer).
4. Notes 1. Sequence selection is critical as it influences the analysis of the utility of a putative mini-barcode. For barcoding purposes, species-level discrimination is most important. Hence, sequences used for mini-barcode selection should include maximum number of species. Congeneric species are good targets for this analysis. Additionally, when possible, multiple sequences from each species should be included so that conspecific variation has been taken into consideration in calculations. 2. Always perform sampling and DNA extraction procedures in a dedicated pre-pcr area. Clean work surface with ethanol or a product, such as ELIMINase Decontaminant. All tissuehandling instruments should be sterilized (preferably by flaming) between samples. 3. A high-fidelity Polymerase, such as Platinum Taq (Invitrogen, Burlington, ON, Canada) is recommended as it requires less optimization and works better with small quantities of template DNA.
352
M. Hajibabaei and C. McKenna
4. Always include at least one PCR reaction without template as a negative control to check for reagent DNA contamination. Consider using a positive control (a previously amplified DNA sample) to test the efficiency of the PCR reagents. 5. PCR protocols listed are for a thermocycler with a rapid thermal ramping (e.g., Eppendorf MasterCycler EP). This allows for more efficient annealing and quicker completion of PCR amplification, optimizations will need to be made if a model with slower ramping is used. 6. Ethidium bromide is toxic. Always wear nitrile gloves when utilizing Ethidium Bromide. Discard gloves, tips, and used gels in appropriate hazardous container after usage. Consult MSDS and a laboratory health and safety manual for safe handling/disposal before using.
Acknowledgements This work was supported by grants from Genome Canada through the Ontario Genomics Institute, Environment Canada and NSERC to MH. References 1. Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications through DNA barcodes. Proc R Soc Lond B Biol Sci 270:313–321 2. Hajibabaei M, Singer GAC, Hebert PDN, Hickey DA (2007) DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends Genet 23:167–172 3. Hajibabaei M, Smith MA, Janzen DH et al (2006) A minimalist barcode can identify can identify a specimen whose DNA is degraded. Mol Ecol Notes 6:959–964 4. Meusnier I, Singer GAC, Landry JF et al (2008) A universal DNA mini-barcode for biodiversity analysis. BMC Genomics 9:214 5. Wandeler P, Hoeck PEA, Keller LF (2007) Back to the future: museum specimens in population genetics. Trends Ecol Evol 22:634–642 6. Zimmermann J, Hajibabaei M, Blackburn DC et al (2008) DNA damage in preserved specimens and tissue samples: a molecular assessment. Front Zool 5:18 7. Hajibabaei M, Singer GAC, Clare EL, Hebert PDN (2007) Design and applicability of DNA arrays and DNA barcodes in biodiversity monitoring. BMC Biol 5:24
8. Patel S, Waugh J, Millar CD, Lambert DM (2009) Conserved primers for DNA barcoding historical and modern samples from New Zealand and Antarctic birds. Mol Ecol Resour 10:431–438 9. Dean MD, Ballard JW (2001) Factors affecting mitochondrial DNA quality from museum preserved Drosophila simulans. Entomol Exp Appl 98:279–283 10. Baird DJ, Pascoe TJ, Zhou X, Hajibabaei M (2011) Building freshwater macroinvertebrate DNA barcode libraries from reference collection material: formalin preservation versus specimen age. J North Am Benthol Soc 30:125–130 11. Poinar HN, Schwarz C, Qi J, Shapiro B et al (2006) Metagenomics to paleogenomics: largescale sequencing of mammoth DNA. Science 311:392–394 12. Evans T (2007) DNA damage. NEB Expressions 2(1):1–3 13. Min XJ, Hickey DA (2007) DNA barcodes provide a quick preview of mitochondrial genome composition. PLoS One 2:e325 14. Ficetola GF, Coissac E, Zundel S et al (2010) An in silico approach for the evaluation of DNA barcodes. BMC Genomics 11:434
15 15. Sonstebo JH, Gielly K, Brysting AK et al (2010) Using next-generation sequencing for molecular reconstruction of past arctic vegetation and climate. Mol Ecol Resour 10:1009–1018 16. Saitoh K, Uehara S, Tega T (2008) Genetic identification of fish eggs collected in Sendai Bay and off Johban, Japan. Icthyol Res 56:200–203 17. Baumstegier J, Kerby JL (2009) Effectiveness of salmon carcass tissue for use in DNA extraction and amplificaton in conservation genetic studies. N Am J Fish Manag 29:40–49 18. Dubey B, Meganathan PR, Haque I (2010) DNA mini-barcoding: an approach for forensic identification of some endangered snake species. Forensic Sci Int Genet 5:181–184 19. Lee PLM, Prysjones RP (2008) Extracting DNA from museum bird eggs, and whole genome amplification of archive DNA. Mol Ecol Resour 8:551–560 20. Houdt JKJ, Breman FC, Virgilio M, Meyer MD (2009) Recovering full DNA barcodes from natural history collections of Tephritid fruitflies (Tephritidae, Diptera) using minibarcodes. Mol Ecol Resour 10:459–465 21. Rougerie R, Smith AM, Fernandez-Triana J et al (2010) Molecular analysis of parasitoid linkages (MAPL): gut contents of adult parasitoid wasps reveal larval hosts. Mol Ecol 20: 179–186
DNA Mini-barcodes
353
22. Smith MA, Fisher BL (2009) Invasions, DNA barcodes and rapid biodiversity assessment using ants of Mauritius. Front Zool 6:31 23. Hajibabaei M, Shokralla S, Zhou X, Singer GAC, Baird DJ (2011) Environmental barcoding: a next-generation sequencing approach for biomonitoring applications using river benthos. PLoS One 6:e17497. doi:10.1371/journal. pone.0017497 24. Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 132:365–386 25. Owczarzy R, Tataurov AV, Wu Y et al (2008) IDT SciTools: a suite for analysis and design of nucleic acid oligomers. Nucleic Acids Res 36 (web server issue) 26. Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol 24:1596–1599 27. Shokralla S, Singer GAC, Hajibabaei M (2010) Direct PCR amplification and sequencing of specimens’ DNA from preservative ethanol. Biotechniques 48:233–234 28. Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser 41:95–98
Chapter 16 Ways to Mix Multiple PCR Amplicons into Single 454 Run for DNA Barcoding Ryuji J. Machida and Nancy Knowlton Abstract Metagenetic analysis using second-generation sequencing offers a novel methodology for measuring the diversity of metazoan communities. Among commercially available second-generation sequencers, the 454 GS FLX Titanium (Roche Diagnostics) offers by far the longest read length and can produce one million sequences from a single run. Compared to the large number of sequences produced from single run, however, number of samples these machines can process is rather low. In this chapter, we describe the use of MID adapters to mix multiple PCR amplicons into a single 454 run. This strategy is rather easy to use and up to 132 samples can be multiplexed into a single 454 run. If a large number of samples are going to be mixed into a single 454 run, however, high cost might be next bottleneck. In this context, we also discuss other ways of multiplexing, including the use of fusion primers and Parallel Tagged Sequencing and weigh their advantages and disadvantages. Key words: Metagenetics, Amplicon sequencing, Multiplexing, Second-generation sequencer
1. Introduction In the aquatic environment, metagenomics, metagenetics, and metatranscriptomics are increasingly used to compare, monitor, and assess the diversity of communities and their dynamics, e.g., (1–3). Among these strategies, metagenetic analysis (based on PCR amplified gene-based sequencing) is becoming a standard and feasible strategy in moderate-scale laboratories. In metagenetic analysis, after a target gene is amplified from DNA extracted from environmental samples, sequences of those amplicons are determined by second-generation sequencing technologies. Currently, the 454 GS FLX Titanium machine (Roche Diagnostics) offers by far the longest read length (average 400 bp) among commercially available second-generation sequencers and can produce one million sequences from single run. However, because of structural difference W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_16, © Springer Science+Business Media, LLC 2012
355
356
R.J. Machida and N. Knowlton
from Sanger sequencing, the number of samples that these machines can process in single run is rather low (maximum of 16 samples using a gasket to subdivide a picotiter plate). For many applications, multiple samples are needed for statistical comparisons, and subdivision of the plate results in a reduction of sequences obtained (to about a half in the case of 16 subdivisions). To increase throughput of number of samples, several alternative strategies have been introduced, all of which use sample-specific nucleotide tags to distinguish the source bioinformatically after sequencing. In this chapter, we describe one of the multiplexing protocols, the MID adapter, in detail and compare its advantages and disadvantages with two additional multiplexing strategies—fusion primers and Parallel Tagged Sequencing (PTS) (4–6). Fusion primers contain samplespecific tags and 454 sequencing primers A and B at the 5′ portion of the oligonucleotide in addition to target-specific primers (Fig. 1-1). Therefore, after the amplification by PCR, amplicons can be pooled without further manipulation. In contrast, with the MID adapter strategy, PCR is performed using ordinary primers. Then the MID adapters, which contain sample-specific tags and 454 sequencing primers A and B, are ligated to the PCR amplicons (Fig. 1-2). In PTS, hand-made, sample-specific adapters are ligated to the PCR amplicons. Then adapters, which contain 454 sequencing primers A and B, are ligated to the prepared PCR amplicons with samplespecific adapters (Fig. 1-3). These methods are different in the time required for library preparation, multiplexing scalability, possibility of PCR amplification bias, and capacity of directional sequencing, all of which play important role in designing an experiment. Note that prices and sequencing capacities described in this chapter are based on the information available on December 2010 for machines in the USA.
2. Materials 2.1. PCR Products and Its Purification
1. PCR products (see Note 1). 2. MinElute Gel Extraction Kit (Qiagen), or Agencourt AMPure XP (Beckman Coulter Genomics). 3. TE buffer.
2.2. MID Adapters
1. GS FLX Titanium Rapid Library MID Adaptors Kit (454 Life Sciences). This kit includes 12 kinds of MID adapters. Each MID adapter contains enough amounts for six reactions.
2.3. Library Construction
1. NEBNext Quick DNA Sample Prep Reagent Set2 (New England Biolabls).
Fig. 1. Multiplexed 454 library preparation workflow for fusion primers, MID adapters, and Parallel Tagged Sequencing (4–6). Capital letter A and T in the figure indicate the extended adenine and thymidine in the 5¢ end of amplicons and adapters, respectively. Capital letter P together with bar indicate the phosphorylated 3¢ end. Fusion primers contain samplespecific tag and 454 sequencing primer A or B at the 5¢ portion of the oligonucleotide in addition to the target-specific primer. Therefore, amplification results using fusion primer are not always same as those obtained by target-specific primers. In contrast, advantage of the fusion primer is directional sequencing. In the library made by fusion primer, only one strand of the library has 454 sequencing primer A where the sequencing of 454 will start. In contrast, 3¢ end of both strand have 454 sequencing primer A in the libraries made by MID adapter and PTS. This is the reason why directional sequencing can be performed only by fusion primer (After the library preparation, all of the libraries are denatured to single-strand and proceeded to next step of 454 sequencing, emersion PCR.).
358
R.J. Machida and N. Knowlton
2.4. Library Purification
1. Agencourt AMPure XP (Beckman Coulter Genomics).
2.5. Library Quantification
1. TBS-380 Fluorometer (Turner Biosystem). 2. RL Standard, which included in GS FLX Titanium Rapid Library Preparation Kit (454 Life Sciences). 3. TE buffer.
2.6. Library Pooling
1. MinElute PCR Purification Kit (Qiagen).
3. Methods 3.1. PCR Amplicon Preparation
1. If an undesired fragment is coamplified by PCR, excise the band from the gel and purify using the MinElute Gel Extraction Kit (Qiagen). Otherwise, purify PCR products with Agencourt AMPure XP (Beckman Coulter Genomics). Elute in 17 μl of TE buffer. DNA in the amount of 500 ng or less in 17 μl of TE buffer is recommended for the following procedure. If the amount exceeds 500 ng in 17 μl, adjust the concentration by diluting the product by TE buffer and use only 500 ng for following reactions (see Notes 2 and 3). Keep some amount of the PCR products for MID adapter ligation efficiency verifications (see Subheading 3.4).
3.2. Phosphorylation and dA-Tailing
1. Add 2.5 μl NEBuffer 2 (10×), 2.5 μl ATP, 1.0 μl dNTP Mix, 1.0 μl PNK, 1.0 μl Taq DNA Polymerase (all included in NEBNext Quick DNA Sample Prep Reagent Set2) in a centrifuge microtube. Mix by pipetting (see Note 4). 2. Add 8.0 μl of the mixture to the 17 μl purified PCR amplicon. Vortex briefly and spin down. 3. Incubate the sample in a thermal cycler with the following program: 25°C for 20 min, 72°C for 20 min, and hold at 4°C.
3.3. MID Adapter Ligation
1. Add 1.0 μl of MID adapter to the reaction tube. A different kind of MID adapter should be used for each of the PCR amplicons. Vortex briefly and spin down. Add 1.0 μl of Quick T4 DNA Ligase (included in NEBNext Quick DNA Sample Prep Reagent Set2) to the reaction tube. Vortex briefly and spin down. 2. Incubate the reaction tube 10 min at 25°C.
3.4. Library Purification
1. Purify the product by Agencourt AMPure XP (Beckman Coulter Genomics) following the manual provided by the manufacturer (see Note 5). Extract with 52 μl TE buffer. Use 50 μl of the extract for the following sample preparation and
16
Ways to Mix Multiple PCR Amplicons into Single 454 Run for DNA Barcoding
359
2 μl for MID adapter ligation efficiency verification by running the gel together with untagged PCR products. 3.5. Library Quantification
1. Quantify the library using TBS-380 Fluorometer (Turner Biosystem) following 454 Rapid Library Preparation Method Manual (6).
3.6. Library Pooling
1. Pool the prepared libraries in molar ratios reflecting the proportion of sequence reads desired from each sample. In general, more than 500 ng of pooled library in no more than 100 μl is required for following emPCR. If the volume exceeds 100 μl, concentrate the library using MinElute PCR Purification Kit (Qiagen).
4. Comparison of MID Adapters, Fusion Primers and Parallel Tagged Sequencing 4.1. Comparison of Time Required for Library Preparation
4.2. Scalability for Multiplexing of Large Number of Samples
Figure 1 illustrates workflow of 454 amplicon library preparations using fusion primers, MID adapters, and PTS (4–6). One major factor controlling the time needed to prepare 454 libraries is the number of samples to be multiplexed. Holding the number of samples constant, using fusion primer has the shortest and PTS has the longest time requirement for preparing multiplexed libraries for 454 sequencing. In general, preparation of MID adapter requires an additional day and PTS libraries require an additional 2–7 days compared to fusion primers (Table 1). In contrast to the time requirement for library preparations, PTS has large advantages in its scalability over fusion primers and MID adapters (Table 1). Using fusion primers, the sample-specific tag constitutes a part of the primer. Therefore, it is required to prepare fusion primers for as many as samples as will be mixed in single 454 run. Generally, fusion primers contain 50 or more bases, and the cost is roughly $64.5 USD per primer (45 cents per base plus $42 USD for HPLC purification). Therefore, fusion primers are not always ideal from the perspective of costs when there is a need to multiplex many samples into single 454 run. MID adapters require less time to prepare libraries compared to PTS, but again the cost to multiplex a large number of samples can be high. Twelve kinds of MID adapters are available from 454 Life Sciences, which together cost $1,500 USD. Alternatively, 120 kinds of MID adapters are available from Integrated DNA Technologies (http://www.idtdna.com), and each MID adapter costs $235 USD. Therefore, preparing many kinds of MID adapters might require a large initial investment. Using the MID adapter strategy, the major costs to prepare the library is not only the MID adapter itself, but also library preparation kit. In this chapter, we
360
R.J. Machida and N. Knowlton
Table 1 Comparison of multiplexing strategies for 454 library preparation Multiplexing strategies
Time required for library preparation
Scalability
Amplification bias
Directional sequencing
Fusion primers
Short
Low
Yes
Yes
MID adapters
Middle
Low
No
No
Parallel Tagged Sequencing
Long
High
No
No
described use of the NEBNext Quick DNA Sample Prep Reagent Set2 (New England Biolabs) and the kit cost $400 USD for 10 reactions (or $1,600 USD for 50 reactions). This cost for library preparation might share a large portion if many samples are going to be multiplexed. In contrast to fusion primers and MID adapters, it is easy to set up 96-well plate reactions using PTS. Sample specific tags used in PTS are oligonucleotides that can be purchased as ordinary primers, and most of the other required chemicals are relatively low cost. Together, this makes the scalability of PTS much higher than use of fusion primers or MID adapters. 4.3. Possibility of Amplification Bias
MID adapters and PTS both use target-specific primers, which is ordinary primer to amplify the target sequence for PCR (Fig. 1). In contrast, fusion primers contain sample specific tag and 454 sequencing primer A and B at the 5¢ portion of the oligonucleotide in addition to the target-specific primer (Fig. 1). Therefore, amplification results using fusion primer are not always same as those obtained by target-specific primers. Additionally, different tag sequences are used to amplify different samples to distinguish the source of sequences. Therefore, those tag regions also have potential to produce amplification bias between fusion primers.
4.4. Directional Versus Nondirectional Sequencing
By using fusion primers, the 5¢ end of amplicon is determined directionally when the 454 sequencing primer A is used for sequencing reaction (Fig. 1). In contrast, either the 5¢ or the reverse complement of the 3¢ end of amplicons are randomly sequenced using 454 sequencing primer A in MID adapter and PTS. In case of short amplicons, sequences cover most of amplicon region; therefore, sequences obtained from either end in nondirectional sequencing can be compiled as single data set because of the large overlap in sequences those obtained from both ends. However, in case of long amplicon size, when only a partial sequence of amplicons will be determined, two sets of sequence data both from 5¢ and 3¢ ends will be obtained, which are not comparable each other.
16
Ways to Mix Multiple PCR Amplicons into Single 454 Run for DNA Barcoding
361
From this standpoint, directional sequencing is a clear advantage of using fusion primers, although this problem might diminish when the sequence length capacity is extended in 2011, as announced by 454 Life Sciences (http://www.454.com).
5. Notes
1. Length of PCR products, including primers and adapters, needs to be shorter than 500 bp, although the sequence length capacity might be extended in 2011 as announced by 454 Life Sciences (http://www.454.com). 2. The original Rapid Library Preparation Method Manual (6) specifies an elution volume of 16 μl instead of 17 μl. We have increased the volume to compensate for the changed reaction volume, which is reduced by omitting T4 DNA polymerase in the next step. 3. Higher than 500 ng may facilitate chimera formation during the subsequent ligation step (4). 4. The target is not fragmented DNA; therefore, we omit T4 DNA polymerase. 5. The target is not fragmented DNA; therefore, we omit small fragment removal, which is described in the original Rapid Library Preparation Method Manual (6).
Acknowledgments We thank David Erickson and W. John Kress for inviting this submission. References 1. Venter JC, Remington K, Heidelberg JF et al (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304:66–74 2. DeLong EF, Preston CM, Mincer T et al (2006) Community genomics among stratified microbial assemblages in the ocean’s interior. Science 311:496–503 3. Sogin ML, Morrison HG, Huber JA et al (2006) Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proc Natl Acad Sci USA 103:12115–12120
4. Meyer M, Stenzel U, Hofreiter M (2008) Parallel tagged sequencing on the 454 platform. Nat Protoc 3:267–278 5. 454 Life Sciences (2009) Amplicon fusion primer design guidelines for GS FLX titanium series Lib-A chemistry. 454 Technical Bulletin TCB No.013-2009 6. 454 Life Sciences (2010) Rapid library preparation method manual: GS FLX Titanium Series
Part IV Applications of DNA Barcode Data
Chapter 17 The Practical Evaluation of DNA Barcode Efficacy * John L. Spouge and Leonardo Mariño-Ramírez Abstract This chapter describes a workflow for measuring the efficacy of a barcode in identifying species. First, assemble individual sequence databases corresponding to each barcode marker. A controlled collection of taxonomic data is preferable to GenBank data, because GenBank data can be problematic, particularly when comparing barcodes based on more than one marker. To ensure proper controls when evaluating species identification, specimens not having a sequence in every marker database should be discarded. Second, select a computer algorithm for assigning species to barcode sequences. No algorithm has yet improved notably on assigning a specimen to the species of its nearest neighbor within a barcode database. Because global sequence alignments (e.g., with the Needleman–Wunsch algorithm, or some related algorithm) examine entire barcode sequences, they generally produce better species assignments than local sequence alignments (e.g., with BLAST). No neighboring method (e.g., global sequence similarity, global sequence distance, or evolutionary distance based on a global alignment) has yet shown a notable superiority in identifying species. Finally, “the probability of correct identification” (PCI) provides an appropriate measurement of barcode efficacy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. This chapter states explicitly how to calculate PCI, how to estimate its statistical sampling error, and how to use data on PCR failure to set limits on how much improvements in PCR technology can improve species identification. Key words: Barcode efficacy in species identification, Probability of correct identification, DNA barcode
1. Introduction Species are becoming extinct, making conservation of biodiversity a major challenge. The first step to preserving biodiversity is assessment, but there are not enough taxonomists to catalog species
*For software relevant to this chapter, see http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html. ncbi/barcode/
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_17, © Springer Science+Business Media, LLC 2012
365
366
J.L. Spouge and L. Mariño-Ramírez
throughout the world. DNA barcodes therefore provide the basis of a promising alternative strategy because they require only collection of DNA and not the immediate taxonomic identification of specimens. Although barcodes have many other uses, e.g., identification of novel species, taxonomic classification, and phylogeny, their application to cataloguing biodiversity justifies restricting this chapter to the measurement of a barcode’s efficacy in identifying known species. In its essence, a barcode is any standardized subset of DNA from a taxonomic specimen (1, 2). The subset may vary, depending on readily recognizable features of a specimen (e.g., is the specimen a vertebrate? a plant? an insect? etc.). If computers could identify the species of a specimen from its barcode, then the barcode would provide a database key for retrieving taxonomic information pertinent to the specimen. A computer catalog of species on Earth then becomes a technical possibility. Early studies indicated that the sequence of cytochrome c oxidase 1 (CO1) gene could correctly identify many species (3), so selection of CO1 as a primary barcode followed naturally (4–10). Although the selection of a DNA barcode has been natural for some species, it has been problematic for others, particularly plants (11–14) and insects (15, 16). The lack of a clear consensus for a barcode in those species has stimulated interest in the objective, quantitative measurement of the efficacy of a barcode in identifying species. Consensus on an actual barcode for some species remains tentative, but nonetheless, a consensus on measuring barcode efficacy has emerged (14, 15, 17). This chapter summarizes the consensus and indicates how to construct studies to evaluate the relative merits of competing barcodes. For practical methods, the reader is invited to view http://www.ncbi.nlm.nih.gov/ CBBresearch/Spouge/html.ncbi/barcode/, a Web site providing information on computer programs pertinent to barcodes. Web pages are supposed to be self-explanatory, so to avoid undue brevity, the second section in this chapter provides some rationale for the computer programs for evaluating barcodes. The third section provides a practical summary of the entire chapter.
2. The Measurement of the Efficacy of Species Identification
To fix our terminology, the term “marker” connotes any contiguous region of DNA (coding or non-coding), whereas the term “barcode” connotes the aggregate of the one or more markers in the “standardized subset of DNA” referred to in the Introduction. Presently, all barcode markers are marker genes like CO1, matK, etc.
17
The Practical Evaluation of DNA Barcode Efficacy
367
In slowly evolving organisms like plants, however, intergenic spacers (DNA regions flanked by two genes) are still worthy of consideration as potential markers, because they usually diverge faster than genes, while their ends are still conserved, providing primers for PCR (17, 18). As described below, however, multiple sequence alignments (MSAs) of intergenic markers might complicate the workflow in a barcode database. To have practical meaning, any measurement of the efficacy of species identification must mirror the performance of a database based on the prospective barcode. In practice, users query the database with a barcode retrieved from a specimen; the database returns the species identification as output, with the assignment “unknown” for any species apparently not yet in the database. Because this chapter restricts itself to discussing the identification of known species, it assumes that each query to the barcode database represents a specimen belonging to a species already in the database. 2.1. The Database
The first step in estimating the efficacy of several prospective barcodes is to assemble the corresponding databases. To ensure the proper controls, specimens not having sequences in every marker database should be eliminated from consideration (14), because if the databases do not contain exactly the same specimens, there might be unappreciated but influential biases. Consider, e.g., a hypothetical experiment that extracts from GenBank all sequences corresponding to two prospective markers, Marker A and Marker B. If Marker A has been the default marker of choice, whereas Marker B has been considered as the last hope for resolving species after Marker A has failed, the GenBank entries for Marker B might be biased toward a subset of particularly difficult specimens. Thus, on GenBank data, Marker B might have fewer correct species assignments than Marker A, even though Marker B is in fact better at resolving species than Marker A. Moreover, relative to a barcode database, GenBank taxonomy is undependable, and undependable taxonomy improperly influences conclusions by occasionally penalizing correct species identification. In addition, GenBank entries do not usually identify individual taxonomic specimens. GenBank data are therefore particularly unsuited to studying barcodes based on more than one marker, because the sequences from different markers cannot be associated with a single specimen. Although studies based on GenBank data have obvious scientific interest, they do not have the same status as a controlled taxonomic study. In summary, the choice of database affects conclusions, so care must be taken that the database reflects the scientific aims of a study. Figure 1 shows some pertinent results for trnH-psbA, a potential barcode marker in plants. By using pairwise alignment and various evolutionary distances in the procedures described below, the best overall probability of correct identification (PCI) in Fig. 1 is about 0.50, which is noticeably lower than the overall PCI of
368
J.L. Spouge and L. Mariño-Ramírez
Fig. 1. Overall PCIs for trnH-psbA. Figure 1 graphs the overall PCI (on the X-axis) from assigning plant species with trnH-psbA sequences collected from GenBank. (The corresponding FASTA file can be obtained at http://www.ncbi.nlm.nih.gov/CBBresearch/ Spouge/html_ncbi/html/bib/116.html). Assignment used a nearest neighbor algorithm and one of six separations (on the Y-axis). The six separations were: (1) Global Distance; (2) Global Similarity; and four evolutionary distances: (3) Jukes-Cantor (38); (4) Kimura (2-Parameter) (39); (5) Jin (using a gamma distribution with parameter 1) (40); and (6) Tamura (41). The pairwise sequence alignment used either the HOX70 scoring matrix A C G G 91 −114 −31 −123 A C −114 100 −125 −31 , G −31 −125 100 −114 91 T −123 −31 −114 with a gap of length k receiving a penalty D(k ) = 400+30k, or the NCBI DNA scoring system (1 for a match, −3 for a mismatch, with a gap of length k receiving a penalty D(k ) = 5+2k ). Perhaps surprisingly, the overall PCIs for the two scoring systems were visually indistinguishable. Global Distance is the global alignment score; Global Similarity is the actual global alignment score divided by the maximum possible global alignment score for sequences of the same length (42). The green part of the horizontal bars gives the unambiguously correct fraction of species assignments, where every specimen had as nearest neighbors only specimens from the same species; the yellow part, the ambiguously correct fraction where every specimen had as nearest neighbors specimens a mix from both the same and other species (with the red border indicating the average fraction of the ambiguously correct fraction matching specimens from different species); and the red part, the unambiguously incorrect fraction where every specimen had only nearest neighbor specimens from other species.
17
The Practical Evaluation of DNA Barcode Efficacy
369
0.69 from a controlled taxonomic study (14), suggesting that the GenBank entries for trnH-psbA might contain biases, relative to a controlled taxonomic study. The corresponding FASTA sequence file (see the Supplementary Materials) in fact contained genetic crosses (denoted by “x”) and tentative species assignments (denoted by “sp.”, “cf.”, “aff.”), which were obscure, until the Web tools mentioned above found them. 2.2. Species Assignment Algorithm
Once an appropriate database has been selected, the computer must assign a species to each barcode query (or declare its failure to assign). The next step, therefore, is to select a computer algorithm for assigning each specimen and its barcode sequence to a species. No algorithm seems to improve noticeably on assigning to a specimen the species of its nearest neighbor within a barcode database (19, 20). Thus, many algorithms begin by estimating a “separation” between the barcode sequences in two specimens. (The term “separation” is preferable to “distance”, which connotes some specific mathematical properties not necessary to barcodes.) Separation can be based on: (1) sequence alignment similarities, (2) sequence alignment distances, (3) evolutionary distances (which usually require prior alignment of the barcode sequences), or (4) alignment-free distances. Studies have compared different measures of separation, but they are too limited to draw definitive conclusions about which separation provides the best species assignments. There are, however, some distinctly bad measures of separation. Like any assignment method, species assignment should use all available information. BLAST is a popular sequence comparison tool (21, 22), but as a measure of separation it can mislead, because it compares two sequences with local alignment, which matches and scores only the two most similar subsequences within two sequences (see Fig. 2, which diagrams some of the differences between local and global alignments). Global alignment, which matches the entire length of sequences, is better for measuring the separation of barcode marker sequences. In intergenic markers particularly, BLAST has the possible weakness of matching only small subsequences, because alignments within intergenic spacers often contain large gaps. Short subsequences can exhibit convergent evolution (homoplasy) (23), so on the one hand a BLAST local alignment might make distant species appear spuriously close. On the other hand, a global alignment might resolve the species by highlighting dissimilarities across the whole marker. In the context of barcodes, therefore, a global alignment (e.g., with some close relative of the Needleman–Wunsch Algorithm (24)) is generally preferable to a local alignment (e.g., with the Smith–Waterman Algorithm (25) or BLAST). Other types of alignments exist, but there is little reason to expect them to assign species notably better than global alignment.
370
J.L. Spouge and L. Mariño-Ramírez
Fig. 2. Two types of alignment, global and local. (a) shows a global alignment of two sequences (black lines). Global alignment is an alignment along the complete length of the sequences, so it bridges a gap in the second sequence (white space), to include all pairs of similar subsequences (red rectangles). (b) shows a local alignment of the same two sequences. Local alignment aligns only the pair of most similar subsequences in the sequences, so it does not bridge the gap in the second sequence and does not include the smaller subsequence alignment (now shown in gray). Local alignment can be misleading when identifying species with barcodes because it does not incorporate all available sequence information.
MSAs might be more problematic for intergenic markers than for marker genes like CO1, because intergenic MSAs usually contain many gaps, disrupting the alignment columns representing evolutionary relationships. In practice, the Barcode of Life Database (http://www.boldsystems.org) stores sequences in a global MSA, by using the program HMMer (26) to align sequences before comparing the corresponding barcode marker genes. In fact, many publicly available tools (e.g., MUSCLE (27) or MAFFT (28)) could create barcode MSAs interchangeably with HMMer. The point of using MSAs in a large barcode database, however, is that MSA can be much faster than pairwise sequence alignment. (If there are N barcodes in a database, pairwise alignment requires time proportional to N 2 .) Although bioinformatics should adapt to the needs of biology and not vice versa, the selection of an intergenic marker as a barcode might exclude MSAs in the workflow of large barcode databases, causing awkward (but probably not insuperable) difficulties. As separations, the relative merits of global alignment similarity, global alignment distances, or evolutionary distances based on a global alignment have not yet been clearly established, although the differences in species assignment are probably small. Alignment distances and similarities model insertions and deletions in sequences, which are not as well understood as nucleotide substitutions used in evolutionary distances. As a separation, p-distance (the proportion p of alignment pairs containing differing nucleotides) is particularly simple and well-known to taxonomists (20), but in fact no separation based on global alignment has shown any clear superiority in species assignment over the others. Other species assignment algorithms should be mentioned (29, 30). Many probabilistic algorithms, in particular those producing phylogenetic trees (31, 32), are now a commonplace in taxonomy.
17
The Practical Evaluation of DNA Barcode Efficacy
371
Unfortunately, most probabilistic computations are much slower than the nearest neighbor algorithms above. Because they do not noticeably improve identification, they have not found a place in automatic species identification. Alignment-free algorithms are simple and provide faster computation than alignment-based methods (20, 33), but presently, they have not been widely adopted in species identification. 2.3. Probability of Correct Identification
With an appropriate database and species assignment algorithm in hand, a scientist interested in barcode efficacy must measure the algorithm’s success in identifying species. Any reasonable measure of barcode efficacy should reflect the probability that a database based on the prospective barcode identifies a specimen’s species correctly. Consensus has therefore emerged on “the probability of correct identification” (PCI) as the appropriate measurement of barcode efficacy (14, 15, 17). The ambiguities in the definition of PCI accommodate legitimate scientific disagreement about success in species identification, so the concept of PCI actually embraces a broad class of measures. Consider a particular data set, and assume that PCI can be defined for each species within the data set. The overall PCI for the data set is the average of the species PCIs, taken over all species in the data set. If a few data subsets are particularly important (e.g., angiosperm, basal, and gymnosperm subsets within a plant data set), the PCI for the subsets can be reported separately. In principle, the PCI for each species could be weighted to reflect the species’ importance or the number of specimens representing it in the data set. In practice, however, scientists have not weighted averages when calculating overall PCI. Thus, to calculate the overall PCI of a data set, we now require only a species PCI, a probability to quantify success in identifying each fixed species. To calculate a species PCI, one can perform a leave-one-out procedure, sometimes called “the jackknife” in statistics (34). Remove each specimen in a species in turn from the database, and consider the separation of the removed specimen from the specimens of the same species remaining in the database. (The leave-one-out procedure cannot sensibly be applied if a species has only a single specimen in the database. Because a singleton species must therefore be omitted from the average in the overall PCI, it usually represents wasted experimental effort. It does, however, provide a “decoy,” which provides a realistic impediment to correct species assignment.) Scientists legitimately disagree over the definition of “success” in species identification. Some scientists might consider “success” theoretically, as a monophyly, where every specimen in the species is closer to all specimens in the species than to any other specimen (14). On success, the species PCI is 1; on failure, it is 0. Other scientists might consider success more pragmatically, as a correct assignment of the species, where each specimen in the species
372
J.L. Spouge and L. Mariño-Ramírez
has as its nearest neighbor(s) only specimens in the species (15). Again, if so, the species PCI is 1; if not, it is 0. The following additional conditions can contribute to success or failure, as desired: ties outside the species for a nearest neighbor, assignment of specimens from other species to the species in question, etc. Some authors have advanced less stringent criteria for success (e.g., for k > 1, the specimen’s nearest neighbors must contain at least one other specimen from the same species) (33). The species PCI has also been calculated as the fraction of specimens within a species whose nearest neighbor gives the correct assignment (17). Any specific choice might be appropriate in different circumstances, depending on the scientific aim. Some authors experimented with placing additional conditions on “success” as defined above, e.g., sequence difference (p-distance) thresholds, such as 2% or 3% (15). Detection of unknown species with sequence identity thresholds seems artificial, however (35). The notion of “species” could be redefined by DNA thresholds (1, 2, 36, 37), but such redefinitions generate many conflicts with traditional taxonomy (15). 2.4. PCR Failure
PCI should estimate the success in correctly identifying a known species. Under present technology, species identification with a DNA barcode requires the following criteria: 1. At least part of the barcode sequence must be present in the specimen. 2. Laboratory procedures must physically extract it from the specimen. 3. PCR primers must amplify it. 4. It must be sequenced. 5. It must diverge sufficiently, to distinguish species. 6. It must not diverge excessively, so specimens from a single species remain similar and identifiable. Thus, PCI must account for PCR failure, if it is to estimate identification success under present technology. Recall that the overall PCI is the average of the PCI for each individual species. The Appendix discusses PCR failure for a barcode based on several markers. For simplicity, this subsection considers here only a barcode based on a single marker. We revise the species PCI to account for PCR failure, as follows. According to the procedures in the preceding subsection (which ignore PCR failure), let the species have PCI p ; and let s be the fraction of specimens from the species with a successful PCR. (Note that s is estimated from all specimens, whereas p is estimated solely from specimens with a successful PCR.) A reasonable procedure might average the “PCR-adjusted species PCI” p ′ = ps over all species to produce a
17
The Practical Evaluation of DNA Barcode Efficacy
373
“PCR-adjusted overall PCI.” The PCR-adjusted overall PCI faithfully reflects the efficacy of species identification with present technology, whereas the overall PCI (which ignores specimens where PCR failed) reflects the efficacy of species identification with a perfect PCR technology. Technology reduces PCR failure rates, so arguments have been advanced that PCR failure should be ignored (14). The PCI after any technological advance, however, is bounded below by the PCR-adjusted overall PCI (which reflects present PCR technology); similarly, it is bounded above by the overall PCI (which ignores specimens with failed PCR). The bounds demonstrate that technological advance by itself does not preclude a sober assessment of future prospects. Like any numerical result from a definite procedure with a sensible meaning, the PCR-adjusted overall PCI is useful, and its deliberate omission merely undermines rational discussion about the relative merits of potential barcodes. 2.5. Statistical Sampling Error
The overall PCI is the (unweighted) average of the species PCIs. Let us make a reasonable approximation that species PCIs are mutually independent across all species. Any database is a sample of all possible species, so the overall PCI from the database is an estimate of the “true” overall PCI p . As such, it has a sampling error, calculable with the binomial distribution. Let n be the number of species contributing to the overall PCI. Under mild assumptions (given below), a binomial estimate pˆ is normally distributed with mean p and standard deviation
(
)
p (1 − p ) / n . Thus, the confidence
(
)
interval ⎡ pˆ − z pˆ 1 − pˆ / n , pˆ + z pˆ 1 − pˆ / n ⎤ contains the true ⎣⎢ ⎦⎥ overall PCI p with a confidence determined by z in conjunction with the normal distribution. The larger z is, the broader the interval becomes, and the greater the probability that the interval contains the true value of p . As approximate examples, z = 2 yields an 95% confidence interval; z = 2.6 , 99%, etc. (As a useful rule of thumb, the normal approximation holds, if n ≥ 20 and the confidence interval does not include 0.0 or 1.0.) Confidence intervals are worth calculating, because they are often surprisingly broad. As an aside, the confidence intervals for the overall PCI are crucial to evaluating the relative merits of tentative barcodes, but they have little direct bearing on one’s confidence in the species assignment of a specific specimen, for the following reason. Most taxonomists probably prefer a barcode for which assignment errors are confined to a few species, rather than to have the same errors spread across many species. (If nothing else, alternative strategies might be available for assigning a small number of problematic species.) Overall PCI faithfully reflects taxonomists’ barcode preferences, but the evaluation of a specific species assignment poses a different problem, requiring a different solution.
374
J.L. Spouge and L. Mariño-Ramírez
3. The Summary of the Workflow Selection of a DNA barcode has been problematic for some species, but there is now a general consensus on the measurement of barcode efficacy. The procedure for measuring barcode efficacy can be broken into several steps. First, assemble databases corresponding to the prospective barcodes. The choice of database must be given careful consideration because it can noticeably influence a study’s conclusions. To ensure proper controls, specimens not having a sequence in every marker database should be eliminated from consideration. Because GenBank taxonomy might be undependable, and because most GenBank sequences do not specify a corresponding taxonomic specimen, studies based on GenBank data do not have the same status as a controlled taxonomic study, particularly for barcodes based on more than one marker. Second, select a computer algorithm for assigning species to barcode sequences. No algorithm seems to improve noticeably on assigning to a specimen the species of its nearest neighbor within a barcode database. A global alignment (e.g., with Needleman– Wunsch algorithm, or some similar algorithm) is recommended, to take advantage of all the information in a barcode sequence. By contrast, BLAST is a local alignment program, which might match only small subsequences within two sequences. Thus, the use of BLAST runs an unnecessary risk when evaluating any prospective barcode, particularly one with an intergenic marker. As long as alignments are in essence global, alignment similarities, alignment distances, and evolutionary distances like p-distance, Kimura 2-Parameter Distance, etc., seem to have approximately equal efficacies in identifying species. Consensus has emerged on “the probability of correct identification” (PCI) as the appropriate measurement of barcode efficacy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. If a few data subsets are particularly important (e.g., angiosperm, basal, and gymnosperm subsets within a plant data set), the PCI for the subsets can be reported separately. To calculate a species PCI, remove in turn each specimen in the species from the database, and consider its separation from the remaining specimens (under, e.g., p-distance). Various definitions of identification success within a species are possible: (1) every specimen in the species is closer to all other specimens in the species than to any other specimen; (2) each specimen in the species has another specimen in the species as its nearest neighbor; (3) more stringent versions of the two foregoing definitions, where ties outside the species for a nearest neighbor, or assignment of other species to the species in question, also connote failure; (4) less stringent criteria for success (e.g., for k > 1 , the specimen’s nearest
17
The Practical Evaluation of DNA Barcode Efficacy
375
k neighbors must contain at least one other specimen from the same species; or (5) probabilistic measures of success, like the fraction of specimens within a species displaying one of the foregoing definitions of success. Scientific purpose makes different definitions of “successful assignment” appropriate to different circumstances. To estimate success under present technology, PCI must account for PCR failure. Although the case of a barcode with several markers has been relegated to the Appendix, the case of a barcode with only one marker poses no difficulties. Simply estimate the rate of PCR failure within each species by using all specimens, not just the ones with completely successful PCRs. Multiplication of a species PCI by the PCR success rate within the species yields a “PCRadjusted” species PCI, which can then be averaged over species to yield a PCR-adjusted overall PCI. The overall PCI after technological advance is bounded below by the PCR-adjusted overall PCI; similarly, it is bounded above the overall PCI (which derives from PCR successes only). Thus, present technology bounds prospects for an overall PCI. A database provides a statistical sample of all possible data. The overall PCI calculated from a database is therefore a statistical estimate of the true overall PCI, and as such, it yields an estimate with a statistical error. The errors are sometimes surprisingly large, and the differences in barcode efficaciousness correspondingly small. For software relevant to this chapter, see http://www.ncbi. nlm.nih.gov/CBBresearch/Spouge/html.ncbi/barcode/.
Acknowledgment This research was supported in part by the Intramural Research Program of the NIH, NLM, NCBI.
Appendix For a barcode with several markers, each of which can have a failed PCR, specimen identification ultimately relies on the markers with a successful PCR. To quantify the identification process, number the markers {1, 2,...,m}, and consider any subset M of {1, 2,...,m}. For a particular specimen, let the probability that M is the subset of markers with PCR success be denoted by s M , and let the PCI for the barcode based on the marker subset M be pM . A species PCI p can then be calculated from the values of s M and pM (although the calculation depends on the definition of species PCI: see Section 2.3 for various definitions.)
376
J.L. Spouge and L. Mariño-Ramírez
One very reasonable definition of the PCR-adjusted species PCI is the average p = ∑ (M ) pM s M . For the case of a barcode based on
a single marker, e.g., M is a subset of {1} , i.e., the empty set { } or {1} . Because the empty set { }corresponds to a complete absence of information about a specimen, the corresponding PCI is p{ } = 0 , so p = p{ }s { } + p{1}s {1} = p{1}s {1} , which agrees with the formula for the PCR-adjusted PCI in the main text, for a barcode based on a single marker. References 1. Hebert PD, Cywinska A, Ball SL, Dewaard JR (2003) Biological identifications through DNA barcodes. Proc Biol Sci 270:313–321 2. Floyd R, Abebe E, Papert A, Blaxter M (2002) Molecular barcodes for soil nematode identification. Mol Ecol 11:839–850 3. Hebert PD, Ratnasingham S, Dewaard JR (2003) Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc Biol Sci 270:S96–S99 4. Hajibabaei M, Janzen DM, Burns JM et al (2006) DNA barcodes distinguish species of tropical lepidoptera. Proc Natl Acad Sci U S A 103:968–971 5. Hogg ID, Hebert PDN (2004) Biological identification of springtails (hexapoda: Collembola) from the canadian arctic, using mitochondrial DNA barcodes. Can J Zool 82: 749–754 6. Lorenz JG, Jackson WE, Beck JC, Hanner R (2005) The problems and promise of DNA barcodes for species diagnosis of primate biomaterials. Philos Trans R Soc Lond B Biol Sci 360:1869–1877 7. Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biol 3:e422 8. Saunders GW (2005) Applying DNA barcoding to red macroalgae: a preliminary appraisal holds promise for future applications. Philos Trans R Soc Lond B Biol Sci 360:1879–1888 9. Smith MA, Fisher BL, Hebert PDN (2005) DNA barcoding for effective biodiversity assessment of a hyperdiverse arthropod group: the ants of Madagascar. Philos Trans R Soc Lond B Biol Sci 360:1825–1834 10. Smith MA, Woodley NE, Janzen DH et al (2006) DNA barcodes reveal cryptic hostspecificity within the presumed polyphagous members of a genus of parasitoid flies (diptera: Tachinidae). Proc Natl Acad Sci U S A 103: 3657–3662 11. Chase MW, Salamin N, Wilkinson M et al (2005) Land plants and DNA barcodes: short-term
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
and long-term goals. Philos Trans R Soc Lond B Biol Sci 360:1889–1895 Cowan RS, Chase MW, Kress JW, Savolainen V (2006) 300,000 species to identify: problems, progress, and prospects in DNA barcoding of land plants. Taxon 55:611–616 Kress WJ, Erickson DL (2008) DNA barcodes: genes, genomics, and bioinformatics. Proc Natl Acad Sci U S A 105:2761–2762 Cbol Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci U S A 106:12794–12797 Meier R, Shiyang K, Vaidya G, Ng PK (2006) DNA barcoding and taxonomy in diptera: a tale of high intraspecific variability and low identification success. Syst Biol 55:715–728 Huang D, Meier R, Todd PA, Chou LM (2008) Slow mitochondrial coI sequence evolution at the base of the metazoan tree and its implications for DNA barcoding. J Mol Evol 66:167–174 Erickson DL, Spouge JL, Resch A et al (2008) DNA barcoding in land plants: developing standards to quantify and maximize success. Taxon 13:1304–1316 Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land plants: the coding rbcl gene complements the non-coding trnhpsba spacer region. PLoS One 2:e508 Austerlitz F (2007) Comparing phylogenetic and statistical classification methods for DNA barcoding. Paper presented at the second international barcode of life conference, Taipei, Taiwan, 2007 Little DP, Stevenson DW (2007) A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms. Cladistics 23:1–27 Altschul SF, Madden TL, Schaffer AA et al (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402 Altschul S (1999) Hot papers – bioinformatics – gapped blast and psi-blast: a new generation
17
23.
24.
25.
26.
27.
28.
29.
30.
31.
The Practical Evaluation of DNA Barcode Efficacy
of protein database search programs by s.F. Altschul, t.L. Madden, a.A. Schaffer, j.H. Zhang, z. Zhang, w. Miller, d.J. Lipman – comments. Scientist 13:15 Wouters MA, Husain A (2001) Changes in zinc ligation promote remodeling of the active site in the zinc hydrolase superfamily. J Mol Biol 314:1191–1207 Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453 Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197 Eddy SR (1995) Multiple alignment using hidden markov models. Proc Int Conf Intell Syst Mol Biol 3:114–120 Edgar RC (2004) Muscle: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113 Katoh K, Misawa K, Kuma K, Miyata T (2002) Mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res 30:3059–3066 Matz MV, Nielsen R (2005) A likelihood ratio test for species membership based on DNA sequence data. Philos Trans R Soc Lond B Biol Sci 360:1969–1974 Nielsen R, Matz M (2006) Statistical approaches for DNA barcoding. Syst Biol 55: 162–169 Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
377
32. Felsenstein J (1988) Phylogenies from molecular sequences – inference and reliability. Annu Rev Genet 22:521–565 33. Kuksa P, Pavlovic V (2009) Efficient alignmentfree DNA barcode analytics. BMC Bioinform atics 10:S9 34. Efron B, Stein C (1981) The jackknife estimate of variance. Ann Stat 9:586–596 35. Ferguson JWH (2002) On the use of genetic divergence for identifying species. Biol J Linnean Soc 75:509–516 36. Blaxter M, Mann J, Chapman T et al (2005) Defining operational taxonomic units using DNA barcode data. Philos Trans R Soc Lond B Biol Sci 360:1935–1943 37. Lambert DM, Baker A, Huynen L et al (2005) Is a large-scale DNA-based inventory of ancient life possible? J Hered 96(3):279–284 38. Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Academic, New York, pp 21–123 39. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120 40. Jin L, Nei M (1990) Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol Biol Evol 7:82–102 41. Tamura K (1994) Model selection in the estimation of the number of nucleotide substitutions. Mol Biol Evol 11:154–157 42. Waterman MS, Smith TF, Beyer WA (1976) Some biological sequence metrics. Adv Math 20:367–387
Chapter 18 Plant DNA Barcodes, Taxonomic Management, and Species Discovery in Tropical Forests Christopher W. Dick and Campbell O. Webb Abstract DNA barcodes have great potential for species identification and taxonomic discovery in tropical forests. This use of DNA barcodes requires a reference DNA library of known taxa with which to match DNA from unidentified specimens. At an even more basic level, it presupposes that the species in the regional species pool have Latin binomials. This is not the case in species-rich tropical forests in which many species are new to science or members of poorly circumscribed species complexes. This chapter describes a workflow geared toward taxonomic discovery, which includes the discovery of new species, distribution records, and hybrid forms, and to management of taxonomic entities in forest inventory plots. It outlines the roles of laboratory technicians, field workers and herbarium-based taxonomists, and concludes with a discussion of potential multilocus nuclear DNA approaches for identifying species in recently evolved clades. Key words: Tropical trees, Metadata, Vouchers, Taxonomy, Herbarium, DNA barcode, Discovery
1. Introduction Tropical forests contain over 90% of the world’s tree diversity (1). Single hectares of highly diverse Asian or South American forests contain more tree species than are found in the whole of eastern North America, or in the vast circumboreal forests. Yet our knowledge of these tropical plant species is poor, and the recorded species are only a fraction of the true species pool in most areas. For example, the Catalog of the Vascular Plants of Ecuador (2) lists only ca. 4,000 vascular plant species for Ecuadoran Amazon (a low intensity sampling of seven million hectares), whereas exhaustive sampling of a single hectare forest in the Ecuadorian Amazon yielded over 900 vascular plant species (3). Based on simple area versus species richness relationships, it is thus likely that the real tree richness of the region is many times 4,000 species (4), and that W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_18, © Springer Science+Business Media, LLC 2012
379
380
C.W. Dick and C.O. Webb
many of the additional species will be new to science; Prance (5) has estimated that 1 in 100 plant specimens collected from remote tropical forests, such as those in the Amazon basin and Papua New Guinea, come from as yet undescribed species. Both the high species diversity and the high probability of encountering undescribed or at least poorly delineated taxa create challenges for biologists working on the inventory of tropical forest plants. One of the major challenges of setting up an inventory plot in tropical forest is simply keeping track of the morphotype identity of each tree, which must then be followed by the lengthy process of matching the morphotypes to known species. DNA barcodes can assist at both stages, accelerating the matching of morphotypes across a plot (via matching DNA haplotypes), and then matching taxonomic entities in the plot to named taxa. Many ecologists involved with tropical forest inventory are excited by the prospect of using DNA barcodes to identify species (including new species), but they may not be familiar with (1) collection methods for molecular samples, (2) standard methods for making herbarium vouchers, or (3) the kinds of metadata that are needed to create DNA barcode reference libraries and describe new species. For a relatively low additional cost, an inventory program not primarily focused on plant taxonomy (e.g., inventory of carbon or the forest dynamics studies; ref. 6) can make high-quality DNA and physical vouchers which can be used to address urgent biodiversity questions. 1.1. DNA barcoding links to Systematics and Ecology
While traditional taxonomic work (e.g., collecting, matching, describing, publishing) will eventually increase global estimates of tropical tree diversity and yield new taxa, the specimens from botanical inventories often spend years in storage before they can be identified or described as new species (7). The “taxonomic bottleneck” is more pronounced now than ever due to the global decline in numbers of taxonomic specialists working on tropical groups. New species descriptions are also limited by inadequate representation of taxa in herbarium collections: herbarium-based specialists cannot describe the many new species that are not extremely distinctive because data pertaining to geographic range and morphological variation is frequently unavailable. DNA barcodes can assist in this process of “taxonomic discovery,” which can take the form of expanding species range information, delimiting species in closely related taxa, standardizing the nomenclature of species with multiple names (synonymy), or even recognizing species that are new to science. The effectiveness of DNA barcodes for taxonomic discovery depends on the preexistence of DNA records for many of the taxa under study, the so-called DNA reference library, but these libraries for tropical plant taxa are currently being built rapidly, primarily by ecologists working on forest inventories.
18
Plant DNA Barcodes, Taxonomic Management, and Species Discovery…
381
Tropical forest inventory plots provide several advantages for taxonomic discovery. Because the trees are tagged, the forest plots serve as living museums in which individual trees or their conspecific populations may be revisited to obtain additional data. Tropical forest plots delimit enough of the local flora to facilitate development of DNA barcode reference libraries for use in broader regional studies. Finally, the plot networks already have many of the human resources and institutional ties to universities and herbaria that are needed to sustain long-term research. An end goal of DNA-based forest inventory should be to standardize taxonomy across regional networks of forest plots, and thereby advance botanical knowledge of these relatively unexplored regions (6). For example, to date, very few of the sets of vouchers from any Center for Tropical Forest Sciences (CTFS) Forest Dynamics Plot have been compared to a set from another plot because of the cost and logistical difficulty of such a cross-plot matching; DNA barcode matching among plots, on the other hand, could be done nearly instantaneously if the data were available. The transfer of physical specimens and data between field workers, lab technicians, and systematists can be organized as a “workflow.” Field workers collect the specimens and metadata, passing the plant tissue to a molecular laboratory and the pressed specimens to a herbarium (Fig. 1). The DNA barcode data then can assist both the field ecologists with basic management of taxonomic entities (morphotyping and matching vouchers), and the herbarium workers with matching to named species, and with range extensions, new species discovery, etc. There should be established lines of communication between laboratory technicians, systematists, and the field workers (or their local supervisor). For example, the lab workers may need additional plant tissue (e.g., from the vascular cambium instead of leaves) if PCR repeatedly fails. The plot workers should receive training from systematists prior to the collections, if possible, because taxonomic specialists often have tips for collecting taxonomically diagnostic field information for their particular groups (e.g., ref. 8). The components of the total workflow that we focus on in this chapter are those geared toward the field ecologists, especially graduate students and postdocs, who work in forest inventory plots, such as the large (25–50 ha) tree inventory plots managed by CTFS (see Chapter 22) or the RAINFOR network of smaller (e.g., 1 ha) forest inventory plots scattered across the Amazon basin (9). We touch briefly on field methods, but make reference to additional sources. We do not emphasize molecular methods, which are described in detail by Fazekas et al. (Chapter 11) in this volume. We end with a discussion of multilocus nuclear DNA markers, which, in addition to the standard chloroplast DNA barcodes, will be necessary to rigorously test taxonomic hypotheses in forest inventory plots.
382
C.W. Dick and C.O. Webb
Fig. 1. Summary of the workflow that uses DNA barcodes for the purpose of taxonomic discovery in tropical forest inventory plots.
2. Field Materials The materials for a DNA barcode project will depend on the magnitude of the project, for example, whether the goal is complete taxonomic inventory of a large forest plot, or collection and identification of focal groups by graduate students. The following materials would be especially useful for a graduate student getting involved in a taxonomic inventory project. 1. Collecting equipment: CWD uses a Jameson 2.43 m (8 ft) fiberglass pole clipper (Sherrill Inc.) with heavy-duty head for cutting branches up to 4.5 cm and four additional poles for a reach of 9.75 m (32 ft). Additional rope (approximately 40 ft) is needed with extensions. Additionally, local botanists need hand clippers and machetes with leather sheaths, wrist mounted slingshots and replacement rubber tubing, and climbing equipment to access the forest canopy. The kinds of climbing equipment vary from canvas belts used to shimmy up tree trunks to more elaborate single rope climbing techniques. The methods for describing rope-climbing techniques for tree climbing are beyond the scope of this chapter. Safety issues are a prime
18
Plant DNA Barcodes, Taxonomic Management, and Species Discovery…
383
concern, and should be considered and implemented, even if field workers are willing to take risks, and an insurance policy should be explicitly established for village assistants before they climb. Ref. 10 outlines safety considerations. A rubber mallet and >1 in. gasket hole punch may be needed to obtain vascular cambium tissue for DNA extraction (11). 2. Specimen drying equipment: A portable plant dryer can be made using a space heater and canvas cloth (12); and propanefueled dryers with plywood frames are available in many large research stations. Plant presses for a drying oven should include the wooden mounting boards, tightening straps, blotting paper or cardboard, newspapers, and corrugated aluminum sheets to spread heat into the plant bundles (available at Forestry Suppliers Inc.). Enough newspapers should be available for layering between individual specimens. 3. Camera equipment: An optimal setup for making photographic vouchers would be a 35-mm digital camera with zoom lens and macro capacity, a ring flash for close-up shots, a tripod and black or gray cloth to use as standardized background. However, excellent results can be obtained with a high-end compact camera with built-in flash and macro (we recommend the Panasonic Lumix LX or GF series, or Canon G series). If in the field in the wet tropics for more than a few weeks, the camera and lens should be stored in an airtight container containing a desiccant, such as silica gel, as the humidity is conducive to the growth of fungi that destroy lens coatings. 4. Miscellaneous items: A large amount of relatively inexpensive equipment is needed to make field collections. For a more extensive list of items, refer to “Field Techniques used by Missouri Botanical Garden” (13). The items should include field notebooks that are small enough so as to contain relatively little information if lost (e.g., Rite in the Rain brand); garbage bags and Ziploc bags of varied sizes; alcohol and specimen jars to preserve small flowers; hand-held Geographic Positioning System (GPS) capable of fast reception under the forest canopy (e.g., Garmin CSx series), and fine silica gel with indicator beads to preserve leaf tissue for DNA extraction. Although fine silica gel is available from scientific supply companies, this high-grade variety is expensive and can be substituted with silica gel from florists.
3. Methods 3.1. Field Collections: What, When, and How
1. Replicate sampling: The collections for each morphospecies should include multiple individuals (n = 3–5) representing the full local habitat range (e.g., wetland and upland) and any
384
C.W. Dick and C.O. Webb
morphological variation that has been noted by workers in the plot, such as variants in bark color or texture or leaf shape or color. These variants may turn out to be cryptic species ((6, 14); Fig. 23). Because forest inventory plots may contain many species represented by single or just a few large trees, additional collections may be needed from outside of the plot in habitats in which the rare plot species may be more abundant. 2. Phenology: Because many tree populations do not reproduce annually, one would ideally make collections over the course of the year and for more than one year in order to obtain representative fruit and floral collections. The collections should be concentrated during seasons in which fruits or flowers are locally most likely. 3. Field observations: Some information about the tagged tree may be obtained from prior inventory data (e.g., DBH, preliminary species assignment, coordinates within plot, tag number, and habitat). The following additional information should be noted for inclusion in the herbarium label: GPS coordinates; date of collection; collector name and collection number; presence of trunk buttresses, bark texture (13). If time permits, noting 5–10 leaf and bark characters for each species can be used to develop basic identification keys for a local flora (15), and can serve to organize photographic resources. 4. Photographic metadata: The field collection presents an opportunity to obtain photographs of fresh flowers and fruits, which contribute valuable information for future species identifications with or without the use of DNA barcodes. In the Gunung Palung flora project in Borneo (16), workers take 10–20 images of each fresh plant, including bark slash, whole twig, twig tip, twig surface, stipules, whole leave (above and below), close up of leaf base underside and petiole, inflorescence, flower (or fruit) at different angles, and longitudinal and transverse sections. Slashing trunks to expose the inner bark is not recommended in forest dynamics plots as it may influence mortality. Each photograph should include a ruler for size scale, and a paper tag with the collector name and collection number (or plot tag number) to avoid confusion about the association of photos and specimens. 5. DNA sampling: From the fresh material collected for the herbarium voucher, select a single young leaf that is neither too tender (the DNA will degrade rapidly in a wilting young leaf) nor excessively damaged by herbivores or covered with epiphylls. Clean a leaf with a dry cloth and cut a 2 × 2 cm square using scissors. Very little plant tissue (e.g., 20 mg dry tissue) is needed for DNA extraction. A common mistake is to collect too much leaf tissue for DNA sampling using the silica-gel approach. If too much tissue (e.g., an entire leaf) is collected,
18
Plant DNA Barcodes, Taxonomic Management, and Species Discovery…
385
the leaf will dry slowly or incompletely, resulting in DNA degradation. Place the tissue sample immediately into a sealed Ziploc sandwich bag or 50-ml Falcon centrifuge tube containing 20 mL of dried silica gel and colored indicator beads. Alternatively, place the sample in a permeable fiber bag (e.g., tea-bag) in a larger box filled with fine silica gel; this prevents fragments of brittle or tender leaves from contaminating the silica gel. Having an excess of silica gel is important for maximum rate of drying. Wipe off scissors with alcohol after each use. Check leaf tissue after one day. If it is brittle and breaks when bent, the silica may be removed and reused if still dry (see color of indicator beads) or baked and reused. Care must be taken not to contaminate samples when reusing silica gel. The dried leaf may be stored indefinitely in a labeled coin envelope inside of an airtight plastic container kept dry with silica gel. There are as yet no standardized protocols for long-term storage of plant tissue for subsequent DNA work. Anecdotal evidence suggests that freezing best preserves DNA in silicadried leaves. One alternative to silica gel is to flash freeze the leaves in liquid nitrogen (N). Liquid N is available in many developing countries because it is used to preserve semen for animal breeding. Flash freezing produces more genomic DNA than silica drying and can maintain RNA for transcriptome sequencing. The disadvantage is the difficulty of handling liquid nitrogen tanks in the field, and the expense of long-term storage of frozen material. Liquid N is typically not permitted on flights, so the samples will need to be transported in dry ice, or in a dry shipper. Other alternatives to using silica gel include placing samples in CTAB buffer solution (11), or using Whatman FTA cards (http://www.whatman.com/). An alternative to using leaf material is obtaining DNA from the vascular cambium (17). Because there may be less need for a plant to invest in defensive secondary compounds in vascular cambium than in leaves, DNA extractions from vascular tissue may be more successful for PCR in some taxa. Pound the gasket-hole puncher into trunk to wood level. Carefully separate the cambium tissue from the inner bark and place in silica gel. Wipe the gasket punch opening with alcohol or bleach to prevent contamination of the next sample. 6. Specimen preservation: The specimen vouchers should be dried on the day of collection when possible, in an arrangement that best demonstrates all of the salient taxonomic characters (e.g., leaf tips, base and underside; stipules, fruits, etc.) (18). Some difficult groups, such as palms, require more specialized arrangement techniques (13). The plant press must be kept tight to prevent wrinkling of material, and retightened through
386
C.W. Dick and C.O. Webb
the course of drying as the material shrinks. The dryer should provide even airflow and temperatures of 35–45°C (12). Rapid drying retains color of specimens but overly high temperatures can produce darkened and brittle specimens. When collecting in remote areas outside of the field stations, one can layer the fresh collections in newspaper and soak with 90% ethanol (or even methanol, used for lighting lamps, in a pinch). This method will keep the plant parts together until they can be dried, but it produces darkly colored specimens and degrades DNA. For alcohol preserved specimens, fresh leaves should be separately dried with silica gel for use in DNA extraction. 7. Taxonomic sorting: For very large forest inventory plots (e.g., ³25 ha), sorting all of the designated morphospecies into higher taxonomic ranks can take years, especially if the initial inventory utilized sterile vouchers (19). Key steps in the determination of trees are: (1) collecting “daily vouchers” (either fallen leaves or sterile twigs) for all morphotypes encountered each day, while doing “within-day” matching for trees examined (i.e., “tree 1234 = tree 1245”); (2) matching the daily vouchers to a growing field herbarium collection, assigning field morphotype codes, splitting types where uncertain, and “synonymizing” identical morphotypes with different morphotype codes; (3) determining which taxa can be identified reliably by field crews without further voucher collections (there are always a few common, well-known taxa that “anyone” can spot). This process is time-consuming and tends to slow down because an increasing number of morphotypes have to be checked. If the period of sampling is long enough that DNA work can be carried out at the same time, and if enough stems can be sampled for DNA, then sequence data can be used to speed up the matching process (Fig. 2). If the sequence of a new tree can be queried against GENBANK (or otherwise available DNA reference library), or placed in a dynamic, community “guide phylogeny” (automatically rebuilt; see Chapter 19), to find a closely related taxon, then the number of vouchers in the field herbarium to which the new tree’s voucher must be compared can be reduced. If there is an exact match of the new tree’s sequence to a sequence of a precollected tree, then the first voucher to compare the new tree’s voucher with is that of the latter. Because the discriminating power of DNA barcodes in some groups is low (17), we cannot unfortunately expect a direct match of DNA sequence to indicate an exact match of all morphotypes (Fig. 2). 3.2. DNA Barcode Reference Library
The difference between a DNA barcode reference library and a standard DNA sequence database entry (e.g., a standard GenBank
i) Intra-plot (plot A) Tree 5000 GTGTACGT
GTGTACGT Tree 4000
Plot morphotype 070
Tree 6000 ACGTACGT
ACGTACGT Tree 1000 ACGTACGT Tree 2000 ACGTACGT Tree 3000
Plot morphotype 005 Plot morphotype 005 Plot morphotype 011
Tree 7000 CCTTCCTT
xxxxxxxx
(becomes m’ type 100)
No match
ii) Regional/among plots Plot A morphotype 070
GTGTACGT
GTGTACGT
Plot B morphotype 060
Plot A morphotype 005 Plot A morphotype 011
ACGTACGT ACGTACGT
ACGTACGT ACGTACGT
Plot B morphotype 010 Plot C morphotype 003
Plot A morphotype 100
CCTTCCTT
xxxxxxxx
No match
iii) Herbarium/global database Plot-wide morphotype 270 GTGTACGT
GTGTACGT GenBank/BoLD: Shorea parvifolia
Plot-wide morphotype 205 ACGTACGT Plot-wide morphotype 311 ACGTACGT Plot-wide morphotype 403 ACGTACGT
ACGTACGT GenBank/BoLD: Santiria tomentosa ACGTACGT GenBank/BoLD: Santiria indica ACGTACGA* GenBank/BoLD: Santiria sumatrana
Plot-wide morphotype 500 CCTTCCTT
xxxxxxxx
GenBank/BoLD: No close match
Fig. 2. Hypothetical examples of the use of DNA barcodes for taxonomic management and discovery. (i) Intra-plot matching. DNA from tree 5000 matches only DNA from tree 4000: it is likely that tree 5000 and tree 4000 are the same morphotype and the same species, but a physical comparison is recommended in case two closely related species have identical DNA barcodes. Time saved by using DNA barcodes: only one physical comparison is needed, versus many if no barcodes available. DNA from Tree 6000 matches a DNA sequence from three trees, which has already been found to come from two distinct morphotypes (probably in the same genus): physical comparison is mandatory, to determine the morphotype of Tree 6000. Time saved: only two morphotypes need to be compared with tree 6000. DNA from Tree 7000 does not match DNA from any other tree: it is possible that a physical comparison would find an identical morphotype and reveal a cryptic species, but unlikely. Time saved: physical comparison of tree 6000 is a low priority and could be skipped in some cases. (ii) Inter-plot matching. DNA from plot A morphotype 70 matches only DNA from plot B morphotype 060: it is likely that these morphotypes are the same, and are the same species, but a physical comparison is recommended, in case (a) two closely related but morphologically distinct species have identical DNA barcodes, or (b) there is geographical variation in morphology in one species. In the case of the latter, a taxonomic decision (one species or two) may require herbarium work (see below). Time saved: only one comparison is needed. Identical DNA from plot A distinct morphotypes 005 and 011 matches DNA from plot B morphotype 010 and plot C morphotype 003: thorough physical matching is needed among all four source morphotypes, to determine if there are two, three, or four plot-network-wide morphotypes. Time saved: only these four morphotypes need to be compared, rather than all members of a tentative genus. DNA from plot A morphotype 100 does not match DNA from any other plot morphotype: probably a unique morphotype and species. Time saved: physical comparison of plot A morphotype 100 is a low priority. A final physical review of all morphotypes should be completed, among morphotypes clustered by similar DNA (or by tentative genera, if these have been assigned by field botanists), to determine if there are potentially cryptic species, revealed by different DNA, but having identical morphology. (iii) Herbarium and DNA database matching. DNA from plot-wide morphotype 270 BLASTs to an identical match with Shorea parvifolia: it is likely that morphotype 270 is indeed S. parvifolia, but with relatively few Shorea having ever been sequenced, other Shorea may have identical barcodes, hence all Shorea in the same section rank should be compared morphologically with vouchers of morphotype 270. If the match is indeed to S. parvifolia, then a taxonomic discovery may be made (range expansion, minor morphological variation, etc.). Identical DNA from plot-wide distinct morphotypes 205, 311, and 403 BLASTs to an identical match with Santiria tomentosa, S. indica, and a close match to S. sumatrana. Thorough physical matching in the herbarium (and in monographs) is needed for the three morphotypes, focused on the three possible Santiria species, but including all likely Santiria, if possible. DNA from plot-wide morphotype 500 does not BLAST to any sequence in any database: the morphotype may be a new species, but more likely it is a species that has been collected before but not sequenced. A herbarium and book search should follow, directed either by a taxonomist’s recognition of genus, or starting with taxa with similar DNA sequences.
388
C.W. Dick and C.O. Webb
entry) is that whereas the standard database publishes sequence information at face value, a DNA barcode entry bundles together two hypotheses that must be supported with metadata: (1) that the DNA sequence is accurate, and (2) that the species identification is accurate. The DNA sequence in a DNA barcode reference library must be accompanied by the raw data (chromatogram) so that other researchers can verify that differences in nucleotide sequence between species are robust and not merely sequencing artifacts. The metadata needed to address the taxonomic hypothesis are the herbarium voucher data and accompanying collection information, the most important of which are geographic location, photographs, and collection date. Several data platforms accommodate DNA barcode sequences and metadata. These include the Barcode of Live Database (BOLD) (20) and the DNA barcode entry option of GenBank called “BarSTool”. Taxonomic metadata should also be registered in the Global Biodiversity Information Facility (GBIF; http://www.gbif.org), which serves as a repository for biodiversity information, including species ranges. 3.3. Role of the Herbarium
A significant added value of DNA barcode surveys are the associated specimens and genomic DNA that, if properly curated (21), can be used for future generations of biodiversity researchers. It is essential to provide the best quality voucher material (i.e., fertile material) for permanent herbarium curation (most herbaria will not accept sterile or poor-quality specimens). The herbarium provides the infrastructure for exchanging specimens to other institutions so that specialists can make taxonomic determinations and incorporate the specimen information into floras or species descriptions. Herbarium-based curators and systematists can recognize rare or novel taxa, and flag these for additional field collections or observations. Costs for herbarium curation need to be incorporated into research budgets, and collaboration agreements should be established prior to the initiation of a large-scale DNA barcode project. Herbarium staff members are often involved in acquiring research and collection permits, for example, which can be a time consuming and laborious procedure that should be dealt with as early as possible.
3.4. Taxonomic Discovery
The discovery of new species, site records, variants or hybrids involves a comparison of morphological data (morphospecies designation) based on field observations and herbarium vouchers, and the DNA barcode haplotypes (Fig. 2). There are two deviations from the ideal one-to-one relationship between the DNA barcode and the locally defined morphospecies: (1) DNA barcodes are identical across multiple morphospecies, or (2) multiple DNA barcode haplotypes are found within a single putative morphospecies. Since each scenario can arise from different biological causes, these cases require further evaluation (Fig. 2).
18
Plant DNA Barcodes, Taxonomic Management, and Species Discovery…
389
Case 1: One DNA barcode for multiple morphospecies. When identical DNA barcodes are found in different morphospecies, it likely reflects a recent speciation history in which mutational differences among species have not yet accrued and sorted. Such is the case in species rich tree genera, such as Inga and sections of the genus Ficus (22). The genetic discrimination of such taxa will require more variable DNA markers (23); and see discussion). These cases underscore the need to maintain archived DNA for future genotyping. Shared cpDNA haplotypes may also be explained by hybridization. Hybridization can be detected in several ways including: within a phylogenetic context as incongruence between nuclear and plastid phylogenies, by geographical associations of haplotypes shared across species (24), by levels of genetic admixture with nuclear loci (23), or by morphological intermediates between the putative species in the field. Taxonomic specialists often have a prior idea of the importance of hybridization in their taxa, based on their examination of morphological discontinuities among species. Case 2: One morphospecies with multiple DNA barcodes. Variant DNA barcodes can be found within species across the geographic range, or even locally in some species (25). This can indicate the existence of morphologically cryptic or semicryptic species, which might have been lumped together as single taxon by field workers (14). In this case, the field workers should revisit the individuals with divergent haplotypes, and carefully examine adult individuals along with nearby seedling and saplings, and collect samples from individuals representing the full range of morphological and ecological variation. If the DNA variation is consistently associated with certain morphological or ecological types, this can provide good evidence of multiple species. These cryptic species can be flagged for further study focused on potential reproductive barriers, such as nonoverlapping phenology and habitat segregation (26). If the two cryptic species are not sister species, then they should also segregate in different nodes within a broadly sampled phylogeny (see Fig. 3).
4. Discussion Standard cpDNA barcodes will be useful for discriminating species across distant clades and within relatively old clades (e.g., with sister species divergences older than the Pleistocene). We provide the example of Trema micrantha species complex (Fig. 3) as an example in which DNA barcodes could be used to discriminate a cryptic species in a long-term forest inventory plot (28).
390
C.W. Dick and C.O. Webb
Fig. 3. Example of using DNA barcodes to diagnose cryptic species. In Barro Colorado Island (BCI), Panama, there was thought to be a single species of Trema—the common pioneer tree species Trema micrantha (27). Molecular studies in the 50 ha plot on BCI revealed highly divergent cpDNA and ITS haplotypes (Dick C, unpublished) among samples, which corresponded with two ecotypes which exhibit ecological differences in light requirement and which can be morphologically distinguished by the color of the endocarp (26). Yesson et al. (2004) showed that showed that T. micrantha is a species complex, and the two BCI morphotypes (T. micrantha 1 and 2) are not even sister species. Each morphotype is widespread, as indicated by sampling from Ecuador (EC) and form clades with other species with high bootstrap support (*). This phylogeny was adapted from Fig. 2 in Yesson (2004).
The more recently evolved species-rich groups (e.g., Inga and Ficus sections) may contain enough morphological variation for discrimination in the field, and yet be invariant using the standard plant DNA barcodes. When morphology is not useful for discriminating these species, alternative sets of DNA markers may be used. The nuclear Internal Transcribed Spacer provides more nucleotide substitution variation than most chloroplast DNA and may be amplified using universal primers. Closely related species with recent common ancestors are expected to share many alleles and haplotypes, but their reproductive isolation should be apparent in the form of distinct allele frequencies among syntopic (co-occurring in the same habitat) populations of the putative species. This requires a population genetics approach. The forest inventory plots provide an excellent system in which to detect reproductive barriers based on genetic differentiation using tools of population genetics because (1) populations of the target species are already mapped and available for analyses and (2) because the species occur in the same locale, the genetic differentiation analysis will not be confounded by differentiation due to isolation by distance processes.
18
Plant DNA Barcodes, Taxonomic Management, and Species Discovery…
391
Microsatellite DNA markers (also known as simple sequence repeats or SSRs) are the most commonly used DNA markers for such analyses because of the high rate of mutation and allele richness within populations. Microsatellites are typically isolated from anonymous regions of the nuclear genome. However, because the primers that are designed from the flanking nucleotide sequences are also variable, the microsatellite markers are often speciesspecific or transferable only to very closely related species. It is not feasible to develop novel microsatellite DNA markers for every potential cryptic species pair. When working within families, an alternative method is to develop microsatellite DNA markers from Expressed Sequence Tags (ESTs). ESTs are short DNA fragments of expressed genes obtained from messenger RNA (mRNA). Although the mRNAs code for proteins, they contain untranslated regions (UTRs) at the 3¢ and 5¢ ends with SSRs at a frequency of 1–2% (29). Because EST-SSR loci are adjacent to coding sequences, highly conserved PCR primers can be designed, which are transferable across species, genera, and even higher-level taxa (29). ESTSSRs have an additional advantage over anonymous nSSRs in that they generally do not produce null alleles (unamplified alleles) because of their highly conserved priming sites. EST-SSRs can be mined from online EST databases using Web-based bioinformatics search engines. There are currently more than 52 million ESTs in GenBank, including thousands from important and species rich tropical tree families, such as Fabaceae, Rubiaceae, and Lauraceae. The multilocus dataset for multiple species can be analyzed using Bayesian clustering approaches that estimate the most likely number of genetic demes (K) in the sample (this can be done using the program STRUCTURE) (23, 30). If, for example, five morphospecies represent five distinct species, the analysis should infer K = 5 demes and assign all individuals to their morphospecies-defined deme. The existence of demes in forest plots is indicative of reproductive isolation (i.e., true species under a biological species concept) because there is not sufficient distance to impede gene flow due to geographic distance. The sample size often used for population genetic analyses is approximately 30 individual per species (to obtain allele frequencies), using ca. 10 SSR loci (for multiple independent estimates of demes). Individuals should be sampled at spaced intervals (e.g., 50 m) throughout the plot to avoid sampling of close relatives. ESTs can also be a source of phylogenetically informative introns. Although introns are spliced from the mRNA, the EST can be compared to known genomes (e.g., Arabidopsis thaliana or Populus trichocarpa) to determine which ESTs span introns. From these, Exon Primed Intron Crossing (EPIC) markers can be developed (31). EPIC markers are expected to amplify nuclear introns broadly across higher-level taxa because of their highly conserved priming regions. Markers such as these will be useful for distinguishing among closely related species, and for developing phylogenies for establishing species relationships.
392
C.W. Dick and C.O. Webb
In summary, we see great potential for DNA analyses to assist in the management of taxonomic entities in species-rich forest inventory plots, and in the discovery of new species. We can imagine a time when a multilocus library of DNA sequences existed for all named species, with an estimate of sequence variation within each species, and when we could affordably sequence millions of base pairs for each individual in the plot using pyrosequencing. We could then match trees to local plot taxa, and to named species, without ever consulting physical vouchers (not that such a DNAonly approach would necessarily be desirable). Significantly original sequences would then almost certainly indicate species new to science. However, we are of course far from having these data available, and so DNA barcodes must be considered an additional valuable source of data in our taxonomic work, to be used in dialog with physical vouchers, rather than a goal in themselves.
Acknowledgments Some of the methods were derived from research supported by the National Science Foundation (DEB awards 0640379 to CD, and 1020868 to CW) and the Center for Tropical Forest Sciences. We thank John Kress and David Erickson for the invitation and for useful ideas for the paper. References 1. Fine PVA, Ree RH (2006) Evidence for a timeintegrated species-area effect on the latitudinal gradient in tree diversity. Am Nat 168: 796–804 2. Jørgensen PM, León-Yánez S (1999) Catalogue of the vascular plants of Ecuador. Missouri Botanical Garden, St. Louis, MO 3. Balslev H, Valencia R, Paz y Miño G, Christensen H, Nielsen I (1998) Species count of vascular plants in one hectare of humid lowland forest in Amazonian Ecuador. Forest biodiversity in North, Central and South America, and the Caribbean, research and monitoring. In: Dallmeier F, Comiskey JA (eds) Man and the biosphere series, vol 21. UNESCO, Paris, pp 585–594 4. Ruokolainen K, Tuomisto H, Kalliola R (2005) Landscape heterogeneity and species diversity in Amazonia. In: Bermingham E, Dick CW, Moritz C (eds) Tropical Rainforests, Past, Present and Future. University of Chicago Press, Chicago, pp 251–270 5. Prance GT, Beentje H, Dransfield J, Johns R (2000) The tropical flora remains undercollected. Ann Mo Bot Gard 87:67–71
6. Dick CW, Kress WJ (2009) Dissecting tropical plant diversity with forest plots and a molecular toolkit. Bioscience 59:745–755 7. Bebber DP, Carine MA, Wood JRI et al (2010) Herbaria are a major frontier for species discovery. Proc Natl Acad Sci U S A. doi:10.1073/ pnas.1011841108 8. Mori SA, Prance GT (1987) A guide to collecting lecythidaceae. Ann Mo Bot Gard 74: 321–330 9. RAINFOR (Amazon Forest Inventor y Network) http://www.geog.leeds.ac.uk/ projects/rainfor/pages/project_eng.html. Last accessed on 28 Feb 2011 10. Laman TG (1995) Safety recommendations for climbing rain-forest trees with single rope technique. Biotropica 27:406–409 11. Colpaert N et al (2005) Sampling tissue for DNA analysis of trees: trunk cambium as an alternative to canopy leaves. Silvae Genetica 54:265–269 12. Blanco MA et al (2006) A simple and safe method for rapid drying of plant specimens using forced-air space heaters. Selbyana 27:83–87
18
Plant DNA Barcodes, Taxonomic Management, and Species Discovery…
13. Leisner R (Field Techniques used by the Missouri Botanical Garden) http://www. mobot.org/mobot/molib/fieldtechbook/ welcome.shtml. Last accessed on 28 Feb 2012 (Missouri Botanical Garden, St. Louis, MO) 14. Janzen DH et al (2009) Integration of DNA barcoding into an ongoing inventory of complex tropical biodiversity. Mol Ecol Resour 9:1–26 15. Kress WJ (2004) Plant floras: how long will they last? A review of flowering plants of the Neotropics. Am J Bot 91:2124–2127 16. Webb CO, Slik JWF, Triono T (2010) Biodiversity inventory and informatics in Southeast Asia. Biodivers Conserv 19:955–972 17. Gonzalez M-A, Baraloto C, Engel J et al (2009) Identification of Amazonian trees with DNA barcodes. PLoS Biol 4:e7483 18. Herbarium University of Florida (Preparation of plant specimens for deposit as herbarium vouchers) http://www.flmnh.ufl.edu/herbarium/ voucher.htm#Identification. Last accessed on 28 Feb 2012 19. Condit R (1998) Tropical forest census plots: methods and results from Barro Colorado Island. Panama and a comparison with other plots, Springer-Verlag, Berlin 20. Ratnasingham S, Hebert PDN (2007) BOLD: The Barcode of Life Data System (http://www. barcodinglife.org). Mol Ecol Notes 7:355–364 21. Savolainen V, Reeves G (2004) A plea for DNA banking. Science 304:1445 22. Kress WJ et al (2009) Plant DNA barcodes and a community phylogeny of a tropical forest
23.
24.
25.
26.
27. 28.
29.
30.
31.
393
dynamics plot in Panama. Proc Natl Acad Sci U S A 106:18621–18626 Duminil J, Caron H, Scotti I, Cazal SO, Petit RJ (2006) Blind population genetics survey of tropical rainforest trees. Mol Ecol 15:3505–3513 Saeki I, Dick CW, Barnes BV, Murakami N (2011) Comparative phylogeography of red maple (Acer rubrum L.) and silver maple (A. saccharinum L.): impacts of habitat specialization, hybridization and glacial history. J Biogeogr 38:992–1005 Dick CW, Heuertz M (2008) The complex biogeographic history of a widespread tropical tree species. Evolution 62:2760–2774 Silvera K, Skillman JB, Dalling JW (2003) Seed germination, seedling growth and habitat partitioning in two morphotypes of the tropical pioneer tree Trema micrantha in a seasonal forest in Panama. J Tropical Ecol 19:27–34 Croat TB (1978) Flora of Barro Colorado Island. Stanford University Press, Stanford, CA Yesson C, Russell SJ, Parrish T et al (2004) Phylogenetic framework for Trema (Celtidaceae). Plant Syst Evol 248:85–109 Ellis JR, Burke JM (2007) EST-SSRs as a resource for population genetic analyses. Heredity 99:125–132 Pritchard JK, Stephens JC, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959 Li C, Riethoven JM, Ma L (2010) Exonprimed intron-crossing (EPIC) markers for non-teleost fishes. BMC Evol Biol 10:90
Chapter 19 Construction and Analysis of Phylogenetic Trees Using DNA Barcode Data David L. Erickson and Amy C. Driskell Abstract The assembly of sequence data obtained from DNA barcodes into phylogenies or NJ trees has proven highly useful in estimating relatedness among species as well as providing a framework in which hypotheses regarding the evolution of traits or species distributions may be investigated. In this chapter, we outline the process by which DNA sequence data is assembled into a phylogenetically informative matrix, and then provide details on the methods to reconstruct NJ or phylogenetic trees that employ DNA barcode data, using only barcode data or combining barcodes with other data. Key words: Nucleotide, Homology, Alignment, Parsimony, Likelihood, DNA barcode, Community phylogeny
1. Introduction All molecular systematics is based on inferring relationships among species based on patterns of substitution at homologous nucleotide (and/or amino acid) bases that vary among taxonomic groups. The result of these analyses is a phylogenetic hypothesis, commonly expressed as a “phylogenetic tree” (1). A phylogenetic tree describes the evolutionary relationships among species, which can also provide inference of the relative degree of divergence (in time or other units) that separate taxa. In a phylogenetic tree, the topology is the pattern of which taxa are grouped with each other, a clade is a grouping of two or more taxa, and the term distance is the length of the branches that connect taxa or clades, which can represent the time since those taxa or clades diverged from a common ancestor. The ability of DNA barcode data to wholly or in part contribute to the reconstruction of well-resolved molecular phylogenies offers tremendous value to the entire community
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_19, © Springer Science+Business Media, LLC 2012
395
396
D.L. Erickson and A.C. Driskell
of ecologists and evolutionary biologists who seek to elucidate and understand biological diversity, as well as benefiting those who employ phylogenetic data to address the ecological and evolutionary mechanisms that promote and maintain species diversity (2, 3). The construction of these trees can help researchers doing basic DNA barcode research by providing a way to assign unknown samples to a species or other taxonomic clade. For example, when sequences are submitted to BLAST for identification, the query sequence can be shown within a distance tree with other, similar sequences to help represent to what species the query sequence belongs. The sequence identification search engine at BOLD uses the same concept and many LIMS systems like WAISABI and Geneious use distance trees to help assign and verify the identity of sequences. Consequently, building aligned sequence matrices into which unknown or unverified sequences may be added promotes best practices and workflow processes in managing DNA barcode data. Likewise, the identification of novel genetic barcode sequences that may belong to new species may be initiated through incorporation into phylogenetic trees containing verified DNA barcode sequences. The delineation of novel species can be challenging and involve much more data than just DNA barcode sequences alone, but use of DNA barcode data to explore and quantify the magnitude of support for discrete genetic groupings that correspond to new species is greatly enhanced by use of molecular phylogenetic trees (see Chapter 18 and ref. 4). In addition to facilitating taxonomic investigations, molecular phylogenies using DNA barcode data are also being implemented by ecologists to explore the ecological mechanisms that affect community structure and function (see Chapter 20; also refs. 5–7). Lastly, phylogenies themselves represent a metric for the measurement of genetic diversity (8), which then allow the unambiguous quantification of genetic diversity for entire communities. This then promotes comparative analysis of genetic diversity in both time and space (9). Consequently, the ability to reconstruct phylogenetic trees has very many uses in ecology and evolution, and using DNA barcode data to correctly assemble community or taxonomic phylogenies provides a tremendous benefit to those who collect and use DNA barcode data. This chapter explicitly focuses on the methods we use to construct molecular phylogenies, starting from verified DNA barcode sequences in FASTA format, through to the generation of phylogenetic trees (Fig. 1). We emphasize that the methods used here are by no means exclusive to other methods; for example there are many methods of sequence alignment and those we have chosen,
19
Construction and Analysis of Phylogenetic Trees Using DNA Barcode Data rbcL
matK/CO1
trnH/ITS
Edited Sequence
FASTA
FASTA
FASTA
Sequence Alignment
Sequencher/ transAlign
transAlign/ MAFFT
Muscle/ MAFFT
Matrix Construction: Nexus File
SequenceMatrix/ SeqCat
Species
Phylogenetic Reconstruction
397
1 2 3 4 5 6 7 8 9
trnH/ ITS trnH/ ITS rbcL
matK
trnH/ ITS trnH/ ITS
Maximum Liklihood: Garli/RAxML
Parsimony: PAUP/TNT 1 2 3 4 5 6 9 8 7 Outgroup
1 2 3 4 5 6 9 8 7
Fig. 1. A workflow of data from sequence to phylogeny is outlined. The programs we use at each step are given.
we do so based on our own experience. Many of the examples of different genes come from plants, but the processes outlined can be applied across organisms and genes. The workflow pattern is what will remain constant even as the exact sequence analysis programs implemented may evolve.
398
D.L. Erickson and A.C. Driskell
2. Materials 2.1. Data Input Files
1. Coding DNA: Sequences of COI (animals), rbcL, and matK (Plants), are maintained as concatenated FASTA files. Each file is in .txt format and contains sequences from all species intended for use in the community phylogeny. Sequences are fully edited and comply with the DNA barcode standards established by CBOL (see Chapter 13 on LIMS). Alignments are done independently for each gene, thus each file should have only sequences for that gene present. Outgroup sequences must be included at this point. All sequences should be in the same orientation (typically 5¢ → 3¢), but length may vary. 2. Noncoding DNA: close relatives. Rapidly evolving noncoding sequences may vary in length in many clades due to the insertion or deletion of nucleotides, which can confound many alignment algorithms. However, if the phylogeny comprises a single plant Order or Family, noncoding sequences, such as the nuclear ribosomal internal transcribed spacer (ITS) or the chloroplast intergenic spacer trnH-psbA may be aligned globally in the same manner as coding genes. Thus, output can be formatted as a single FASTA file in same orientation. 3. Noncoding DNA: nested alignments. With more divergent taxa, or when performing analyses for entire communities or very divergent clades we subdivide alignment for rapidly evolving noncoding markers like trnH-psbA and ITS based on taxonomy (see Note 1). Input for alignment programs requires individual concatenated FASTA files for each Order or Family. Similar treatment may also be required for CO1 when combining very divergent taxa into a single community phylogeny (e.g., echinoderms, annelids, and platyhelminths; see Note 2). Algorithms that partition sequences based on genetic distance (i.e., Mega-phylogeny (10)) rather than taxonomy may also yield improved alignments.
2.2. Sequence Alignment Programs (see Note 3)
1. rbcL alignments: We use Sequencher 4.8 (GeneCodes Corp.) to perform alignments for rbcL across all taxa. Other programs, such as Geneious 1.07 (Biomatters Ltd.) may also be used, but we have more experience with Sequencher. Global alignments from Sequencher may contain many erroneously inserted gaps due to the divergence of sequences—Sequencher was not designed as a cross-taxon alignment tool and does not handle very divergent sequences in the manner one might expect. However, it is still a useful tool. Sequencher alignments must be edited to remove the inserted gaps—the command “collect all gaps right/left” under the “Sequence” menu greatly facilitates this.
19
Construction and Analysis of Phylogenetic Trees Using DNA Barcode Data
399
2. matK alignments: We use transAlign (ref. 11; freely downloadable at http://www.molekularesystematik.uni-oldenburg.de/ en/34011.html) for global alignment of matK data. The program transAlign translates the DNA sequence data of protein coding genes into amino acid (aa) sequences, sends the aa sequences to a separate program for alignment, and lastly “back” translates the resulting aa alignment into a nucleotide sequence alignment, preserving the original DNA sequences. Thus, transAlign does not perform the alignment, but rather interacts with an alignment program and maintains the integrity of the original DNA sequences while using an aa translation for the actual inference of sequence homology. As noted, transAlign requires an interface with an alignment program also installed on your computer—we use ClustalX (12), although other alignment programs can be substituted (e.g., Muscle, MAFFT, T-Coffee). 3. COI alignments: COI alignments can be treated as matK above, given that both are rapidly evolving coding genes. Backtranslation is frequently used for assembling aligned databases of COI, and thus the use of transAlign works well. MAFFT (ref. 13; http://mafft.cbrc.jp/alignment/software/) is also very effective. 4. trnH-psbA and ITS alignments: We use Muscle 3.6 (ref. 14; http://www.drive5.com/muscle) for alignments of trnHpsbA sequences (which would also apply to ITS and any other rapidly evolving noncoding sequence data. We use default parameters for the program when aligning. 5. Matrix construction: To combine separate aligned files into a matrix with sequences for multiple genes concatenated (end-toend) (see Fig. 2), we currently use the Java-based program SequenceMatrix 12.7.0 (ref. 15; freeware http://taxondna. sourceforge.net/). We have also used both MacClade4 (16) and more recently, a perl script called SeqCat (http://www.molekularesystematik.uni-oldenburg.de/en/34011.html). Taxon labels exported from SequenceMatrix in nexus format will be truncated to 32 characters long so input labels need to be adjusted accordingly. MacClade4 operates only in Mac OS, while SequenceMatrix runs on any operating system where the Java binary is installed. The SeqCat program allows much longer taxon labels, but its output is an interleaved nexus file. The output file may need to be converted to a noninterleaved file (e.g., via PAUP* 4.0) before use in some downstream programs. 2.3. Phylogeny Reconstruction Programs
1. Parsimony: We use PAUP* 4.0 (17) for phylogenetic inference using the parsimony criterion. PAUP* runs under the Mac OS or UNIX. An alternative for parsimony searches is TNT (18), http://www.zmuc.dk/public/phylogeny/tnt/, which can be
400
D.L. Erickson and A.C. Driskell Sequence Matrix rbcL
matK
Tree Topology
trnH-psbA
----------------------------------------------
Inga laurina
----------------------------------------------
Inga marginata
----------------------------------------------
Inga nobilis
----------------------------------------------
Inga oerstediana
Casearia aculeata ---------------------------------------------Casearia guianensis ---------------------------------------------Casearia arborea Casearia sylvestris ---------------------------------------------commersoniana - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Casearia -------------------------------------------------
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Solanum circinatum Solanum hayesii ---------------------------------------------- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Solanum asperum - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Solanum lepidotum ----------------------------------------------
Bactris coloniata
----------------------------------------------
Bactris barronis
-------------------------------------------------------------------------------------------
Ginkgo biloba
Bactris major
Fig. 2. Outline of Supermatrix (or nested matrix) design. Coding genes can be aligned globally, across highly divergent clades, whereas the most rapidly evolving sequences are partitioned into smaller alignment blocks to improve the likelihood of correctly assessing homology among aligned nucleotides. In this model of a matrix for a community containing four divergent lineages, the intergenic spacer trnH-psbA is aligned within orders, with different orders nested into discrete partitions of the matrix. A nested design may be implemented when using any hypervariable sequence region.
run under Windows or UNIX. We have also implemented parsimony analyses via PAUP on the CIPRES Web server (http:// www.phylo.org/sub_sections/portal/), which runs large jobs very quickly. 2. Maximum likelihood: We have used both Garli 0.951 and and RAxML for maximum likelihood (ML) phylogenetic inference. Garli (ref. 19; freeware http://garli.googlecode.com) and RAxML (ref. 20; freeware http://icwww.epfl. ch/~stamatak/) may also be run on the CIPRES Web server for very large data sets (see Note 4). 3. Neighbor Joining trees: We use PAUP for calculating neighbor joining trees from DNA sequence alignments; however, alternative programs include clearcut 1.0.9 (21), which is also available via CIPRES (see Note 5).
3. Methods 3.1. Conducting Alignments
1. transAlign. We use transAlign in conjunction with ClustalX for global alignment of the coding genes. After the program is called, it will prompt for file name and location of the
19
Construction and Analysis of Phylogenetic Trees Using DNA Barcode Data
401
concatenated FASTA file (from Subheading 2.1 above). Default location is the same directory as the executable. Subsequent information is requested in a stepwise order, with an important choice being the type of translation table to be used (for plants we use the bacterial/plastid; for animals the appropriate mtDNA table can be selected). We typically choose to output as an aligned FASTA files as this format is most easily read by the programs we use for matrix construction. We choose to have the program check all six possible reading frames to cope with any sequences accidentally submitted in reverse orientation. The program will also screen for and report possible insertion–deletion (indel) errors that create pseudogene-like stop codons. 2. Muscle: We use both command line and Web-based versions of Muscle for alignment of noncoding DNA sequences. The online version is available at: http://www.ebi.ac.uk/Tools/ msa/muscle/. Again, FASTA formatted sequences (from Subheading 2.1 above) are the source input. For nested alignments of noncoding sequences, we use a script (from above at Subheading 2.2, item 4) that automates batch submissions to a command line version of Muscle; this expedites having to run Muscle for each set of nested sequences. 3. MAFFT: MAFFT has proven to be highly accurate in alignment of complex sequences. As with Muscle, we use both online and local command line versions. Local command-line versions handle larger numbers of sequences than online implementations; and can be implemented in LIMS type programs like Geneious. Using the using the FFT-NS-I option, which assumes there are blocks of concordance among sequences punctuated by gaps, has produced the best alignments for rapidly evolving genes like matK and CO1. 4. Manual verification of alignment: Each alignment produced by the alignment methods listed above is checked manually using SeAl (freeware http://tree.bio.ed.ac.uk/software/seal/) (see Note 3). Typically, the rbcL alignment is then exported as a nexus formatted file (FASTA is also ok; see Note 6), and the matK and trnH-psbA files are exported as aligned FASTA files. Output files for each gene are then used to construct a combined supermatrix, or nested matrix as below (see Note 7). Alignments of the coding genes can be screened for one and two base gaps which typically correspond to sequence error and which arise from insertion of a gap from a single sequence. We cross check all one and two base gaps found in the aligned sequences with the raw data from the individuals causing the gaps and delete the base causing the gaps unless strongly supported by raw data. Similarly, the consensus sequence of all aligned coding genes can be exported and checked for reading
402
D.L. Erickson and A.C. Driskell
frame errors via translation. Stop codons in the consensus sequence are evaluated, with those sequences which are responsible for the stop codon in the consensus sequence checked. 3.2. Matrix Construction
1. Concatenation and supermatrix design. To combine multiple aligned DNA sequence files into a single matrix for use with phylogenetic analysis programs, we use the java program SequenceMatrix. SequenceMatrix combines alignment files (in aligned FASTA or Nexus format) so that each gene is added sequentially, and sequence data from different genes with the exact same taxon label will be concatenated. In addition, each specimen for which the data is to be concatenated must have a unique label and these must be consistent among input files. Alignment files can be dragged and dropped directly into the SequenceMatrix window and all files can be selected and added simultaneously. This is particularly useful when working with nested alignments—in large community phylogenies there may be dozens of Family or Ordinal-specific alignment files (see Fig. 2). We do not have the program code external gaps as question marks (however, see Note 9). Failure to identically label the same sample in different input files results in failure to concatenate sequences for a taxon. The resulting matrix is then exported as a nexus file with the option of “one single (potentially very long) line”.
3.3. Phylogenetic Reconstruction
1. Parsimony. We have employed PAUP* on local servers as well as via the CIPRES portal. The CIPRES portal implements the parsimony ratchet (22), as well as searching for a single most parsimonious tree using PAUP*. When analyzing nested supermatrices, it is critical to correctly specify the fraction of bases for the algorithm to “deform” during the ratchet runs (use the “set parameters” command). The ratchet procedure uses a hillclimbing algorithm to search for a best tree and to avoid fixation on a locally optimal, but globally suboptimal, tree-island, the algorithm “deforms” or alters base composition of the data to compel exploration of alternate potentially superior tree space. In a highly nested supermatrix (e.g., Fig. 2) where a large fraction of the matrix will consist of missing data (e.g., >95% missing data, Chapter 22), the percentage of bases deformed during the ratchet iteration must be relatively high (³50%) to ensure that the data is sufficiently deformed to dislodge the search from local optima. Because so many of the characters in a nested matrix have a high proportion of missing data, a large percentage of the matrix must be manipulated by the program in order to change enough informative data. For a data matrix that is globally aligned for all sequences (e.g., when using only CO1, or only rbcL + matK) the default setting
19
Construction and Analysis of Phylogenetic Trees Using DNA Barcode Data
403
of deforming 20% of bases should be adequate. We initiate five separate runs of PAUP* on CIPRES, and then combine the ratchet trees from each run into a single file for use in constructing a consensus tree. Using PAUP* on a local cluster has proven more useful, particularly when implementing a constraint tree (see Note 8). Constraint trees are readily defined, either through selection of a nexus formatted tree under GUI versions of PAUP*, or though inclusion of a constraint command in PAUP* command-blocks (see Note 9). In general, we do not include insertion–deletion characters as part of the data matrix (see Note 10). 2. Maximum likelihood on local cluster. We have used both RAxML and GARLI for ML-based phylogenetic reconstruction. Both are available via the CIPRES server, and both can readily use a newick-defined constraint tree (however, see Note 5). Both RAxML and GARLI are implemented on local clusters (see Note 11 for PAUP* command-block we use for both programs) which is equivalent to use of the GTR + I + Gamma model, which is broadly employed for many data sets. 3.4. Assessment of Topology and Support
1. For both MP and ML methods, we recommend the use of standard bootstrapping procedures. This is particularly true when the data matrix consists entirely of barcode data, which is by definition, a minimal quantity of data. For parsimony, the parsimony ratchet is a good alternative, and we combine trees from at least five separate MP runs with parsimony into a consensus tree. For ML, we suggest initiating 100 separate runs, each begun with a random addition starting tree. All trees are then assembled into a single nexus file with trees block, and a 50% majority rule tree is constructed. Use of a constraint tree may render estimation of topology irrelevant, but a phylogeny produced without that constraint, should be evaluated based on congruence with known phylogenies, and with respect to the fraction of taxonomic ranks that form monophyletic groups.
4. Notes 1. The partitioning of sequences that cannot be aligned globally is important since the accuracy of the alignment is dependent on genetic distance of those sequences being aligned. This will be true for the most rapidly evolving sequences, including any of the noncoding sequences as well as for CO1 which may saturate when alignments are among all members of a community. In plants, we evaluate the scale of alignment by comparing alignments using rbcL plus trnH-psbA (or ITS) aligned
404
D.L. Erickson and A.C. Driskell
at ordinal level with topologies based on rbcL only. We note if the phylogeny produced using both genes alters the topology from that produced with rbcL only. We assume trnH-psbA should not change the topology produced by rbcL, except in resolving polytomies within the rbcL-only tree. We partition the trnH-psbA alignment into successively lower taxonomic groups until the topology of the combined matrix is consistent with the rbcL-only data matrix. That is, when the trnH-psbA alignment becomes ambiguous due to alignment at too high a taxonomic scale, it will cause erroneous topological rearrangements at higher scales that conflict with topologies observed when using rbcL-only; thus, when trnH-psbA is aligned correctly, it should not contradict the rbcL (or rbcL + matK) alignments and will instead just increase resolution in poorly resolved clades. 2. COI can be aligned between highly divergent species, often through back translation. However, the phylogenetic information content may be limited, even if the alignment is legitimate because the rapid nucleotide substitution rate leads to saturation of changes at a given position. Accordingly, it may be prudent when one seeks to align many highly divergent clades with COI, as for a community phylogeny, to subdivide the alignment in a nested format, such as that used with noncoding genes. 3. When manually checking sequence alignments, we only make certain types of modifications, and otherwise leave the computer-generated alignments as is. Specifically, for rbcL alignments, there must be no gaps of any kind. SeAl provides a rapid tool for screening for gaps, usually the result of incorrect sequence editing. Further, the aligned sequence matrix for rbcL must contain no stop codons in the translation. For matK, there will be a substantial number of gaps, but nearly all gaps will be in multiples of three, corresponding to differences in the number of complete amino acids in the mature protein. Any one or two base gaps are likely the result of errors in sequence editing and must be confirmed in the source sequences. Further, as with rbcL, individual contigs as well as the aligned sequence matrix for all samples should contain no stop codons. We have observed pseudogenes in matK in a few families (particularly Melastomaceae) such that correct sequences contain one or two base gaps, which then affect inference of stop codons in an aligned matrix. Thus, one cannot definitely say no stop codons may be present in an aligned matK sequence matrix. For trnH-psbA, because it is noncoding, the use of stop codons to evaluate sequence alignment is not applicable. Typically, we trim trnH-psbA sequences with an
19
Construction and Analysis of Phylogenetic Trees Using DNA Barcode Data
405
internal primer that is different from the primer used for PCR (Hamilton 1999 psbA sequence). For CO1, the presence of stop codons should alert one to the possibility of having retrieved an NuMt, rather than a true mitochondrial copy. 4. RAxML appears to interpret congruence of the ML-generated tree with the constraint tree differently than does GARLI. In RAxML a clade that is a polytomy in the ML tree is not regarded as conflicting with a clade that is resolved in the constraint tree, thus it is possible for the constraint tree to have better resolution in some clades than an ML tree produced by RAxML which employs that constraint. Alternatively, GARLI appears to enforce the constraint tree topology more strictly, such that the ML tree produced by GARLI will always mirror the resolution of the constraint tree at a minimum, and often will resolve clades in the ML tree that are unresolved in the constraint tree. 5. Neighbor Joining is a discrete type of tree building algorithm in that it uses genetic distances obtained from an alignment matrix, but it produces the tree by ever finer subdivision of unresolved clades, as opposed to objectively evaluating relationships among species. The order in which species are listed in the alignment file may also affect the NJ tree topology because of the way ties are dealt with. Typically, ties are broken at random, thus a tie that includes >3 species can produce different topologies when the order in which those species are listed in the matrix differs. 6. Export of sequences out of Sequencher in a nexus format sometimes leads to inclusion of erroneous characters appended to the sequence reads that interfere with alignment using most alignment programs, including Muscle. Export of sequences as aligned FASTA solves this problem—and we paste sequences directly into the Web interface rather than upload text files to avoid incompatibilities between mac, pc, and unix machines. 7. We distinguish between a supermatrix and a nested matrix. A supermatrix may contain a large number of different genes for a set of specimens where the number of samples that have data for any one gene in the matrix may be low; the very large number of genes that are present unite the matrix (23). A nested matrix is a method to ensure that rapidly evolving genes are aligned correctly, when they cannot be aligned across all species in the matrix (cannot perform global alignment). In a nested alignment, one or more genes may be present and aligned for all samples, whereas other genes may
406
D.L. Erickson and A.C. Driskell
be aligned only within families to ensure estimation of homology during alignment (see Fig. 2). 8. A constraint tree can be implemented when part of the topology is known, and the user wants to enforce that known topology on the phylogeny being constructed with the barcode data. For example, in plants we can use a master tree from Angiosperm Phylogeny Group (APGIII), which specifies the relationships at the level of Order. When using an APGIII constraint tree when we run the analysis, the phylogeny produced from barcode data must be concordant with APGIII at the Ordinal level. Since APGIII does not specify the relationship at lower taxonomic levels, the barcode data will resolve family and species level relationships. Because barcode data is necessarily minimal, use of a constraint tree allows the barcode data to resolve lower level phylogenetic relationships, while leveraging existing phylogenetic data sets to correctly constrain deeper topological relationships. See Kress and coworkers (6) for further discussion and examples. 9. Example PAUP block for MP using a constraint tree: Begin Paup; Defaults hsearch; constraints
=; set autoclose=yes; set criterion=parsimony; set root=outgroup; set storebrlens=yes; set increase=auto; outgroup ; hsearch addseq=random nreps=100 swap=tbr hold=5 enforce=yes; savetrees file= format=altnex; end;
10. When one wishes to use gaps and indel variation in phylogenetic computation, it may be desirable to have SequenceMatrix code the external gaps as question marks; this will allow programs like PAUP* to treat internal gaps as a fifth character. Failure to code external gaps as missing data while using indel variation in phylogeny estimation may allow the completeness of a sequence to be interpreted as natural indel variation.
19
Construction and Analysis of Phylogenetic Trees Using DNA Barcode Data
407
11. The following PAUP command block can be read and implemented by both Garli and RAxML: begin paup; set criterion=likelihood; constraints =; lset nst=6 basefreq=empirical; lset pinvar= estimate; lset rates=gamma ncat=4 shape= estimate; hsearch nreps=10 addseq=random swap=tbr enforce=yes; SaveTrees BrLens=yes My_tree.tre replace = yes end;
References 1. Swofford DL, Olsen GJ, Waddell PJ, Hillis DM (1996) Phylogenetic inference. In: Hillis DM, Moritz C, Mable BK (eds) Molecular systematics. Sinauer Associates, Boston 2. Webb CO (2000) Exploring the phylogenetic structure of ecological communities: an example for rain forest trees. Am Nat 156: 145–155 3. Harvey PH, Leigh Brown AJ, Maynard SJ, Nee S (2006) New uses for new phylogenies. Oxford University Press, Oxford 4. Smith MA, Rodriguez JJ, Whitfield JB et al (2008) Extreme diversity of tropical parasitoid wasps exposed by iterative integration of natural history. DNA barcoding, morphology, and collections. Proc Nat Acad Sci USA 105:12359–12364 5. Kress WJ, Erickson DL, Jones FA et al (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proc Nat Acad Sci USA 106:18621–18626 6. Schreeg LA, Kress WJ, Erickson DL, Swenson NG (2010) Phylogenetic analysis of local-scale tree soil associations in a lowland moist tropical forest. PLoS One 5:e13685. doi:10.1371/ journal.pone.0013685 7. Uriarte M, Swenson N, Chazdon R et al (2010) Trait similarity, shared ancestry, and the structure of neighborhood interactions in a subtropical forest: Implications for community assembly. Ecol Lett 13:1503–1514
8. Forest F, Grenyer R, Rouget M et al (2007) Preserving the evolutionary potential of floras in biodiversity hotspots. Nature 445: 757–760 9. Hardy OJ, Jost L (2008) Interpreting and estimating measures of community phylogenetic structuring. J Ecol 96:849–852. doi:10.1111/ j.1365-2745.2008.01423.x 10. Smith S, Beaulieu JM, Donoghue MJ (2009) Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches. BMC Evol Biol 9:37. doi:10.1186/1471-2148-9-37 11. Bininda-Emonds ORP (2005) transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences. BMC Bioinformatics 6:156. doi:10.1186/ 1471-2105-6-156 12. Larkin MA, Blackshields G, Brown NP et al (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948. doi:10.1093/ bioinformatics/btm404 13. Katoh K, Misawa K, Kuma K-I, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066. doi:10.1093/nar/gkf436 14. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. doi:10.1093/nar/gkh340
408
D.L. Erickson and A.C. Driskell
15. Meier R, Shiyang K, Vaidya G, Ng PKL (2006) DNA barcoding and taxonomy in diptera: a tale of high intraspecific variability and low identification success. Syst Biol 55:715–728. doi:10.1080/10635150600969864 16. Maddison DR, Maddison WP (2000) MacClade 4: analysis of phylogeny and character evolution, version 4.0. Sinauer Associates, Sunderland, MA 17. Swofford DL (2002) PAUP*. Phylogenetic analysis using parsimony (* and other methods) version 4. Sinauer Associates, Sunderland, MA 18. Goloboff PA, Farris JS, Nixon KC (2008) TNT, a free program for phylogenetic analysis. Cladistics 24:774–786 19. Zwickl DJ (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likeli-
20.
21.
22.
23.
hood criterion. Ph.D. Dissertation, The University of Texas at Austin Stamatakis A, Ott M, Ludwig T (2005) RAxML-OMP: an efficient program for phylogenetic inference on SMPs. In: Proceedings of 8th international conference on parallel computing technologies (PaCT2005). Lect Notes Comput Sci 3506288-302. Springer Verlag, Berlin Evans J, Sheneman L, Foster JA (2006) Relaxed neighbor-joining: a fast distance-based phylogenetic tree construction method. J Mol Evol 62:785–792 Nixon KC (1999) The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15:407–414 Driskell AC, Ané C, Burleigh JG et al (2004) Prospects for building the tree of life from large sequence databases. Science 306:1172–1174
Chapter 20 Phylogenetic Analyses of Ecological Communities Using DNA Barcode Data Nathan G. Swenson Abstract Ecologists and conservation biologists are increasingly focusing on quantifying the phylogenetic component of biodiversity in order to inform basic and applied research. A major obstacle of this approach in tropical ecosystems has been the difficulty of generating high-quality phylogenetic trees for the vast numbers of species in these systems. Phylogenetic trees inferred from DNA barcodes hold the potential to overcome this obstacle. Here, I present a methodological framework for analyzing the phylogenetic alpha and beta diversity of ecological communities using a phylogenetic tree. The analytical approach is presented using the freely available and widely used software platform “R”. Key words: Biodiversity, Community ecology, Community phylogenetics, Phylogenetic beta diversity, Phylogenetic diversity, DNA barcode
1. Introduction Ecologists interested in studying and conserving biodiversity are tasked with quantifying that diversity through space and time. Typically, this has been done using a measure of species diversity. Other dimensions of biodiversity such as phylogenetic diversity and functional are less often quantified, but these forgotten dimensions may be equally or more important (1–3). Conservation biologists are interested in phylogenetic measures of biodiversity as a way to provide a more robust estimate of the overall evolutionary history being currently preserved in protected lands or the potential amount of biodiversity that could be lost in threatened regions (1, 3–5). Basic community ecology research, on the other hand, has focused on the phylogenetic structure of communities in order to gain insights into their historical assembly (6–8). Despite their differing aims, both research programs generate estimates of the phylogenetic diversity within and between species assemblages across scales. W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_20, © Springer Science+Business Media, LLC 2012
409
410
N.G. Swenson
The phylogenetic diversity of communities has been of interest in community ecology for almost 100 years where early studies analyzed the ratio of species and genera in communities as a way to understand whether biotic or abiotic interactions are important in community assembly (9, 10). Specifically, a low species to genus ratio indicates the coexistence of distantly related species—what is termed today as phylogenetic overdispersion (6). A high species to genus ratio indicates coexistence of closely related species—what is termed today as phylogenetic underdispersion or clustering (6). This species to genus ratio approach continued for decades culminating with the famous community assembly rules and null model debates of the 1960s and 1970s (10). The foundation of the species to genus ratio approach is the assumption that closely related species are more likely to have similar niches—often termed phylogenetic niche conservatism. If closely related species tend to share similar niches, then community assembly via abiotic filtering should result in phylogenetic clustering, whereas community assembly mediated by biotic interactions should result in phylogenetic overdispersion (6). Charles Darwin originally alluded to niche conservatism when he considered the implications of common descent. Specifically, species that share a recent common ancestor should, on average, tend to be more similar to one another than they are to more distant relatives. If this assumption is supported, then not only will phylogenetic diversity adequately estimate the functional diversity of an assemblage. A problem with the species to genus ratio approach, beyond the assumption of niche conservatism, is that taxonomic ranks do not convey detailed information regarding the time since two species diverged. A solution to this problem is to use phylogenetic trees with branch lengths. The branch lengths can be used to provide a more refined measure of relatedness between taxa. Though, until the early 2000s, generating phylogenetic trees representing communities (i.e., community phylogenies) was considered not possible. Pioneering work by Cam Webb and colleagues (6, 11) who developed software tools such as Phylomatic (11) for estimating phylogenetic trees for plant communities largely removed this obstacle. This innovation sparked a large number of investigations into the phylogenetic relatedness of coexisting plant species primarily in the tropics where measurements of species function or niches in tens to hundreds of locally coexisting species are difficult to achieve at best (6–8). This work has primarily sought to quantify the phylogenetic diversity in a community and to ask whether that phylogenetic diversity is higher or lower than that randomly expected given their species diversity. These results are then often used to determine to what degree abiotic or biotic interactions govern community assembly (6–8). As noted above, the development of Phylomatic by Webb and Donoghue (11), which made the estimations of community
20
Phylogenetic Analyses of Ecological Communities Using DNA Barcode Data
411
phylogenies for diverse botanical communities feasible, enabled a substantial and continually growing community phylogenetics research program. This provides evidence of how quickly the development of a new tool can spawn an entirely new literature within the matter of years. The generation of community phylogenies from DNA barcodes presents what may be considered the next substantial development in community phylogenetics (12, 13). There are two primary reasons for this prediction. First, the ability of Phylomatic to generate phylogenies for diverse communities was and is a powerful tool for ecologists, but the phylogenies generally lack much resolution within families and certainly within genera. Thus, fine-scale phylogenetic structuring in communities, particularly communities with many congeners, cannot be detected (12, 14). Second, the generation of a three-locus barcode and a barcode community phylogeny for the species hyper-diverse communities is entirely feasible (12, 13). Thus, the stage is set for DNA barcode community phylogenies to provide the next revolutionary tool in the emerging field of community phylogenetics. In the next two sections, the commonly used metrics of phylogenetic alpha and beta diversity is presented. 1.1. Phylogenetic Alpha Diversity and Dispersion
Central to the community phylogenetics research program is the quantification of the phylogenetic diversity within a community termed phylogenetic alpha diversity. One of the first phylogenetic alpha diversity metrics ever generated was “Faith’s Index” (1). Faith’s Index calculates the total branch lengths shared by the taxa in a community. A second commonly used metric is the mean pairwise phylogenetic distance (MPD) designed by Webb (6). This metric calculates the pairwise phylogenetic distance between all species in a community and then reports the mean value. This provides an overall picture of the “deep” or “basal” phylogenetic diversity in a community. It is calculated as follows: S
MPD = ∑ f i d ij , i =1
where S is the number of species in the community, f i is the relative abundance of species i, and d ij is the MPD between species i and all other species in the community. A third commonly used metric developed by Webb (6) is the mean nearest taxon distance (MNTD). This metric calculates the nearest phylogenetic neighbor between all species in a community and then reports the mean. This provides a “shallow” or “terminal” measure of phylogenetic diversity in a community. It is calculated as follows: S
MNTD = ∑ f i min d i . j , i =1
412
N.G. Swenson
where S is the number of species in the community, f i is the relative abundance of species i, and min d i . j is the nearest phylogenetic distance species i and all other species in the community. The majority of methods used to quantify the phylogenetic alpha diversity in communities are highly dependent upon the species alpha diversity. That is, there is often a strong correlation between phylogenetic and species alpha diversity. Therefore, it is difficult to determine whether the observed level of phylogenetic diversity is different from what would be expected at random given the species diversity. We cannot therefore determine the significance of the results with respect to mechanisms of community assembly. In order to determine the significance of the results, the researcher needs to utilize a null model. The concept of a null model is to hold constant all of the observed patterns except the one pattern in which you are interested. In this scenario, we are interested in the phylogenetic diversity of the community so we need to construct a null model that keeps the species diversity, species relative abundances, and species occupancy rates across communities constant. Not keeping these constant may result in inflated type I or type II statistical errors. A preferred null modeling method for determining if the observed phylogenetic diversity is higher or lower than expected is to randomize the names of the taxa across the tips of the phylogeny X times and calculate the phylogenetic alpha diversity with each random dataset. This provides a null distribution to which one can compare the observed phylogenetic alpha diversity. This null model does not randomize the community data. Therefore, all spatial patterns, abundance distributions, and species richness values are held constant. If the observed phylogenetic alpha diversity is higher than expected, then the community it is considered phylogenetically overdispersed. If it is lower than expected, it is considered phylogenetically underdispersed or clustered. 1.2. Phylogenetic Beta Diversity and Dispersion
Plant community ecologists have long been interested in the compositional dissimilarity of communities—termed beta diversity. Generally, compositional dissimilarity analyses have been conducted using lists of species in the communities. While this method has produced many important results and studies, ideally we would also like to know how evolutionarily dissimilar are the species in the communities being compared. Knowing such information would allow for stronger inferences regarding the ecological and evolutionary mechanisms that promote the observed distributions of plant species. Comparing the phylogenetic dissimilarity between two communities, or the phylogenetic beta diversity, is one rather new way of enhancing traditional species list based analyses of compositional
20
Phylogenetic Analyses of Ecological Communities Using DNA Barcode Data
413
dissimilarity (15). Here, I present two phylogenetic beta diversity metrics that are increasingly implemented in community ecology. The first metric is the pairwise phylogenetic dissimilarity (Dpw) between two communities where a pairwise phylogenetic distance between all species in one community and all species in another community: D pw =
∑
nk1 i =1
f i d ik2 + ∑ j =1 f j d jk1 nk2
, 2 where d ik2 is the MPD between species i in community k1 and all species in community k2, d jk1 is the MPD between species j in community k2 and all species in community k1 and f i and f j are the relative abundance of species i and species j. The second metric is a nearest phylogenetic neighbor dissimilarity (Dnn) between two communities: D nn =
∑
nk1 i =1
f i min d ik2 + ∑ j =1 f j min d jk1 nk2
2
,
where min d ik2 is the nearest phylogenetic neighbor to species i in community k1, min d jk1 is the nearest phylogenetic neighbor to species j in community k2 , and f i and f j are the relative abundance of species i and species j. Similar to phylogenetic alpha diversity metrics, phylogenetic beta diversity metrics may be correlated with the underlying species richness and species beta diversity. Thus, null model analyses can also be implemented in order to determine if the observed phylogenetic beta diversity is higher or lower than that expected given the observed species beta diversity. The null model used for this approach is identical to the one used for phylogenetic alpha diversity. In the following sections, the data required to quantify the phylogenetic alpha and beta diversity of communities are described. Then, how to calculate these metrics using the statistical software “R” is demonstrated.
2. Materials 2.1. DNA Barcode Community Phylogeny
1. Generate a DNA barcode community phylogeny following the methods described in Chapter 19. 2. Save the DNA barcode community phylogeny as a newick file. An example newick file lacking branch lengths is shown here: ((speciesA,speciesB),((speciesC,speciesD),speciesE));
2.2. Community Data
Organize community data into a three column tab delimited text file (.txt) where the first column is the name of the community,
414
N.G. Swenson
the second column is the abundance of a species in that community if it is present (i.e., no absences are represented), and the third column is the name of the taxa. If there are no abundance data, presence can be represented as a one in the abundance column (see Note 1). An example community data file for three communities is shown here: CommunityA
12
speciesA
CommunityA
4
speciesC
CommunityA
1
speciesE
CommunityB
8
speciesA
CommunityB
9
speciesB
CommunityC
5
speciesB
CommunityC
2
speciesC
CommunityC
19
speciesD
CommunityC
14
speciesE
3. Methods 3.1. Reading Phylogenetic and Community Data into R Software
1. Open R 2. Set working directory to the folder containing your phylogeny and community data files (see Note 2): setwd(“your.working.directory.path”) 3. Load the R package Picante (16) which will be used for the community phylogenetics calculations (see Note 3): library(picante) 4. Read in your newick phylogeny file: your.phylo.file = read.tree(“your.newick.tree.txt”) 5. Read in your community data file: your.community.file = readsample(“your.community.data.txt”)
3.2. Phylogenetic Alpha Diversity of Communities (see Notes 4 and 5)
1. Calculate Faith’s Index: faiths.index.output = pd(your.community.file, your.phylo.file) 2. Calculate the mean pairwise phylogenetic distance (MPD) for each community not weighting the result for abundance: mpd.output = mpd(your.community.file, cophenetic(your.phylo.file), abundance.weighted = FALSE)
20
Phylogenetic Analyses of Ecological Communities Using DNA Barcode Data
415
3. Calculate the mean nearest taxonomic distance (MNTD) for each community not weighting the result for abundance: mntd.output = mntd(your.community.file, cophenetic(your.phylo.file), abundance.weighted = FALSE) 3.3. Phylogenetic Alpha Dispersion of Communities
1. Quantify whether the observed Faith’s Index value is higher or lower than expected by generating a null model. The null model randomizes the names of taxa along the tips of the phylogeny and recalculates the Faith’s Index. The null distribution is then used to calculate a standardized effect size (SES) and a P value (see Note 6):
2. Quantify whether the observed MPD value is higher or lower than expected by generating a null model (see Notes 7 and 8):
3. Quantify whether the observed MNTD value is higher or lower than expected by generating a null model:
3.4. Phylogenetic Beta Diversity of Communities
1. Quantify the MPD (Dpw) between communities:
2. Quantify the mean nearest phylogenetic neighbor distance (Dnn) between communities:
3.5. Phylogenetic Beta Dispersion of Communities (see Notes 9 and 10)
1. Generate an empty three-dimensional array with the x and y dimensions equaling the number of communities and the z dimension equaling 999 randomizations plus one for the observed Dpw values: Dpw.nulls = array(NA, c(dim(as.matrix(Dpw.output)),1000))
416
N.G. Swenson
2. Assign the observed Dpw values to the first layer of the array: Dpw.nulls[,,1] = Dpw.output 3. Run a null model that randomizes the names of the species on the phylogeny 999 times. During each iteration, place the random Dpw values for your communities into an empty layer of the array: for(i in 2:1000){ random.phylo = tipShuffle(your.phylo.file); Dpw.nulls[,,i] = as.matrix(comdist(your.community.file, cophenetic(random.phylo), abundance.weighted = FALSE))} 4. Generate an empty matrix that will be propagated with P values indicating whether your observed Dpw value is higher or lower than that expected given the null distribution generated in step 3:
5. Calculate P values for Dpw metric: for(i in 1:dim(Dpw.pvalue)[1]){ for(j in 1:dim(Dpw.pvalue) [2]){ Dpw.pvalue[i,j] = (rank(Dpw.nulls[i,j,])[1])/1000 } } 6. Generate an empty three-dimensional array with the x and y dimensions equaling the number of communities and the z dimension equaling 999 randomizations plus one for the observed Dnn values: Dnn.nulls = array(NA, c(dim(as.matrix(Dnn.output)),1000)) 7. Assign the observed Dnn values to the first layer of the array: Dnn.nulls[,,1] = Dnn.output 8. Run a null model that randomizes the names of the species on the phylogeny 999 times. During each iteration, place the random Dnn values for your communities into an empty layer of the array: for(i in 2:1000){ random.phylo = tip.shuffle(your.phylo. file); Dnn.nulls[,,i] = as.matrix(comdistnt(your.community. file, cophenetic(random.phylo), abundance.weighted = FALSE))} 9. Generate an empty matrix that will be propogated with P values indicating whether your observed Dnn value is higher or lower than that expected given the null distribution generated in step 3:
10. Calculate P values for Dnn metric and store them in the Dnn. pvalue file: for(i in 1:dim(Dnn.pvalue)[1]){ for(j in 1:dim(Dnn.pvalue) [2]){Dnn.pvalue[i,j] = (rank(Dnn.nulls[i,j,])[1])/1000 } }
20
Phylogenetic Analyses of Ecological Communities Using DNA Barcode Data
417
4. Notes 1. Errors often occur due to file formatting issues. A common problem encountered is the delimitation of the community data file. Make sure that it is tab delimited, it has hard returns at the end of each line and no column names in the file. Also make sure that species names match those in the phylogeny and that no two communities or species have the same name. 2. An alternative easy way to set the working directory in R is to use the “MISC” drop down menu when using R on Macintosh operating systems and the “FILE” drop down menu when using Windows. 3. Help interpreting or implementing any R command can be accessed by typing a question mark followed by the command name. 4. The phylogenetic alpha diversity algorithms can take a fair amount of time to run if the null model is implemented. The phylogenetic beta diversity algorithms, on the other hand, are memory intensive due to the pairwise community distance matrices being stored and analyzed. The calculation of the observed results may take seconds to tens of minutes depending on the size of phylogeny and the number of communities analyzed. The null model will take 999 times longer. 5. The species richness of the communities is reported in the phylogenetic alpha diversity outputs under the “ntaxa” column. 6. The SESs in the phylogenetic alpha diversity output are in the columns with headers ending with a “.z” in which the P values are in the columns with headers ending with a “.p”. Remember the P value being reported represents a rank. Thus, when conducting a two-tailed test, a P value of 0.975 or higher is significant. 7. P values may be preferred over SES results. A SES of greater than 1.96 or lower than 1.96 may be considered significant only if the null distribution is normal. This is often not the case with community phylogenetics null model distributions. The P value provides the rank of the observed in the null distribution and therefore it is more directly interpretable. 8. The above code can be modified in several instances by changing the abundance.weighted = FALSE to abundance. weighted = TRUE, but this must be done for both the observed and null calculations. Abundance weighting can provide substantially different results depending on the evenness of the communities and it can often be a useful additional piece of information.
418
N.G. Swenson
9. Inferring the mechanisms governing community assembly from the above tests alone can be problematic for two reasons. First, species niches may or may not be phylogenetically conserved. Second, if some species traits are overdispersed in a community, while others are underdispersed this may give a random phylogenetic signal (17). Thus, analyses that marry the above phylogenetic analyses with analyses of phylogenetic signal in trait data and trait dispersion are particularly powerful and more likely to provide robust inferences. 10. Functional trait dendrograms that are often used in functional ecology take the same data structure as a phylogeny. They can therefore be used in the above code to provide measures of functional alpha and beta diversity. Thus, one can compare phylogenetic and functional alpha and beta diversities using the same exact metrics and statistical tools.
Acknowledgments I would like to thank John Kress and Dave Erickson for their collaboration and invitation to contribute to this volume. N.G.S. is supported by Michigan State University. References 1. Faith DP (1992) Conservation evaluation and phylogenetic diversity. Biol Conserv 61:1–10 2. Webb CO, Ackerly DD, McPeek MA, Donoghue MJ (2002) Phylogenies and community ecology. Annu Rev Ecol Syst 33:475–505 3. McGill BJ, Enquist BJ, Weiher E, Westoby M (2006) Rebuilding community ecology from functional traits. Trends Ecol Evol 21:178–185 4. Faith DP (1994) Genetic diversity and taxonomic priorities for conservation. Biol Conserv 68:69–74 5. Faith DP (2002) Quantifying biodiversity: a phylogenetic perspective. Conserv Biol 16:248–252 6. Webb CO (2000) Exploring the phylogenetic structure of ecological communities: an example for rain forest trees. Am Nat 156:145–155 7. Swenson NG, Enquist BJ, Pither J, Thompson J, Zimmerman JK (2006) The problem and promise of scale dependency in community phylogenetics. Ecology 87:2418–2424 8. Swenson NG, Enquist BJ, Thompson J, Zimmerman JK (2007) The influence of spatial and size scales on phylogenetic relatedness in
tropical forest communities. Ecology 88: 1770–1780 9. Elton J (1946) Competition and the structure of ecological communities. Animal Ecol 15:54–68 10. Jarvinen O (1982) Species-to-genus ratios in biogeography: a historical note. J Biogeogr 9:363–370 11. Webb CO, Donoghue MJ (2005) Phylomatic: tree assembly for applied phylogenetics. Mol Ecol Notes 5:181–183 12. Kress WJ, Erickson DL, Jones FA, Swenson NG, Perez R, Sanjur O, Bermingham E (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics pot in Panama. Proc Natl Acad Sci USA 106:18621–18626 13. Kress WJ, Erickson DL, Swenson NG, Thompson J, Uriarte M, Zimmerman JK (2010) Improvements in the application of DNA barcodes in building a community phylogeny for tropical trees in a Puerto Rican forest dynamics plot. PLoS One 5:e15409. doi:10.1371/journal.pone.0015409 14. Swenson NG (2009) Phylogenetic resolution and quantifying the phylogenetic diversity and dispersion of communities. PLoS One 4:e4390
20
Phylogenetic Analyses of Ecological Communities Using DNA Barcode Data
15. Graham CH, Fine PVA (2008) Phylogenetic beta diversity: linking ecological and evolutionary processes across space in time. Ecol Lett 11:1265–1277 16. Kembel SW, Ackerly DD, Blomberg SP et al (2010) Picante: R tools for integrating phylog-
419
enies and ecology. Bioinformatics 26: 1463–1464 17. Swenson NG, Enquist BJ (2009) Opposing assembly mechanisms in a Neotropical dry forest: implications for phylogenetic and functional community ecology. Ecology 90:2161–2170
Part V Case Studies Using DNA Barcodes
Chapter 21 FISH-BOL, A Case Study for DNA Barcodes Robert D. Ward Absract The FISH-BOL campaign was initiated in 2005, and currently has barcoded for the cytochrome c oxidase subunit I (COI) gene about 8,000 of the 31,000 fish species currently recognised. This includes the great majority of the world’s most important commercial species. Results thus far show that about 98% and 93% of marine and freshwater species, respectively, are barcode distinguishable. One important issue that needs to be more fully addressed in FISH-BOL concerns the initial misidentification of a small number of barcode reference specimens. This is unsurprising considering the large number of fish species, some of which are morphologically very similar and others as yet unrecognised, but constant vigilance and ongoing attention by the FISH-BOL community is required to eliminate such errors. Once the reference library has been established, barcoding enables the identification of unknown fishes at any life history stage or from their fragmentary remains. The many uses of the FISH-BOL barcode library include detecting consumer fraud, aiding fisheries management, improving ecological analyses including food web syntheses, and assisting with taxonomic revisions. Key words: COI, Cytochrome oxidase, Species identification, Fish, Elasmobranchii, Actinopterygii, DNA barcode
1. Introduction In 2005, a campaign—FISH-BOL—was launched to DNA barcode all the fish species of the planet (1). These currently number about 31,000. Approximately half are marine species, with an estimated 5,000 further marine species awaiting description (2). In total, there are likely to be around 40,000 extant fish species. Some 8,000 (August 2010) of the currently recognised fish species have been barcoded as part of this campaign, with an average of about seven specimens per species. Since the inception of FISHBOL, progress has been steady (Fig. 1). About 7,500 of the 30,000 known actinopterygiians have been barcoded, and about 500 of
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_21, © Springer Science+Business Media, LLC 2012
423
424
R.D. Ward
Fig. 1. Progress of FISH-BOL, showing numbers of species barcoded by date.
the 1,000 or so elasmobranchs. Numbers of fish species barcoded according to the taxonomy browser of the Barcoding of Life Database (BOLD, www.barcodinglife.org) are appreciably higher, at over 10,000 actinopterygiians and 1,000 elasmobranchs. The difference in FISH-BOL and BOLD tallies comes from two sources: (1) BOLD captures barcodes that are lodged in GenBank but not in BOLD itself and (2) BOLD tallies include as discrete species those specimens not yet scientifically named, such as Cynoglossus cf. arel, Cynoglossus sp. E or Cynoglossus sp. Individual barcoding studies of 50 or more species include marine Australian fishes (3), Australian sharks and rays (4), Canadian freshwater fishes (5), North American marine fishes (6), coral reef fish (7), central American freshwater fishes (8), Indian marine fishes (9) and Antarctic fishes (10).
21
FISH-BOL, A Case Study for DNA Barcodes
425
Most important commercial species have now been barcoded. For example, of the 60 principal fish species that constitute the bulk of capture production (FAO Capture Production 2008, see ftp.fao.org/fi/stat/summary/default.htm#capture), 56 have been barcoded (August 2010, mean sample size per species = 28.1) and the 57th is sampled and awaiting barcoding. However, the intent of FISH-BOL is to barcode at least five specimens of every fish species on the planet, and achieving that goal will clearly be difficult.
2. Materials 1. Any part of the fish may be used for extracting DNA. Probably white muscle is used most frequently, but liver or fin tissues are also commonly used. If larval fish are to be retained and vouchered, DNA maybe extracted from a single eyeball. Single fish eggs also yield suitable DNA. 2. Tissue storage in 95% alcohol is recommended. If this is not practical, for example during fieldwork and/or transport, DMSO may be used (see Note 1). Where possible, we also try to retain a tissue portion frozen at −80°C. 3. During fieldwork, and in laboratory processing and sample cataloguing, we try to maintain the following sequence: (a) collection, (b) preliminary identification and labelling, (c) tissue extraction for DNA barcoding (usually white muscle taken from under a scalpel-cut skin flap on the right side of the fish), (d) photography (of left side of fish), see Note 2, (e) storage of whole specimen for later museum vouchering and/or identification verification.
3. Methods 3.1. Fish DNA Barcoding Methodology
1. Fish barcoding is based on sequencing the standard 655 bp fragment of cytochrome c oxidase I. 2. A range of fish primers is provided in Table 1 (also see Notes 3 and 4). The Biodiversity Institute of Ontario (BIO) has standardised protocols for DNA extraction, PCR and sequencing (see Note 4). 3. It is recommended that for reference barcodes, both forward and reverse sequences are read and that the consensus sequence be posted on BOLD. For matching of unknown specimens against the reference library, sequencing of the unknown in a single direction is likely to be sufficient.
426
R.D. Ward
Table 1 PCR primers for fish DNA barcoding Name
5¢-3¢ Sequence
Primers without M13 tails FishF1 TCAACCAACCACAAAGACATTGGCAC FishF2 TCGACTAATCATAAAGATATCGGCAC FishR1 TAGACTTCTGGGTGGCCAAAGAATCA FishR2 ACTTCAGGGTGACCGAAGAATCAGAA Fish-BCH ACTTCYGGGTGRCCRAARAATCA Fish-BCL TCAACYAATCAYAAAGATATYGGCAC TelF1 TCGACTAATCAYAAAGAYATYGGCAC TelR1 ACTTCTGGGTGNCCAAARAATCARAA M13-tailed primers Fish cocktail C_FishF1t1-C_FishR1t1 (ratio 1:1:1:1) VF2_t1 TGTAAAACGACGGCCAGTCAACCAACCACAAAGACATTGGCAC FishF2_t1 TGTAAAACGACGGCCAGTCGACTAATCATAAAGATATCGGCAC FishR2_t1 CAGGAAACAGCTATGACACTTCAGGGTGACCGAAGAATCAGAA FR1d_t1 CAGGAAACAGCTATGACACCTCAGGGTGTCCGAARAAYCARAA Mammal C_VF1LFt1-C_VR1LRt1 (ratio 1:1:1:3:1:1:1:3) cocktail LepF1_t1 TGTAAAACGACGGCCAGTATTCAACCAATCATAAAGATATTGG VF1_t1 TGTAAAACGACGGCCAGTTCTCAACCAACCACAAAGACATTGG VF1d_t1 TGTAAAACGACGGCCAGTTCTCAACCAACCACAARGAYATYGG VF1i_t1 TGTAAAACGACGGCCAGTTCTCAACCAACCAIAAIGAIATIGG LepR1_t1 CAGGAAACAGCTATGACTAAACTTCTGGATGTCCAAAAAATCA VR1d_t1 CAGGAAACAGCTATGACTAGACTTCTGGGTGGCCRAARAAYCA VR1_t1 CAGGAAACAGCTATGACTAGACTTCTGGGTGGCCAAAGAATCA VR1i_t1 CAGGAAACAGCTATGACTAGACTTCTGGGTGICCIAAIAAICA Sequencing primers for M13-tailed PCR products M13F TGTAAAACGACGGCCAGT M13R CAGGAAACAGCTATGAC
Reference (3) (3) (3) (3) (68) (68) (10) (10) (69)
(69)
(70) (70)
4. For PCR recalcitrant samples, mini-barcodes of just 100–200 bases may suffice for identification purposes (11, 12). 5. Tissues from fish preserved in formalin have long been considered too refractory to consider sequencing (see Note 5). 6. All participants in fish barcoding are strongly urged to use the Barcode of Life Database (BOLD), www.barcodinglife.org, see ref. 13 as the repository for their data (see Note 6). 7. The process of developing a reference library of fish barcodes has highlighted a number of issues that have impeded the completion of this library and these must be considered. They include: a failure to PCR amplify and sequence some specimens (see Note 7), the possibility that mitochondrial DNA inserts into nuclear DNA (numts) are being sequenced rather
21
FISH-BOL, A Case Study for DNA Barcodes
427
than the true mtDNA COI gene (see Note 8), and the inability of COI to distinguish some species (see Note 9). 8. More details on fish barcoding methodologies are given elsewhere in this volume. 3.2. Gaps in Species Coverage
1. Some 8,000 (FISH-BOL) to 10,000 (BOLD plus GenBank) fish species have been barcoded, leaving some 20,000 or more species awaiting examination. The barcoded species are not distributed randomly, either taxonomically (Table 2) or regionally (Table 3). Approximately one-half of all elasmobranchs have been barcoded, but only one-quarter of the much more numerous actinopterygiians. A detailed analysis of species coverage by family, as of mid-2010, has been published (14). At least one species has been barcoded from about 90% of all families. Some quite large families are well represented, such as the shark family Carcharhinidae with 47 of 52 species barcoded (90.4%). However, only 381 of 2,770 species of the largest family, Cyprinidae, have been barcoded (13.8%). Most of the unrepresented families have few species, the largest such being the rice fishes of the family Adrianichthyidae with 29 as yet unbarcoded species. Regional coverage is similarly varied, ranging from about 30% of all fish species in the Australian and North American regions to only 10% of North East Asia fishes. 2. The FISH-BOL goal of a reference barcode for every living fish species is also one of the goals of iBOL (www.iBOL.org), but will not be easily attained. A major impediment is the lack of sufficient dedicated funding for collecting trips and subsequent taxonomic identification and vouchering—the sequencing protocols themselves are relatively inexpensive to implement. One way to use limited funds in an efficient manner is to target a
Table 2 Breakdown of FISH-BOL progress by taxonomic class Class
Species number Barcoded number % Progress
Actinopterygii
29933
7266
24
42
24
57
1114
549
49
Holocephali
46
27
59
Myxini
74
13
18
Sarcopterygii
11
2
18
Cephalaspidomorphi Elasmobranchii
428
R.D. Ward
Table 3 Breakdown of FISH-BOL progress by regional Working Group (defined by FOA regions as given) Working group
FAO regions
Africa
1, 34, 47, 51
Australia
Species number
Barcoded number
% Progress
8980
1247
14
6, 57, 71, 81
8623
2521
29
Europe
5, 27, 37
2028
396
20
India
4, 51, 57
11023
1997
18
Meso America
2, 31, 77
7677
1750
23
North America
2, 18, 21, 31, 67, 77
8112
2274
28
North East Asia
4, 5, 18, 61
10414
924
9
Oceania/Antarctica
8, 48, 58, 77, 81, 88
5702
1403
25
South America
3, 31, 41, 87
8981
1043
12
South East Asia
4, 57, 71
12140
2103
17
hyper-diverse region over a short period of time—a “barcoding blitz” (see Note 10). 3. Other cost-effective strategies for increasing coverage can also be devised. There are many museums around the world with barcode-friendly tissues (i.e., not formalin stored) from vouchered specimens that still await barcoding. Perhaps more effort needs to be made to attract specialists in particular taxa or geographic regions to the campaign. Greater use could be made of piggy-backing on existing research expeditions of the catches of commercial and artisanal fishers. It is notable that the sampling effort provided by vessels and researchers engaged in the International Polar Year (2007–2009) resulted in the high coverage of 74% of fish species for the Arctic and 50% for the Antarctic (14), although fish diversity in these regions is limited compared with the tropics. 4. The majority (about 73%) of fish species barcoded thus far are marine (14), and in future much more attention needs to be placed on the large freshwater faunas of South America, Africa and Asia. 5. Finally, a plea that all researchers with collections of fish barcodes deposit these barcodes in BOLD as soon as possible. 3.3. Identification of Reference Specimens
1. Perhaps the most significant issue that has arisen in the development of the fish barcode library concerns mislabelling or misidentification of some reference specimens. The former can
21
FISH-BOL, A Case Study for DNA Barcodes
429
arise from sample contamination or sample confusion, and most such cases are usually obvious after inspection of the sequence data and can be rectified. More important is the issue of specimen misidentification. There are many contributors to FISH-BOL and not all identifications are made by trained taxonomists. With more than 30,000 fish species, including many morphologically similar species complexes, such errors are not unexpected. This issue is compounded by the uncertain or incomplete taxonomies of many fish groups, and by a lack of knowledge of the true extent of the range of many species and likely degrees of endemism. Elimination of errors and reconciliation of Linnean names across reference specimens is imperative and needs further effort by the scientific community of FISH-BOL. This is a complex and demanding task, and one that is therefore expensive to implement fully, but it needs increased attention. 2. Putative barcode errors can now be flagged in BOLD, either by removing them to a problematic sample project or by being individually flagged but remaining in the original project. Flagged records are removed from BOLD’s identification engine. When verified, or corrected, they can be moved back into their original project or the flag removed. 3. Effort put into correct initial diagnosis of species saves confusion and time later. Where specimens can be reliably identified to a known species, that diagnosis should of course be made. Where there is uncertainty, this should be recognised. Specimens can be identified just to genus (e.g. Arius sp.) or perhaps use might be made of the prefix “cf” (e.g. Arius cf. venosus, meaning that the specimen appeared to be closest to Arius venosus but that the identification is provisional and it may in fact be another, perhaps unrecognised, species). Other similar notations can be used (see Note 11). 4. Retention of whole specimen vouchers, whenever possible, is highly desirable. Sometimes this might not be possible, for example where specimens are very large. In such situations, digital images should be retained, both of the whole fish and of any diagnostic characters. If the permanent vouchering of all barcoded specimens is not possible, it might sometimes be possible to retain temporary vouchers. These can be discarded when identifications and barcodes have been verified. 5. It is now recognised that the inclusion of a precision index to gauge levels of confidence in identifications is highly desirable. Since July 1993, specimens in the Australian National Fish Collection database at CSIRO have been identified to one of five levels of reliability according to the taxonomic expertise of the identifier (15). These were discussed at the inaugural FISH-BOL meeting in 2005 and published in the
430
R.D. Ward
Table 4 Reliability of identification: the system used by the CSIRO Australian National Fish Collection Identification scale Level 1: Highly reliable identification. Specimen identified by (a) an internationally recognised authority of the group, or (b) a specialist that is presently studying or has reviewed the group in the Australian region Level 2: Identification made with a high degree of confidence at all levels. Specimen identified by a trained identifier who had prior knowledge of the group in the Australian region or used available literature to identify the specimen Level 3: Identification made with high confidence to genus but less so to species. Specimen identified by (a) a trained identifier who was confident of its generic placement but did not substantiate its species identification using the literature or (b) a trained identifier who used the literature but still could not make a positive identification to species or (c) an untrained identifier who used most of the available literature to make the identification Level 4: Identification made with limited confidence. Specimen identified by (a) a trained identifier who was confident of its family placement but unsure of generic or species identification (no literature used apart from illustrations) or (b) an untrained identifier who had/used limited literature to make the identification Level 5: Identification superficial. Specimen identified by (a) trained identifier who is uncertain of the family placement of the species (cataloguing identification only), (b) and untrained identifier using, at best, figures in a guide, or (c) where the status and expertise of the identifier is unknown
workshop proceedings; they are summarised here in Table 4 (see Note 12). 6. Taxonomy is an ongoing endeavour (2). New fish species are continually described, and existing species may have their generic or species placements changed. Keeping track of these taxonomic revisions in databases such as BOLD is not a trivial matter and requires constant attention. 3.4. Uses of the Fish Reference DNA Barcode Library
1. Once a full barcode reference library is in place, identifying the great majority of unknown specimens (of any life history stage) or samples is straight-forward. Most commercial fish species are now represented in the BOLD database and have distinguishable barcodes. Exactly how sequences from unknown specimens are best matched against reference sequences is still a matter of debate (see Note 13). 2. Barcoding can be applied to ensure food safety and to protect against consumer fraud (see Note 14). 3. Processed samples, including cooked, grilled and deep fried fillets, can be successfully barcoded (16, 17). Mini-barcodes
21
FISH-BOL, A Case Study for DNA Barcodes
431
have been proposed for species discrimination of canned products (18), where the combination of high temperatures and pressure degrades DNA. 4. The reference library can be used to check or provide identifications for fisheries management purposes. Finned, headed or gutted specimens can be identified to ensure that quota regulations are not being breached (see Note 15). 5. Biological sciences will benefit from having accurate identifications of all fish specimens, from eggs to adults (see Note 16). 6. Prey items of sharks have been barcoded and identified (19, 20). 7. Environmental barcoding offers the hope of identifying species in bulk samples taken from, for example ichthyoplankton tows. These may be analysed using massively parallel sequencing platforms. 8. Barcoding can also be used to verify the identity of cell lines from fish species (21, 22). 9. Finally, barcoding is already making important contributions to the science of fish taxonomy (see Note 17). Whenever possible, it would be useful if describers of new species include a DNA barcode as part of that description (23, 24). 3.5. Structure of FISH-BOL
1. The fish barcode of life campaign (FISH-BOL) was initiated at a meeting at the University of Guelph in June 2005. This was organised shortly after initial results showed that species of fish could indeed be reliably discriminated by DNA barcoding using COI (3) and was attended by about 50 fisheries geneticists and taxonomists, together with sponsors and supporters of the DNA barcoding approach to specimen identification. The ultimate goal of FISH-BOL is to barcode all the fish species of the planet. 2. The meeting agreed that global coverage would be best facilitated by establishing ten regional working groups defined by FAO regions (Table 3). These groups would take responsibility for overseeing collections, identifications and barcoding of the fish faunas of their areas. The working groups are therefore based on geography rather than taxonomy. They are expected to raise the profile of fish barcoding in their region in various ways, including holding or participating in barcoding workshops and conferences. 3. The meeting agreed that BOLD should be used as the workbench for assembly of fish barcode sequences, and a linked Web site (www.fishbol.org) was established to further the aims of FISH-BOL. 4. FISH-BOL is administered through two co-chairs and a campaign coordinator (see Note 18). Each working group also has
432
R.D. Ward
a chair and usually a deputy chair. Members of FISH-BOL and their contact details are listed on the Web site by working group, although not all contributors to the campaign are listed. 5. FISH-BOL is an informal collaboration of fish taxonomists and geneticists. It has no dedicated funds although its establishment and some initial meetings were assisted by funds from the Consortium for the Barcode of Life (CBOL). 6. The authoritative list of fish species is the Catalogue of Fishes (25). FishBase (www.fishbase.org) uses this list and maintains a database of species with distributions by FAO region, country and habitat type. FishBase worked with FISH-BOL to provide a list of fish species for each FAO region and thus each FISHBOL working group. Barcoding progress of working groups can thereby be monitored (Table 3). 7. For each specimen, the standard data fields of the BOLD submission are completed wherever possible: identification (genus and species), identifier (with email and institution), sample number, voucher number, institution storing, sample donor (with email), collector, collection date, locality (with GPS coordinates), elevation/depth (in metres), sex and life stage. The FAO region of a barcoded fish is stated in the “Extra Info” field of BOLD, and soon identification reliability levels will also be trialled in this field (see Note 12). Additional information can be recorded in the “Notes” field. 8. An initial target of five specimens per species per FAO region was set. It was recognised from the start that this will often be insufficient to encompass all the COI barcodes of a species (see Note 19) and that varying degrees of genetic isolation between populations (especially likely for freshwater fishes (26)) might mean that barcodes collected in one population differ from barcodes collected in other populations. Wherever possible, therefore, replicate specimens should be collected from different populations. Larger sample sizes will often be desirable, especially if species are widespread or genetically heterogeneous. 9. COI divergences within a fish species are usually less than 2% (27). Intraspecies variability exceeding that level (and thereby falling into multiple bins in BOLD) might reflect the existence of undescribed cryptic species. Resolution of such instances will likely require the barcoding of additional specimens (preferably at least five per putative species) and detailed taxonomic examination.
21
FISH-BOL, A Case Study for DNA Barcodes
433
4. Conclusions The FISH-BOL campaign has barcoded about 8,000 named fish species in the last 5 years, and it will clearly take several more years to approach its goal of barcoding all the world’s fishes. Yet the many uses of a validated and comprehensive fish barcode reference library in the diverse fields of food commerce and safety, taxonomy, and biological and ecological sciences suggest unequivocally that the goal is one worth attaining. FISH-BOL warmly thanks all those in the community who are already participating, and calls for the involvement of further scientists, collection managers and taxonomists to enable its campaign goal to be reached.
5. Notes 1. Large pieces of tissue should be cut into small pieces (<5–7 mm) to permit adequate fluid penetration. Note that mixing alcohol and DMSO has a detrimental effect on DNA quality (A. Borisenko, personal communication), so a sample originally stored in DMSO should not be subsequently moved into alcohol. 2. Photographs of whole specimens and, if useful, close-ups of diagnostic features, may be readily taken with a digital camera. Photographs should be taken as soon as possible after collection and before colours have faded. Small- to medium-sized fishes (size range of about 1–25 cm) can be imaged quickly and inexpensively using a standard flat-bed photo scanner (28). 3. The combination of forward primer FishF1 and reverse primer FishR1 works well for a wide range of species, although for elasmobranchs the FishF2/FishR2 combination often works better (Bronwyn Holmes, personal communication). TelF1 and FishR1 also work well (Agnes Dettai, personal communication). 4. BIO uses fish primer cocktails (Table 1). Fish primer cocktails reduce the need to experiment with different primer combinations and increase efficiency for high-throughput applications. At BIO, DNA is usually extracted from muscle tissue using an automated glass fibre protocol (29), and PCR conditions use low primer and dNTP concentrations to negate the need for PCR clean-up before sequencing. The current BIO routine for fish uses a fish primer cocktail for first round amplifications, and a mammal cocktail (Table 1) for second round amplifications for those that fail to amplify with the fish cocktail. 5. There are several reports of the successful amplification of DNA from formalin-preserved tissues (30, 31), including a
434
R.D. Ward
semi-nested PCR protocol for amplifying a 400 bp fragment of COI from fishes preserved in formalin for up to 23 years (32). 6. BOLD facilitates the compilation and examination of all specimen and barcode data and provides a quick and easy connection for data transfer to GenBank. It also provides tools for initial characterisation and inspection of data, together with identification of unknown sequences. 7. An overview of the number of fish species that have thus far failed to barcode successfully has been published (14). This estimated a failure rate of 3% for those species that have been processed for two or more specimens. In other words, 97% of tested species have yielded at least one “BARCODE” compliant sequence (see BARCODE data standard at www.barcoding. si.edu/PDF/DWG_data_standards-Final.pdf ). There is no strong systematic pattern to those orders or families with failed species, although six orders (each with low species numbers) currently have failure rates between 10% and 33%. If it assumed that specimen degradation is not the cause of these failures and that primer hybridization problems are suspected, new primers can be designed and tested. Note that primers used for each specimen should be recorded on BOLD. However, as far as possible, “universal” primers are preferred as they simplify the amplification of unknown or query samples. 8. NUMTs are insertions of mitochondrial DNA sequences into the nuclear genome. Once such an insertion occurs, the inserted sequence is released from selection, becomes a pseudogene, and can rapidly mutate. It has been suggested that NUMTs could provoke misidentifications. Fortunately, at least in fishes, NUMTs appear to be very rare. Initial reports of NUMTS in the fugu Takifugu rubripes (33, 34) were rejected (35) and ascribed to artificial incorporation of mtDNA sequences into the consensus nuclear sequence. We have not observed any NUMTs in some 8,000 COI barcodes from about 2,000 species of Australian fishes. However, there have been recent reports of putative NUMTs in a few teleosts (36, 37). Most NUMTs are short (<200 bp (33)) and unlikely to amplify with the standard COI primers that target a 650 bp region. Further, relaxation from selection means that NUMTs are likely to contain radical amino acid changes and mutations to stop codons. BOLD has an automatic function that flags any sequences with stop codons and these sequences can then be re-inspected. In our experience, the few such flagged sequences have arisen from errors in sequence assembly rather than the true existence of stop codons. 9. Some pairs of fish species share COI haplotypes and cannot be distinguished with COI barcodes. However, about 98% and
21
FISH-BOL, A Case Study for DNA Barcodes
435
93% of marine and freshwater species, respectively, are separable (1, 5). Those that cannot be separated likely include instances of very recent radiation or hybridization (with or without introgression), or perhaps in some instances undue splitting by taxonomists. If the cause of the genetic identity is considered to reflect recent speciation, then a more rapidly evolving DNA sequence might be examined to see if this reveals divergence between the species. The control region of mtDNA might be a suitable candidate. The ND2 gene mtDNA shows deeper divergence between some closely related shark species than COI (38). However, there is as yet no consensus on what a second barcoding gene might be for animals. If shared haplotypes among species are thought to have arisen by hybridization or introgression, then further genetic resolution of taxonomic status is only attainable through nuclear DNA analysis of genes diagnostic for the two parental species. Allozyme markers are suitable for such investigations, if tissues are preserved in a suitable state (frozen or fresh), but for alcoholstored samples nDNA sequences will have to be screened. Hybridization has been recorded among about 1% of teleosts but not among chondrichthyans (39, 40). In general, hybridization is not sufficiently widespread to have much effect on the overall ability of barcoding to discriminate fish species. 10. A group of about a dozen (divers, taxonomists, photographers, geneticists) mounted a fish “barcoding blitz” at the Lizard Island Research Station on Australia’s Great Barrier Reef, and in less than 2 weeks had accumulated about 1,000 specimens of nearly 400 species. Such blitzes at other biodiverse sites around the world will likely yield similar results. 11. Recommendations for nomenclature in situations of taxonomic uncertainty are available (41). The “Extra info” field of BOLD can be used to give supplementary taxonomic information if required. 12. There is as yet no formal system in BOLD for recognition and recording of identification reliability levels, but such levels could be recorded (as, for example L1 to L5) in the current “Extra info” category of BOLD. This would enable them to be printed out in BOLD Taxon ID Trees by clicking on the “Extra info” option box. These identification levels are necessarily to some extent subjective, and cannot in themselves eliminate the possibility of errors. Nevertheless, adoption of this or a similar method in BOLD would assist in reconciling apparent conflicts. Ultimately, full resolution may well have to await re-inspection of vouchers by experts—an expensive and timeconsuming process. Sometimes it might be necessary to inspect holotypes to get accurate identifications of barcoded specimens. However, for some species, holotypes are missing or
436
R.D. Ward
degraded, so this is not possible. In an ideal world, all holotypes would themselves be barcoded, but many are very old and/or have been stored in formalin for varying amounts of time; obtaining full length barcodes from them would be time-consuming and expensive, but probably worthy of the investment. 13. BOLD deploys a phenetic approach to identification and aligns the query sequence to reference sequences using a Hidden Markov Model, delivering percent matches (and species designations) for the highest matches (13). Character-based methods have been trialled on tunas (42) and sharks (43). Other proposed methods include Bayesian, decision theory, artificial intelligence and support vector machine approaches (44–47). 14. 20–30% of fish samples from various markets and restaurants in North America and markets in Italy and Amazonia were found to be mislabelled (17, 48, 49). 15. Barcoding has revealed, for example widespread substitutions in shark products (50), and resulted in shark fins confiscated from illegal fishing activities being identified to species (51). 16. A survey of the spawning dynamics of the commercially important blue mackerel (Scomber australasicus) used barcoding to identify eggs collected in ichthyoplankton surveys (52). Identification of larval and juvenile stages has been made both specifically (Cubera snapper, Lutjanus cyanopterus (53)) and generally (marine fishes from Australia, Florida, Mexico and Pacific Society Islands (54–57)). 17. Many likely new species have been flagged by barcoding and await full taxonomic investigation (see, e.g. refs. 58–62). It has helped validate many new species (e.g. refs. 63–66) and assisted in taxonomic revisions (e.g. ref. 67). 18. The co-chairs are Paul Hebert and Bob Ward, and the campaign coordinator is Bob Hanner. 19. In practise, the existence of unsampled reference barcodes will often not diminish the ability of barcoding to correctly identify unknown specimens, as barcode variation within a species is generally much less than that between species.
Acknowledgements I thank William White and Peter Last (CSIRO Marine and Atmospheric Research) for discussions on aspects of this paper, and Bronwyn Holmes and Dan Gledhill (also of CSIRO) for comments on an early draft. I also thank Paul Hebert (co-chair of FISHBOL) and Bob Hanner (campaign coordinator of FISH-BOL) for
21
FISH-BOL, A Case Study for DNA Barcodes
437
their great work in helping to make FISH-BOL a success, and the many FISH-BOL participants who have collected, identified and barcoded many thousands of specimens. A special thanks goes to the team at the Canadian Centre for DNA Barcoding for barcoding the majority of fish specimens sequenced to date. References 1. Ward RD, Hanner R, Hebert PDN (2009) The campaign to DNA barcode all fishes, FISHBOL. J Fish Biol 74:329–356 2. Eschmeyer WN, Fricke R, Fong JD, Polack DA (2010) Marine fish diversity: history of knowledge and discovery (Pisces). Zootaxa 2525:19–50 3. Ward RD, Zemlak TS, Innes BH et al (2005) Barcoding Australia’s fish species. Phil Trans R Soc London B 360:1847–1857 4. Ward RD, Holmes BH, White WT, Last PR (2008) DNA barcoding Australasian chondrichthyans: results and potential uses in conservation. Mar Freshw Res 59:57–71 5. Hubert N, Hanner R, Holm E et al (2008) Identifying Canadian freshwater fishes through DNA barcodes. PLoS One 3:e2490 6. Steinke D, Zemlak TS, Boutillier JA, Hebert PDN (2009) DNA barcoding Pacific Canada’s fishes. Mar Biol 156:2641–2647 7. Steinke D, Zemlak TS, Hebert PDN (2009) Barcoding Nemo: DNA-based identifications for the ornamental fish trade. PLoS One 4: e6300 8. Valdez-Moreno M, Ivanova NV, Elias-Gutierrez M et al (2009) Probing diversity in freshwater fishes from Mexico and Guatemala with DNA barcodes. J Fish Biol 74:377–402 9. Lakra WS, Verma WS, Goswami M et al (2010) DNA barcoding Indian marine fishes. Mol Ecol Resour 11:60–71 10. Dettai A, Lautredou AC, Bonillo C et al (2011) The actinopterygian diversity of the CEAMARC cruises: barcoding and molecular taxonomy as a multi-level tool for new findings. Deep-Sea Res II 58(Suppl 1):250–263 11. Hajibabaei M, Smith MA, Janzen DH, Rodriguez JJ, Whitfield JB, Hebert PDN (2006) A minimalist barcode can identify a specimen whose DNA is degraded. Mol Ecol Notes 6:959–964 12. Meusnier I, Singer GAC, Landry J et al (2008) A universal DNA minibarcode for biodiversity analysis. BMC Genomics 9:214 13. Ratnasingham S, Hebert PDN (2007) BOLD: the Barcode of Life Data System (www. barcodinglife.org). Mol Ecol Notes 7:355–364
14. Becker S, Hanner R, Steinke D (2011) Five years FISH-BOL—a brief status report. Mitochondrial DNA 22(Suppl 1):3–9 15. Williams A, Last PR, Gomon MF, Paxton JR (1996) Species composition and checklist of the demersal ichthyofauna of the continental slope off Western Australia (20–35°S). Record West Aust Mus 18:135–155 16. Smith PJ, McVeagh SM, Steinke D (2008) DNA barcoding for the identification of smoked fish products. J Fish Biol 72:464–471 17. Wong EHK, Hanner R (2008) DNA barcoding detects market substitution in North American seafood. Food Res Int 41:828–837 18. Rasmussen RS, Morrissey MT, Hebert PDN (2009) DNA barcoding of commercially important salmon and trout species (Oncorhynchus and Salmo) from North America. J Agric Food Chem 57:8379–8385 19. Dunn MR, Szabo A, McVeagh MS, Smith PJ (2010) The diet of deepwater sharks and the benefits of using DNA identification of prey. Deep-Sea Res I 57:923–930 20. Barnett A, Redd KS, Frusher SD et al (2010) Non-lethal method to obtain stomach samples from a large marine predator and the use of DNA analysis to improve dietary information. J Exp Mar Biol Ecol 393:188–192 21. Cooper JK, Sykes G, King S et al (2007) Species identification in cell culture: a two-pronged molecular approach. In Vitro Cell Dev Biol Anim 43:344–351 22. Lakra WS, Swaminathan TR, Rathore G et al (2010) Development and characterization of three new diploid cell lines from Labeo rohita (Ham.). Biotechnol Prog 26:1008–1013 23. Victor BC (2007) Coryphopterus kuna, a new goby (Perciformes: Gobiidae: Gobinae) from the western Caribbean, with the identification of the late larval stage and an estimate of the pelagic larval duration. Zootaxa 1526:51–61 24. De Astarloa JMD, Mabragana E, Hanner R, Figueroa DE (2008) Morphological and molecular evidence for a new species of longnose skate (Rajiformes: Rajidae: Dipturus) from Argentinean waters based on DNA barcoding. Zootaxa 1921:35–46
438
R.D. Ward
25. Eschmeyer WN (2010) Catalog of fishes electronic version. http://research.calacademy. org/ichthyology/catalog/fishcatmain.asp . Accessed 25 Oct 2010 26. Ward RD, Woodwark M, Skibinski DOF (1994) A comparison of genetic diversity levels in marine, freshwater and anadromous fish. J Fish Biol 44:213–232 27. Ward RD (2009) DNA barcode divergence among species and genera of birds and fishes. Mol Ecol Resour 9:1077–1085 28. Steinke D, Hanner R, Hebert PDN (2009) Rapid high-quality imaging of fishes using a flat-bed scanner. Ichthyol Res 56:210–211 29. Ivanova NV, Dewaard JR, Hebert PDN (2006) An inexpensive automation-friendly protocol for recovering high-quality DNA. Mol Ecol Notes 6:998–1002 30. Klanten SO, van Herwerden L, Choat JH (2003) Acquiring reef fish DNA sequences from formalin-fixed museum specimens. Bull Mar Sci 73:771–776 31. Bucklin A, Allen LD (2004) MtDNA sequencing from zooplankton after long-term preservation in buffered formalin. Mol Phylogent Evol 30:879–882 32. Zhang J (2010) Exploiting formalin-preserved fish specimens for resources of DNA barcoding. Mol Ecol Resour 10:935–941 33. Richly E, Leister D (2004) NUMTs in sequenced eukaryotic genomes. Mol Biol Evol 21:1081–1084 34. Antunes A, Ramos MJ (2005) Discovery of a large number of previously unrecognized mitochondrial pseudogenes in fish genomes. Genomics 86:708–717 35. Venkatesh B, Dandona N, Brenner S (2006) Fugu genome does not contain mitochondrial pseudogenes. Genomics 87:307–310 36. Teletchea F, Laudet V, Hanni C (2006) Phylogeny of the Gadidae (sensu Svetovidov, 1948) based on their morphology and two mitochondrial genes. Mol Phylogent Evol 38:189–199 37. Knudsen SW, Moller PR, Gravlund P (2007) Phylogeny of the snailfishes (Teleostei: Liparidae) based on molecular and morphological data. Mol Phylogent Evol 44:649–666 38. Moore ABM, White WT, Ward RD et al (2011) Rediscovery and redescription of the smoothtooth blacktip shark Carcharhinus leiodon (Carcharhinidae) from Kuwait, with notes on its ecology and conservation. Mar Freshw Res 62:528–539 39. Scribner KT, Page KS, Bartron ML (2000) Hybridization in freshwater fishes: a review of case studies and cytonuclear methods of
40.
41. 42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
biological inference. Rev Fish Biol Fish 10: 293–323 Gardner JPA (1997) Hybridization in the sea. In: Blaxter JHS, Southward AJ (eds) Advances in marine biology, vol 31. Academic, New York, pp 2–78 Bengtson P (1988) Open nomenclature. Palaeontology 31:223–227 Lowenstein JH, Amato G, Kolokotronis SO (2009) The real maccoyii: identifying tuna sushi with DNA barcodes—contrasting character attributes and genetic distances. PLoS One 4:e7866 Wong EHK, Shivji MS, Hanner RH (2009) Identifying sharks with DNA barcodes: assessing the utility of a nucleotide diagnostic approach. Mol Ecol Resour 9(Suppl 1):243–256 Elias M, Hill RI, Willmott KR, Dasmahapatra KK, Brower AVZ, Mallet J, Jiggins CD (2007) Limited performance of DNA barcoding in a diverse community of tropical butterflies. Proc R Soc B Biol Sci 274:2881–2889 Abdo Z, Golding GB (2007) A step towards barcoding life: a model-based, decision-theoretic method to assign genes to pre-existing species groups. Syst Biol 56:44–56 Zhang AB, Sikes DS, Muster C, Li SQ (2008) Inferring species membership using DNA sequences with back propagation neural networks. Syst Biol 57:202–215 Seo T-K (2010) Classification of nucleotide sequences using support vector machines. J Mol Evol 71:250–267 Filonzi L, Chiesa S, Vaghi M, Marzano FN (2010) Molecular barcoding reveals mislabelling of commercial fish products in Italy. Food Res Int 43:1383–1388 Ardura A, Pola IG, Ginuino I, Gomes V, Garcia-Vasquez E (2010) Application of barcoding to Amazonian commercial fish labelling. Food Res Int 43:1549–1552 Barbuto M, Galimberti A, Ferri E, Labra M, Malandra R, Galli P, Casiraghi M (2010) DNA barcoding reveals fraudulent substitutions in shark seafood products: the Italian case of “palombo” (Mustelus spp.). Food Res Int 43:376–381 Holmes BH, Steinke D, Ward RD (2009) Identification of shark and ray fins using DNA barcoding. Fish Res 95:280–288 Neira FJ, Keane JP (2008) Ichthyoplanktonbased spawning dynamics of blue mackerel (Scomber australasicus) in south-eastern Australia: links to the East Australian Current. Fisheries Oceanogr 17:281–298 Victor BC, Hanner R, Shivji M, Hyde J, Caldow C (2009) Identification of the larval
21
54.
55.
56.
57.
58.
59.
60.
61.
62.
FISH-BOL, A Case Study for DNA Barcodes
and juvenile stages of the Cubera snapper, Lutjanus cyanopterus, using DNA barcoding. Zootaxa 2215:24–36 Pegg GG, Sinclair B, Briskey L, Aspden WJ (2006) MtDNA barcode identification of fish larvae in the southern Great Barrier Reef, Australia. Sci Mar 70(Suppl 2):7–12 Richardson DE, Vanwye JD, Exum AM et al (2007) High-throughput species identifications: from DNA isolation to bioinformatics. Mol Ecol Notes 7:199–207 Valdez-Moreno M, Vasquez-Yeomans L, EliasGutierrez M et al (2010) Using DNA barcodes to connect adults and early life stages of marine fishes from the Yucatan Peninsula, Mexico: potential in fisheries management. Mar Freshw Res 61:665–671 Hubert N, Deirieu-Trottin E, Irisson JO et al (2010) Identifying coral reef fish larvae through DNA barcoding: a test case with the families Acanthuridae and Holocentridae. Mol Phylogent Evol 55:1195–1203 Ward RD, Holmes BH, Yearsley GK (2008) DNA barcoding reveals a likely second species of Asian seabass (barramundi) (Lates calcarifer). J Fish Biol 72:458–463 Ward RD, Costa FO, Holmes BH, Steinke D (2008) DNA barcoding shared fish species from the North Atlantic and Australasia: minimal divergence for most taxa but a likely two species for both Zeus faber (John dory) and Lepidopus caudatus (silver scabbardfish). Aquat Biol 3:71–78 Lara A, de Leon JLP, Rodriguez R, Casnae D, Cote G, Bernatchez L, Garcia-Machado E (2010) DNA barcoding of Cuban freshwater fishes: evidence for cryptic species and taxonomic conflicts. Mol Ecol Resour 10: 421–430 Zemlak TS, Ward RD, Connell AD et al (2009) DNA barcoding reveals overlooked marine fishes. Mol Ecol Resour 9(Suppl 1):237–242 Sriwattanarothai N, Steinke D, Ruenwongsa P et al (2010) Molecular and morphological evi-
63.
64.
65.
66.
67.
68.
69.
70.
439
dence supports the species status of the Mahachai fighter Betta sp. Mahachai and reveals new species of Betta from Thailand. J Fish Biol 77:414–424 Ward RD, Holmes BH, Zemlak TS, Smith PJ (2007) DNA barcoding discriminates spurdogs of the genus Squalus. In: Last PR, White WT, Pogonoski JJ (eds) Descriptions of new dogfishes of the genus Squalus (Squaloidea: Squalidae). CSIRO Marine and Atmospheric Research Paper 014, Hobart, Australia, pp 117–130 Victor BC (2008) Redescription of Coryphopterus tortugae (Jordan) and a new allied species Coryphopterus bol (Perciformes: Gobiidae: Gobiinae) from the tropical western Atlantic Ocean. J Ocean Sci Found 1:1–19 Pyle RL, Earle JL, Greene BD (2008) Five new species of the damselfish genus Chromis (Perciformes: Labroidei: Pomacentridae) from deep coral reefs in the tropical western Pacific. Zootaxa 1671:3–31 Lin HC, Galland GR (2010) Molecular analysis if Acanthemblemaria macrospilus (Teleostei: Chaenopsidae) with description of a new species from the Gulf of California, Mexico. Zootaxa 2525:51–62 Ebert DA, White WT, Goldman KJ et al (2010) Resurrection and redescription of Squalus suckleyi (Girard, 1854) from the North Pacific, with comments on the Squalus acanthias subgroup (Squaliformes: Squalidae). Zootaxa 2612: 22–40 Baldwin CC, Mounts JH, Smith DG, Weight LA (2009) Genetic identification and color descriptions of early life-history stages of Belizean Phaeoptyx and Astrapogon (Teleostei: Apogonidae) with comments on identification of adult Phaeoptyx. Zootaxa 2008:1–22 Ivanova NV, Zemlak TS, Hanner R, Hebert PDN (2007) Universal primer cocktails for fish DNA barcoding. Mol Ecol Notes 7:544–548 Messing J (1983) New M13 vectors for cloning. Methods Enzymol 101:20–78
Chapter 22 Generating Plant DNA Barcodes for Trees in Long-Term Forest Dynamics Plots W. John Kress, Ida C. Lopez, and David L. Erickson Abstract Long-term forest dynamics plots, such as those maintained and coordinated by the Center for Tropical Forest Science and Smithsonian Institution Global Earth Observatories (CTFS/SIGEO), are a rich source of biological data that describe the demographics, ecology, and evolution of pristine and disturbed forest habitats across ecosystems. As molecular techniques for plant systematic and ecological studies, including DNA barcodes, have improved so have the methods for collecting tissue samples, generating DNA sequences, and managing genetic data. Tissue samples can be processed at the point of collection and stored in silica gel for extended periods of time or samples can be taken from historical museum collections with sufficient DNA yields for study. In this chapter, we provide a workflow that includes the tracking of data from field collection of tissue samples to the DNA barcode sequence laboratory to final analyses for forensic and phylogenetic investigations. Key words: DNA barcode, Phylogenetics, SIGEO, Community ecology, Forensics, Conservation
1. Introduction The Center for Tropical Forests Science (CTFS) and Smithsonian Institution Global Earth Observatories (SIGEO), with headquarters in the Smithsonian Tropical Research Institute (STRI), coordinates, as of 2012, a network of 46 large-scale (£52 ha) tropical forest dynamics plots with counterparts in 21 tropical countries (Fig. 1) (http://www.ctfs.si.edu). Each CTFS/SIGEO plot contains hundreds of thousands of mapped trees (³1 cm DBH-trunk diameter at breast height) that are censused at 5-year intervals for growth, recruitment, and mortality. The Barro Colorado Island (BCI) 50-ha plot in Panama, for example, contains over 300,000 individual stems and has been monitored over a 30-year period (1, 2). The 46 plots include
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_22, © Springer Science+Business Media, LLC 2012
441
442
W.J. Kress et al.
Fig. 1. Map indicating the location of current and planned CTFS/SIGEO forest dynamics plots.
over four million individual trees and over 8,500 tree species. CTFS/ SIGEO is expanding this monitoring program by adding large-scale plots in the temperate zone to quantify the response of trees and forest ecosystems to the Earth’s changing climate. SIGEO is a member of a consortium called The Group on Earth Observations (GEO) launched in 2002, which recently established a Global Earth Observation System of Systems (GEOSS; http://www.earthobservations.org; ref. 3) to provide access to data, services, analytical tools, and modeling capabilities for environmental decision making in response to biodiversity loss and climate change. In the field of community ecology, investigations at the CTFS/ SIGEO forest dynamics plots have focused on the factors responsible for assembling species within a specific ecological community (e.g., ref. 4). To understand the basis of these species assemblages, three core components are taken into consideration: species diversity, functional diversity, and evolutionary (or phylogenetic) diversity (5). A phylogenetic framework allows one to test hypotheses that co-occurring species are 1) more closely related than by chance (phylogenetic clustering), 2) more distantly related than by chance (phylogenetic over-dispersion), or 3) randomly distributed (5, 6). However, in most cases, estimates of the phylogenetic relationships among species in a community assemblage are the least resolved. Ideally, characters used to generate a phylogeny of a community, e.g., DNA sequence data, should be independent of the functional morphological characters under investigation. Unfortunately, most
22
Generating Plant DNA Barcodes for Trees in Long-Term Forest Dynamics Plots
443
community-based studies lack DNA sequence data for all of the taxa and rely on previously published evolutionary trees to infer community phylogenies (7). Because these phylogenetic trees usually lack species-specific data, such community phylogenies are usually only resolved to the generic or family levels. The use of sequence data from DNA barcode libraries improves the statistical power to reconstruct community phylogenies at the species-level and has been a significant advance in testing hypotheses regarding the assembly of these communities, the mechanisms of species coexistence, and the role of trait conservatism in determining the community structure (5, 8–13). A multilocus (rbcL, matK, and trnH-psbA) DNA barcode library for woody plant species present in forest dynamics plots is also an important tool for rapid and reliable taxonomic identification in ecological studies conducted in these plots. The DNA barcode sequence libraries can facilitate more diverse biological research from ecological forensics to new species discovery. As examples, investigations of plant–herbivore interactions as well as below-ground species distributions will be greatly enhanced with DNA barcodes. Investigations associated with DNA barcodes for these forest communities require that a significant amount of data generated in these studies be effectively tracked and stored. These ecological and evolutionary studies utilize diverse information on both the taxa and specimens sampled in the forest dynamics plots as well as the subsequent DNA barcode sequences, which must be readily available and accessible to researchers. In anticipation of a potentially vast increase of data generated by DNA barcode studies at CTFS/SIGEO plots, it is important that researchers take into consideration a number of key factors in the process of tissue sampling, database organization, generating DNA barcode sequences, and community analyses as detailed below. The workflow for generating DNA barcodes for forest dynamics plots entails five basic tasks (Fig. 2). 1.1. Making Field Voucher Specimens and Collecting Tissue Samples
To adequately document and verify DNA barcodes, multiple tissue samples of each species should be collected in the forest plot. As with all botanical specimens, explicit taxonomic, locality, and habitat data should be recorded for each collection.
1.2. Establishing Field Collection and Tissue Databases
A simple but complete database should be used to track sample information from the initial field collection to the DNA barcode sequence and finally to the ecological analyses. A standard sample template file is used for all accessions with mandatory and optional fields completed at each stage of processing. A Microsoft Excel format works well; separate worksheet tabs can be used for the different types of data, including field, laboratory, and DNA barcode library.
444
W.J. Kress et al.
Fig. 2. Workflow for collecting specimens and tissue samples, generating DNA barcodes, and applying these barcodes in analyses to understand the community ecology and evolution of trees in forest dynamics plots. Taxonomists may also obtain diagnostic herbarium material and DNA sequence data for use in the identification and description of new species. This workflow results in the establishment of a DNA barcode reference library for the forest plot (from CW Dick, WJ Kress (2009) Bioscience 59:745–755).
22
Generating Plant DNA Barcodes for Trees in Long-Term Forest Dynamics Plots
445
1.3. Maintaining Sample Repositories and DNA Barcode Databases
Consideration must be made for the storage of sample derivatives as well as data on those samples. A database for tracking short-term and long-term storage locations is necessary to determine: (1) how and where DNA samples are archived; (2) how PCR products, purified DNA, and/or tissue are retained; and (3) how DNA is stored for archival purposes and for immediate use. Answers to these questions are in part dependent on the resources available for a specific project.
1.4. Generating DNA Barcode Sequences
DNA sequencing follows standard Sanger procedures, but may be carried out in either low- or high-throughput fashion depending on the laboratory facilities available. High-throughput procedures, e.g., 96-well plate format, are much more economical and greatly bring down the cost of generating a multilocus barcode, but not all laboratories are equipped for this level of sequence production (14).
1.5. Analyses of DNA Barcode Data
The application of DNA barcodes to investigations in forest dynamics plots is in the early stages of development and only a few examples are available as models (see refs. 11, 15). DNA barcodes serve as both simple species identifiers for forensic ecological studies and as molecular markers for generating community phylogenetic trees. Other applications will certainly be developed in the future (see Chapter 23).
2. Materials Ecological investigations utilizing DNA barcodes require materials for making voucher specimens, archiving tissue samples and DNA extractions, as well as databases and analytic programs for managing and analyzing the sequence data. If possible, all specimen collections should be stored for use in future investigations and not destroyed at the completion of a given study. All materials used for this long-term storage should be archival, low- to acid-free and of durable quality. 2.1. Field Voucher Specimens
Standard plant presses with ventilators and tightening straps, and newsprint or unprinted paper for pressing voucher specimens (Fig. 3a). Plant pressing supplies are from Herbarium Supply Co. (http://www.herbariumsupply.com), Bozeman, MT, USA.
2.2. Field Tissue Samples
Low-acid coin envelopes, 2.25″ × 3.50″ (Fig. 3b), catalog number S-11485, for storing tissue are from Uline (http://www.uline. com), Pleasant Prairie, WI, USA.
446
W.J. Kress et al.
Fig. 3. Materials used for collecting voucher specimens, tissue samples, and DNA samples. (a) Plant presses are used to create the field voucher reference collections to be deposited in an herbarium, (b) coin envelopes are used to dry and store plant tissue collections, (c) indicating silica gel is used in the field to desiccate plant tissue, (d) Lock and Lock boxes which have a gasket on the lid for an air-tight seal can store many coin envelopes, (e) DNA extraction plate, with sealing caps, top and side views (arrow indicates notched corner for correct orientation of plate), and (f) 2D barcode tubes used to store DNA in freezers.
1. Indicating silica gel 3–5 mm beads (Fig. 3c), from Poly Lam Products, Corp. (http://www.polylam.com), Williamsville, NY, USA, is used to dry the tissue. Heavy duty, one gallon-size, recloseable bags, found in most grocery stores are used to store envelopes in silica gel while in the field. 2. Samples in coin envelopes are stored in air tight Lock and Lock boxes, 8.9″ × 11.5″ × 4.7″ (Fig. 3d) catalog number ZHPL836, from Heritage Mint, Ltd. (http://www.heritagemint.com), Scottsdale, AZ, USA. 2.3. DNA Extraction Samples
1. Tissue samples for the DNA extraction are placed into 1 ml round bottom, deep-well blocks (Fig. 3e) by Matrix catalog # 4212 (http://www.matrixtechcorp.com), Thermo Fisher Scientific, Hudson, New Hampshire, USA, 1-800-345-0206.
22
Generating Plant DNA Barcodes for Trees in Long-Term Forest Dynamics Plots
447
2. Block wells are covered with Collection Microtube Caps (120 × 8), 8-well strip caps (Fig. 3e), catalog # 19566, available from Qiagen (http://www.qiagen.com), Valencia, CA, USA. 3. Resulting DNA is stored in Matrix 0.5 ml 2D barcoded v-bottom tubes # 3735 (Fig. 3f), (http://www.matrixtechcorp.com). Store DNA at −80° C or in liquid nitrogen. 2.4. Databases
1. Excel spreadsheet application is available through Microsoft, http://office.microsoft.com/en-us/excel/. 2. Geneious, http://www.geneious.com/, is a suite of DNA sequence analysis applications developed by Biomatters, Ltd. It includes both a laboratory information management system (LIMS) and field collections information management system (FIMS) components as well as sequence analysis software and tools for collaborations.
2.5. Data Analyses
1. Initial assessment software includes: (a) Sequencher 4.8 from Gene codes Corporation (http://www. genecodes.com), 775 Technology Drive, Suite 100A, Ann Arbor, MI 48108, US is used to obtain DNA sequence assemblies from generated sequences. (b) transAlign (16) http://www.molekularesystematik.unioldenburg.de/33997.html is used to align DNA sequences by amino acid translations of protein-coding sequences. (c) MacClade (17) provides phylogenetic analysis tools and is available through Sinauer Associates, Inc. Publishers (http://www.sinauer.com). (d) MUSCLE (18) http://www.drive5.com/muscle is used for multiple alignments. (e) RAxML 7.0.0 (19) is software used for analysis of large phylogenetic trees, http://wwwkramer.in.tum.de/exelixis/ software.html. (f) Basic Local Alignment Search Tool (BLAST (20) http:// w w w. n c b i . n l m . n i h . g o v / b l a s t / B l a s t . c g i ? C M D = Web&PAGE_TYPE=BlastHome), is used for nucleotide searches and matching unidentified (forensic) samples to the DNA barcode library for the forest plot. 2. Reconstruction Phylogeny Software used for analysis includes: (a) PAUP* 4.0 (21), software for phylogenetic inference, available for purchase through Sinauer at http://paup.csit. fsu.edu/. (b) GARLI version 0.951 (22), free software that performs phylogenetic inference using maximum-likelihood criterion, https://www.nescent.org/wg_garli/Main_Page.
448
W.J. Kress et al.
(c) CIPRES, Cyberinfrastructure of Phylogenetic Research (http://www.phylo.org), a free web interface, implements a suite of phylogenetic analysis programs utilizing the San Diego, CA, Supercomputer Center.
3. Methods A standardized database should be established at the onset of the project to track field specimens from initial collection through all laboratory procedures. Once initial data are collected, data can be easily captured for use in other databases or worksheets as the scope of the project changes and expands. 3.1. Field Voucher Specimens
1. Ideally, herbarium voucher specimens with fertile (flowers and/or fruits) plant material should be made for each individual tree sampled for a DNA barcode in a forest dynamics plot. Unfortunately in practice, it is often very difficult to find all sampled trees in flower or fruit. As a substitute, a single herbarium voucher (with two duplicate specimens) (Fig. 4) may be collected to represent the multiple individuals of a species sampled in a particular plot with a notation indicating with the tree tag number which of the four or more individuals was selected for the herbarium voucher (see Note 1). These specimens are pressed and dried in preparation for mounting and storage in a herbarium. Appropriate data on collection locality and habitat are provided on a label attached to the specimen. Optimally, the voucher specimens collected for herbarium mounting should be in flower or in fruit; however, sterile specimens are acceptable if fertile material is not available. 2. The mounted specimens should be deposited in recognized herbaria for proper storage and reference. One specimen is deposited in the host country herbarium and the second specimen should be preferentially submitted for inclusion in the United States National Herbarium (US) at the Smithsonian Institution in Washington, DC, USA. 3. For those individual trees of a species that are not specifically vouchered by a herbarium specimen the DNA barcode sequence is eventually compared to the vouchered individual and used to confirm species identity. If the DNA barcodes do not match, then it is suggested that additional individuals be tested. 4. If four distinct individual trees cannot be sampled for a given species, tissue from herbarium specimens can be used to augment the sample size. Ensure that destructive sampling of herbarium specimens is allowed by the host herbarium before removing such tissue samples. Many herbaria have strict restrictions on such sampling of specimens.
22
Generating Plant DNA Barcodes for Trees in Long-Term Forest Dynamics Plots
449
Fig. 4. Schematic of plant collection activities in DNA barcoding of forest plots. From the same plot, a single tissue sample is collected from each of four individual trees of the same species. One of those individuals is sampled for material to create two plant voucher specimens: one specimen is deposited in the host herbarium, the second one is sent to the US National Herbarium. The voucher label indicates the sampled individual tree tag number. Tissue samples from historical herbarium vouchers are roughly 6 mm in diameter.
3.2. Tissue Samples
1. If possible, and highly desirable, four individuals per species should be sampled per forest plot. Tender, new-growth leaf tissue is preferable for drying in silica and for the DNA extraction. The four samples can receive the same collector number (corresponding to the herbarium voucher specimen) and should be numbered individually and distinctly as 1 of 4, 2 of 4, 3 of 4, 4 of 4, or a similar numbering system. 2. Leaf tissue is placed directly into the coin envelope which has been marked in pencil with the taxonomic name, collection date, collector name, and collector number. The envelopes are placed into resealable, thick gauge plastic bags containing a layer of about 5 cm. of silica gel. Replace silica gel as needed; leaf tissue should dry in about 72 h. 3. Once the plant tissue is dry, multiple envelopes can be placed into a Lock and Lock box containing silica gel to maintain a dry environment (Fig. 3d). Keep the boxes in a cool, dark cabinet for long-term archival storage. 4. In the event that leaf tissue is too thick/succulent, tissue can be placed directly into a small, air tight, resealable plastic bag containing silica gel. After the tissue is dry, it will need to be relocated into a coin envelope and stored as above.
450
W.J. Kress et al.
5. Once tissue is dry, a sample from each leaf is taken and placed in an individual deep-well of an extraction plate. A hand held paper punch provides a convenient sample. A 6 mm punch provides enough tissue to yield a sufficient amount of DNA. 6. Note the orientation of the extraction plate. The notched corner should always be on the upper left when working with the plate (Fig. 3e, see arrow). The well designation is determined by alphanumeric doce, the row (A–H) followed by the column (1–12). 7. Place eight samples at a time to complete one column then cover that column with a strip cap that covers the eight wells. Make sure the caps are level with the block surface. The caps provide a tight seal and prevent spillage and contamination. Care should be taken to keep samples of the same taxon far apart on the plate. In the event of contamination from one well to another, it can be easily noted on the sequence results. 3.3. Sequencing Samples 3.3.1. DNA Extraction, Amplification, and Sequencing
3.3.2. PCR and Sequencing
3.3.3. Sequence Editing and Alignment
In our lab, about 50 mg of dried tissue sample (6 mm diameter) is disrupted in a Tissuelyzer (Qiagen Cat. # 85210) after which tissues are incubated overnight at 55°C using a CTAB-based extraction buffer from AutoGen (Holliston, MA, USA). Following incubation the supernatant is removed and placed in a clean 2 ml 96-well plate for submission to an AutoGen 960 DNA extraction robot, see also Chapter 11. Completed DNA extractions are hydrated in 200 ml of 100 mM Tris–HCl (pH 8.0) and then transferred to Matrix barcode tubes and stored at −80°C. Working stocks of DNA are transferred to a microtiter plate, diluted 5× with water and then taken to the PCR laboratory. For more details, see Chapters 11 and 14. Three barcode loci are generated for each individual tree sample: the coding genes rbcLa and matK; and the noncoding intergenic spacer trnH-psbA (11). Routine PCR is used with no more than three attempts per sample to recover a PCR amplicon. The PCR cycling conditions are exactly the same for rbcLa and trnH-psbA [95°C 3 min (94°C 30 s, 55°C 30 s, 72°C 1 min) × 35cycles, 72°C 10 min] following procedures outlined in (ref. 11, also Chapter 11) with matK requiring lower annealing temperatures and more cycles [95°C 3 min (94°C 30 s, 49°C 30 s, 72°C 1 min) × 40 cycles, 72°C 10 min] and always includes DMSO at a final concentration of 5%. Primer pairs for each of the gene regions are listed in Table 1. Successful PCR reactions are purified using a 5× diluted mixture of ExoSap (USB). For sequencing, see Chapters 8 and 11. Sequencing of matK PCR products includes DMSO to a final concentration of 4% in the reaction mixture. 1. Recovered trace files for each of the three markers are imported into Sequencher 4.8, trimmed, and assembled into contigs.
22
Generating Plant DNA Barcodes for Trees in Long-Term Forest Dynamics Plots
451
Table 1 Primer pairs for barcode loci rbcLa, matK, and trnH-psbA Marker
Primer
Sequence (5¢ → 3¢)
Size
Referencea
rbcLa
SI_For SI_Rev
ATGTCACCACAAACAGAGACTAAAGC GTAAAATCAAGTCCACCRCG
554 bp
1
matK
KIM 3F KIM 1R 1329 320 5R XF
CGTACAGTACTTTTGTGTTTACGAG ACCCAGTCCATCTGGAAATCTTGGTTC TCTAGCACACGAAAGTCGAAGT CGATCTATTCATTCAATATTTC GTTCTAGCACAAGAAAGTCG TAATTTACGATCAATTCATTC
Avg 850 bp Avg ~880 bp Avg ~900 bp
2
psbA3¢f trnH
GTTATGCATGAACGTAATGCTC CGCGCATGGTGGATTCACAATCC
Avg ~450 bp
6 7
trnH-psbA
3 4 5
a
1. New primers developed at the Smithsonian; 2. Ki-Joong Kim, School of Life Sciences and Biotechnology, Korea University, Seoul, Korea, unpublished primers; 3. Cuenoud P, Savolainen V, Chatrou LV, Powell M, Grayer RJ, Chase MW (2002) Molecular phylogenetics of Caryophyllales based on nuclear 18S rDNA and plastid rbcL, atpB, and matK DNA sequences. Am J Bot 89:132–144; 4. http://www.kew.org/barcoding/ protocols.html; 5. Soltis DE, Tago-Nakazawa M, Xiang Q-Y, Kawano S, Murata J, Wakabayashi M, Hibsch-Jetter C (2001) Phylogenetic relationships and evolution in Chrysosplenium (Saxifragaceae) based on matK sequence data. Am J Bot 88:883–893; 6. Sang T, Crawford DJ, Stuessy TF (1997) Chloroplast DNA phylogeny, reticulate evolution, and biogeography of Paeonia (Paeoniaceae). Am J Bot 84:1120–1136; 7. Tate JA, Simpson BB (2003) Paraphyly of Tarasa (Malvaceae) and diverse origins of the polyploid species. Syst Bot 28:723–737
Each of the three markers is handled differently in alignment. The rbcLa marker is aligned in Sequencher 4.8. Alignment is always unambiguous due to the absence of indel variation and all rbcLa sequences are readily aligned with each other in a global alignment. The global rbcLa alignment is then exported from Sequencher as an aligned nexus file for analyses (see below). 2. For alignment of matK, sequences are exported individually (i.e., unaligned) in FASTA file format from Sequencher. transAlign (16) is used to perform alignment via back-translation. The matK sequences (with one per species as available) are aligned simultaneously with each other in this manner and saved as an aligned FASTA file. That aligned matK FASTA file is then concatenated onto the rbcLa alignment using MacClade (17) to produce a two-gene alignment for all taxa. 3. For trnH-psbA, contigs are exported from Sequencher as unaligned FASTA files. FASTA sequences are partitioned taxonomically or by genetic distance for alignment, primarily by family. In cases where only one species per family or order is present in the plot, the trnH-psbA sequence of that species is not included in the phylogenetic alignment. Each set of taxonomically structured sequences is then aligned using Muscle (18). See Chapter 19 for more details on DNA sequence editing and alignment.
452
W.J. Kress et al.
3.4. Field and Tissue Database
1. Voucher specimen information should include the following data (asterisk indicates critical data fields): *Accession number—a unique number assigned to the specimen by the herbarium or research plot. *Tree Tag number —if a tagged and mapped tree is collected, then the unique number of that individual must be recorded. Order—taxonomic rank above plant family rank. *Family—taxonomic rank of the specimen below order and above genus. *Genus—taxonomic rank of the specimen below family and above species. *Species—taxonomic rank of the specimen below genus. Below species rank—the rank below species name if known, i.e., subspecies, variety. Below species epithet—the name below the specific epithet. *Author—the authority who described the species. *Collector number—the primary collector’s or team leader’s tracking number assigned to a specimen. For multiple samples of the same species, only one collector number should be assigned. *Sample number—a unique number assigned to multiple collections of the same species, i.e., 1 of 4, 2 of 4, …, or 0104, 0204… *Collector name—the primary collector or team leader whose numbering system is used. Team collectors—other members of the collecting party. *Collection date—date the sample was collected; establish a consistent data entry format for dates. Species field code—a shortened taxonomic name specific for a species usually composed of four letters from the genus name and two letters of the species name. Herbarium accession number—a number assigned to the mounted voucher by the herbarium holding the collection. Herbarium acronym—code assigned to the host herbarium by Index Herbariorum published by the International Association for Plant Taxonomy. Other ID numbers—any other identification number assigned that can be used for additional specimen identification. Taxonomic determiner—name of person who made the taxonomic determination of the specimen. Date of determination—date that the taxonomic determination was made.
22
Generating Plant DNA Barcodes for Trees in Long-Term Forest Dynamics Plots
453
Determiner’s address—email or physical address of determiner. Common name—local name(s) for the specimen. Cultivated—is the plant cultivated or not? Cultivated name—the local name for the plant. *Habitat—a description of the type of environment where the specimen was collected. *Habit—a description of the general appearance of the plant. *Plant description—a description of the plant, inflorescence, flowers, and other parts that can aid in species identification. Taxonomic/specimen notes—any other information related to the specimen. *Database editor—name of person entering or editing data. *Entry date—date data is entered or changed. 2. Tissue sample information should include (asterisk indicates critical data fields): *Accession number—same as entered for the collection information. *Tree tag number—same as entered for the collection information. *Genus—same as entered for the collection information. *Species—same as entered for the collection information. *Collector—same as entered for the collection information. *Collector number—same as entered for the collection information. *Sample number—same as entered for the collection information. *Voucher specimen verification—is the tissue taken from a voucher, yes or no? If yes, record voucher number and pertinent information. *Plate number (in reference to multi-well plate)—DNA extraction plate number. *Row letter—the row letter printed on the DNA extraction plate where the sample is deposited. *Column number—the column number printed on the DNA extraction plate where the sample is deposited. *Type of tissue—is the tissue from a fresh source or from a herbarium specimen? Notes on the tissue—this field is for any other comments that may be important for other studies. *Database editor—name of person entering or editing data. *Entry date—date data is entered or changed.
454
W.J. Kress et al.
3. Geographic data should include (asterisk indicates critical data fields): *Accession number—same as entered for the collection information. *Tree tag number—same as entered for the collection information. *Family—same as entered for the collection information. *Genus—same as entered for the collection information. *Species—same as entered for the collection information. *Collector—same as entered for the collection information. *Collector number—same as entered for the collection information. *Sample number—same as entered for the collection information. *Country—provenance. *State/province—political subdivision of the region where the collection took place. *Locality—city or neighborhood where collection occurred. Elevation—altitude. Latitude—GPS coordinate at collection site. Longitude—GPS coordinate at collection site. Geographic notes—any additional geographic information related to the collection site. *Database editor—name of person entering or editing data. *Entry date—date data is entered or changed. 4. Tissue repository data (asterisk indicates critical data fields): *Accession number—same as entered for the collection information. *Tree tag number—same as entered for the collection information. Order—same as entered for the collection information. *Family—same as entered for the collection information. *Genus—same as entered for the collection information. *Species—same as entered for the collection information. Below species rank—same as entered for the collection information. Below species epithet—same as entered for the collection information. *Author—same as entered for the collection information.
22
Generating Plant DNA Barcodes for Trees in Long-Term Forest Dynamics Plots
455
*Collector number—same as entered for the collection information. For multiple samples of the same species, only one collector number should be assigned. *Sample number—same as entered for the collection information. *Collector name—same as entered for the collection information. *Collection date—date the sample was collected; establish a consistent data entry format for dates. *Plate number (in reference to multi-well plate)—storage plate number. *Row letter—the row letter printed on the storage plate where the sample is deposited. *Column number—the column number printed on the storage plate where the sample is deposited. Date of storage—date tissue was submitted for storage. *Database editor—name of person entering or editing data. *Entry date—date data is entered or changed. 3.5. DNA Barcode Analyses
1. Identification—BLAST Rates of sequence assignment to species. The rate at which each barcode marker is assigned to the correct species may be determined using BLAST (20). All recovered sequences at each of the three markers are formatted as both database and query; all query sequences are compared against the entire database library of sequences. For barcode sequences that vary within a species, all variant haplotypes are included within the database and queried in the BLAST searches. For species lacking intraspecific variation only a single individual is included. A sequence is counted as being correctly assigned when that species has the highest Bit-Score among all candidates; a sequence is not counted as correctly assigned when the correct species is either tied with another species, or receives a lower score. All barcodes are tested singly and in combination. 2. Community Phylogeny (see Note 2) Assembly of barcode sequence data into a supermatrix. Assembly of the different sets of aligned trnH-psbA sequences into a supermatrix is achieved by sequentially concatenating them with the rbcLa + matK alignment in a supermatrix format as described below using MacClade. The resulting matrix is very sparse, often with more than 90% of the matrix consisting of missing data or gaps. Gaps are not coded and are treated as missing data in phylogenetic reconstruction.
456
W.J. Kress et al.
Phylogenetic reconstruction. The community phylogeny is constructed using maximum likelihood (ML) and maximum parsimony (MP) algorithms. Three different marker combinations are normally examined for performance in phylogenetic reconstruction: rbcL + matK, rbcL + trnH-psbA, and rbcL + matK + trnHpsbA. For all combinations of markers, all species in the plot are included with some sequences obtained from GenBank and used in conjunction with the other barcode sequences. ML analyses are conducted using RAxML (19) via the CIPRES supercomputer cluster. The different locus combinations are partitioned for independent model assessment at each marker. For all combinations of markers, a single most likely tree is estimated in addition to running 200–250 bootstrap replicates depending on the marker set. The same gene combinations are used in MP using PAUP v.4.0 (21) and also run through a local cluster in which a modification of the parsimony ratchet (23; following 24) is implemented. This method usually results in a very large number of equally parsimonious trees for the two-locus combinations. The three gene matrix produces many fewer trees. For both ML and MP trees, a 50% majority consensus tree is constructed and used to quantify overall levels of support for each node within the trees, the rates of well-supported monophyly for taxonomic hierarchies (genus, family, and order) and concordance with expected topologies. Analyses can be conducted both with and without the use of a constraint tree, as described below. Application of constraint trees. Constraint trees can be built such that all taxa are present, but within each order, species topology is not resolved. Thus, the topology of the plant orders in the plot is specified in accordance with APG III (25), but within each order, species are arrayed as a polytomy. This approach allows the topology of the species within each order to be resolved with the barcode sequence data, while the ordinal backbone of the tree is defined apriori (26). Constraint trees are implemented in both PAUP and RAxML such that only trees that conformed to the APG III ordinal constraint tree are retained for analysis. 3.6. Notes
1. Tagged tree determinations are not always correct! Collecting samples from multiple individual trees allows the barcodes to serve as verification of conspecific identifications. Similarly, sequencing multiple genes is a good check on identifications and sample veracity. 2. The generation of community phylogenetic analyses is only in its infancy and new, improved methodologies are expected in the future.
22
Generating Plant DNA Barcodes for Trees in Long-Term Forest Dynamics Plots
457
Acknowledgments We thank Stuart Davies, Oris Sanjur, Eldredge Bermingham, and Nathan Swenson for help in developing the methodologies described here, and the Smithsonian Institution and CTFS provided funding support. References 1. Hubbell SP, Foster RB (1983) Diversity of canopy trees in a neotropical forest and implications for the conservation of tropical trees. In: Sutton SJ, Whitmore TC, Chadwick AC (eds) Tropical rain forest. Ecology and management. Blackwell Science, Oxford, pp 25–41 2. Condit R (1998) Tropical forest census plots: methods and results from Barro Colorado Island, Panama, and a comparison with other plots. Blackwell Scientific, New York 3. Scholes RJ, Mace GM, Turner W et al (2008) Toward a global biodiversity observing system. Science 321:1044–1045 4. Hubbell SP (2001) The unified neutral theory of biodiversity and biogeography. Princeton University Press, Princeton 5. Webb CO, Ackerly DD, McPeek MA, Donoghue MJ (2002) Phylogenies and community ecology. Annu Rev Ecol Syst 33: 475–505 6. Westoby M (2006) Phylogenetic ecology at world scale, a new fusion between ecology and evolution. Ecology 87:S163–S165 7. Webb CO, Donoghue MJ (2005) Phylomatic, tree assembly for applied phylogenetics. Mol Ecol Notes 5:181–183 8. Webb CO (2000) Exploring the phylogenetic structure of ecological communities. An example for rain forest trees. Am Nat 156:145–155 9. Cavender-Bares J, Kozak K, Fine P, Kembel S (2009) The merging of community ecology and phylogenetic biology. Ecol Letters 12:693–715 10. Cavender-Bares J, Ackerly DA, Baum D, Bazzaz FA (2004) Phylogenetic overdispersion in Floridian oak communities. Am Nat 163: 823–843 11. Kress WJ, Erickson DL, Jones FA et al (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proc Nat Acad Sci 106:18621–18626 12. Swenson NG, Enquist BJ, Thompson J, Zimmerman JK (2007) The influence of spatial and size scales on phylogenetic relatedness in tropical forest communities. Ecology 88: 1770–1780
13. Wright SJ, Ackerly DD, Bongers F et al (2007) Relationships among ecologically important dimensions of plant trait variation in seven neotropical forests. Ann Bot 99:1003–1015 14. Ivanova NV, Zemlak TS, Hanner RH, Hebert PDN (2007) Universal primer cocktails for fish DNA barcoding. Mol Ecol Notes 7:544–548 15. Gonzalez MA, Baraloto C, Engel J et al (2009) Identification of Amazonian trees with DNA Barcodes. PLoS One 4:e7483. doi:10.1371/ journal.pone.0007483 16. Bininda-Edmonds ORP (2005) transAlign, using amino acids to facilitate the multiple alignment of protein-coding DNA sequences. BMC Bioinformatics 6:156 17. Maddison DR, Maddison WP (2000) MacClade 4: analysis of phylogeny and character evolution, version 4.0. Sinauer Associates, Sunderland, MA 18. Edgar R (2004) MUSCLE, multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797 19. Stamatakis A, Hoover P, Rougemont J (2008) A rapid bootstrap algorithm for the RAxML web-servers. Syst Biol 75:758–771 20. Altschul SF, Madden TL, Schäffer AA et al (2007) Gapped BLAST and PSI-BLAST, a new generation of protein database search programs. Nucleic Acids Res 1:3389–3402 21. Swofford DL (2003) PAUP* phylogenetic analysis using parsimony (* and other methods), version 4. Sinauer, Sunderland, MA 22. Zwickl DJ (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. PhD Dissertation, The University of Texas at Austin 23. Nixon KC (1999) The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15:407–414 24. Carolan JC, Hook ILI, Chase MW et al (2006) Phylogenetics of Papaver and related genera based on DNA sequences from ITS nuclear ribosomal and plastid trnL intron and trnL-F
458
W.J. Kress et al.
intergenic spacers. Ann Bot 98:141–155. doi:10.1093/aob/mc1079 25. Apg III (2009) An update of the angiosperm phylogeny group classification for the orders and families of flowering plants, APG III. Bot J Linnean Soc 161:105–121
26. Kress WJ, Erickson DL, Swenson NG et al (2010) Improvements in the application of DNA barcodes in building a community phylogeny for tropical trees in a Puerto Rican forest dynamics plot. PLoS One 5:e15409. doi:10.1371/ journal.pone. 0015409
Chapter 23 Future Directions David L. Erickson and W. John Kress Abstract It is a risky task to attempt to predict the direction that DNA barcoding and its applications may take in the future. In a very short time, the endeavor of DNA barcoding has gone from being a tool to facilitate taxonomy in difficult to identify species, to an ambitious, global initiative that seeks to tackle such pertinent and challenging issues as quantifying global biodiversity, revolutionizing the forensic identifications of species, advancing the study of interactions among species, and promoting the reconstruction of evolutionary relationships within communities. The core element of DNA barcoding will always remain the same: the generation of a set of well-identified samples collected and genotyped at one or more genetic barcode markers and assembled into a properly curated database. But the application of this body of data will depend on the creativity and need of the research community in using a “gold standard” of annotated DNA sequence data at the species level. We foresee several areas where the application of DNA barcode data is likely to yield important evolutionary, ecological, and societal insights, and while far from exclusive, provide examples of how DNA barcode data will continue to empower scientists to address hypothesis-driven research. Three areas of immediate and obvious concern are (1) biodiversity inventories, (2) phylogenetic applications, and (3) species interactions. Key words: DNA barcode, Biodiversity, Phylogenetics, Ecology, Trophic, CO1, Next-generation sequencing
1. Biodiversity Inventories One of the original goals of DNA barcoding was to advance our ability to identify and quantify biodiversity (1–4). This goal is particularly relevant in a time when many hypothesize that anthropogenic changes to climate and land use may precipitate a wave of species extinctions (5, 6), mass changes in the distributions of species (7), and a widespread invasion of exotic species into new habitats (8, 9). The ability of DNA barcode data to quickly and relatively cheaply provide diagnostic identifications of species that are present in specific locations has immediate conservation and W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6_23, © Springer Science+Business Media, LLC 2012
459
460
D.L. Erickson and W.J. Kress
environmental management implications (10, 11). Governmental agencies, which have long employed the concept of “indicator” species, are now experimenting with employing DNA barcoding to screen for native and invasive species (12, 13). For example, the Environmental Protection Agency in the USA has used DNA barcoding to screen for the presence or absence of diagnostic invertebrates in stream habitats as an indicator of water quality (14). Similarly, DNA barcoding will aid the diagnosis of the mixture of species that may exist in complex natural community structures, such as coral reefs (15), soils (16, 17), and limnological (lake and stream) environments. These applications of DNA barcoding rely on the existence of a robust reference library that will allow DNA sequences recovered from these studies to be assigned to known taxonomic groups. It may also rely on the use of DNA minibarcodes, which may better recover DNA sequence data from mixtures of samples ((18), (chapter 15), this volume) but which require very careful design in order to minimize errors in the fraction of samples recovered (e.g., (19)). And of course the large barcode campaigns for targeted taxa, such as FishBOL (e.g., Ward, this volume; http://ibol.org/thecampaign-to-dna-barcode-all-fishes-fishbol/), birds via the All Birds Barcode Initiative (http://www.barcodingbirds.org/), and community-level campaigns, such as SIGEO (Kress et al. in this volume; http://www.sigeo.si.edu/), BIOCODE (http://www. mooreabiocode.org/), and the ACG lepidopteran project (20) are compiling large sets of taxa that contribute to biodiversity inventories. In both the taxonomic campaigns and the community studies the reference DNA barcode library is being assembled while at the same time these campaigns fuel the discovery of novel diversity. The role of DNA barcoding in new species discovery and the genetic characterization of previously recognized taxa are where the greatest promise and peril of DNA barcoding reside (21, 22). Estimates of global species diversity vary widely, although all agree that a large percentage of global biodiversity remains un-described, and DNA barcoding can help expedite the discovery and description of new species. Exactly how DNA barcode data are used to describe and quantify this unknown diversity will mark the long-term contribution of this methodology to science. The traditional taxonomic identification of new species is notoriously slow. Faced with the threat of widespread species extinctions, existing methods of describing new taxa are certainly inadequate. However, the assignment of species status to organisms based entirely on genetic divergence at a single gene so far has not been embraced by the broader taxonomic community (and probably never will be). Several substantial methodological issues are inherent in the concept of species defined by DNA distinction only, including the use of genetic distance thresholds, the
23
Future Directions
461
use of a single gene which has been shown to be readily transmitted via reticulation, and confounding genetic distance with phylogenetically informative characters. Just as importantly, the use of a DNA species concept will undercut the central theme of the biological species concept, which is that speciation is a process and by assembling the parts of that process (including morphological, geographic, behavioral, and genetic data) it is possible to understand what is a species and how it may arise. For DNA barcoding to fulfill its promise, a concerted effort must be made to ensure that novel genetic groups, which are delineated by DNA barcodes (e.g., BINs sensu BOLD; which are BOLD ID numbers that are assigned to sets of DNA sequences that are observed to cluster together when employing a clustering algorithm), are linked to morphological, reproductive, and geographic data, and perhaps most importantly, that the campaigns of DNA barcoding inspire a generation of taxonomists who will use DNA barcodes as a tool in species discovery alongside many of the more traditional methods of species identification. If we consider that DNA barcode records deposited in repositories like GenBank or BOLD need to be identified only to family level, it is not clear how useful those records will be. Those sequences submitted as BINs may form discrete groups in some clustering algorithms and may correspond to legitimate species. However, a BIN is not equivalent to a species, and our focus must be on the identification of biological diversity that corresponds to species (22, 23). Thus, the challenge of DNA barcoding is not to collect all possible BINs, but to meld the process by which DNA barcode sequences, and accompanying BIN categories, can be transformed into what are called species (24).
2. Phylogenetic Applications The various DNA barcode campaigns (e.g., FishBOL, TreeBOL, iBOL, etc.) differ dramatically from Tree of Life (ToL) projects despite some initial similarities. Many proponents of DNA barcoding draw a bright line between the goal of DNA barcoding and phylogenetics. Most ToL projects are concerned with the deepest phylogenetic relationships among taxonomic orders and families using many genes, and now entire genomes, but in general are not concerned with resolving relationships at the species-level. However, the fact that many large-scale DNA barcode campaigns are underway to collect genetic data from numerous species implies an application of these sequence data to phylogenetics (25). A conceptual tension will always exist between the minimalist goals of
462
D.L. Erickson and W.J. Kress
DNA barcoding, to use the smallest amount of data possible to facilitate species identifications, and phylogenetic analysis, to use numerous genes for robust reconstruction of complex evolutionary relationships. Yet the application of sequence data from coding genes when collected from properly identified species to phylogenetic investigations is inevitable. In some cases, barcode data may comprise the first phylogenetic analysis that compares closely related species, which in turn is supplemented with additional loci as needed. The rapid divergence of CO1 among taxa has made sequence alignment sometimes challenging. This challenge may require modifications like RY coding of third position sites, or even translation into amino acids for estimation of homology among distant clades, but the problem is tractable and will likely begin to be applied more broadly. The use of supertrees is another application in which trees produced from alignments among the most closely related species are combined making the need for CO1 to be aligned among the widest divergences unnecessary. The more robust, lower scale alignments then can easily be combined to reconstruct phylogenetic relationships among distant clades, especially in community phylogenetic analysis. In plants the situation for phylogenetic application of DNA barcode sequence data is easier because several loci are normally employed and the barcode markers include genes that are commonly employed in phylogenetic analysis in the first place. Assessing homology in the most rapidly evolving markers (e.g., the intergenic spacer trnH-psbA) is facilitated through nested alignments ((26), see chapter 19, this volume) that fix alignments among more closely related groups with these loci, then unite these groups via a more conservative barcode region (e.g., rbcL) that can be aligned across all samples. The supertree methods mentioned above for animals are routinely employed in plant community phylogenetics, with online tools for pruning and grafting existing trees into a phylogeny that contains a desired set of species. Often this phylogenetic tree can be used as a constraint to specify deeper relationships among clades while leveraging DNA barcode data to resolve species level relationships. The applications of these DNA barcode-based phylogenies are diverse. Studies are already routinely employed that use traditional phylogenies to test community scale processes, such as the role of competition and environmental filtering in determining community membership as well as phylogeographic relationships (27, 28). A bigger goal is to fold DNA barcode data into ToL-type studies. This approach may argue for a second gene region to be added to animal barcode studies. An easily sequenced gene that is readily aligned across all animal species would facilitate the type of nested sequence matrix employed in plants enabling phylogenies across more taxa than CO1 alone could generate. When the same gene regions are used in subtrees of a supertree-type analysis (29),
23
Future Directions
463
the reconstruction of a supertree is much more robust. As sequencing becomes less expensive, and the technology for aligning and analyzing very large numbers of taxa improves the global phylogenetic utility of DNA barcode data may be more thoroughly exploited, which will greatly empower ecologists to apply these sequence data to explicit hypothesis-driven questions in ecology and evolution (see chapter 20, this volume).
3. Species Interactions The effects of the interactions among species on the ecology and evolution of organisms in a community are large, but at the same time difficult to quantify (30). Whether the interactions are between parasite and host, plant and herbivore, or mutualists, the central determinant of investigating these interactions is the ability to accurately identify the species involved. DNA barcodes provide these determinations. A classic example where DNA barcoding can critically assist such investigations is plant–herbivore interactions. Furthermore, models of species formation, particularly in the hugely diverse group of herbivorous insects, often make implicit assumptions about the rate of specialization in the herbivore (e.g., insects in the tropics are more specialized allowing for increased speciation) (31–33). The ability to directly diagnose the diets of these herbivores, while not entirely novel, is vastly improved through the assembly of DNA barcode reference libraries of the food plants. In addition, the suggestion that a generalist species is in fact a single species that feeds on many plants rather than a set of cryptic specialist species can now be tested. A second example is the effects of parasitism on animal behavior. If we can accurately determine which animals are carrying which parasites, we can then explore environmental correlates that may suggest how certain behaviors are affecting the rates of parasitism. As above, DNA barcoding will provide accurate determinations of the interacting species. Such investigations of interacting species may be furthered through the use of next generation sequencing (NGS). NGS is a term loosely applied to the set of technologies used for genome-scale sequencing, like Roache 454, Illumina, Ion Torrent, and others. These technologies generate far more sequence reads than traditional Sanger sequencing and also allow for the pooling of many samples amplified with CO1 in which each sample represents a mixture of species. Machida and Knowlton (chapter 16, this volume) outline methods for employing 454 technology to examine species diversity in coral reef communities, which in many cases are so complex that the physical separation of component species for analyses is impossible. In a pair of related studies on animal diets in
464
D.L. Erickson and W.J. Kress
piscivorous fish (34) and herbivorous voles (18), NGS combined with DNA barcoding was used both to capture the entire set of species involved in the interaction and to quantify how much of each species was consumed. The methodology was far more accurate, and likely cost effective, than traditional methods which try to extract component species with unique primers or which attempt to massively clone PCR from conservative genes. The race to lower costs and improve the quality of NGS technologies will push studies of species interaction further into the realm of NGS (35) and better enable researchers to apply libraries of DNA barcodes to improve our understanding of not only what species occur in which environments, but also how such interactions contribute to community ecology. In conclusion, the applications of DNA barcodes to test hypotheses in evolutionary biology and ecology are in their infancy. Significant challenges remain in using these short diagnostic sequences to answer scientific questions, particularly in bridging the divide between DNA barcode BINs and species designations for the discovery and inventory of diversity. The rewards of meeting these challenges are immense, and the benefit to the entire community of scientists will continue to expand. References 1. Hebert PDNH, Cywinska A, Ball S, deWaard J (2003) Biological identifications through DNA barcodes. Proc R Soc Lond B Biol Sci 270: 313–321 2. Vernooy R, Haribabu E, Muller MR et al (2010) Barcoding life to conserve biological diversity: beyond the taxonomic imperative. PLoS Biol 8:e1000417 3. Radulovici AE, Archambault P, Dufresne F (2010) DNA barcodes for marine biodiversity: moving fast forward? Diversity 2:450–472 4. Dinca V, Zakharov EV, Hebert PD, Vila R (2010) Complete DNA barcode reference library for a country’s butterfly fauna reveals high performance for temperate Europe. Proc R Soc Lond B Biol Sci 278:347–355. doi:10.1098/rspb.2010.1089 5. McLaughlin JF, Hellmann JL, Boggs CL, Ehrlich PL (2002) Climate change hastens population extinctions. Proc Natl Acad Sci USA 99:6070–6074. doi:10.1073/pnas.052131199 6. Ezard THG, Aze T, Pearson PN et al (2011) Interplay between changing climate and species’ ecology drives macroevolutionary dynamics. Science 332:349–351 7. Kelly AE, Goulden ML (2008) Rapid shifts in plant distribution with recent climate change. Proc Natl Acad Sci USA 105:11823–11826
8. Dukes JS, Mooney HA (1999) Does global change increase the success of biological invaders? TREE 4:135–139 9. Simberloff D (2000) Global climate change and introduced species in United States forests. Sci Total Environ 262:253–261 10. Armstrong KF, Ball SL (2005) DNA barcodes for biosecurity: invasive species identification. Philos Trans R Soc Lond B Biol Sci 360: 1813–1823. doi:10.1098/rstb.2005.171 11. Dawnay N, Ogden R, McEwing R et al (2007) Validation of the barcoding gene COI for use in forensic genetic species identification. Forensic Sci Int 173:1–6 12. Pfrender ME, Hawkins CP, Bagley M et al (2010) Assessing macroinvertebrate biodiversity in freshwater ecosystems: advances and challenges in DNA-based approaches. Q Rev Biol 85:319–340 13. Stribling J (2006) Environmental protection using DNA barcodes or taxa? Bioscience 56: 878–879 14. Pilgrim EM, Jackson SA, Swenson S et al (2011) Incorporation of DNA barcoding into a largescale biomonitoring program: opportunities and pitfalls. J N Am Benthol Soc 30:217–231 15. Barber P, Boyce SL (2006) Estimating diversity of Indo-Pacific coral reef stomatopods through
23 DNA barcoding of stomatopod larvae. Proc R Soc Lond B Biol Sci 273:2053–2061 16. Kesanakurti PR, Fazekas AJ, Burgess KS (2011) Spatial patterns of plant diversity below-ground as revealed by DNA barcoding. Mol Ecol 20:1289–1302 17. Floyd R, Abebe E, Papert A, Blaxter M (2002) Molecular barcodes for soil nematode identification. Mol Ecol 11:839–850 18. Soininen EM, Valentini A, Coissac E et al (2009) Analysing diet of small herbivores: the efficiency of DNA barcoding coupled with high-throughput pyrosequencing for deciphering the composition of complex plant mixtures. Front Zool 6:16. doi:10.1186/1742-9994-6-16 19. Ficetola GF, Coissac E, Zundel S et al (2010) An in silico approach for the evaluation of DNA barcodes. BMC Genomics 11:434 20. Janzen DH, Hajibabaei M, Burns JM et al (2005) Wedding biodiversity inventory of a large and complex Lepidoptera fauna with DNA barcoding. Proc R Soc Lond B Biol Sci 360:1835–1845 21. Hebert PDN, Penton EH, Burns JM et al (2004) Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator. Proc Natl Acad Sci USA 101:14812–14817 22. DeSalle R, Egan MG, Siddall M (2005) The unholy trinity: taxonomy, species delimitation and DNA barcoding. Philos Trans R Soc Lond B Biol Sci 360:1905–1916. doi:10.1098/ rstb.2005.1722 23. Seberg O, Humphries CJ, Knapp S, Stevenson DW, Peterson G, Scharff N et al (2003) Shortcuts in systematics? A commentary on DNA-based taxonomy. TREE 18:63–65 24. Miller SE (2007) DNA barcoding and the renaissance of taxonomy. Proc Natl Acad Sci USA 104:4775–4776. doi:10.1073/pnas.0700466104
Future Directions
465
25. Chase MW, Fay MF (2009) Barcoding of plants and fungi. Science 325:682–683 26. Kress WJ, Erickson DL, Jones FA et al (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proc Natl Acad Sci USA 106:18621–18626 27. Schreeg LA, Erickson DL, Kress WJ, Swenson NG (2011) Phylogenetic analysis of local-scale tree soil associations in a lowland moist tropical forest. PLoS One 5:e13685. doi:10.1371/ journal.pone.0013685 28. Uriarte M, Swenson NG, Robin L, Chazdon RL et al (2011) Trait similarity, shared ancestry and the structure of neighbourhood interactions in a subtropical wet forest: implications for community assembly. Ecol Lett 13:1503–1514 29. Pisani D, Wilkinson M (2002) Matrix representation with parsimony, taxonomic congruence, and total evidence. Syst Biol 51:151–155 30. Thompson JN (1999) The evolution of species interactions. Science 284:2116–2118. doi:10.1126/science.284.5423.2116 31. Novotny V et al (2002) Low host specificity of herbivorous insects in a tropical forest. Nature 416:841–844 32. Novotny V, Drozd P, Miller SE et al (2007) Why are there so many species of herbivorous insects in tropical rainforests? Science 313: 1115–1118 33. Norton DA, Didham RK (2007) Comment on “Why are there so many species of herbivorous insects in tropical rainforests?”. Science 315:1666b 34. Leray M, Agudelo CN, Mills CM, Meyer CP (2011, submitted) Trophic interactions from COI fragment amplification of gut contents: methodological guidelines and case studies of two omnivorous reef fish species. PLoS ONE 35. Glenn TC (2011) Field guide to purchasing and using next-generation DNA sequencers. Mol Ecol Notes 11:759–769
INDEX A Actinopterygii...........................................338, 419, 420, 423 Agarose .................................20, 30, 31, 57, 60, 98, 100, 103, 112, 117–118, 123, 141, 159, 177, 194, 199, 226, 235, 236, 245, 309–311, 323, 324, 331, 332, 340, 344–346 Amino acid .................................. 61, 68, 144, 286, 337, 342, 391, 395, 400, 430, 443, 458 Amphibian ........................................................ 78–103, 337 Amplicon ........................................... 12, 60, 64, 65, 67, 138, 147, 268, 341, 351–357, 446 Animal................................................................ 3–7, 11–15, 17–45, 47, 48, 50, 51, 54, 61, 81, 91, 93, 95, 96, 101, 110, 111, 114, 122, 128, 129, 135, 154, 170, 174, 206, 214, 215, 218, 222, 266, 309, 312–315, 335, 381, 394, 397, 431, 458, 459 Anneal .............................................................30, 33, 44, 51, 58–60, 64–68, 81–90, 98, 99, 122, 167, 169, 193, 194, 203, 212, 213, 245, 321, 323, 324, 329–331, 341, 344, 348, 446 Assignment.......................................... 6, 100, 154, 184, 197, 206, 363–371, 380, 451, 456 ATBI .................................................................................14 Avian .......................128, 129, 131, 136, 138, 140, 141, 145
B Bacillariophyta ................................................................. 215 Barcode of Life Datasystem (BOLD) ........................ 15, 17, 21–24, 35–36, 39–42, 45, 50, 60, 68, 132, 133, 144–146, 149, 175, 176, 196, 197, 210, 213, 214, 216, 217, 219 Basic local alignment search tool (BLAST) ................. 6, 60, 67, 121, 125, 146, 196, 197, 287, 288, 305–306, 365, 370, 383, 392, 443, 451 Beta diversity ....................................407–409, 411, 413, 414 Betaine...............................................................................66 Bigdye terminator ..................................................... 93, 112, 118, 119, 131, 143, 159, 168–170, 178, 189, 194, 226, 237, 238, 246, 340, 344, 346, 347 Binding buffer (BB) ..................................19, 130, 157, 158, 187, 193, 200, 207, 211, 224, 229, 230, 317
Biodiversity.................................................. 4, 12, 13, 18, 52, 53, 110, 154, 206, 214, 244, 251, 261, 335, 338, 339, 361, 362, 376, 384, 405, 421, 438, 455–457 Bioinformatics ...................... 4, 154, 266, 342, 352, 366, 387 Biorepository ....................................................... 42, 53, 262 BioValidator ............................................................ 253–263 Birds...... ................................................ 79, 83, 91, 100, 103, 127–151, 335, 339, 456 BLAST. See Basic local alignment search tool (BLAST) Blood ...........................................................19, 93, 94, 100, 110, 111, 128, 129, 132, 133, 145, 146, 148, 160, 170, 171, 179, 309, 311–313 Bovine serum albumin (BSA)................................ 52, 58, 66 BSA. See Bovine serum albumin (BSA) Buccal-swab......................................................... 94, 96, 110 Bushmeat............................................................. 80, 99, 103
C Caecilians ..........................................................................79 Chlorobutanol ...................................................................94 Chlorophyta ............................................................ 215, 218 Chloroplast ...................................................... 377, 386, 447 Community ecology ................ 405, 406, 409, 438, 440, 460 Conservation .....................................................80, 154, 339, 342, 361, 405, 406, 439, 456, 458, 460 Contig ............................................... 37, 39, 43, 60, 67, 120, 121, 124, 125, 143, 144, 201, 239, 244, 267, 287, 292, 344, 400, 446, 447 Crocodylia ............................................................. 79, 89, 92 Cytochrome oxidase .............................. 6, 21, 28, 30, 34, 36, 40, 41, 44, 47, 48, 50, 62–65, 80–82, 84–92, 98–103, 125, 128–131, 135–142, 145, 154, 155, 164, 165, 167, 185, 206, 212–216, 218, 282, 295, 309, 325, 331, 335, 337–339, 341–343, 362, 394, 395, 400, 421, 423, 427, 428, 430, 431
D Databasing....................................................................... 265 Data processing ....................................60–61, 113, 120–121 D1/D2 ............................................................................. 185 Diatoms ................................................................... 205–219
W. John Kress and David L. Erickson (eds.), DNA Barcodes: Methods and Protocols, Methods in Molecular Biology, vol. 858, DOI 10.1007/978-1-61779-591-6, © Springer Science+Business Media, LLC 2012
467
DNA BARCODES 468 Index Dimethyl sulphoxide (DMSO) .................................. 51, 55, 66, 111, 114, 132, 175, 176, 188, 200, 233, 243, 321, 323, 421, 429, 446 DNA alignment ............................................................. 68, 443 amplification .............................................. 310, 320–325 archival ........................................................................14 databasing .................................................................. 383 editing..........................................................................35 high-throughput .............................................. 19, 26–27 quantification................................................... 52, 57, 66 sequencing ................................. 3–7, 35, 47–49, 54, 110, 196, 287, 309, 347, 382–384, 388, 395–398, 430, 431, 438–441, 443, 447, 456, 457 Dnazol®....................................................................... 51, 56
E Eggs......................................................4, 109, 421, 427, 432 Elasmobranchii................................................................ 423 Ethidium bromide (EtBr) .................................... 30–32, 60, 67, 98, 112, 117, 167, 177, 226, 236, 245, 324, 340, 346, 348 Eukaryotes ......................................................... 47, 207, 341 Evolution .........................................................3, 4, 7, 47–50, 70, 81, 132, 214, 337, 363–366, 370, 391, 392, 405, 408, 438–440, 447, 458–460 ExoSAP-It ........................................................67, 112, 118, 123, 324–325 Exuviae ..............................................................................93
F Field data..................................................255, 258, 269, 271 Field information management system (FIMS).......................53, 251–263, 267–271, 277, 279, 287, 291, 295–297, 303–305, 443 Fish..................................................................82, 83, 87, 91, 92, 109–125, 164, 167, 178, 419–433, 456, 457 Forensics ..........................................................103, 154, 155, 161, 164–165, 172, 173, 178, 179, 335, 338, 439, 441, 443 Formalin ....................................................12–14, 54, 55, 93, 94, 121, 252, 337, 338, 422, 424, 429–430, 432 Frogs... ......................................................79–85, 93, 95, 100 Fungi.. ....................................................... 5, 7, 47, 101, 102, 125, 155, 183–203, 206, 214, 215, 241, 244–246, 266, 308–310, 312–315, 327, 330, 337, 339, 379
G Genbank .......................................... 6, 21, 28, 67, 68, 80, 85, 90–92, 99, 125, 143, 146, 185, 196, 197, 251, 266, 267, 290–294, 304, 306, 341, 342, 363–365, 370, 382, 384, 386, 387, 420, 423, 430, 452, 457 Geneious Pro ..................................................... 67, 113, 120
Genes cytochrome oxidase (COI) ..............................21, 28, 30, 34, 36, 40, 41, 44, 47, 48, 50, 62–65, 80–82, 84–92, 98–103, 125, 128–131, 135–142, 145, 154, 155, 164, 165, 167, 185, 206, 212–216, 218, 282, 295, 309, 325, 331, 335, 337–339, 341–343, 394, 395, 400, 423, 427, 428, 430, 431 D1/D2 ....................................................................... 185 ITS ..........................................................4, 7, 65, 70–71, 185, 188, 196–203, 222, 225, 231, 234, 235, 237, 243–245, 386, 393–395, 399 large subunit of ribulose-l-5-bisphosphate carboxylase/oxygenase (rbcL) ..............4, 7, 212–216, 218, 219, 222, 225, 231, 234, 235, 237, 243, 244, 246, 393–395, 397–400, 439, 446, 447, 451, 452, 458 maturase K (matK) ............................ 4, 7, 214, 222, 231, 232, 234, 235, 237, 243, 244, 246, 362, 393–398, 400, 439, 446, 447, 451, 452 trnH-psbA.................................. 394, 395, 397, 399, 400 ribosomal ................................................... 185, 196, 202 Genetic lockdown..............................................................12 Genome chloroplast ................................................. 377, 386, 447 complete .......................................................... 85, 90–92 mitochondrial .....................................80, 85, 90–92, 138 Google Fusion Tables ..............................256–258, 267, 268, 270, 296–297
H Homology ..................................................69, 395, 396, 458
I Identification .......................................................4, 5, 18, 80, 95, 97, 101, 109, 125, 145, 146, 186, 205, 206, 214, 218, 227, 254, 287, 303, 304, 362, 363, 366–368, 370, 376, 426, 427, 432, 455, 459 Insertion-deletion (indel) ............................61, 68, 202, 342, 397, 399, 447 Internal transcribed spacer (ITS).............................. 4, 7, 65, 70–71, 185, 188, 196–203, 222, 225, 231, 234, 235, 237, 243–245, 386, 393–395, 399 Interspecific variation ................................................ 71, 146 Intragenomic ............................................................... 70, 71 Intraspecific variation ................... 48, 71, 131, 146, 428, 451 Intron ............................................. 61, 68, 70, 219, 338, 387 Invertebrates ........................... 14, 47–71, 133, 154, 176, 337 ITS. See Internal transcribed spacer (ITS)
L Laboratory information management systems (LIMSs) ..................................101, 196, 197, 253, 254, 258, 261, 263, 265–306, 392, 394, 397, 443 Larvae .............................4, 80, 100, 109, 113, 257, 421, 432
DNA BARCODES 469 Index Likelihood .................................. 60, 396, 399, 403, 443, 452 LIMSs. See Laboratory information management systems (LIMSs) Lissamphibia .....................................................................79 Lysis buffer ......................................................14, 19, 26, 43, 111, 114, 115, 130, 132, 148, 176, 191, 192, 200, 224, 229, 230, 242
M Macroalgae .............................................................. 205–219 MAFFT alignment .................... 61, 285, 366, 393, 395, 397 Mammalia ............................................... 153–155, 164–165 Meiofauna ............................................................. 48, 61, 70 Metadata ...................................................24, 210, 251–255, 262, 266, 267, 276, 277, 376, 377, 380, 384 Metazoa..................................................... 47, 48, 50, 56, 61, 64, 65, 69–71 Microfauna ........................................................................54 Mid-tag, Mitochondria ..........................................................6, 47, 50, 51, 55, 61, 65, 68, 69, 80, 81, 85, 90–93, 99, 103, 128, 129, 138, 145, 174, 206, 215, 219, 222, 309, 335, 401, 422, 430 Mitochondrial DNA (mtDNA) ................................. 47, 48, 50, 61, 63, 65, 69–71, 81, 129, 174, 175, 397, 422, 423, 430, 431 Mitogenome ......................................................................80 Molecular biodiversity ...................................... 4, 12, 13, 18, 52, 53, 110, 154, 206, 214, 244, 251, 261, 335, 338, 339, 361, 362, 376, 384, 405, 438, 455–457 diagnostics ................................................................. 154 marker ...................................... 48, 50, 61, 184, 206, 441 Morphospecies ...................................... 49, 54, 94, 379, 382, 384, 385, 387 mtDNA. See Mitochondrial DNA (mtDNA) Multiplexing .............................................352, 353, 355–356 MUSCLE alignment.......................... 61, 285, 342, 366, 443
N Neighbour joining ........................................................... 198 Neotropics ....................................................................... 128 Next generation sequencing (NGS) ................ 339, 459–460 Nucleotides ............................................................28, 37, 41, 52, 61, 68, 69, 112, 121, 136, 146, 158, 222, 224, 226, 244, 286, 309, 320, 321, 329, 337, 340, 341, 343, 352, 353, 356, 366, 384, 386, 387, 391, 394–396, 400, 443
P Palearctic ......................................................................... 128 Parsimony ................................................342, 343, 393, 395, 398–399, 402, 452
Pectoral muscle ................................................................ 131 Phaeophyceae .................................................................. 215 Phylogenetics.......................... 5, 70, 405–414, 447, 457, 458 Phylogeographic .....................................80, 82–85, 103, 458 Pipeline.............................................. 12, 129, 132, 154, 252, 254, 279, 304 Polymerase chain reaction (PCR) amplicon ...............................................67, 351–357, 446 amplification .................................. 15, 20, 52, 58–60, 66, 67, 92–93, 122, 130–131, 135–141, 188, 193–194, 201, 212, 218, 243, 246, 319, 321, 329, 332, 340, 344–346, 348, 351, 352, 422 optimization ................................................................13 purification ................................................118, 188, 324, 345, 354, 355 touchdown ........................................................... 98, 331 Preservation ...............................................13–15, 51, 54–56, 91–92, 94, 111, 114, 132, 155, 160, 161, 170, 172, 175, 176, 179, 186, 191, 199, 252, 307–332, 337, 338, 381–382 Primer bias............................................................................. 339 blocking ..................................................................... 166 cocktail ........................................ 81, 102, 103, 163, 165, 166, 171, 172, 175, 177, 429 degenerate..................................... 51, 102, 163, 165, 167 mini-barcode ..................................................... 341–342 PCR....................................................... 6, 15, 28, 35, 36, 142, 155, 178, 194, 320, 341, 343, 368, 387, 422 sequencing ........................................... 34, 36, 60, 84, 93, 113, 122, 123, 143, 165, 194, 292, 352, 353, 356, 422 universal ......................................... 48, 59, 61, 68, 70, 81, 99, 103, 138, 185, 218, 243, 309, 342, 344, 386, 430 Probability ........................................123, 279, 363, 366–371 Proteinase K ....................................................19, 26, 52, 54, 56, 92, 97, 111, 114, 115, 133, 135, 150, 156, 176, 207, 211, 307, 309, 312–315, 318, 327, 328 Protists............................................................7, 47, 205–219 Pseudogene ......................................................60, 61, 67, 68, 70, 129, 144, 145, 174, 222, 246, 397, 400, 430
Q Quality control ...................................13, 113, 120–121, 324
R Reptiles .......................................................79–103, 339, 341 Rhodophyta ..................................................................... 215 Ribosomal DNA 16S...............................................................................69 18S (SSU) ............................................................ 70, 447 28S (LSU) ............................................................. 65, 70
DNA BARCODES 470 Index S
T
Salamanders................................................79, 81, 83, 84, 93 Salt-extraction ...................................................................91 Sequencing high-throughput ...................................28, 142, 339, 341 NGS .................................................................. 459, 460 Sanger ........................................................112, 118–120, 131, 203, 214, 266, 336, 339–341, 346–347, 352, 441, 459 SIGEO. See Smithsonian Institution Global Earth Observatories (SIGEO) Silica... .................................................................56, 63, 100, 171, 200, 209, 210, 216, 217, 223, 224, 227, 228, 240–242, 308, 310, 317, 327, 338, 340, 343, 379–382, 442, 445 Smithsonian Institution Global Earth Observatories (SIGEO) ...................... 437–439, 456 Sodium hypochlorite ................................................... 93, 96 Speciation ............................. 48–50, 222, 385, 431, 457, 459 Species cryptic ..........................................................4, 49, 54, 81, 94, 95, 146, 153, 184, 335, 339, 380, 383, 385–387, 428, 459 discovery ............................................ 4, 80, 81, 132, 153, 154, 184, 206, 335, 374–388, 439, 456, 457, 460 endangered ................................. 4, 80, 93, 129, 146, 148 identification.................................... 4, 5, 81, 94, 95, 127, 128, 146, 148, 154, 191, 206, 215, 335, 338, 339, 342–343, 362–369, 380, 384, 426, 441, 449, 457, 458 taxonomy .......................................................4, 5, 18, 49, 81, 109, 110, 128, 153, 184, 197, 206, 335, 336, 342, 361–363, 369, 384, 385, 388, 391, 392, 402, 420, 426, 432, 447, 448, 456, 457 Sphenodontia ....................................................................79 Spores ........................................................96, 183, 184, 187, 191, 199, 200 Squamata ..........................................................79, 87, 89, 92 Submission .......................................... 11, 13, 22–24, 35, 36, 40, 42, 45, 133, 196, 251, 267, 290–294, 303, 304, 306, 397, 446
TAE. See Tris–acetate–EDTA (TAE) Taq polymerase ................................................20, 29, 52, 58, 59, 92, 93, 130, 141, 147, 166, 177, 201, 208, 212, 242, 243, 320, 323, 344, 345 TBE. See Tris–borate–EDTA Template........................................ 22, 24, 35, 41, 44, 58, 59, 66, 98, 103, 113, 116, 120, 138, 141, 143, 150, 166, 169, 176, 188, 193, 194, 202, 203, 212, 246, 253–255, 262, 263, 273, 309, 320–323, 329, 330, 332, 342, 344–348, 439 Testudines.........................................................79, 87, 88, 92 Thermal cycling...........................................59–60, 141–142, 158, 159, 166, 167, 174, 235 Tissue archival .............................................................. 179, 254 collection ............................ 127, 129, 131, 174, 175, 442 storage ....................................................... 111, 114, 421 subsampling ........................12, 14, 18, 25–26, 51, 54, 61, 155, 160, 161, 172, 223, 227, 228, 240, 242, 254 Toe pad .............................................133–135, 144, 149, 150 transAlign .................................................393, 395–397, 443 Transfer RNA (tRNA) ................................................ 69, 80 Trehalose ....................................... 20, 29, 33, 44, 52, 58, 66, 130, 131, 141–143, 147, 158, 159, 166, 169, 174, 177, 224–226, 232, 233, 238, 242, 243, 245, 309, 310, 319–321, 328, 347 Tris–acetate–EDTA (TAE) ........................20, 31, 32, 44, 67 Tris–borate–EDTA (TBE) ........................67, 112, 117, 226, 236, 310, 324, 340, 346 tRNA. See Transfer RNA (tRNA) Tropics ............................................................. 374–388, 437
V Voucher ................................................. 22, 23, 54, 129, 132, 148, 184, 209, 251, 290, 439, 441, 444–445, 448, 449 VPIN,
W Wash Buffer .......................................... 19, 27, 43, 130, 162, 187, 188, 193, 207, 211, 224, 229, 230, 242, 317