Methods
in
Molecular Biology™
Series Editor John M. Walker School of Life Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK
For other titles published in this series, go to www.springer.com/series/7651
Plant Epigenetics Methods and Protocols
Edited by
Igor Kovalchuk Department of Biological Sciences, University of Lethbridge, Lethbridge, Alberta, Canada
Franz J. Zemp Department of Biological Sciences, University of Lethbridge, Lethbridge, Alberta, Canada
Editors Igor Kovalchuk, MD, Ph.D. Department of Biological Sciences University of Lethbridge Lethbridge, Alberta Canada
[email protected]
Franz J. Zemp Department of Biological Sciences University of Lethbridge Lethbridge, Alberta Canada
[email protected]
ISSN 1064-3745 e-ISSN 1940-6029 ISBN 978-1-60761-645-0 e-ISBN 978-1-60761-646-7 DOI 10.1007/978-1-60761-646-7 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2010922364 © Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, c/o Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Humana Press is a part of Springer Science+Business Media (www.springer.com)
Preface The discovery of DNA as the genetic material brought great hope to scientists all over the world. It was believed that many of the lingering questions in genetics and the mechanisms of heredity would finally be answered. However, as often is the case in science, more questions arose out of this discovery. What defines a gene? What are the mechanisms of gene regulation? Further discovery and technological innovations brought about sequencing techniques that allowed the study of complete genomes from many organisms, including Arabidopsis and humans. Despite all the excitement surrounding these technologies, many features of the genome remained unclear. Peculiar characteristics in genome composition such as significant redundancy consisting of many repetitive elements and noncoding sequences, active transcriptional units with no protein product, and unusual sequences in promoter regions added to the mysteries of genetic make-up and gene regulation. Indeed, the more we discovered about the genome, the more difficult it became to understand the complexity of cellular function and regulation. Out of the study of the intricacies of the genome and gene regulation, arose a new science that was independent of actual DNA changes, but critical in maintaining gene regulation and genetic stability. Epigenetics, literally translated as “above genetics,” is the science that describes the mechanisms of heritable changes in gene regulation that does not involve modifications of DNA sequence. These changes may last through somatic cell division and, in some cases, throughout multiple generations. Epigenetics is perhaps one of the most popular and quickly evolving fields of modern science. Despite the fact that the ideas behind epigenetics had already been developing in the late nineteenth and early twentieth centuries, major advances have only occurred within the last 10–15 years as the mechanisms surrounding epigenetic regulation began to be uncovered. It was hoped by many that the mysteries of gene regulation and inheritance that remained unanswered would finally be elucidated with the help of this new science. Since, the understanding of the contribution of epigenetic regulation to cell function has helped scientists from many distinct fields of research such as molecular biology, population genetics, microbiology, ecology, developmental biology, and evolution. Gene silencing as an epigenetic mechanism to control gene expression was first described in plants. This occurred with the beginning of the era of plant transgenesis, and almost undermined the new paradigm of improvement of plant performance via transgenic techniques. Silencing was a serendipitous discovery, as this finding revitalized the field of epigenetics. Phenomena such as plant acclimation and adaptation to stress, hybrid and heterozygote vigor (heterosis), plant tolerance to viral infection, transgenerational changes in genome stability, paramutations, among others, are now considered excellent candidates for regulation via epigenetic mechanisms. Future studies involving various protocols for the analysis of methylation patterns, histone modifications, chromatin structure, and small RNA expression, the hallmarks of epigenetic regulation, will undoubtedly help to explain these phenomena. It will be exciting to discover how plants utilize these mechanisms to adapt to stress, and how we can manipulate these characters for the generation of better and hardier crops.
v
vi
Preface
In this book we have collected a variety of protocols for the study of the function of small noncoding RNAs, DNA methylation, and histone modifications in plants. Where possible and appropriate, we presented several protocols with different degrees of complexity. We also include protocols for plant transgenesis and the analysis of genome stability, with a discussion for their applications to epigenetic studies. It was our aim to put together a single manual that researchers in the field of plant epigenetics can turn to in hopes to answer the many yet undiscovered and unexplained phenomena in plant biology. Lethbridge, AB, Canada
Igor Kovalchuk Franz J. Zemp
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v ix
1 Analysis of DNA Methylation in Plants by Bisulfite Sequencing . . . . . . . . . . . . . . 1 Andrea M. Foerster and Ortrun Mittelsten Scheid 2 Analysis of Bisulfite Sequencing Data from Plant DNA Using CyMATE . . . . . . . . 13 Andrea M. Foerster, Jennifer Hetzl, Christoph Müllner, and Ortrun Mittelsten Scheid 3 Analysis of Locus-Specific Changes in Methylation Patterns Using a COBRA (Combined Bisulfite Restriction Analysis) Assay . . . . . . . . . . . . . 23 Alex Boyko and Igor Kovalchuk 4 Detection of Changes in Global Genome Methylation Using the Cytosine-Extension Assay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Alex Boyko and Igor Kovalchuk 5 In Situ Analysis of DNA Methylation in Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Palak Kathiria and Igor Kovalchuk 6 Analysis of Mutation/Rearrangement Frequencies and Methylation Patterns at a Given DNA Locus Using Restriction Fragment Length Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Alex Boyko and Igor Kovalchuk 7 Isoschizomers and Amplified Fragment Length Polymorphism for the Detection of Specific Cytosine Methylation Changes . . . . . . . . . . . . . . . . 63 Leonor Ruiz-García, Jose Antonio Cabezas, Nuria de María, and María-Teresa Cervera 8 Analysis of Small RNA Populations Using Hybridization to DNA Tiling Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Martine Boccara, Alexis Sarazin, Bernard Billoud, Agnes Bulski, Louise Chapell, David Baulcombe, and Vincent Colot 9 Northern Blotting Techniques for Small RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Todd Blevins 10 qRT-PCR of Small RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Erika Varkonyi-Gasic and Roger P. Hellens 11 Cloning New Small RNA Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Yuko Tagami, Naoko Inaba, and Yuichiro Watanabe 12 Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation and DNA Microarray Hybridization (ChIP-chip). Part A: ChIP-chip Molecular Methods . . . . . . . . . . . . . . . . . . . . . . 139 Julia J. Reimer and Franziska Turck
vii
viii
Contents
13 Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation and DNA Microarray Hybridization (ChIP-chip). Part B: ChIP-chip Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . Ulrike Göbel, Julia Reimer, and Franziska Turck 14 Metaanalysis of ChIP-chip Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julia Engelhorn and Franziska Turck 15 Chromatin Immunoprecipitation Protocol for Histone Modifications and Protein–DNA Binding Analyses in Arabidopsis . . . . . . . . . . . . . Stéphane Pien and Ueli Grossniklaus 16 cDNA Libraries for Virus-Induced Gene Silencing . . . . . . . . . . . . . . . . . . . . . . . . Andrea T. Todd, Enwu Liu, and Jonathan E. Page 17 Detection and Quantification of DNA Strand Breaks Using the ROPS (Random Oligonucleotide Primed Synthesis) Assay . . . . . . . . . . . . . . . Alex Boyko and Igor Kovalchuk 18 Reporter Gene-Based Recombination Lines for Studies of Genome Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Palak Kathiria and Igor Kovalchuk 19 Plant Transgenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alicja Ziemienowicz
161 185
209 221
237
243 253
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Contributors David Baulcombe • The Sainsbury Laboratory, John Innes Centre, Norwich, UK Bernard Billoud • Atelier de bioinformatique, Université Pierre et Marie Curie, Paris, France Todd Blevins • Pikaard Laboratory, Biology Department, Washington University, MO, USA Martine Boccara • Unité de Recherche en Génomique Végétale (URGV), INRA/CNRS/UEVE, Evry cedex, France Alex Boyko • Department of Biological Sciences, University of Lethbridge, Lethbridge, AB, Canada Agnes Bulski • CNRS UMR8186, Département de biologie, Ecole Normale Supérieure, Paris cedex, France Jose Antonio Cabezas • Departamento de Investigación Agroalimentaria, Instituto Madrileño de Investigación y Desarrollo Rural, Agrario y Alimentario, Alcalá de Henares, Spain María-Teresa Cervera • Departamento de Sistemas y Recursos Forestales, CIFOR, Madrid, Spain Louise Chapell • The Sainsbury Laboratory, John Innes Centre, Norwich, UK Vincent Colot • Unité de Recherche en Génomique Végétale (URGV), INRA/CNRS/UEVE, Evry cedex, France Nuria de María • Departamento de Sistemas y Recursos Forestales, CIFOR, Madrid, Spain Julia Engelhorn • Max Planck Institute for Plant Breeding Research, Köln, Germany Andrea M. Foerster • Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna, Austria Ulrike Göbel • Max Planck Institute for Plant Breeding Research, Köln, Germany Ueli Grossniklaus • Institute of Plant Biology & Zürich-Basel Plant Science Center, University of Zürich, Zürich, Switzerland Roger P. Hellens • HortResearch, Mt Albert Research Centre, Auckland, New Zealand Jennifer Hetzl • Institute for Computer Graphics and Algorithms, Vienna University of Technology, Vienna, Austria Naoko Inaba • Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan Palak Kathiria • Department of Biological Sciences, University of Lethbridge, Lethbridge, AB, Canada Igor Kovalchuk • Department of Biological Sciences, University of Lethbridge, Lethbridge, AB, Canada Enwu Liu • NRC Plant Biotechnology Institute, Saskatoon, SK, Canada
ix
x
Contributors
Ortrun Mittelsten Scheid • Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna, Austria Christoph Müllner • Institute for Computer Graphics and Algorithms, Vienna University of Technology, Vienna, Austria Jonathan E. Page • NRC Plant Biotechnology Institute, Saskatoon, SK, Canada Stéphane Pien • Institute of Plant Biology & Zürich-Basel Plant Science Center, University of Zürich, Zürich, Switzerland Julia J. Reimer • Max Planck Institute for Plant Breeding Research, Köln, Germany Leonor Ruiz-García • Departamento de Biotecnología y Protección de Cultivos, Instituto Murciano de Investigación y Desarrollo Agrario y Alimentario (IMIDA), Murcia, Spain Alexis Sarazin • CNRS UMR8186, Département de biologie, Ecole Normale Supérieure, Paris cedex 05, France Yuko Tagami • Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan Andrea T. Todd • NRC Plant Biotechnology Institute, Saskatoon, SK, Canada Franziska Turck • Max Planck Institute for Plant Breeding Research, Köln, Germany Erika Varkonyi-Gasic • HortResearch, Mt Albert Research Centre, Auckland, New Zealand Yuichiro Watanabe • Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan Alicja Ziemienowicz • Department of Biological Sciences, University of Lethbridge, Lethbridge, AB, Canada
Chapter 1 Analysis of DNA Methylation in Plants by Bisulfite Sequencing Andrea M. Foerster and Ortrun Mittelsten Scheid Abstract Methylation of cytosines is a very important epigenetic modification of genomic DNA in many different eukaryotes, and it is frequently involved in transcriptional regulation of genes. In plants, DNA methylation is regulated by a complex interplay between several methylating and demethylating enzymes. Analysis of the resulting cytosine methylation patterns with the highest resolution is achieved after sodium bisulfite treatment, deaminating nonmethylated cytosines to uracil. Subsequent PCR and sequence analysis of individual amplicons displays the degree, position, and sequence context of methylation of every cytosine residue in individual genomic sequences. We describe the application of bisulfite sequencing for the analysis of DNA methylation at defined individual sequences of plant genomic DNA. Key words: DNA methylation, 5-methylcytosine (5mC), Bisulfite sequencing, Bisulfite primer design, Bisulfite conversion control
1. Introduction Methylation at position 5 of cytosines is a major epigenetic modification in eukaryotes and the only known covalent change of plant genomic DNA itself. Alterations in DNA methylation are frequently involved in transcriptional gene regulation (1, 2). Therefore, there is a great interest in analyzing cytosine methylation levels and distribution within the genome. Methylated and unmethylated cytosines can be distinguished by bisulfite genomic sequencing at single-base resolution. Treating genomic DNA with sodium bisulfite converts unmethylated cytosine to uracil, while 5-methylcytosine remains unchanged (Fig. 1). Amplification by the polymerase chain reaction (PCR) of converted DNA followed by sequencing reveals positions of 5-methylcytosine in the sequence of interest.
Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_1, © Springer Science + Business Media, LLC 2010
1
2
Foerster and Mittelsten Scheid C G
mC
G
C G
genomic DNA fragments
Denaturation and sodium bisulfite treatment U
mC
U PCR (1st cycle)
U A
mC
G
U A
PCR (amplification) T A
C G
T A
Fig. 1. Bisulfite conversion. DNA is denatured and then treated with sodium bisulfite, causing deamination of unmethylated cytosine to uracil which is converted to thymine by PCR
This principle, first described by Frommer et al. (3) and Clark et al. (4), has since undergone several experimental simplifications and refinements and is widely applied to DNA from many different organisms. It is also applied for genome-wide analysis of DNA methylation (5–7). Here, we describe a simple and reliable protocol for DNA methylation detection by bisulfite sequencing of a specific target sequence. To succeed in generating meaningful data, complete conversion of unmethylated cytosines is the most important step. This is achieved by incubating genomic DNA in a high bisulfite concentration at high temperature and low pH. The conversion procedure and subsequent purification lead to DNA fragmentation and DNA loss, respectively, requiring a balance between conversion efficiency and DNA stability. Based on our experience and in the interest of reproducible experiments, we recommend commercially available kits for the conversion procedure, and here we will focus on crucial pre- and postconversion steps. Among these are the DNA preparation, conversion control, the design of bisulfite primers, and cloning of amplified sequences. Tools for data analysis and comments on interpretation are described in the chapter “Analysis of bisulfite sequencing data from plant DNA using CyMATE.”
2. Materials 2.1. Extraction and Pretreatment of Genomic DNA
1. Nucleon PhytoPure (Amersham Biosciences) (see Note 1). 2. RNase A, DNase- and protease-free (10 mg/ml, Fermentas) (see Note 2). 3. Restriction endonuclease and appropriate buffer (see Note 3).
Analysis of DNA Methylation in Plants by Bisulfite Sequencing
3
4. 3 M sodium acetate. 5. Ethanol – absolute and 70%. 2.2. Sodium Bisulfite Conversion and PCR Amplification
1. EpiTect Bisulfite Kit (Qiagen) (see Note 4). 2. Primer for conversion control (see Note 5). 3. Primer for the region under investigation (see Note 6). 4. TrueStart Taq DNA Polymerase (Fermentas) (see Note 7). 5. dNTP set – 100 mM aqueous solutions at pH 7.0 of each of dATP, dCTP, dGTP and dTTP (Fermentas or equivalent product). 6. QIAquick Gel Extraction Kit (Qiagen or equivalent product).
2.3. Cloning and Sequencing of PCR Products
1. pGEM-T Easy Vector System (Promega or equivalent product). 2. Competent E. coli (DH5a). 3. LB solution containing 50 mg/ml Ampicillin. 4. LB plates containing 50 mg/ml Ampicillin, 0.5 mM IPTG and 80 µg/ml X-Gal.
3. Methods 3.1. Extraction and Pretreatment of Genomic DNA
1. Extract genomic DNA from the plant material under investigation according to the manufacturers’ instructions. To achieve optimal bisulfite conversion rates, genomic DNA needs to be clean and intact (see Note 1) and should be from young and healthy tissues. As an example, 100 mg of 3 week old Arabidopsis seedlings give good and reproducible results. Additional RNase treatment is recommended (see Note 2) and can be applied during cell lysis (30 min at 37°C). After the preparation, resuspend genomic DNA in 50 µl sterile water, heat it to 55°C for 30 min with constant slight shaking (600 rpm), and keep it on ice until usage. Alternatively, the dissolved DNA can be kept at 4°C overnight. 2. Measure the DNA concentration photometrically and check DNA integrity by gel electrophoresis of a 1 ml aliquot. The DNA should appear as a single band. Digest 2 µg genomic DNA with an appropriate restriction enzyme (see Note 3) (5–10 U/µg genomic DNA in the recommended buffer) and incubate overnight. 3. Precipitate the digested DNA with 1/10 volume of 3 M sodium acetate and three volumes of absolute ethanol (−20°C, >2 h). Centrifuge and remove the supernatant, and wash the pellet with 500 µl 70% ethanol. Dry the pellet and resuspend
4
Foerster and Mittelsten Scheid
it in 20 µl sterile water. Keep the digested DNA at 4°C until usage but not more than a month. The sample is now ready for bisulfite conversion. 3.2. Sodium Bisulfite Conversion and PCR Amplification
1. One of the most important and critical issues for successful bisulfite sequencing is an accurate primer design. This is challenging because information about the degree of methylation, and thereby the expected sequence after conversion is an experimental question that is not available beforehand. To ensure unbiased results, cytosine residues at primer binding sites should be set to match degenerate bases in primers, but the number of degenerate positions should be kept small. Therefore, there are special constraints on the primers and their location on the DNA template. In addition, DNA strands need to be analyzed separately, since they are no longer complementary. With some experience, manual selection of bisulfite primer sets worked well for us, but there is a software primer design tool for bisulfite-converted plant genomic DNA (8), (for more details about bisulfite primer design, see Note 6). 2. As stated in the introduction (see Note 4), we recommend to apply a commercially available bisulfite sequencing kit to assure complete and reproducible conversion. Perform the procedure with the desired amount of DNA and according to the protocol supplied with the kit. We have improved the results by extending the conversion procedure for an extra 5 min at the denaturation step at 99°C, and by adding an additional 2 h conversion step at 60°C before a final hold step at 20°C. We use a PCR machine to control temperature and duration of denaturation and incubation times. 3. Check the completeness of conversion by PCR with primers matching a fully converted or a nonconverted site in a region known to be unmethylated (see Note 5 and Fig. 2).
Fig. 2. The expected pattern after conversion control PCR, using two different primer sets distinguishing between unconverted DNA (BScontrol1) and converted DNA (BScontrol2)
Analysis of DNA Methylation in Plants by Bisulfite Sequencing
5
Typical PCR conditions for conversion control primer sets are: Reaction set-up (see Note 7): Sterile water
11.3 µl
10× TrueStart Taq buffer
2.5 µl
dNTP mix, 2 mM each
2.5 µl
Forward primer, 10 µM
2.0 µl
Reverse primer, 10 µM
2.0 µl
MgCl2, 25 mM
1.5 µl
TrueStart Taq DNA Polymerase
0.2 µl
Converted DNA
3.0 µl
Total volume
25.0 µl
Thermal cycling conditions: 95°C
2 min
95°C
30 s
50°C
30 s
68°C
30 s
68°C
2 min
1 cycle
35 cycles
1 cycle
Optional: To determine the degree of conversion efficiency, the amplicon from the conversion control PCR (primer set: BScontrol2F and BScontrolR) can be sequenced, or cloned and sequenced (see below). Only samples with high conversion rates (>95% of all C converted to T) should be used for amplification of the experimental region. 4. If the control indicates the full conversion of DNA, start PCR of the target region with bisulfite primers. We recommend using Hot Start Taq polymerase with a good performance (see Note 7). Typical PCR conditions for the experimental target primer sets (see Note 8):
6
Foerster and Mittelsten Scheid
Reaction set up: Sterile water
11.3 µl
10× TrueStart Taq buffer
2.5 µl
dNTP mix, 2 mM each
2.5 µl
Forward primer, 10 µM
2.0 µl
Reverse primer, 10 µM
2.0 µl
MgCl2, 25 mM
1.5 µl
TrueStart Taq DNA Polymerase
0.2 µl
Converted DNA
3.0 µl
Total volume
25.0 µl
Thermal cycling conditions (see Note 8): 95°C
2 min
95°C
30 s
X°C
30 s
68°C
Y s
68°C
2 min
1 cycle
35–45 cycles
1 cycle
5. Analyze an aliquot of 5 µl PCR reaction by electrophoresis on an agarose gel. If the amplicon is only a single sharp band of the expected size, the PCR product can be directly used for cloning. If the expected product is visible and accompanied by some unspecific bands or smear, we recommend gel extraction of the specific amplicon using an appropriate column, e.g., from the QIAquick Gel Extraction Kit (Qiagen), prior to cloning. If the band is not or hardly visible, load the whole PCR reaction and perform gel extraction (see Note 9). 3.3. Cloning and Sequencing of PCR Products
1. If there is a sufficient amount of converted DNA at the start of the PCR, the amplicon represents different copies of genomic DNA. Therefore, comparison of several individual plasmid clones obtained via PCR allows a statistical analysis. It also reveals the variability of methylation patterns among genomic copies. Efficient cloning can be obtained using T/A cloning systems that include blue/white screening for recombinant plasmids (see Note 10). Add 3 µl of the purified PCR product in a 10 µl ligation mix, and use the pGEM-T Easy Vector System for optimal results. Ligation is performed at 4°C overnight. Five µl of the ligation mix are used to transform
Analysis of DNA Methylation in Plants by Bisulfite Sequencing
7
competent E. coli. The transformed bacteria are plated on LB-Amp/IPTG/X-Gal agar plates and incubated overnight at 37°C. 2. White colonies (15–20) are picked from these plates and dipped briefly into a PCR tube containing colony-PCR mix prior to inoculation in 2 ml LB-Amp and incubation at 37°C. Colony PCR mix using standard PCR solutions (see Note 11): Reaction set up: Sterile water
14.4 µl
10× standard PCR buffer (containing MgCl2)
2.0 µl
dNTP mix, 2.5 mM each
1.5 µl
Primer M13 forward, 10 µM
1.0 µl
Primer M13 reverse, 10 µM
1.0 µl
Taq DNA Polymerase (5 U/µl)
0.1 µl
Total volume
20.0 µl
Thermal cycling conditions: 95°C
2 min
95°C
30 s
60°C
30 s
72°C
30 s
72°C
2 min
1 cycle
25–30 cycles
1 cycle
3. Run the PCR products on agarose gels, identify those with the expected size, and prepare plasmid DNA from the corresponding liquid cultures. These plasmids can be controlled further by EcoRI digestion for the presence of the correct insert size or directly prepared for sequence analysis. We recommend analyzing at least ten independent clones per amplicon by sequencing. Sequencing is done by standard methods, priming with M13 forward or M13 reverse primers (see Note 12). 4. The sequence analysis of individual clones provides useful information about the degree of methylation in each clone and of each cytosine residue as well as the sequence context of methylated cytosine. The alignment of individual sequence files is not trivial due to different reading starts and sequence heterogeneity after conversion. A manual comparison of sequences can be facilitated by aligning the whole set of
8
Foerster and Mittelsten Scheid
sequence files and creating a blunt-ended multiple sequence alignment, excluding primer and vector sequences (see Note 13). There are several publicly available software tools supporting the analysis of bisulfite data using algorithms considering the plant-specific diversity of DNA methylation (8–10).
4. Notes 1. Bisulfite sequencing requires clean and high molecular weight genomic DNA. In our hands, DNA preparations using standard lab protocols, even if they are sufficient for routine PCR, have not been successful for good bisulfite conversion. However, the positive experience with PhytoPure does not exclude using other procedures from other suppliers that provide good quality genomic DNA. This is also true for all other recommended chemicals, enzymes and kits. 2. RNase A treatment should always be performed during or subsequent to genomic DNA extraction. If conversion problems occur, we recommend an additional Proteinase K treatment of genomic DNA upon extraction to eliminate possible protein contamination. 3. Cleavage of genomic DNA by restriction endonucleases is recommended prior to conversion to ensure the release of secondary structures and allow for full denaturation. This is important, as sodium bisulfite can only react with cytosine residues in single-stranded DNA. The enzyme should be chosen to yield sufficiently large fragments (a six-base recognition site) and not to be inhibited by DNA methylation. For example, suitable enzymes are BamHI, DraI or SspI, which all give an approximate fragment size of 4 kb. Care has to be taken not to cut in the sequence between bisulfite primers. 4. Most commercially available bisulfite conversion kits are applicable to a very small amount of DNA and guarantee reproducible conversion rates and DNA integrity. They support DNA denaturation and include DNA protection buffers to prevent DNA fragmentation due to depurination caused by harsh conversion conditions. They also simplify and improve tedious purification procedures after conversion, and speed up the analysis. The following bisulfite conversion kits, in alphabetical order of supplier and without any claim for completeness, function equally good: MethylDetector Bisulfite Modification Kit (Active Motif), Methyl SEQr Bisulfite Conversion Kit (Applied Biosystems), Methyl Easy Kit (Diagenode), MethylCode Bisulfite Conversion Kit (Invitrogen), EZ DNA Methylation Kit (Zymo Research).
Analysis of DNA Methylation in Plants by Bisulfite Sequencing
9
5. A decisive step for the analysis is the degree of conversion of unmethylated cytosines in a DNA sample. If unmethylated cytosines are not completely modified, all the following results are meaningless. There are different options to assure completeness of the reaction. A genomic DNA sample can be spiked with unmethylated DNA (3), e.g., with plasmid DNA amplified in an appropriate bacterial strain or by PCR. Alternatively, bisulfite-converted DNA is analyzed by two PCR reactions using primers that match either the fully converted or nonconverted sequence in the same C-rich region with a genomic target region known to be unmethylated (7, 9). We recommend the latter option, as this control is more accurate and closer to the conditions of the experimental region. As an example, for the analysis of Arabidopsis DNA, the following upstream primers specific for either nonconverted or converted DNA (At5g66750) have yielded good results (Fig. 2) in combination with the same downstream primer in a C-free region. BScontrol1F (nonconverted DNA): 5′CGTCTGGTGATTCACCCACTTCTGTTCTCAACG3′BScontrol2F (converted DNA): 5′-TGTTTGGTGATTTATTTATTTTTGTTTTTAATG3′BScontrolR (unbiased): 5′-CTCTCACTTTCTATCCCATTCTA-3′ 6. The upper limit of PCR products derived from bisulfitetreated DNA should be not more than 500 bp. We have obtained the best results with primer sets generating amplicons of 200–300 bp. Primers should contain degenerated nucleotides (R for A/G in one primer, Y for C/T in the other primer) to allow unbiased amplification of methylated and unmethylated DNA. Not more than two to three degene rated sites should be present in one primer (see below). Try to find regions relatively rich in Gs and poor in Cs on the strand you are interested in. Due to conversion, the top and bottom strand are no longer complementary, which may lead to strand-specific amplification. The 3′ primer complementary to the sequenced strand should contain degenerated nucleotides for A/G (R), as it should potentially bind to uracil converted from unmethylated cytosines and to unchanged methylated cytosines in the first round of amplification. The 5′ primer homologous to the sequenced strand should contain Y standing for C/T to anneal to the already amplified strand generated with the 3′ primer. Repeats of dinucleotides (e.g., ATATATAT) or primers with long runs (>4 b) of a single base should generally be avoided, as they can misprime. Similar to other standard PCR reactions, primer design should avoid regions of homology outside of the target. Therefore,
10
Foerster and Mittelsten Scheid
we recommend running a BLAST search with designed primers, if genome information for an organism is available. In addition, primers should be designed to avoid secondary structure and primer dimer formation. A primer length can be varied to adjust annealing temperatures, since primers from 21 to 28 nucleotides worked well in our hands. The melting temperature can range between 48 and 60°C, but primer pairs should have no more than 4°C difference in melting temperatures. We recommend running an initial gradient PCR with any new bisulfite primer set to find the most appropriate annealing temperature. If the amount of bisulfitetreated samples available allows using converted DNA for primer testing, we absolutely advise to do so. 7. The use of hot-start polymerase from this or other suppliers is recommended to avoid nonspecific primer amplification. 8. Reaction conditions like the annealing temperature X and the extension time Y need to be adjusted for each amplicon and primer set depending on the melting temperature, distance and base composition (see also Note 6). If one primer contains more degenerated sites than the other, we recommend an adjustment of primer concentration in the PCR setup, i.e., a higher amount of primer with the higher number of wobble bases. The equilibrium between molar ratios for each primer pair is very important, as the formation of hybrid PCR products can occur during PCR. An incomplete extension product is able to act as a highly efficient primer in a subsequent PCR cycle, resulting in the formation of a hybrid product containing information from the bottom and top strand (11). 9. We use 1.5% agarose gels, but their concentration should be adjusted to the size of the PCR product. In case of low efficiency of bisulfite PCR, it is advisable to load the entire PCR reaction and gel purifying a band. Column purification of gelextracted PCR products is very efficient. A sample should not be diluted too much; elution in 20–30 µl sterile water from the column works best. The elution step should be repeated with the first eluate. We do not recommend any nested PCR approaches, since they can cause some bias toward either methylated or unmethylated targets and may increase redundancy rather than enhancing sensitivity or specificity. 10. We compared blunt end cloning and T/A cloning systems. T/A cloning was much more efficient and gave more transformants with an expected insert. Although ligation should be unbiased with regard to direction, we observed a preferential insert orientation of the cloned fragment, which could be due to a particular sequence composition of bisulfite-converted DNA.
Analysis of DNA Methylation in Plants by Bisulfite Sequencing
11
11. Sequence for colony PCR primers: M13 forward
5′ GTA AAA CGA CGG CCA G 3′
M13 reverse
5′ CAG GAA ACA GCT ATG AC 3′
12. The amplified converted target sequence often results in repetitive and A/T rich sequences. Some sequencing systems may have problems to produce sequencing runs of sufficient length and quality. In our hands, the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) is very useful. 13. For significantly methylated genomic regions, patterns of individual clones are usually so diverse that clones with identical sequences indicate PCR-generated redundancy rather than identical genomic templates. It is advisable to exclude duplicates from quantitative analysis. References 1. Chan SW, Henderson IR, Jacobsen SE (2005) Gardening the genome: DNA methylation in Arabidopsis thaliana. Nat Rev Genet 6(5): 351–360 2. Zilberman D, Gehring M, Tran RK, Ballinger T, Henikoff S (2007) Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription. Nat Genet 39(1):61–69 3. Frommer M, McDonald LE, Millar DS, Collis CM, Watt F, Grigg GW et al (1994) A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci U S A 89(5):1827–1831 4. Clark SJ, Harrison J, Paul CL, Frommer M (1994) High sensitivity mapping of methylated cytosines. Nucleic Acids Res 22(15): 2990–2997 5. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD et al (2008) Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452(7184):215–219 6. Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH et al
7.
8.
9.
10.
11.
(2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133(3):523–536 Reinders J, Delucinge Vivier C, Theiler G, Chollet D, Descombes P, Paszkowski J (2008) Genome-wide, high-resolution DNA methylation profiling using bisulfite-mediated cytosine conversion. Genome Res 18(3):469–476 Gruntman E, Qi Y, Slotkin RK, Roeder T, Martienssen RA, Sachidanandam R (2008) Kismeth: analyzer of plant methylation states through bisulfite sequencing. BMC Bioinformatics 9:371–384 Hetzl J, Foerster AM, Raidl G, Mittelsten Scheid O (2007) CyMATE: a new tool for methylation analysis of plant genomic DNA after bisulphite sequencing. Plant J 51(3): 526–536 Grunau C, Schattevoy R, Mache N, Rosenthal A (2000) MethTools – a toolbox to visualize and analyze DNA methylation data. Nucleic Acids Res 28(5):1053–1058 Warnecke PM, Stirzaker C, Song J, Grunau C, Melki JR, Clark SJ (2002) Identification and resolution of artifacts in bisulfite sequencing. Methods 27(2):101–107
Chapter 2 Analysis of Bisulfite Sequencing Data from Plant DNA Using CyMATE Andrea M. Foerster, Jennifer Hetzl, Christoph Müllner, and Ortrun Mittelsten Scheid Abstract Amplifying and sequencing DNA after bisulfite treatment of genomic DNA reveals the methylation state of cytosine residues at the highest resolution possible. However, a thorough analysis is required for statistical evaluation of methylation at all sites in each genomic region. Several software tools were developed to assist in quantitative evaluation of bisulfite sequencing data from complex methylation patterns occurring in plants. This chapter describes the application of Cytosine Methylation Analysis Tool for Everyone (CyMATE). From aligned sequences, CyMATE quantifies and illustrates general and pattern-specific methylation at CG, CHG, and CHH (H = A, C, or T) sites, both per sequence and per position. CyMATE is also able to perform a quality control of sequences and to detect redundancy among individual clones. The software is able to reveal methylation patterns on complementary strands by handling data from hairpin bisulfite sequencing. The tool is freely available for non-commercial use at http://www.cymate.org. Key words: DNA methylation, 5-methylcytosine (5mC), Bisulfite sequencing, Hairpin sequencing, Symmetric/asymmetric DNA methylation, Methylation context, CyMATE
1. Introduction DNA methylation is an important component of epigenetic gene regulation. It is a more complex process in plants than in other eukaryotes, since it can modify cytosines in every sequence context. Patterns of 5-methylcytosine (5mC) can be detected by sodium bisulfite treatment which leads to the conversion of nonmethylated cytosine to uracil, whereas 5mC remains unchanged. Following PCR amplification from bisulfite-treated genomic DNA, the converted positions appear as thymine in the amplified sequences. Individual clones are sequenced and compared to the Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_2, © Springer Science + Business Media, LLC 2010
13
14
Foerster et al.
original, unmodified genomic template. Changes from C to T residues indicate cytosines that were not methylated, whereas remaining Cs specify methylated cytosines in the genomic template. Multiple clonal sequences obtained from the same biological sample are usually compared to get a statistical representation of DNA methylation at the genomic site under investigation. Manual evaluation is laborious and error-prone due to large data sets. Although there are several tools for computer-assisted evaluation of methylation patterns in mammalian DNA, their analysis is restricted to CG sites. Only a few software tools are available to analyse plantspecific DNA methylation patterns (1–3). This chapter focuses on the application of Cytosine Methylation Analysis Tool for Everyone (CyMATE), which is designed to distinguish and quantify different DNA methylation classes occurring in plants in the context of CGN, CHG, or CHH. CyMATE allows for analysis of cytosine methylation by quantifying total and class-specific methylation as well as patterns of individual templates or patterns at specific positions within the master sequence. A color- and shapecoded graphical output in pattern matrix view is supplied together with a detailed statistical analysis and histograms representing the quantitative evaluation (Fig. 1). CyMATE is freely available for non-commercial use at http://www.cymate.org.
2. Program Input CyMATE analyses cytosine methylation patterns based on bisulfite sequencing and evaluation of transitions from C to T nucleotides in sequences representing individual genomic templates. The protocol for the bench work is described in the chapter “Analysis of DNA Methylation in Plants by Bisulfite Sequencing”. The following paragraph describes how to prepare input data for analysis with CyMATE. 2.1.Organization of Input
CyMATE is expected to read pre-aligned sequence data, i.e. multiple sequence alignment (MSA) files, either in sequential (standard FASTA), interleaved (standard CLUSTAL) or NEXUS format. Steps 1–3 describe how to create MSA from experimental raw data. 1. Define the reference sequence (the master genomic sequence without any conversion). We recommend to use genomic DNA data as a reference sequence, e.g. from NCBI’s Nucleotide (http://www.ncbi.nlm.nih.gov/sites/entrez?db= nuccore&itool=toolbar). The reference sequence should not encompass flanking PCR primers, as they do not represent genomic DNA that has undergone conversion.
Analysis of Bisulfite Sequencing Data from Plant DNA Using CyMATE
15
Bisulfite conversion, PCR, cloning and sequencing
Generation of multiple sequence alignment
CyMATE Backend
Output
Pos 43 61
| | |
Class I (CGN): M 31 (81.58%) 27 (71.05%)
Fig. 1. A workflow in CyMATE analysis
2. Define the sample sequence(s) (clone(s)) and the master sequence with unambiguous file names to support later identification and sorting. The sequences of experimental clones should not be manually edited or clipped, since trimming regions outside of the master sequence and detection of sequencing errors are much easier after alignment. 3. Combine the data by aligning the master and clonal sequences with appropriate software, e.g. ClustalW (http://www.ebi.ac.uk/ Tools/clustalw2/index.html) or the desktop version ClustalX 1.83. The region of interest can be manually selected using the option “Save Sequences as...” from the “Edit” Menu in ClustalX. This feature of ClustalX is also useful to remove any remaining primer/vector sequences from the alignment and to generate blunt ended MSAs as needed for the use of CyMATE (see Note 1 and Fig. 2). Save MSA with the master on top (see Note 2) in FASTA, CLUSTAL or NEXUS format. Do not save the file in any other format, e.g. binary DOC or DOCX format.
16
Foerster et al.
Fig. 2. Generation of a multiple sequence alignment (MSA)
Fig. 3. A CyMATE analysis web form for single-strand analysis
2.2. Submission for Evaluation
The MSA files prepared in the previous section can now be submitted to CyMATE through the website http://www.cymate. org. The program allows an unlimited number of requests. 1. Open the website and in the “Perform Analysis” section, select “start analysis”. Select “single strand” for the standard analysis (Fig. 3); alternatively, choose “double strand”
Analysis of Bisulfite Sequencing Data from Plant DNA Using CyMATE
17
(if sequences were generated by hairpin bisulfite sequencing) or other parameters as appropriate (see Note 3). 2. Complete the form field with your email address and use the “Choose file” button to select and upload your MSA file. 3. Click the “Analyse this!” button. CyMATE will process your request (see Note 4), write the results into separate text and graphical files, and deliver them to your email address. Within a short period of time (see Note 5), the files will be available for further evaluation.
3. Program Output To analyse data, open the email entitled “CyMATE Analysis Request – Analysis Results” which contains analysis results as attachments. A successful run of the program will produce up to four different files with the name of your MSA and extensions.pdf, .txt, .fasta and .afa. The PDF file contains graphic results of the bisulfite analysis with filled symbols for methylated and blank symbols for nonmethylated cytosines, with red circles for CG, blue squares for CHG, and green triangles for CHH sites (see Note 6). The plain text file includes complete methylation analysis of the uploaded data file, mostly in a tabular form (see Note 7). The FASTA file corresponds to an original input file, the AFA file to the converted MSA file.
4. Additional Features of CyMATE
4.1. Options for Single Strand Mode
Most routine applications will require only the described simple and straightforward basic queries. For specific applications, however, a number of additional features can be selected through the CyMATE web interface. This section describes optional features of CyMATE at a glance. In the “Single strand” mode, pre-selection of only specific (any one, any two, or all three) methylation classes is possible. This option is available after selecting the analysis mode under “Enter Parameters for the single strand analysis”. For example, de-selecting the “Class 2” and “Class 3” checkboxes restricts the analysis to “Class 1” (CG) only (Fig. 3). Another option is “Mutation search”. By selecting this option, the text output will be extended by an additional part entitled “rvdiff”, providing a detailed mismatch analysis for each individual sample sequence indicating every heterogeneous position apart from a C-to-T transition with reference to the master sequence.
18
Foerster et al.
4.2. Redundancy Check
CyMATE offers useful features for analyzing sequences apart from differences in their methylation state. Selecting the analysis mode “Redundancy” (together with selecting the group-output option and excluding the master sequence) can be used to detect identical clones. In the case of methylated sequences, these clones indicate redundancy produced most likely by PCR rather than representing identical genomic templates and thereby reducing the significance of the results obtained. In the group-wise analysis of CyMATE, identical clones turn up first and appear together in a group. All but one member of this group should be removed from the data set. The analysis mode “Mismatch” operates similarly, revealing differences with respect to a master sequence. As for the single strand mode, the reference sequence must be on top of the MSA file. This feature can either be used independently or in addition to the single-strand data analysis mode by selecting “Mutation search” (see Subheading 4.1). If the feature is used independently, a detailed text file will be created, showing all mismatches in MSA. The mismatch analysis will include C-to-T conversions in each sequence for each position.
4.3. Double Stranded Analysis
While it is usually sufficient to analyze methylation patterns at one DNA strand (especially for symmetric methylation sites), sometimes it may be interesting to gain information about modification at the anti-parallel strand. The elegant method of hairpin bisulfite sequencing (4, 5), in which two strands are ligated prior to denaturation and bisulfite conversion via a linker with a unique sequence fingerprint, allows the analysis of complementary strands from the same genomic template (see Note 8). CyMATE can process sequence information obtained by double strand analysis. A module called CyMATEads has been implemented and described recently (6). It is available in the “Perform analysis” section under “double strand data” and the analysis mode “Double strand”. It requires entering the hairpin linker (HPL) sequence in the field “Hair-Pin-Linker” in the “Enter Parameters for the double strand-analysis” section. CyMATE will automatically discriminate between the top and the bottom strand. The HPL, single-stranded overhang regions, and regions of pairing between HPL and genomic complementary sequence will be excluded from the analysis. CyMATE will deliver detailed results in PDF and plain text format by email.
4.4. Analysis of Two Complementary Strands
CyMATE can further handle sequences generated by different primer sets, which amplify specifically the top or bottom strand. These do not necessarily represent strands from the same genomic template but are complementary. The analysis mode entitled “Two strand” will analyse the forward and reverse strand and deliver a detailed analysis in PDF and TXT format, similar to the double strand mode described earlier.
Analysis of Bisulfite Sequencing Data from Plant DNA Using CyMATE
4.5. CyMATE Updates
19
As other software, CyMATE may be developed further and adapted to new needs if necessary. Therefore, please also consult the actual information on the website (see Note 9).
5. Notes 1. “Blunt-ended” means that each sequence in the alignment has the same length. If required, leading or trailing gaps will be inserted at the start or the end of the sequence during the alignment procedure. 2. The master is expected to be the first sequence in the alignment. ClustalX and ClustalW offer the possibility to conserve the input order by checking the “input” option in the “Alignment – Output Format Options” menu. There are no restrictions in the length of sequences and their total number following the master sequence. 3. A basic analysis can be done using default parameters. A detailed description of alternative settings is available, (see Subheading 4). 4. CyMATE operates in three major phases. (1) For input reading and error detection, CyMATE reads aligned data and identifies each object by its label and its sequence data. CyMATE also differentiates between the “master” and “clone” type of the sequences. Furthermore, CyMATE considers data objects either as “single strand” (default for most analyses), “double strand” (for hairpin-bisulfite data) or “two strand” (complementary single strand data). It performs a number of consistency checks, e.g. for the file format. (2) During data analysis, CyMATE first determines all cytosine residues in the “master” sequence as potential methylation sites with their location (position index) and sequence context (methylation class). Subsequently, each clone is analysed separately with reference to the master. All clone profiles of one MSA are used to create statistics, e.g., the average number of methylated CHG sites at a specific position or the relative number of methylated CG sites in a specific sample. Multiple error checks are performed simultaneously with the above described evaluation procedures. (3) For the production of output files, methylation profiles of individual clones are summarized and written into a text file, including frequency and specificity of methylation per site, per sequence, per methylation class, and globally. Individual profiles are also written into a graphics file to yield the colored matrix-like plot. 5. While it usually takes only a few seconds, the actual time depends on the internet connection and the number of other simultaneous CyMATE operations.
20
Foerster et al.
Fig. 4. The CyMATE pattern matrix output
6. This output file (Fig. 4) can be opened using any image processing software and can be edited and inserted into other documents. Besides the matrix-like plot with shape- and colour-coded symbols, it shows a “ruler” on the top indicating the relative location and eventual clustering of cytosine residues in the sequence. At the bottom, numbers specify a position index of each cytosine residue within MSA. 7. The text output delivered by CyMATE is divided in three parts. (1) The first part refers to the master sequence and lists the sum of all possible methylation sites in absolute and relative numbers. In addition, cytosines are assigned to class I for CGN, class II for CHG and class III for CHH sequence context based on two nucleotides following cytosine in the sequence. For every class, methylation sites within the master sequence as well as pattern frequency within the master are indicated in absolute and relative values. (2) The second part of the text output contains information about the position of methylation. It specifies the occurrence of methylated (M) vs. nonmethylated (NM) cytosines at each potential methylation site, separately for each class and site in absolute and relative values. Furthermore, it states the average methylation degree per class and in total. The OK column provides an additional quality control. A value less than 100% at a specific position indicates e.g. a sequencing error in MSA. (3) The third part of the analysis report represents the examination of each individual cloned sequence, divided into relative values as a percentage and absolute values for methylated (M) and non-methylated (NM) sites. Relative values indicate the degree of methylation as a percentage for every single methylation class and in total (AVG). As described for the position-wise analysis, an OK column is included as a quality control. In an additional table, absolute and relative values indicate how many of all methylated residues of each individual clone are found in each methylation class. The plain text data output can be easily transferred into spreadsheets, e.g., Microsoft Excel, to generate histograms for an overview over the degree of methylation at a specific position (Fig. 5) or in total (Fig. 6).
21
100 75 50 25 0 22 24 25 28 34 35 38 43 57 58 61 72 73 75 78 79 82 89 90 91 101 102 103 104 105 107 108 109 111 118 121 136 140 141 144 145 147 150 153 158 178 180 181 183 187 199 201 203 207 208 209 211 215 216 219 221 226 227 228 231 232 235 250 255 268 275 276 277 284 285 286
%mC
Analysis of Bisulfite Sequencing Data from Plant DNA Using CyMATE
Position
Fig. 5. A histogram of position-based methylation analysis
100
total CGN
75
CHG
%mC
CHH 50
25
0
Fig. 6. A histogram of global methylation analysis
8. Hairpin bisulfite PCR (4) is performed after cutting genomic DNA using a restriction enzyme (no cutting within the sequence to be analyzed) and ligating complementary strands with each other with a stem-loop structure hairpin linker. During bisulfite treatment of the ligated DNA, the doublestranded target is denatured and can be amplified by PCR with primers specific for the top and bottom strand. The PCR products contain both complementary strands in linear but inverted orientation from which the methylation status of the original double strand template can be deduced. A refinement of the technique (5) was achieved by inserting a degenerate sequence in the hairpin region, thereby distinguishing each genomic DNA template by its individual barcode tag and allowing redundancy reduction. 9. CyMATE is currently modified to produce additional graphical output for statistical data and additional numerical data (in CSV format) for import into spreadsheet programs like Excel. For more information, updates and feedback, a detailed user guide, several example files, contact details and a “Frequently Asked Questions” section are available on www. cymate.org.
22
Foerster et al.
References 1. Hetzl J, Foerster AM, Raidl G, Mittelsten Scheid O (2007) CyMATE: a new tool for methylation analysis of plant genomic DNA after bisulphite sequencing. Plant J 51(3): 526–536 2. Gruntman E, Qi Y, Slotkin RK, Roeder T, Martienssen RA, Sachidanandam R (2008) Kismeth: analyzer of plant methylation states through bisulphite sequencing. BMC Bioinformatics 9:371–384 3. Grunau C, Schattevoy R, Mache N, Rosenthal A (2000) MethTools – a toolbox to visualize and analyze DNA methylation data. Nucleic Acids Res 28(5):1053–1058
4. Laird CD, Pleasant ND, Clark AD, Sneeden JL, Hassan KM, Manley NC et al (2004) Hairpin-bisulphite PCR: assessing epigenetic methylation patterns on complementary strands of individual DNA molecules. Proc Natl Acad Sci USA 101(1):204–209 5. Miner BE, Stoger RJ, Burden AF, Laird CD, Hansen RS (2004) Molecular barcodes detect redundancy and contamination in hairpin-bisulphite PCR. Nucleic Acids Res 32(17):e135 6. Muellner C, Hetzl J (2008) CyMATEads: Reliable analysis of cytosine methylation in plant and animal DNA using bisulphite sequence data. Schriftenreihe Informatik 26:43–52
Chapter 3 Analysis of Locus-Specific Changes in Methylation Patterns Using a COBRA (Combined Bisulfite Restriction Analysis) Assay Alex Boyko and Igor Kovalchuk Abstract DNA methylation is a major mechanism for the reversible control of gene expression, chromatin structure, and genome stability. Methylation analysis at a given locus allows one to evaluate levels of chromatin packaging, gene expression, and even homologous recombination. We have shown that the combined bisulfite restriction analysis (COBRA) assay makes it possible to analyze methylation levels at a defined locus. The major steps are: bisulfite conversion of nonmethylate cytosines to uracils, locus-specific PCR amplification of converted DNA, restriction digestion, and analysis of restriction patterns on the gel. Due to the availability of various restriction enzymes that have cytosines in the restriction recognition sequence, the assay allows analysis of various cytosines, including those potentially targeted for symmetrical and nonsymmetrical methylation. Key words: Locus-specific methylation, Combined bisulfite restriction analysis (COBRA), Bisulfite conversion
1. Introduction DNA methylation represents an important cellular mechanism that controls gene transcription, transposon activity, and inheritance of epigenetic traits (1–5). The loss of methyl groups commonly results in upregulation of gene transcription (6) and transposon activation (2). Notably, a DNA methylation pattern is not uniform throughout the genome, and it differs between gene coding sequences and transposons. While genes usually have several discrete methylated regions, transposons are methylated uniformly (6). Extensive methylation of certain genes and differential methylation of alleles result in imprinting, which may create new heritable epialleles and paramutations (7–10). Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_3, © Springer Science + Business Media, LLC 2010
23
24
Boyko and Kovalchuk
Studying the effects of epigenetic modifications on genome maintenance and regulation requires the use of efficient techniques capable of quantitative detection of locus-specific DNA methylation patterns. Here, we discuss a method that is a combination of three experimental techniques used to reveal site-specific differences in methylation patterns, including bisulfite conversion of DNA, PCR amplification of selected DNA fragments, and restriction digestion of PCR products with endonucleases. This method yields reliable quantitative results, and its performance is not affected by original density of DNA methylation at the analyzed DNA locus (11). This method is also known as a combined bisulfite restriction analysis or COBRA. The assay consists of three major steps: treatment of genomic DNA with sodium bisulfite, PCR amplification, and restriction digestion (Fig. 1). Sodium bisulfite treatment converts all unmethylated cytosine residues to uracil residues. This alters the DNA sequence and leads to the methylation-dependent creation of new recognition sites for restriction enzyme sites. In contrast, methylcytosine residues are not modified by bisulfite, which may result in the retention of preexisting restriction sites in a methylation-dependent manner. Overall, bisulfite conversion leads to the formation of a mixed population of DNA fragments that reflect existing differences in a methylation pattern. The PCR reaction permits amplification of each of these sequence variants without affecting the relative ratio between them. Moreover, PCR allows equal amplification efficiency of different sequences, thus preventing bias in the comparison of different methylation patterns. Hence, PCR products usually represent a collection of DNA sequences that have the same length and differ in sequence composition at the sites of potential DNA methylation. These differences can be revealed by using restriction endonucleases that recognize DNA sequences that are affected by methylation. Following restriction digestion, the cleaved PCR products can be separated by gel electrophoresis, and DNA band intensity can be determined using image processing software. Alternatively, if the amount of DNA is low and high sensitivity is required, then the cleaved PCR products can be hybridized with a specific probe. Similarly, a signal resulting from probe hybridization can be quantified. The ratio between the cleaved and remaining PCR products corresponds to the ratio between methylated and unmethylated cytosine residues originally present in genomic DNA before bisulfite conversion. The COBRA assay is characterized by high sensitivity and, in contrast to the other site-specific methylation analysis techniques such as methylation-specific PCR (MSP), has a very low possibility of false-positive results (11). It can efficiently work with low amounts of input DNA (11), and it also permits analysis of cytosine methylation in two DNA strands separately (12). Overall, the assay is not labor-intensive and provides a high degree of quantitative accuracy (11, 13).
25
Analysis of Locus-Specific Changes in Methylation Patterns Using a COBRA
a
c
GACGCATA probe
mC
mC
Bisulfite Conversion
C
Hpy CH4IV
C —
PCR
U
T
+
+
+ % mC = 100 x
A
B A+B
B Methylation
b
50 %
100 %
d
Top strand 5’ 3’
GACGCATA CTGCGTAT
GACGCATA CTGCGTAT
c
5’
GACGCATA
3’
3’
CTGCGTAT
5’
e 5’
GAUGUATA
3’
3’
UTGUGTAT
5’
5’ 3’
GATGTATA CTACATAT
3’ 5’
5’ 3’
AACACATA TTGTGTAT
3’ 5’
3’ 5’
c
Bottom strand
a 5’ 3’
0%
d
f
3’ 5’ CH3
Top strand
b
CH3 5’ 3’
GACGCATA CTGCGTAT CH3
Bottom strand
c
d
5’
GACGCATA
3’
3’
CTGCGTAT
5’
3’ 5’
c
CH3
g
5’
GACGUATA
3’
3’
UTGCGTAT
5’
d
h
CH3
Hpy CH4IV 5’ 3’
GACGTATA CTGCATAT
3’ 5’
5’ 3’
AACGCATA TTGCGTAT
3’ 5’
CH3
Fig. 1. General outline of the COBRA assay. (A) A general mechanism of methylated cytosine detection. The bisulfite treatment converts all unmethylated cytosine residues to uracil residues. Next, PCR amplification substitutes uracil for thymine. In contrast, methylcytosine residues remain unchanged. (B) Generation of new restriction sites upon bisulfite conversion and PCR amplification. The original DNA sequence that was chosen for a COBRA analysis contains a precursor of the recognition site for HpyCH4IV restriction endonucleases (ACGC). The original DNA sequence can exist in two forms: (a) the unmethylated (ACGC) cytosine nucleotide, and (b) the methylated (AmCGC) cytosine nucleotide in a CpG sequence context. Denaturation separates the top and bottom DNA strands (c). The native DNA sequence is modified in a methylation-dependent manner upon bisulfite conversion of single-stranded DNA (d). The top (e, g) and bottom (f, h) DNA strands are amplified in different PCR reactions using distinct sets of PCR primers. In our example, only amplification of the top DNA strand is informative. It leads to the generation of HpyCH4IV recognition site (ACGT) in a methylation-dependent manner (AmCGC → AmCGU → ACGT versus ACGC → AUGU → ATGT). (C) Quantification of cytosine methylation by probe hybridization. Note that the probe sequence does not cover the restriction enzyme recognition site. The ratio of the cleaved PCR product (B) and the total amount of the PCR product (A + B) shows a percentage of methylated HpyCH4IV recognition site precursors in original DNA
2. Materials 2.1. Sodium Bisulfite Treatment
1. Autoclaved distilled water. 2. 3 M NaOH. CAUTION: 3 M sodium hydroxide solution is toxic, corrosive, and can cause burns. Prevent eye, skin, and clothing contact.
26
Boyko and Kovalchuk
3. Sodium bisulfite. 4. Hydroquinone. CAUTION: hydroquinone is a skin bleaching agent. Irritates lungs and skin. 5. Wizard DNA Clean-Up System (Promega). 6. 7.5 M ammonium acetate, pH 7.0. 7. 100% and 70% ethanol. 8. 10 M NaOH. CAUTION: 3 M sodium hydroxide solution is extremely toxic, corrosive, and can cause burns. Prevent eye, skin, and clothing contact. 9. Glycogen. 2.2. PCR Amplification
1. Nuclease-free water. 2. Primers. 3. 10× Ex Taq™ Buffer (Takara Bio USA). 4. dNTP Mixture (Takara Bio USA). 5. Takara Ex Taq™ DNA Polymerase (Takara Bio USA). 6. 100% and 70% ethanol.
2.3. Restriction Enzyme Digestion
1. Restriction enzyme with suitable 10× reaction buffer.
2.4. Gel Electrophoresis and Methylation Analysis
1. Agarose, electrophoresis grade.
2. Nuclease-free water.
2. 1× TBE: 90 mM Tris, pH 8.0, 90 mM boric acid 2 mM EDTA. 3. 6× DNA gel loading buffer. 4. DNA ladder (Fermentas). 5. Gel image analysis software (Image J, National Institutes of Health, USA).
3. Methods 3.1. Sodium Bisulfite Treatment
Sodium bisulfite treatment results in conversion of unmethylated cytosine residues to uracil residues. In contrast, reactivity of 5-methylcytosine is much lower, thus methylated cytosine residues remain unchanged (14). The reaction proceeds through several steps and requires that DNA remains in a single-stranded form. Incomplete DNA denaturation prevents sulfonation of a cytosine at the C-6 position and results in incomplete conversion (15). Importantly, since the sulfonation reaction competes with depurination of DNA, it is important to find the best conditions allowing maximum cytosine conversion with minimal DNA
Analysis of Locus-Specific Changes in Methylation Patterns Using a COBRA
27
degradation due to depurination (16). Severe DNA depurination may result in a complete failure of the following PCR reaction. 1. Prepare 2 µg aliquots of genomic DNA in a final volume of 20 µl (see Notes 1–3). 2. Denaturate DNA by adding freshly prepared 3 M NaOH to a final concentration of 0.3 M and incubating samples for 15 min at 37°C. 3. Prepare fresh stock solution of 3.6 M sodium bisulfite (Sigma), pH 5.0 and 0.1 M hydroquinone (Sigma). Sodium bisulfite pH is adjusted with 10 M NaOH (see Note 4). 4. Add 208 µl of 3.6 M solution of sodium bisulfite stock and 12 µl of 10 mM hydroquinone solution (dilute the hydroquinone stock solution 1:10 to obtain a 10 mM solution) to the 20 µl of denaturated DNA. Mix samples. 5. Incubate samples in the dark at 55°C for about 16 h (see Notes 5 and 6). 6. Remove free bisulfite by passing samples through a Wizard DNA Clean-Up System (Promega) desalting column and elute in 50 µl of sterile distilled water. 7. Add a freshly prepared 3 M NaOH solution to a final concentration of 0.3 M, and incubate samples for 15 min at 37°C. 8. Neutralize the solution by adding 7.5 M ammonium acetate, pH 7.0 to a final concentration of 3 M. 9. Ethanol-precipitate DNA with glycogen as a carrier. Resuspend DNA in 20 µl of sterile distilled water. 10. Samples can be stored at −20°C until needed. 3.2. PCR Amplification
PCR amplification of bisulfite-treated DNA is more technically challenging than PCR on native DNA. First, remaining bisulfite may inhibit the PCR reaction. However, passing the sample through a desalting column helps solve this problem. The other difficulties are the primer design and optimization of thermalcycling parameters. Bisulfite conversion significantly alters the native DNA sequence and results in depletion of cytosine nucleotides. Following bisulfite treatment, two originally complementary DNA strands become noncomplimentary single-stranded sequences. Thus, two different sets of PCR primers are needed to analyze DNA methylation at both sense and antisense DNA strands. Since bisulfite conversion may generate/retain restriction sites in one strand and not in another, the right choice of a DNA strand for analysis is very important. Once the DNA strand is selected, PCR primers can be designed. There are also several important rules for primer design. It is important that primers recognize those DNA sequences that contain no CpG dinucleotides and have low cytosine content. This insures
28
Boyko and Kovalchuk
that both originally methylated and unmethylated sequences are equally amplified by PCR. Next, it is advisable to design long primers of at least 24 bases, which help compensate for the reduced sequence complexity of PCR products generated from the bisulfitetreated DNA template. Finally, choosing small-size amplicons (less than 500 bp) may help improve PCR quality and reduce bias during sequence amplification. Similarly, thermal-cycling parameters may require optimization. The preliminary extended denaturation time, up to 5 min, in the first cycle is recommended. If the amount of template DNA is sufficient, 30 cycles of PCR should be enough to produce the product for subsequent restriction digestion. If the amount of input DNA is very low, then the number of cycles can be increased. Alternatively, a secondary or nested PCR can be performed. A nested PCR is a conventional PCR that uses the second or even third PCR with different sets of primers. This is done by using the product of the first PCR as a template. 1. Use 2–3 µl of bisulfite-treated DNA per a 25 µl PCR reaction. 2. Prepare PCR reactions. Each reaction should contain: 0.63 units of Takara Ex Taq™ DNA Polymerase (Takara Bio USA), 1× Ex Taq™ Buffer (contains 2 mM MgCl2) (Takara Bio USA), dNTP Mixture (2.5 mM each dNTP) (Takara Bio USA) and 50 µM of each primer in a final volume of 25 µl. Master mix can be used, if more than 5 samples have to be analyzed. 3. After PCR cycling is completed, keep samples on ice. If necessary, perform a secondary PCR. 4. Ethanol-precipitate the PCR product. Resuspend DNA in 20 µl of sterile distilled water. Quantify DNA (see Note 7). 5. Samples can be stored at −20°C until needed. 3.3. Restriction Enzyme Digestion
Since bisulfite conversion generates new restriction sites and retains original restriction sites in a methylation-dependent manner, it is possible to analyze cytosine methylation using restriction endonucleases (Table 1). The general opinion is in favor of using newly generated restriction sites for restriction analysis. Using newly created restriction sites allows verification of complete bisulfite conversion. If conversion is not completed, no restriction sites will be created. It is also possible to use methylation-sensitive restriction endonucleases, since PCR products do not contain methylated cytosine residues. The application of different restriction enzymes permits analysis of cytosine methylation in a different sequence context, including symmetrical CpG and CpNpG and nonsymmetrical CpHpH methylation. 1. Digest 1 µg of purified PCR-amplified DNA with a tenfold excess of restriction enzyme in a final volume of 20 µl according to manufacturer’s protocol. Incubate samples overnight. 2. Digested DNA samples can be stored at −20°C until needed.
Analysis of Locus-Specific Changes in Methylation Patterns Using a COBRA
29
Table 1 List of restriction endonucleases commonly used for the COBRA assay Restriction endonuclease
Restriction site
BsiWI
C/GTACG
BspDI
AT/CGAT
BstBI
TT/CGAA
BstUI
CG/CG
ClaI
AT/CGAT
HpyCH4IV
AC/GT
MluI
A/CGCGT
NruI
TCGCGA
PvuI
CGAT/CG
TaqI
T/CGA
Cytosine methylation, which is most commonly analyzed, is cytosine methylation in a CpG sequence context. To avoid difficulties during CpG methylation analysis, the restriction enzymes selected for the assay should have cytosine residues in their recognition sites only in the CpG sequence context
3.4. Gel Electrophoresis and Methylation Analysis
1. Separate the digested PCR products in agarose gel (see Note 8). 2. Measure the intensity of DNA bands using available software tools (e.g. Image J, National Institutes of Health, USA) (see Note 9). 3. Calculate a percentage of cytosine methylation at a given locus by relating the intensity of the cleaved and remaining undigested PCR products (see Note 10).
4 Notes 1. Precise quantification of genomic DNA is essential for the assay. Samples can be quantified using a spectrophotometer. However, an equal sample loading must be confirmed using gel electrophoresis, and if necessary, it should be adjusted accordingly. 2. Sometimes, it can be recommended to digest genomic DNA with restriction enzymes before bisulfite conversion. This helps insure the completeness of DNA denaturation and bisulfite conversion. Restriction enzymes used for digestion should not be cut within the region selected for analysis (i.e.
30
Boyko and Kovalchuk
for PCR amplification). Following restriction digestion, DNA should be ethanol-precipitated to prevent the interference with a bisulfite reaction. 3. It is possible to use much less DNA for conversion. Successful results were reported with nanogram quantities of starting DNA (11). 4. Sodium bisulfite oxidizes easily. To prevent excessive aeration during dissolving, a solution should be mixed very gently. The pH adjustment is required to dissolve sodium bisulfite completely. It is recommended to prepare a sodium bisulfitehydroquinone mixed stock solution to reduce sodium bisulfite oxidation. 5. Samples should be incubated in a PCR thermocycler to prevent evaporation. If a thermocycler is not available, samples can be overlaid with few drops of mineral oil. 6. An excessive incubation time may result in progressive DNA degradation due to depurination. On the contrary, a short incubation period may lead to incomplete conversion. It is recommended to experiment with the incubation temperature and time, and the concentration of sodium bisulfite in order to establish optimal reaction conditions. An easy and inexpensive way to check for the completeness of bisulfite conversion is to digest converted PCR-amplified DNA with a restriction enzyme that recognizes the sequence that contains only adenine or thymine (e.g. DraI: TTT/AAA) (12). If conversion is successful, then new restriction sites will be created. 7. It is necessary to clean up a PCR reaction before restriction digestion. Residual salts from PCR buffers may inhibit the restriction enzyme activity. A commercial PCR clean-up kit can be used at this stage. If a PCR reaction produces an unspecific product, then gel extraction of the main product is recommended. 8. If the size difference between the digested and undigested PCR product is too small, then DNA should be separated using polyacrylamide gel. 9. If an initial amount of PCR product is low and cannot be seen in the gel directly, then probe hybridization (Southern blotting) should be performed. The probe should be designed using the same guidelines as used for primers. The probe should not be designed for the recognition of potentially methylated sequences such as CpG dinucleotides and sequences with high cytosine content. Also, the probe should not overlap with restriction sites of enzymes used for COBRA analysis (Fig. 1c).
Analysis of Locus-Specific Changes in Methylation Patterns Using a COBRA
31
10. If the intensity of PCR products after restriction digestion is visualized using probe hybridization, then a percentage of methylated cytosine residues can be calculated using the formula % = 100 × B/(A + B), where A and B are intensities of the remaining undigested and digested PCR products, respectively (Fig. 1c). In this example, PCR product digestion can occur only if a cytosine residue in the CpG dinucleotide sequence is methylated before bisulfite conversion (Fig. 1c). References 1. Finnegan EJ, Genger RK, Peacock WJ, Dennis ES (1998) DNA methylation in plants. Annu Rev Plant Physiol Plant Mol Biol 49:223–247 2. Kato M, Miura A, Bender J, Jacobsen SE, Kakutani T (2003) Role of CG and non-CG methylation in immobilization of transposons in Arabidopsis. Curr Biol 13:421–426 3. Bender J (2004) DNA methylation and epigenetics. Annu Rev Plant Biol 55:41–68 4. Goll MG, Bestor TH (2005) Eukaryotic cytosine methyltransferases. Annu Rev Biochem 74:481–514 5. Rassoulzadegan M, Grandjean V, Gounon P, Vincent S, Gillot I, Cuzin F (2006) RNAmediated non-mendelian inheritance of an epigenetic change in the mouse. Nature 441:469–474 6. Zilberman D, Gehring M, Tran RK, Ballinger T, Henikoff S (2007) Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription. Nat Genet 39:61–69 7. Chandler VL, Eggleston WB, Dorweiler JE (2000) Paramutation in maize. Plant Mol Biol 43:121–145 8. Choi Y, Gehring M, Johnson L, Hannon M, Harada JJ, Goldberg RB et al (2002) DEMETER, a DNA glycosylase domain protein, is required for endosperm gene imprinting and seeds viability in Arabidopsis. Cell 110:33–42
9. Zilberman D, Henikoff S (2005) Epigenetic inheritance in Arabidopsis: selective silence. Curr Opin Genet Dev 15:557–562 10. Penterman J, Zilberman D, Huh JH, Ballinger T, Henikoff S, Fischer RL (2007) DNA demethylation in the Arabidopsis genome. Proc Natl Acad Sci USA 104:6752–6757 11. Xiong Z, Laird PW (1997) COBRA: a sensitive and quantitative DNA methylation assay. Nucleic Acids Res 25:2532–2534 12. Sadri R, Hornsby PJ (1996) Rapid analysis of DNA methylation using new restriction enzyme sites created by bisulfite modification. Nucleic Acids Res 24:5058–5059 13. Yang AS, Estécio MR, Doshi K, Kondo Y, Tajara EH, Issa JP (2004) A simple method for estimating global DNA methylation using bisulfite PCR of repetitive DNA elements. Nucleic Acids Res 32:e38 14. Wang RY, Gehrke CW, Ehrlich M (1980) Comparison of bisulfite modification of 5-methyldeoxycytidine and deoxycytidine residues. Nucleic Acids Res 8:4777–4790 15. Hayatsu H (1976) Bisulfite modification of nucleic acids and their constituents. Prog Nucleic Acid Res Mol Biol 16:75–124 16. Raizis AM, Schmitt F, Jost JP (1995) A bisulfite method of 5-methylcytosine mapping that minimizes template degradation. Anal Biochem 226:161–166
Chapter 4 Detection of Changes in Global Genome Methylation Using the Cytosine-Extension Assay Alex Boyko and Igor Kovalchuk Abstract Methylation is a reversible covalent chemical modification of DNA intended to regulate gene expression, genome stability, and chromatin structure. Although there are various methods of methylation analysis, most of them are either laborious or expensive, or both. Here, we describe a quick, inexpensive method for analysis of global genome methylation using a cytosine extension assay. The assay can be used for analysis of the total level of CpG, CNpG, and asymmetrical methylation in a given cell culture or in a plant tissue sample. Key words: Global genome methylation, Cytosine-extension, CpG, CNpG, Asymmetrical methylation
1. Introduction DNA methylation plays a critical role in a variety of cell processes. This includes regulation of transcription, control of transposable element activity, and defense against foreign DNA sequences (1–4). Recent reports showed that DNA methylation plays an important role in the inheritance of gene expression patterns (5). Therefore, the maintenance of DNA methylation is critical for genome stability (6). A decrease in the level of cytosine methylation is frequently associated with activation of transposons (2) and an increased frequency of chromosomal rearrangements (7–9). In higher eukaryotes, DNA methylation is primarily associated with a CpG sequence context (4). Methylated CpG dinucleotides are frequently found within the 5¢ gene regulatory regions, in which they form the so-called CpG islands. The maintenance of a proper methylation status of CpG islands is critical for the
Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_4, © Springer Science + Business Media, LLC 2010
33
34
Boyko and Kovalchuk
prevention of malignant transformation in mammals (10) and suppression of transposon activity in plants (2). Additionally, CpG methylation is responsible for coordination of transgenerational stability of the plant epigenome (11). In contrast to animals, plants usually have a significantly higher content of methylated CpG sites in their genome (30% in plants versus 2–8% in animals). Moreover, CpNpG and asymmetrical cytosine methylation that is usually absent in animals occurs frequently in plants (1, 3, 12, 13). Therefore, there is a need for efficient and sensitive protocols that permit quantitative measurement of changes in DNA methylation. To date, a number of methods allowing detection of DNA methylation changes have been developed. Here, we discuss a sensitive and rapid method for the detection of changes in global DNA methylation (14). The assay is based on using methylation-sensitive restriction enzymes that produce 5¢ guanine overhangs upon cleavage. Restriction digestion is followed by single nucleotide primer extension with (3H)dCTP (Fig. 1). This allows the quantitative detection of unmethylated restriction sites available, as the number of (3H)dCTP incorporations should be proportional to the number of 5¢ guanine overhangs
CpG sites
CpNpG sites CH3
5’ 3’
CCGG GGCC
5’ 3’
3’ 5’
CCGG GGCC
CH3 5’ 3’
3’ 5’
CCGG GGCC
5’ 3’
3’ 5’
CCGG GGCC
CH3
HpaII cleavage
CH3
MspI cleavage
HpaII cleavage blocked
MspI cleavage blocked
CH3 5’ 3’
CGG C C GGC
3’ 5’
5’ 3’
CCGG GGCC
CH3 3’ 5’
5’ 3’
CGG C C GGC
3’ 5’
5’ 3’
CCGG GGCC
CH3 [3H]dCTP incorporation
[3H]dCTP incorporation
CH3
CGG CC CC GGC [3H]
3’ 5’
5’ 3’
CCGG GGCC CH3
No Signal
3’ 5’
CH3
No [3H]dCTP incorporation
[3H] 5’ 3’
3’ 5’
No [3H]dCTP incorporation CH3
[3H] 3’ 5’
5’ 3’
CC CGG CC GGC [3H]
3’ 5’
5’ 3’
CCGG GGCC
3’ 5’
CH3
No Signal
Fig. 1. The mechanism of methylation pattern detection using the cytosine-extension assay. Digestion of plant genomic DNA with the methylation-sensitive restriction endonucleases (here HpaII and MspI) creates 5¢ guanine overhangs. The single nucleotide primer extension reaction incorporates (3H)dCTP nucleotides into digested DNA. DNA methylation at a restriction site blocks cleavage, thereby preventing (3H)dCTP incorporation
Detection of Changes in Global Genome Methylation Using the Cytosine-Extension Assay
35
produced upon cleavage. Importantly, the cleavage efficiency of the enzyme is not affected by methylation density (14). The background readings that may arise from broken genomic DNA are eliminated by performing a single nucleotide primer extension assay with DNA that was not digested with a restriction endonuclease. Next, the undigested DNA readings can be subtracted from the digested DNA readings to obtain the number that corresponds to the number of unmethylated restriction sites available. The assay is highly sensitive, it requires low amounts of input DNA and can be used for methylation analysis of significantly damaged DNA templates that contain various DNA adducts, strand breaks, and abasic sites (14). The assay is workefficient and permits analysis of several hundred samples in 2 days.
2. Materials 2.1. Restriction Enzyme Digestion
1. Restriction enzymes such as HpaII, MspI, and others (see Table 1) with suitable 10× reaction buffer. 2. Nuclease-free water.
2.2. Single Nucleotide Extension Reaction
1. Nuclease-free water. 2. Methylation-sensitive restriction enzyme with suitable 10× reaction buffer. 3. Agarose, electrophoresis grade. 4. 1× TBE (90 mM Tris, pH 8.0, 90 mM boric acid, 2 mM EDTA). 5. 6× DNA gel loading buffer. 6. 10× PCR buffer II w/o MgCl2. 7. 25 mM MgCl2. 8. AmpliTaq DNA polymerase (Perkin Elmer, Foster City, CA). 9. (3H)dCTP (NEN, Boston MA). CAUTION: Radiation protection measures must be taken for handling 3H and all derived materials. Store in a shielded container in a dedicated freezer at −20°C. 10. Whatman DE-81 ion-exchange filters. 11. 500 mM Na-phosphate buffer, pH 7.0. 12. Scintillation vials (PerkinElmer). 13. A scintillation cocktail (PerkinElmer). 14. A Beckman LS 5000 CE liquid scintillation counter (Beckman).
36
Boyko and Kovalchuk
Table 1 List of the methylation-sensitive restriction endonucleases that can be used for the cytosine extension assay Restriction endonuclease
Restriction site (blocked by cytosine methylation) Methylation pattern analysis
AciI
C/CGG
Global methylation
AgeI
A/CCGGT
Global methylation
AscI
GG/CGCGCC
CpG islands
BssHII
G/CGCGC
CpG islands
BstBI
TT/CGAA
Global methylation
HpaII
C/CGG
Global methylation
Hpy CH4IV
A/CGT
Global methylation
MluI
A/CGCGT
Global and CpG islands
MspI
C/CGG
Global methylation
NarI
GG/CGCC
CpG islands
The most frequently used enzymes are HpaII, MspI, AciI, and BssHII. If samples of animal DNA are analyzed, then HpaII and MspI can be used to measure the percentage of methylated CpG sites of the total number of restriction sites available. While HpaII cleavage is blocked by methylation at the internal cytosine, its isoschizomer MspI can cleave the same site regardless of cytosine methylation. In contrast, in plant DNA, MspI cleavage is blocked, if the external cytosine at the restriction site is methylated. Hence, the combination of HpaII and MspI enzymes can be efficiently used for plant DNA methylation analysis to compare cytosine methylation in the CpG and CpNpG sequence context, respectively
3. Methods 3.1. Restriction Enzyme Digestion
1. Using nuclease-free water, prepare two 1.0 µg genomic DNA aliquots. One aliquot is incubated with a methylation-sensitive endonuclease. The second DNA aliquot is incubated without restriction enzyme and serves as a background control (see Notes 1–4). 2. Set up digestion of the first aliquot in a final volume of 20 µl using a tenfold excess of restriction enzyme according to the manufacture’s protocol. Use nuclease-free water in place of enzyme for the second DNA aliquot. Incubate samples overnight at the temperature suggested by the manufacturer.
Detection of Changes in Global Genome Methylation Using the Cytosine-Extension Assay
3.2. Single Nucleotide Extension Reactions
37
1. Use 10 µl (0.5 µg DNA) of each digestion reaction for the single nucleotide extension reaction. 2. Set up reactions in a final volume of 25 µl containing 0.5 µg DNA (10 µl), 1× PCR buffer II w/o MgCl2, 1.0 mM MgCl2, 0.5 units of AmpliTaq DNA polymerase (Perkin Elmer, Foster City, CA), (3H)dCTP (42.9 Ci/mmol) (NEN, Boston MA) (see Note 5). 3. Incubate samples at 56°C for 1 h, and then place the samples on ice. 4. Apply 25 µl reactions to Whatman DE-81 ion-exchange filters. Air-dry filters. (see Note 6). 5. Wash filters in 500 mM Na-phosphate buffer, pH 7.0 at room temperature for 10 min. 6. Repeat wash twice. 7. Air-dry filters and transfer them to scintillation vials (PerkinElmer) containing 5 ml of a scintillation cocktail (PerkinElmer). Insure that filters are completely submersed into the scintillation cocktail. 8. Measure the radiolabel incorporation of the samples in a liquid scintillation counter (ex. Beckman LS 5000 CE liquid scintillation counter) using the setting suggested by the manufacturer. The readings taken from the enzyme-treated samples show the total radiolabel incorporation (RIT) that negatively correlates with the number of methylated restriction sites. The readings taken from the samples incubated without a restriction enzyme show the background radiolabel incorporation (RIB) that may reflect the quality and integrity of the input DNA. 9. Calculate the actual (due to restriction site demethylation) radiolabel incorporation (RIA) using a formula RIA = RIT − RIB, where RIT and RIB are the total and background radiolabel incorporation, respectively. Express the results as relative (3H) dCTP incorporation/0.5 µg DNA. Alternatively, the results can be expressed as a percentage change in control samples (see Note 4).
4. Notes 1. Precise quantification of genomic DNA is essential for the assay. Samples can be quantified using a spectrophotometer. However, equal sample loading must be confirmed using gel electrophoresis, and if necessary should be adjusted accordingly.
38
Boyko and Kovalchuk
2. It is important to ensure purity of DNA preparation. We suggest using the ethanol-precipitated DNA for the assay. This prevents interference of chemicals used during DNA extraction (SDS, EDTA, proteinase K, phenol etc.) with the process of restriction digestion. 3. The choice of restriction enzyme determines the type of DNA methylation being analyzed: global genome methylation or CpG islands (Table 1). The enzymes that have their recognition sites distributed randomly throughout the genome are suitable for global methylation analysis. The enzymes that have multiple CpGs in their recognition sequences are usually used to study methylation of CpG islands. Similarly, choosing the right enzyme makes it possible to selectively analyze methylation in both CpNpG and CpG sequence contexts. 4. A pair of isoschizomers comprising one methylation-sensitive and one methylation-insensitive enzyme can be used to determine the percentage of restriction sites available in the genome that contains methylated cytosine residues. Three cytosineextension reactions for each sample should be performed: a background control reaction (no enzyme added), a digestion reaction with a methylation-sensitive enzyme, and a digestion reaction with a methylation-insensitive enzyme. Once samples are collected for background incorporation, the ratio of incorporations after methylation-sensitive and methylationinsensitive enzyme digestion will show the percentage of unmethylated restriction sites. 5. Recent reports demonstrate that (3H)dCTP nucleotides can be efficiently substituted for biotinylated dCTP, thus eliminating the need to use radioactivity for the assay (15). 6. Using Whatman DE-81 ion-exchange filters is essential, as they drastically reduce DNA contamination with unincorporated nucleotides (16).
References 1. Finnegan EJ, Genger RK, Peacock WJ, Dennis ES (1998) DNA methylation in plants. Annu Rev Plant Physiol Plant Mol Biol 49:223–247 2. Kato M, Miura A, Bender J, Jacobsen SE, Kakutani T (2003) Role of CG and non-CG methylation in immobilization of transposons in Arabidopsis. Curr Biol 13:421–426 3. Bender J (2004) DNA methylation and epigenetics. Annu Rev Plant Biol 55:41–68 4. Goll MG, Bestor TH (2005) Eukaryotic cytosine methyltransferases. Annu Rev Biochem 74:481–514
5. Rassoulzadegan M, Grandjean V, Gounon P, Vincent S, Gillot I, Cuzin F (2006) RNAmediated non-mendelian inheritance of an epigenetic change in the mouse. Nature 441:469–474 6. Rizwana R, Hahn PJ (1999) CpG methylation reduces genomic instability. J Cell Sci 112:4513–4519 7. Engler P, Weng A, Storb U (1993) Influences of CpG methylation and target spacing on V(D)J recombination in a transgenic substrate. Mol Cell Biol 13:571–577
Detection of Changes in Global Genome Methylation Using the Cytosine-Extension Assay 8. Bender J (1998) Cytosine methylation of repeated sequences in eukaryotes: the role of DNA pairing. Trends Biochem Sci 23: 252–256 9. Bassing CH, Swat W, Alt FW (2002) The mechanism and regulation of chromosomal V(D)J recombination. Cell 109:S45–S55 10. Shi H, Wang MX, Caldwell CW (2007) CpG islands: their potential as biomarkers for cancer. Expert Rev Mol Diagn 7: 519–531 11. Mathieu O, Reinders J, Caikovski M, Smathajitt C, Paszkowski J (2007) Transgenerational stability of the Arabidopsis epigenome is coordinated by CG methylation. Cell 130:851–862 12. Ingelbrecht I, Van Houdt H, Van Montagu M, Depicker A (1994) Posttranscriptional silencing of reporter transgenes in tobacco
13.
14.
15.
16.
39
correlates with DNA methylation. Proc Natl Acad Sci U S A 91:10502–10526 Meyer P, Niedenhof I, ten Lohuis M (1994) Evidence for cytosine methylation of nonsymmetrical sequences in transgenic Petunia hybrida. EMBO J 13:2084–2088 Pogribny I, Yi P, James SJ (1999) A sensitive new method for rapid detection of abnormal methylation patterns in global DNA and within CpG islands. Biochem Biophys Res Commun 262:624–628 Fujiwara H, Ito M (2002) Nonisotopic cytosine extension assay: a highly sensitive method to evaluate CpG island methylation in the whole genome. Anal Biochem 307:386–389 Basnakian AG, James SJ (1996) Quantification of 3¢OH DNA breaks by random oligonucleotide-primed synthesis (ROPS) assay. DNA Cell Biol 15:255–262
Chapter 5 In Situ Analysis of DNA Methylation in Plants Palak Kathiria and Igor Kovalchuk Abstract Epigenetic changes in the plant genome are associated with differential genome methylation, histone modifications, and the binding of various chromatin-binding factors. Methylation of cytosine residues is one of the most versatile mechanisms of epigenetic regulation. The analysis of DNA methylation can be performed in different ways. However, most of these procedures involve the extraction of chromatin from cells with further isolation and analysis of DNA. Modest success has been achieved in DNA methylation analysis in plant tissues in situ. Here, we present an in situ method for DNA methylation analysis, which has high sensitivity and good reproducibility. Key words: DNA methylation, Epigenetic regulation, In situ analysis, Immunohistochemistry
1. Introduction The development of any organism is dependent on the composition of genetic material and regulation of its expression. Regulation of gene expression is achieved by many different molecular mechanisms at the transcriptional, posttranscriptional, translational, or posttranslational level. Transcriptional regulation includes cytosine methylation, histone modifications, and changes in chromatin structure. DNA methylation primarily operates in the form of a methyl group added to cytosine residues. In plants, this occurs at symmetrical sites, such as CG and CNG sequences, as well as at nonsymmetrical sites, abbreviated as CNN (1). Previous reports show that epigenetic modifications in the CG-rich regions lead to stable expression patterns, which are inherited for several generations (2, 3). On the contrary, epigenetic modifications in other regions are more flexible in nature and can be substantially changed during plant development and upon exposure to environmental stresses (2).
Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_5, © Springer Science + Business Media, LLC 2010
41
42
Kathiria and Kovalchuk
Several in vitro techniques are available for the analysis of DNA methylation, including bisulfite conversion-based PCRs, methylation-sensitive restriction fragment length polymorphism (RFLP) analysis, and Chromatin immunoprecipitation (ChIP) assays (4, 5, 6). These techniques require either chromatin or DNA isolation; hence, the in situ analysis of DNA methylation is not possible. Initial trials of in situ DNA methylation analysis in tissues were carried out successfully in animals (7). The in situ technique has been used in various studies of plants (8, 9). Here, we present an improved technique based on immunological detection, which allows for in situ analysis of DNA methylation in plants. Experiments using this technique have shown that the distribution of euchromatic and heterochromatin regions is similar to previous reports documenting the distribution of these regions in nuclei of animal tissue (10, 11). The in situ technique can be used for different purposes, including analysis of various tissue types and changes during different developmental stages (9). The technique has been used for tobacco and Arabidopsis and may also be used for other plant species. Experiments based on this technique were conducted on 5-week-old tobacco plants. However, types of plant species and their age may vary according to experiments. The initial steps of the technique include fixation and sectioning of the plants. Tissue sections are further treated to remove all RNA and proteins from tissues. Then the DNA is denatured for optimal recognition by antibodies. At this stage, an anti5 MeC antibody is used for immunolabeling. The chromophore-conjugated secondary antibody is used, and DNA is counterstained with DAPI. The analysis is carried out using confocal microscopy.
2. Materials 2.1. Slide Coating with APES
1. 3, Aminopropyltriethoxy Silane (APES). 2. 100% ethanol. 3. Acetone. 4. 60°C oven. 5. A slide holder.
2.2. Tissue Fixation and Cryosectioning
1. Fixative: 4% paraformaldehyde (PFA) solution in 1× Phosphate buffer saline (PBS) or 3:1 ethanol: acetic acid solution (see Note 1). To prepare a 4% PFA solution, heat 90 mL distilled water to 60°C. Add 4 g of PFA powder. Dissolve the powder by adding 1 N NaOH solution and adjust pH 11. After the powder has completely dissolved, add 1 N HCl to adjust pH to 7.5. Bring the solution to room temperature, and add
In Situ Analysis of DNA Methylation in Plants
43
10 mL of 10× PBS. CAUTION: The PFA solution should be prepared in a well-ventilated fume hood, as PFA is a hazardous chemical when inhaled. 2. 30% sucrose solution in distilled water. 3. Tissue-Tek® Optimal cutting temperature (OCT; Sakura Finetek, Netherland) solution for mounting cryopreserved specimens. 4. APES coated glass slides. 5. Dry ice. 6. Cryomicrotome. 2.3. Immunodetection of DNA Methylation
1. 1× PBS and PBST: To prepare 10× PBS, dissolve NaCl (80 g), KCl (2 g), Na2HPO4 (14.4 g) and KH2PO4 (2.4 g) in 90 mL water. Adjust pH to 7.5 and make the total volume of 1 L with water. To make PBST, add Tween 20 up to 0.05% in 1× PBS. 2. 2× SSC and 4× SSC: prepare 20× SSC: Dissolve NaCl (175.3 g) and Sodium citrate (88.2 g) in 900 mL water. Set pH to 7.0. Make up the volume of 1 L. Dilute accordingly in ddH2O to achieve 2× and 4× solutions. 3. 100, 80, 60, 40 and 20% ethanol. 4. 100 µg/mL RNAse A in 2× SSC solution. 5. 100 µg/mL Protease K solution or pepsin in 100 mM HCl solution. 6. 50% Formamide in 4× SSC solution. CAUTION: Formamide is toxic. Please do all manipulations in a fume hood. 7. Blocking buffer: 5% BSA in 1× PBS solution (see Note 2). 8. Primary antibody solution: An anti5¢Methyl Cytosine antibody diluted in blocking buffer at 1:200 dilutions. 9. Secondary antibody solution: diluted in blocking buffer at 1:500 dilution (see Note 3). 10. Antifade solution: To prepare an antifade solution, dissolve 50 mg of p-Phenylenediamine (Sigma-Aldrich) in 5 mL of 1× PBS solution. Set pH to 9.0. Add 45 mL of glycerol. Mix well, aliquot in 1 mL tubes and store at −80°C. 11. Counter stain: 1 µg/mL DAPI solution in water. 12. A hot plate. 13. A thermometer. 14. Beakers. 15. A confocal microscope. 16. Slide holders or glass/plastic Coplin jars.
44
Kathiria and Kovalchuk
3. Methods 3.1. Slide Coating with APES
Tissue retention on slides is one of the major problems in tissue section analysis. It can be enhanced by various techniques, among them APES coating of slides is one of the commonly used methods. 1. Add 2 mL of APES to 100 mL of acetone to prepare 2% solution in a beaker (see Note 4). 2. Add 2% APES solution to a glass slide holder. Arrange the slides in the slide holder in such a way that the entire glass surface of the slide is exposed to the solution. Care must be taken that the slides do not touch each other. Incubate the slides for 2 min. During incubation, APES will react with the glass surface. 3. Carefully take out the slides using forceps and rinse well in a beaker containing 100% ethanol to remove all the unreacted 2% APES. 4. Air-dry the slides in a dust-free ventilated area such as a clean fume hood. After air drying, incubate the slides at 60°C for at least 3 h. The ‘baking’ process creates additional cross-linking between glass and APES molecules. 5. Return the slides to the original box it came with and store at 4°C until further use. It can be stored up to 2–3 months.
3.2. Tissue Fixation and Cryosectioning
1. Harvest plant tissues from healthy plants and prepare for fixation. If mesophyll tissue is to be analyzed, dissect leaves into 1 cm × 1 cm pieces, as larger pieces are harder to fix. Submerge the tissues in the fixative solution, and vacuum-infiltrate for 20–30 min. In case of ethanol: acetic acid fixative, store plants for 24 h or longer in the fixative without vacuum infiltration. 2. After fixation, rinse the tissue once with 1× PBS solution to remove excess 4% PFA. Submerge the tissue in a 30% sucrose solution with vacuum infiltration for 10 min (avoid the vacuum step for delicate tissues). The 30% sucrose solution acts as a cryoprotectant and reduces injuries of cells due to freezing. 3. Store the tissue at 4°C, until it sinks to the bottom. At this stage, replace the tissue with 1:1 30% sucrose: OCT solution. Incubate the tissue at 60°C for 2–3 h, and then overnight at room temperature. This allows ample time for the sucrose: OCT solution to infiltrate the tissue. 4. The tissue is ready for cryosectioning. For sectioning, hardened tissue is required. To make it, carefully place the tissue in a 100% OCT solution and then – on dry ice. At this stage, the OCT solution solidifies. Cryosectioning is carried out to obtain 10 µm thick sections of tissue using a cryomicrotome.
In Situ Analysis of DNA Methylation in Plants
45
5. Place the sections on APES coated slides and allow to air dry. The slides with sections can be stored at −80°C for long periods. 3.3. Immunodetection of DNA Methylation
1. Thaw the slides with tissue sections at room temperature. Bake the slides at 60°C for 20 min to induce additional crosslinks between tissue and slides. 2. Fix the tissue with 4% paraformaldehyde for 10 min. Wash the slides two times with 1× PBS for 5 min each in a Coplin jar or a slide holder. 3. The presence of RNA and proteins in the cells may hinder effective penetration of antibodies and antigen recognition. Hence, to enhance the antigen/antibody reaction, RNA and protein removal is required, which will also remove chromatin proteins and unwind DNA, thus allowing antibodies to have more access to DNA. Incubate the sections for 1 h at 37°C in 100 µg/mL RNAse A in the 2× SSC solution to remove RNA from the tissue. Subsequently, treat the sections with 100 µg/mL Protease K solution in 100 mL HCl solution for 30 min at room temperature (see Note 5). Then, wash two times with 1× PBS for 3 min each by changing the solution in the Coplin jar. 4. Dehydrate the tissue in progressively higher concentrations of 20, 40, 60, 80, and 100% ethanol solution (incubation time: 10 min). Take the slides out from the Coplin jar, and remove the excess of 100% ethanol by dripping the solution. Keep the sections horizontally on a bench top to air dry for 10 min. 5. At this stage, DNA is denatured for optimal recognition by the antibody. Submerge the dried slides in a 50% formamide in the 4× SSC solution that has been preheated to 80°C in a beaker. Take the beaker off from the hotplate and place at room temperature to allow it to cool down. Wash slides in two changes of 1× PBS for 5 min each time in the Coplin jar without agitation. 6. Block the sections using an appropriate blocking solution (see Note 2) for 1–2 h. The blocking step is required to eliminate nonspecific binding between antibody and other cellular components. 7. Apply 200 µL of primary antibody solution to each slide. Cover the slides with a piece of Parafilm to prevent evaporation and drying of the antibody solution during prolonged incubations. Incubate the slides from 5 h to overnight at 4°C without agitation (see Note 6). 8. To remove excess antibodies from the slides, wash the slides in three changes of PBST solution for 15 min each time in the Coplin jar without agitation.
46
Kathiria and Kovalchuk
9. Apply the secondary antibody solution to the slides in the volume of 200 µL per slide. Then, cover the slides with a piece of Parafilm and incubate at room temperature for 3 h. 10. To remove unreacted secondary antibodies, wash the slides with PBST solution three times (each wash – 15 min) in the Coplin jar. 11. Apply DAPI counterstain to the sections for 10 min and destain in 1× PBS for 10 min. 12. Mount the sections in antifade solution and apply coverslips to the slides. Store the slides in dark at 4°C. The antifade solution prevents photobleaching caused by strong light sources. 13. Observe the nuclei using confocal microscopy. For our studies we used Nikon ECLIPSE TE2000-U (Japan) microscope with EZ-C1 3.60 software. Check the DAPI expression using 408 nm laser (CVI Melles Griot, USA) and analyze the Alexa 546 expression using 543 nm laser (CVI Melles Griot, USA). Analyze the samples using 60× water immersion objective lens. Use the Multi channel pseudocolor mode in the software. Obtain the final output in the form of one red channel image, one blue channel image, and one superimposed image of both channels (Fig.1).
4. Notes 1. The PFA powder remains insoluble in water, until the pH is adjusted to 7.0 with 1 M NaOH. The solution should be prepared fresh for optimal tissue fixation. As an option, solution can be stored at −20°C for a month approximately. Never heat the solution above 60°C. 2. The composition of blocking buffer can vary. It is better to use the serum of the animal, in which the secondary antibody was raised as a blocking reagent. For example, if the secondary antibody is Goat antirabbit, use 5% goat serum instead of 10% BSA solution for blocking. 3. The dilution of the secondary antibody has to be determined experimentally. It may vary depending on tissue type used. A 1:500 dilution can be used as the initial reference point. 4. The APES solution has to be diluted just before use. Once prepared, it can be reused to coat a large number of slides and stored for 24 h at room temperature. 5. The Protease K solution makes the tissue more delicate to handle. Hence, overdigestion of soft tissues may lead to
In Situ Analysis of DNA Methylation in Plants
47
Fig. 1. In situ analysis of DNA methylation. (a) The plant nucleus showing DAPI-stained DNA in blue. (b) The same plant nucleus with 5-MC in red. (c) The superimposed image of DAPI and 5-MC. The euchromatic regions in the nucleus can be seen as more blue stained because of the relative scarceness of DNA methylation. The heterochromatic regions reveal the high-level expression of red, which indicates high-level of DNA methylation
excessive tissue damage. The time of incubation at 37°C has to be standardized according to tissue type used. 6. The time of incubation with the primary antibody depends on the type and thickness of tissue. An overnight incubation generally gives better results. To avoid drying of the solution during incubation, slides can be covered with a piece of paraplast.
48
Kathiria and Kovalchuk
References 1. Cao X, Jacobsen S (2002) Locus-specific control of asymmetric and CpNpG methylation by the DRM and CMT3methylansferase genes. Proc Natl Acad Sci U S A 99: 16491–16498 2. Mathieu O, Reinders J, Caikovski M, Smathajitt C, Paszkowski J (2007) Transgenerational stability of the Arabidopsis epigenome is coordinated by CG methylation. Cell 130:851–862 3. Widman N, Jacobsen S, Pellegrini M (2009) Determining the conservation of DNA methylation in Arabidopsis. Epigenetics 4(2):119–124 4. Valliant I, Paszkowski J (2007) Role of histone and DNA methylation in gene regulation. Curr Opin Plant Biol 10:528–533 5. Dahl C, Guldberg P (2003) DNA methylation analysis techniques. Biogerontology 4:233–250 6. Thu KL, Vucic EA, Kennett JY, Heryet C, Brown CJ, Lam WL et al (2009) Methylated
7. 8.
9.
10.
11.
DNA immunoprecipitation. J Vis Exp 23, doi: 10.3791/935 Mayer W, Niveleau A, Walter J, Fundele R, Haaf T (2000) Demethylation of the zygotic paternal genome. Nature 403:501–502 Naumann K, Fischer A, Hofmann I, Krauss V, Phalke S, Irmler K et al (2005) Pivotal role of AtSUVH2 in heterochromatic histone methylation and gene silencing in Arabidopsis. EMBO J 24:1418–1429 Oakeley E, Podesta A, Jost JP (1997) Developmental changes in DNA methylation of the two tobacco pollen nuclei during maturation. Proc Natl Acad Sci U S A 94:11721–11725 Manak JR, Wen H, Van T, Andrejka L, Lipsick JS (2007) Loss of Drosophila myb interrupts the progression of chromosome condensation. Nature Cell Biol 9:581–587 Zink D, Fischer AH, Nickerson JA (2004) Nuclear structure in cancer cells. Nat Rev Cancer 4:677–687
Chapter 6 Analysis of Mutation/Rearrangement Frequencies and Methylation Patterns at a Given DNA Locus Using Restriction Fragment Length Polymorphism Alex Boyko and Igor Kovalchuk Abstract Restriction fragment length polymorphism (RFLP) is a difference in DNA sequences of organisms belonging to the same species. RFLPs are typically detected as DNA fragments of different lengths after digestion with various restriction endonucleases. The comparison of RFLPs allows investigators to analyze the frequency of occurrence of mutations, such as point mutations, deletions, insertions, and gross chromosomal rearrangements, in the progeny of stressed plants. The assay involves restriction enzyme digestion of DNA followed by hybridization of digested DNA using a radioactively or enzymatically labeled probe. Since DNA can be digested with methylation sensitive enzymes, the assay can also be used to analyze a methylation pattern of a particular locus. Here, we describe RFLP analysis using methylationinsensitive and methylation-sensitive enzymes. Key words: Restriction fragment length polymorphism (RFLP), Genome stability, Mutation frequency, Locus-specific methylation pattern, Methylation sensitive enzymes
1. Introduction Restriction fragment length polymorphism (RFLP) is a difference in homologous DNA sequences that can be detected by the presence of fragments of different lengths after digestion of DNA with specific restriction endonucleases. These variations in restriction fragment lengths can then be detected by Southern blotting. All differences observed result from naturally occurring sequence variation between individual organisms. RFLP analysis is highly locus-specific and offers a possibility to detect polymorphisms in both alleles in a heterozygous organism. This allows one to use the RFLP method for revealing sequence variations specific for a single clone or individual organism. Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_6, © Springer Science + Business Media, LLC 2010
49
50
Boyko and Kovalchuk
RFLP analysis has a wide range of successful applications, from disease diagnostics in humans to plant breading (1–3). RFLP also finds its application in a number of ecological, evolutionary, taxonomical, phylogenic, and genetic studies, in which it is used for genotyping and genetic mapping, hereditary disease diagnostics, paternity tests, forensics, plant breading, and analysis of complex traits (1–8). High sensitivity of RFLP to changes in DNA sequences makes it an invaluable tool for analysis of locus-specific changes in the plant genome under the influence of stress. Profiling stress-treated plants and their progeny by RFLPs at the loci involved in stress response can help us understand genetic and epigenetic mechanisms of plant adaptive responses (9). RFLP analysis can be used in combination with other methods, such as a cytosine extension assay and COBRA or methylation-sensitive RFLP (see below), to reveal mechanisms behind stress-induced genetic and epigenetic variation and to describe its role in genome evolution and development of acquired stress tolerance (9). RFLP analysis is commonly used for detection of sequence differences between closely related organisms in a single population. The method is based on the detection of variation in DNA fragment lengths generated by restriction digestion of total genomic DNA. Polymorphisms are most frequently represented as single-nucleotide variations. Different spontaneous or stressinduced point mutations, insertions, deletions, and other types of sequence rearrangements lead to the creation of new sites or the elimination of already existing restriction sites. The frequency of such changes at a given locus can be measured using Southern blot hybridization with a probe that is specifically designed to recognize a polymorphic locus. Alternatively, the RFLP analysis protocol can be used for quantitative detection of locus-specific differences between organisms (methylation-sensitive RFLP) in methylation patterns. This is achieved by using methylation-sensitive restriction endonucleases during the first step of the protocol, restriction digestion (Table 6.1) (10, 11). Since cleavage of restriction sites is dependent on cytosine methylation, the comparison of a restriction pattern and DNA band intensity between samples allows the quantification of locusspecific differences in cytosine methylation. The choice of restriction endonuclease is the most critical factor affecting the efficiency of RFLP analysis. This derives from the fact that some restriction enzymes expose more sequence variants than others, depending on the frequency of their recognition sites in the analyzed genome. Another factor is a sequence context of a recognition site itself. Sequences containing CpG dinucleotides within the enzyme recognition sequence yield more sequence variants as compared to restriction sites lacking that sequence. This is due to high rates of C to T transitions at CpG sites, especially if cytosine is methylated (12). Based on various assumptions,
Analysis of Mutation/Rearrangement Frequencies and Methylation Patterns
51
Table 1 List of restriction endonucleases that are commonly used for RFLP analysis Restriction endonuclease
Restriction site
Sensitivity to 5-methylcitosine
AvaII
G/GWCC
Blocked by overlapping
BamHI
G/GATCC
Not sensitive
BglII
A/GATCT
Not sensitive
DraI
TTT/AAA
Not sensitive
EcoRI
G/AATTC
Blocked by overlapping
HindIII
A/AGCTT
Not sensitive
HpaII
C/CGG
Blocked
MspI
C/CGG
Blocked (at external cytosine)
PstI
CTGCA/G
Blocked (at external cytosine)
PvuII
CAG/CTG
Not sensitive
TaqI
T/CGA
Not sensitive
mathematical models were produced that determined how often each enzyme should recognize sequence variants (13). The major steps of RFLP analysis include: restriction digestion of genomic DNA with selected restriction endonucleases (Table 1), separation of resulted DNA fragments by gel electrophoresis, transfer of DNA to a membrane, probe hybridization and detection. The presence of polymorphic sites will result in a difference between restriction patterns of samples. Analysis of these patterns allows one to measure the frequency of rearrangements at a given locus (9). RFLP analysis is time-consuming, it requires 4–5 days to complete. The general schedule can be as follows (see Fig. 1): Day 1: Restriction digestion (set overnight). Day 2: Ethanol-precipitation of digested DNA (if needed, DNA can be stored at 4°C). Gel electrophoresis. Denaturation and neutralization of the gel. Capillary transfer of DNA to a membrane (set overnight). Day 3: UV crosslinking of DNA to the membrane (if needed, the membrane can be stored at 4°C). Membrane hybridization with a probe (set overnight). Day 4: Washing the membrane. Detection of probe-binding sites. Exposure of the membrane to the film (overnight exposure may be required). Day 5: Analysis of RFLP results. Overall, RFLP analysis provides the great sensitivity; it is highly locus-specific (but can be set to detect sequences with low
52
Boyko and Kovalchuk
HindIII
HindIII
Apm A
8 kb
HindIII
A HindIII
HindIII
HindIII
A
HindIII
2 kb
HindIII
HindIII
6 kb
HindIII
HindIII
probe
HindIII
Ad A
A
HindIII
5 kb
AA HindIII
HindIII
Ai
HindIII
2 kb
HindIII
HindIII
HindIII
2 kb
HindIII
6.5 kb
HindIII
HindIII
AApm
AAd
*
AAi
AAr 8 kb
*
6.5 kb 6 kb
*
5 kb
HindIII
2 kb
HindIII
Ar A
2 kb
HindIII HindIII
0.5 kb
HindIII
HindIII
HindIII
*
6 kb
0.5 kb
HindIII
Fig. 1. The general principle of RFLP analysis. RFLP analysis of DNA from several plants was conducted with a HindIII restriction endonuclease. The maps represent a pair of chromosomes on which RFLP is located (A gene). The wild type allele of the A gene yields two restriction fragments, 6 and 2 kb long. However, various internal and external stimuli may change the DNA sequence at the given gene locus, thereby introducing novel alleles of this gene to the plant population. Presence of new alleles may result in RFLP at this gene locus. The resulting RFLP can be detected using gel electrophoresis, Southern blotting and probe hybridization. The RFLP shown here demonstrates the presence of the novel alleles of gene A. These alleles arose from a point mutation event that eliminated the restriction site (Apm allele), from a deletion of the part of the A gene sequence (Ad allele), from an insertion in the A gene (Ai allele), and from an intrachromosomal recombination event that resulted in duplication of the A gene region containing the restriction site (Ar allele). The resulting RFLP can be detected using gel electrophoresis, Southern blotting and probe hybridization. Bands that appear due to the polymorphism at the analyzed A gene locus are marked with an asterisk
homology (~60%); it allows the detection of both alleles in a heterozygous sample and permits multiple hybridizations on the same membrane with different probes to study polymorphism at different loci. It can be easily modified for detection of locusspecific changes in methylation patterns by using methylationsensitive restriction endonucleases.
2. Materials 2.1. Restriction Enzyme Digestion and DNA Precipitation
1. Restriction enzyme with suitable 10× reaction buffer. 2. Nuclease-free water.
Analysis of Mutation/Rearrangement Frequencies and Methylation Patterns
53
3. 100 and 70% ethanol. 4. 5 M NaCl (autoclaved). 2.2. Gel Electrophoresis
1. Sterile double-distilled water. 2. Agarose, electrophoresis grade. 3. 1× TBE: 90 mM Tris, pH 8.0, 90 mM boric acid 2 mM EDTA. 4. 6× DNA gel loading buffer (Fermentas). 5. DNA ladder (Fermentas).
2.3. Transfer DNA to Membrane
1. Sterile double-distilled water. 2. Denaturation solution: 0.5 M NaOH, 1.5 M NaCl. 3. Neutralization solution: 0.5 M Tris–HCl, pH 7.5; 1.5 M NaCl. 4. 20× SSC solution: 300 mM tri-sodium citrate dihydrate, pH 7.0, 3 M NaCl. 5. 2× SSC solution: 30 mM tri-sodium citrate dihydrate, pH 7.0, 300 mM NaCl. 6. Positively charged nylon membrane (Roche). 7. 3 mm Whatman paper. 8. Paper towels. 9. Glass plates and trays to assemble a “sandwich.” 10. UV-crosslinker.
2.4. Probe Preparation and Hybridization
1. Sterile double-distilled water. 2. PCR DIG Probe Synthesis Kit (Roche). 3. DIG Easy Hyb Granules (Roche). 4. PCR-labeled probe. 5. Glass hybridization tubes (Fisher). 6. Hybridization oven with a rotator. 7. 20× SSC solution: 300 mM tri-sodium citrate dihydrate, pH 7.0, 3 M NaCl. 8. 10% SDS (filter-sterilized).
2.5. Detection Procedure
1. Sterile double-distilled water. 2. Washing buffer: 0.1 M maleic acid, pH 7.5, 0.15 M NaCl, 0.3% (v/v) Tween 20. 3. Blocking reagent (Roche) (see Note 1). 4. Anti-digoxigenin-AP, Fab fragments (anti-DIG-AP conjugate) (Roche) (see Note 2). 5. Detection buffer: 0.1 M Tris–HCl, pH 9.5, 0.1 M NaCl.
54
Boyko and Kovalchuk
6. CDP-Star™ solution (Roche). 7. Kodak MR and XAR films (Kodak). 2.6. RFLP Analysis
1. Image analysis software (Image J, National Institutes of Health, USA).
2.7. Membrane Stripping Procedure and Rehybridization
1. Sterile double-distilled water. 2. Striping solution: 0.2 M NaOH, 0.1% SDS (freshlyprepared). 3. 2× SSC: 30 mM tri-sodium citrate dihydrate, pH 7.0, 300 mM NaCl.
3. Methods 3.1. Restriction Enzyme Digestion and DNA Precipitation
1. Digest 5 µg of genomic DNA with a tenfold excess of restriction enzyme in a final volume of 100 µl according to the manufacturer’s protocol. Incubate the samples overnight (see Notes 3–5). 2. Precipitate digested DNA by adding 2 µl of 5 M NaCl and 2.5 volumes of ice-cold 100% ethanol. Incubate the samples on ice for 30 min (see Note 6). 3. Spin the samples at 16,000 × g for 10 min in a centrifuge at 4°C. Discard the solution. 4. Add 1 ml of 70% ethanol. 5. Spin the samples at 16,000 × g for 10 min in a centrifuge at 4°C. Discard the solution. 6. Add 500 µl of 100% ethanol. 7. Spin the samples at 16,000 × g for 10 min in a centrifuge at 4°C. Discard the solution. Air-dry the DNA pellet. 8. Resuspend the samples in 10 µl of nuclease-free water (the final DNA concentration is about 0.5 µg/µl). Keep the samples on ice. If necessary, the samples can be stored at 4°C for several days.
3.2. Gel Electrophoresis
1. Prepare 1% agarose gel using 1× TBE buffer (see Notes 7–9). 2. Mix the digested and ethanol-precipitated DNA with loading dye and load the samples in the gel. 3. Run the gel at low voltage to separate restriction fragments (see Note 10).
3.3. Transfer DNA to Membrane
1. Denaturate DNA in the gel by incubating the gel in a denaturation solution on an orbital shaker for 15 min (see Note 11). 2. Discard the solution and repeat the denaturation step.
Analysis of Mutation/Rearrangement Frequencies and Methylation Patterns
55
3. Rinse the gel with sterile double-distilled water for 5 min. 4. Neutralize the gel by incubating it in a neutralization solution on the orbital shaker for 15 min. 5. Discard the solution and repeat the neutralization step. 6. Equilibrate the gel for at least 10 min in the 20× SSC solution on the orbital shaker. 7. Set up a capillary blot transfer “sandwich.” To do this, place the gel with a DNA side facing up on the glass bridge resting in a reservoir of 20× SSC and covered with Whatman paper submersed in 20× SSC solution. Remove all air bubbles trapped between the gel and Whatman paper by rolling a sterile pipette over the gel. Place a dry and positively charged nylon membrane (Roche) on the DNA-containing surface of the gel. Remove all air bubbles trapped between the gel and the membrane in the same manner as mentioned earlier. Cover the membrane with two sheets of dry Whatman paper. Place a stuck of dry paper towels on the top of the “sandwich.” Complete assembling the “sandwich” by placing a glass plate on the top of the paper towel and add approximately 500 g of weighs (see Note 12). 8. Allow blot to transfer overnight (see Note 13). 9. On the next day, disassemble the “sandwich” and fix DNA to the membrane by UV crosslinking. To do this, place the wet membrane with the DNA side facing up on Whatman paper soaked in the 2× SSC solution. Expose the membrane to UV light according to a protocol suggested by the UV-crosslinker manufacturer. 10. Rinse the membrane for 2–3 min in sterile double-distilled water. 11. Allow the membrane to air dry (see Note 14). 3.4. Probe Preparation and Hybridization
1. Prepare a digoxigenin (DIG)-labeled DNA probe using conventional PCR and reagents supplied by the PCR DIG Probe Synthesis Kit (Roche). The prepared probe can be stored at −20°C until needed (see Note 15). 2. Determine the hybridization temperature for the probe using guidelines provided in the DIG User’s Manual (Roche). 3. Prewarm 10 ml of DIG Easy Hyb solution for every 100 cm2 of the membrane at the selected hybridization temperature. 4. Place the membrane in a glass hybridization tube (Fisher) with the DNA side facing inside. Remove air-bubbles. 5. Immediately add the prewarmed DIG Easy Hyb solution to the hybridization tube and incubate the membrane on a rotator in the hybridization oven at the selected hybridization temperature for 30 min (see Notes 16–17).
56
Boyko and Kovalchuk
6. Prewarm5 ml of DIG Easy Hyb (ROCHE) solution for every 100 cm2 of the membrane at the selected hybridization temperature. 7. Take 2 µl of the PCR-labeled probe for each 1 ml of hybridization solution used. Add this amount of probe to 50 µl of sterile double-distilled water and denaturate DNA in boiling water for 5 min. Chill the probe on ice for 5 min immediately. 8. Immediately add the denaturated probe to the prewarmed hybridization solution to correct the hybridization temperature. Mix by inversion. 9. Discard the prehybrydization solution and immediately add the hybridization solution containing the denaturated probe. 10. Incubate the membrane on a rotator in the hybridization oven at the selected hybridization temperature overnight (6–16 h). Make sure that the entire membrane surface is covered with the hybridization solution during incubation (see Note 17). 11. When hybridization is completed, place the membrane in a tray and immediately add low stringency buffer (2× SSC, 0.1% SDS) (see Note 17). 12. Incubate the tray at room temperature with shaking for 5 min. 13. Discard the solution and repeat wash with low stringency buffer (2× SSC, 0.1% SDS). 14. Discard the solution and immediately add high stringency buffer (0.2× SSC, 0.1% SDS) preheated to 65°C (see Note 18). 15. Incubate the membrane at 65°C with shaking for 15 min (see Note 18). 16. Discard the solution and repeat wash with preheated high stringency buffer (0.2× SSC, 0.1% SDS). 3.5. Detection Procedure
1. Discard the high stringency buffer and rinse the membrane in washing buffer for 5 min (see Note 17). 2. Discard the washing buffer. Immediately add the 1× blocking solution and incubate the membrane with shaking for 30 min (see Notes 19–20). 3. Discard the blocking solution and immediately add the antibody solution (the 1× blocking solution containing 1:20,000 diluted anti-DIG-AP conjugate (Roche). Incubate the membrane with shaking for 30 min (see Notes 19–20). 4. Wash the membrane twice for 15 min in the washing buffer. 5. Equilibrate the membrane in the detection buffer for 5 min.
Analysis of Mutation/Rearrangement Frequencies and Methylation Patterns
57
6. Dilute CDP-Star™(Roche) 1:100 in the detection buffer. Usually, you will need about 1 ml of diluted CDPStar™solution for each 100 cm2 of the membrane. 7. Place the membrane in a Ziplock plastic bag and cover its surface with the diluted CDP-Star™ solution. Do not let the membrane dry. Seal the bag and incubate the membrane with shaking at room temperature for 5 min. Use the bag that is larger than the membrane by at least 1 cm from each side. Plastic bags allow decreasing substantially the amount of CDP-Star™ solution used and preventing the membrane from drying. 8. Discard the CDP-Star™ solution. Seal the wet membrane in a new bag. 9. Expose the membrane to the film (see Note 21). 10. After exposure, the membrane can be stored at 4°C for future rehybridization with a different probe. 3.6. RFLP Analysis
The film obtained can be scanned to produce a digital image (Figs. 2 and 3). The frequency of rearrangements at a given locus can be calculated according to the formula fr = nr/(p × N), where fr is the frequency of rearrangements, nr is the number of rearranged loci, p is the number of plants screened, and N is the total number of loci that carry homology to the probe (9). If needed, the image analysis software (ex. Image J, National Institutes of Health, USA) can be used to measure band intensity (see Note 22).
EcoRI
*
Hind III EcoRI Hind III EcoRI Hind III EcoRI Hind III
Fig. 2. An example of RFLP analysis of N-gene-like resistance gvenes using the HindIII or EcoRI restriction nuclease. RFLP analysis of N-gene-like loci was performed using total genomic DNA extracted from Nicotiana tabacum cv. SR1, digested with EcoRI or HindIII restriction endonucleases, and hybridized with the fourth exon of the N-gene. The hybridization was carried out at 35°C overnight. The asterisk shows an example of polymorphism (right above the asterisk)
58
Boyko and Kovalchuk
1 2 3 4 5
Group#1
Group#2
Fig. 3. An example of RFLP analysis of N-gene-like resistance genes using the HpaII methylation-sensitive restriction nuclease. Methylation-sensitive RFLP analysis of N-gene-like loci was performed using total genomic DNA extracted from Nicotiana tabacum cv. SR1, digested with methylation-sensitive HpaII restriction endonucleases, and hybridized with the fourth exon of the N-gene. The hybridization was carried out at 35°C overnight. Arrows 2 and 5 show a nearly equal level of methylation, whereas arrows 1, 3 and 4 show a different level of methylation. “Group#1” has higher methylation levels as shown by higher intensity of the heavier fragment (arrow 1) and the nearly complete absence of fragments 3 and 4
3.7. Membrane Stripping Procedure and Rehybrydization
1. Remove the membrane from the bag. Rinse the membrane for 5 min in sterile double-distilled water. 2. Remove the bound probe by washing the membrane twice in the striping solution at 37°C for 15 min. 3. Rinse thoroughly in the 2× SSC solution for 5 min. 4. Follow through the usual prehybridization, hybridization and detection procedure using a new probe.
4. Notes 1. This solution is light sensitive; prepare a 10× stock of blocking solution in maleic acid buffer, and store it at 4°C; dilute 10× stock to 1× just before using. 2. Centrifuge the original vial for 5 min at 16,000 x g before each use and pipette the required amount of antibody solution from the surface. 3. Precise quantification of genomic DNA is essential for the assay. Samples can be quantified using a spectrophotometer; however, equal sample loading must be confirmed using gel electrophoresis, and it should be adjusted accordingly if necessary.
Analysis of Mutation/Rearrangement Frequencies and Methylation Patterns
59
4. It is important to insure the purity of DNA preparation. We suggest using ethanol-precipitated DNA for the assay. This prevents interference with restriction digestion of chemicals used during DNA extraction (SDS, EDTA, proteinase K, phenol etc). 5. Quality of DNA preparation is another critical factor for RFLP analysis. Check the quality of DNA samples in a gel before using them for restriction digestion. Degraded and fragmented DNA is not suitable for analysis, as resulting restriction fragments may not be of a predicted size, and new artificial fragments may appear. DNA degradation can be usually prevented by careful handling of samples during purification. 6. Ethanol precipitation of digested DNA before gel electrophoresis serves two main purposes: it helps reduce the sample volume before loading, and it removes residual restriction buffer salts that may interfere with gel electrophoresis. 7. In many cases, a wide range of restriction fragment sizes is expected; therefore, a 1% agarose gel is quite suitable for analysis. It is not recommended to use lower percentage agarose gels, as they are difficult to handle and easy to break. If fragments of less than 500 bp have to be “resolved,” the percentage of agarose in a gel can be increased to 2–2.5%. 8. We recommend preparing long gels, as they allow nice separation of restriction fragments of various sizes. Using 20 cm long 1% agarose gels, we were able to separate more than 30 restriction fragments in each lane with high resolution (Fig. 2). 9. Avoid using ethidium bromide in a gel, as it may cause uneven background during long runs. If needed, the gel can be stained with ethidium bromide solution (0.5 µg/ml) later, when the run is completed. It is recommended to rinse the gel with sterile double-distilled water before denaturating DNA to remove ethidium bromide. 10. Though high voltage allows faster DNA migration, it also significantly decreases fragment resolution. Hence, low voltage is strongly recommended. We usually run our gels at 50 V for 7–8 h. 11. If the DNA target size exceeds 5 kb, it may be necessary to incubate the gel in the 0.25 M HCl solution for 10–20 min to depurinate DNA before the denaturation step. Depurination denatures large DNA fragments without changing their location in the gel and increases the efficiency of subsequent DNA transfer to the membrane. 12. Using positively charged membranes significantly improves transfer efficiency. The application of nylon membranes has several advantages as compared to using nitrocellulose. They
60
Boyko and Kovalchuk
have better durability and enhance an opportunity for multiple rehybridizations. Handle the membrane as little as possible with gloved hands or use blunt-ended forceps. The prints left on the membrane surface may be visible after detection and interfere with the main signal. 13. It is important to have enough 20× SSC solution and dry paper towels to insure efficient DNA transfer. It is recommended to stain the gel with ethidium bromide after transfer is completed to check the transfer efficiency. CAUTION: ethidium bromide is a toxic substance, use gloves. 14. At this stage, a DNA blot can be stored, if needed. To do this, place the membrane between two sheets of Whatman paper and seal it in a plastic bag. The sealed DNA blot can be kept at 4°C for 1–2 weeks. 15. DIG-labeled probes provide high sensitivity and eliminate inconveniences related to the use of radioactivity. The quality of DNA labeling can be checked using an agarose gel containing ethidium bromide. PCR products that were produced using DIG-11-dUTP run slower and appear less stained than unlabeled control PCR products (DIG User’s Manual, Roche). The probes are usually designed as nucleotide runs that are complementary to the target sequence(s) that contain restriction sites. Thus, the probes are able to hybridize with one or more restriction fragments produced by endonuclease cleavage and reveal DNA sequence polymorphism at a given locus. For example, to analyze the stress-induced sequence polymorphism at the N-gene-like loci, we used the probe that had homology to the LRR region of the N-gene. This LRR region DNA sequence is frequently found in many other plant R-genes. The probe sequence was chosen such that they carried homology to multiple N-gene-like loci. Overall, 30 N-gene-like loci were targeted by the probe, allowing polymorphism analysis. In fact, increasing the number of gene loci targeted by the probe may be seen as a good strategy that allows reducing the total number of individuals in plant population that need to be screened. 16. If detection yields high background, then prehybridization time should be increased. 17. It is very important to prevent the membrane from drying during prehybridization, hybridization, and detection procedures. 18. By lowering salt concentration in high stringency buffer and increasing wash temperature, it is possible to reduce probe hybridization of sequences with low homology to the probe. The appropriate salt concentration and wash temperature
Analysis of Mutation/Rearrangement Frequencies and Methylation Patterns
61
depend on the purpose of the experiment and must be determined experimentally. 19. The amount of blocking solution and antibody solution should be sufficient to cover the membrane completely. Exposing of the membrane to air will cause drying and will drastically affect the quality of detection procedures. 20. Time of incubation in the blocking and antibody solution should be adjusted, if high background is present. Similarly, a final dilution of an anti-DIG-AP conjugate can be changed to increase sensitivity or decrease background. 21. Exposure time depends on the intensity of a signal. If the signal is low, then films with high sensitivity like Kodak XAR films (Kodak) are recommended. Alternatively, if the signal is high, then Kodak MR films (Kodak) can be used to decrease the background and increase the resolution. 22. If RFLP analysis was performed using a methylation-sensitive restriction endonuclease, then the percentage of methylated cytosine residues at a given recognition site can be compared between samples. The samples which display a better signal (more bands or higher band intensity) in the part of the gel that corresponds to low molecular weight DNA have a lower number of methylated cytosine residues present at enzyme recognition sites. On the contrary, the presence of bands corresponding to high molecular weight DNA indicates that restriction digestion was prevented by extensive cytosine methylation at restriction sites. A difference in the intensity of selected bands can be quantified using an image processing software such as Image J (National Institutes of Health, USA). Quantification results can be used to report a difference in the methylation pattern at a given locus (Fig. 3).
References 1. Pethe V, Lagu M, Chitnis PK, Gupta V, Ranjekar PK (1989) Restriction fragment length polymorphism: a recent approach in plant breeding. Indian J Biochem Biophys 26:285–288 2. Barnes SR (1991) RFLP analysis of complex traits in crop plants. Symp Soc Exp Biol 45:219–228 3. Todd R, Donoff RB, Kim Y, Wong DT (2001) From the chromosome to DNA: Restriction fragment length polymorphism analysis and its clinical application. J Oral Maxillofac Surg 59:660–667 4. Gusella JF, Wexler NS, Conneally PM, Naylor SL, Anderson MA, Tanzi RE et al (1983) A polymorphic DNA marker genetically linked
5. 6. 7.
8.
to Huntington’s disease. Nature 306: 234–238 Kochert G (1991) Restriction fragment length polymorphism in plants and its implications. Subcell Biochem 17:167–190 Nagamura Y, Antonio BA, Sasaki T (1997) Rice molecular genetic map using RFLPs and its applications. Plant Mol Biol 35:79–87 Wu YY, Csako G (2006) Rapid and/or highthroughput genotyping for human red blood cell, platelet and leukocyte antigens, and forensic applications. Clin Chim Acta 363:165–176 Agarwal M, Shrivastava N, Padh H (2008) Advances in molecular marker techniques and their applications in plant sciences. Plant Cell Rep 27:617–631
62
Boyko and Kovalchuk
9. Boyko A, Kathiria P, Zemp FJ, Yao Y, Pogribny I, Kovalchuk I (2007) Transgenerational changes in the genome stability and methylation in pathogen-infected plants: (virusinduced plant genome instability). Nucleic Acids Res 35:1714–1725 10. Nelson M, Raschke E, McClelland M (1993) Effect of site-specific methylation on restriction endonucleases and DNA modification methyltransferases. Nucleic Acids Res 21:3139–3154
11. McClelland M, Nelson M, Raschke E (1994) Effect of site-specific modification on restriction endonucleases and DNA modification methyltransferases. Nucleic Acids Res 22:3640–3659 12. Gonzalgo ML, Jones PA (1997) Mutagenic and epigenetic effects of DNA methylation. Mutat Res 386:107–118 13. Wijsman EM (1984) Optimizing selection of restriction enzymes in the search for DNA variants. Nucleic Acids Res 12:9209–9226
Chapter 7 Isoschizomers and Amplified Fragment Length Polymorphism for the Detection of Specific Cytosine Methylation Changes Leonor Ruiz-García, Jose Antonio Cabezas, Nuria de María, and María-Teresa Cervera Abstract Different molecular techniques have been developed to study either the global level of methylated cytosines or methylation at specific gene sequences. One of them is a modification of the Amplified Fragment Length Polymorphism (AFLP) technique that has been used to study methylation of anonymous CCGG sequences in different fungi, plant and animal species. The main variation of this technique is based on the use of isoschizomers with different methylation sensitivity (such as HpaII and MspI) as a frequent cutter restriction enzyme. For each sample, AFLP analysis is performed using both EcoRI/HpaII and EcoRI/MspI digested samples. Comparative analysis between EcoRI/HpaII and EcoRI/MspI fragment patterns allows the identification of two types of polymorphisms: (1) “Methylation-insensitive polymorphisms” that show common EcoRI/HpaII and EcoRI/MspI patterns but are detected as polymorphic amplified fragments among samples; and (2) “Methylation-sensitive polymorphisms” that are associated with amplified fragments differing in their presence or absence or in their intensity between EcoRI/HpaII and EcoRI/MspI patterns. This chapter describes a detailed protocol of this technique and discusses modifications that can be applied to adjust the technology to different species of interest. Key words: AFLP-based technique, Isoschizomers, Cytosine methylation, Anonymous CCGG sites, Methylation pattern
1. Introduction Nuclear plant DNA is highly methylated, containing 5-methylcytosine. Methylation of cytosine residues occurs predominantly in symmetrical CG and CNG sequences (where N is any nucleotide) and provides a mechanism of gene control. Different techniques have been developed to study variations in DNA Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_7, © Springer Science + Business Media, LLC 2010
63
64
Ruiz-García et al.
methylation of nuclear genomes. Some of these techniques are based on the use of restriction enzyme isoschizomers that recognize the same restriction site but display differential sensitivity to cytosine methylation. Tetracutter restriction enzymes, such as HpaII and MspI, are frequently used isoschizomers detecting anonymous CCGG sites, which flanking sequences are unknown and in which cytosines are differentially methylated. Both restriction enzymes recognize the sequence 5¢-CCGG. However, HpaII is inactive if one or both cytosines are methylated (both strands methylated), whereas MspI cleaves 5¢CmCGG but not 5mCCGG sequences. Amplified Fragment Length Polymorphism (AFLP) is a polymerase chain reaction (PCR)-based technique that allows a fast analysis of a large number of marker fragments for any organism without prior knowledge of its genomic sequence. It is based on the selective amplification of anonymous DNA fragments obtained after digestion of total DNA with two restriction enzymes (hexacutter and tetracutter enzymes) and ligation of oligonucleotide adapters (1). The AFLP technique can be adapted to study cytosine methylation by using one of the restriction enzyme isoschizomers instead of a frequent cutter enzyme. Therefore, two simultaneous analyses will be carried out using EcoRI and either HpaII or MspI to digest each sample. After ligation of EcoRI and HpaII/MspI adapters, DNA fragments are subjected to two successive PCR amplification steps, pre-amplification and selective amplification. In both amplifications, EcoRI and HpaII/MspI primers, which consist of the adapter, the restriction site and several selective nucleotides, are used to amplify EcoRI/HpaII or EcoRI/MspI DNA fragments. The use of two PCR steps ensures the optimal reduction of DNA fragment complexity to end up with an adequate number to be visualised and scored after separation on denaturing polyacrylamide gels (Fig. 1) or by capillary electrophoresis (see Note 1). Different labels may be used to detect AFLP fragments. In most cases, only one of the primers (EcoRI primer) is 5¢ labelled and used in the selective amplification (the second PCR reaction). In this chapter, we describe a detailed protocol that can be applied to any DNA fragment detection system (see Note 1). The final step of the analysis is scoring of DNA fragment profiles. Comparative analysis between EcoRI/HpaII and Fig. 1. Schematic representation of the methylation AFLP technique. (1) Restriction of the genomic DNA with EcoRI/HpaII or EcoRI/MspI. (2) Ligation of EcoRI and HpaII/MspI double-stranded adapters to the ends of the restriction fragments. (3) Amplification of a subset of restriction fragments using two primers complementary to the adapters, with one selective nucleotide (pre-amplification) and +2/+3 selective nucleotides (selective amplification) at their 3¢ends (N). N stands for A, T, C or G residues. The underlined selective nucleotide is common in pre-amplification and selective amplifications. The arrows indicate the direction of DNA polymerisation; * labelled primers. (4) Gel electrophoresis of the amplified restriction fragments and visualisation and scoring of the DNA fingerprint
Isoschizomers and Amplified Fragment Length Polymorphism for the Detection
1 +
EcoRI /HpaII or MspI
Genomic DNA
HpaII Not cleaves
5’mCCGG
5’CmCG
GGCCm5’
MspI Cleaves
5’mCCGG
G G GCmC5’
5’CmCGG
GGCC5’
methylation of both strands
Cleaves
G GCC5’’
hemimethylated DNA
5’CmCG
5’mCCGG
G GCC5’
methylation of the internal C
Restriction fragment
EcoRI cleavage
5’CmCGG
G G GCmC5’
Not cleaves
5’mCCGG
GGCCm5’ GGCC methylation of the external C
MspI-HpaII cleavage C GGC
AATTC G
2 EcoRI Adapter
+
Restriction fragment
CTCGTAGACTGCGTACC AATTC CTGACGCATGGTTAA G
+
MspI-HpaII Adapter
C CGATCGAGACTCAT GGC TAGCTCTGAGTAGCAG
Template DNA for AFLP amplification CCGATCGAGACTCAT GGCTAGCTCTGAGTAGCAG
CTCGTAGACTGCGTACCAATTC CTGACGCATGGTTAAG
3 Pre-amplification EcoRI+N /HpaII/MspI+N MspI-HpaII primer+1 NGGCTAGCTCTGAGTA-5´ CCGATCGAGACTCAT
CTCGTAGACTGCGTACCAATTC CTGACGCATGGTTAAG 5´-GACTGCGTACCAATTCN
GGCTAGCTCTGAGTAGCAG
EcoRI primer+1 Selective amplification *EcoRI+NN /HpaII/MspI+NNN EcoRI primer+2
MspI-HpaII primer+3
*5´-GACTGCGTACCAATTCNN
5´-ATGAGTCTCGATCGGNNN
4
Fragment visualisation
65
66
Ruiz-García et al.
EcoRI/MspI AFLP fragment patterns reveals genetic variability associated with “Methylation-insensitive polymorphisms” and “Methylation-sensitive polymorphisms”. Methylation-insensitive polymorphisms are associated with genetic variability and will show common EcoRI/HpaII and EcoRI/MspI patterns among samples. Methylation-sensitive polymorphisms are associated with epigenetic variability and detected as amplified fragments differing in their presence or absence, or in their intensity between the EcoRI/HpaII and EcoRI/MspI patterns of the same sample (Fig. 2). Thus, methylation of the internal cytosine would lead to the appearance of amplified fragments in EcoRI/MspI but not in EcoRI/HpaII profiles. Indeed, hemimethylation of the CCGG site, in which the external cytosine is methylated only in one strand, would lead to the appearance of fragments in EcoRI/HpaII but not in the EcoRI/MspI profile. The AFLP technique was initially modified by Reyna-López et al. (2) to analyse fungal DNA methylation and later adapted to study genome methylation in different plant species (3, 4). Currently, AFLP is broadly used to analyse DNA methylation in different plant genomes. This technique can be used to analyse cytosine methylation of plant species with genome sizes varying by more than 80-fold (see Note 2).
Fig. 2. Detail of the AFLP fingerprint pattern of ten Arabidopsis ecotypes (5). The DNA fingerprint was generated with the primer combination EcoRI+AT/(HpaII-MspI)+ACT. The arrows labelled MeI correspond to methylation-insensitive polymorphisms, while MeS1 and MeS2 correspond to methylation-sensitive fragments found with both isoschizomers or with only one of them, respectively
Isoschizomers and Amplified Fragment Length Polymorphism for the Detection
67
2. Materials 2.1. Equipment and Supplies
1. 1.5 mL Eppendorfs. 2. PCR tubes or plates. 3. A thermocycler (PCR machine). 4. The agarose gel electrophoresis system. 5. Automated sequencers (plate or capillary) (see Note 1) or, 6. The sequencing gel electrophoretic system (i.e. Sequi-Gen GT Sequencing System, BioRad) and gel dryer. 7. High voltage power supply (e.g. BioRad PowerPac 3000). 8. X-ray films, a phosphoimaging device, if using manual systems (see Note 1).
2.2. Buffers and Reagents
1. 10× HpaII restriction buffer: 100 mM Bis Tris Propane-HCl, 100 mM MgCl2, 10 mM DTT, pH 7.0 (Buffer 1 New England Biolabs). 2. Digestion and ligation buffer (10× RL Buffer): 100 mM TrisHAc, 100 mM MgAc, 500 mM KAc, 50 mM DTT, 500 ng/µL BSA, pH 7.5. 3. 10× PCR buffer: 100 mM Tris–HCl, pH 8.3, 25 mM MgCl2, 500 mM KCl. 4. Formamide buffer: 98% formamide (deionized and filtered), 10 mM EDTA, pH8.0, 0.025% of xylene cyanol. 5. Restriction enzymes: EcoRI, HpaII and MspI (New England Biolabs). 6. A double-stranded EcoRI-adapter (5 pmol/µL) (see Note 3). It is made of two primers: 5¢-CTCGTAGACTGCGTACC and 5¢-AATTGGTACGCAGTC. 7. A double-stranded HpaII/MspI-adapter (50 pmol/µL) (5) (see Note 3). It consists of the combination of primers 5¢-GACGATGAGTCTCGAT and 5¢-CGATCGAGACTCAT. 8. ATP 10 mM (Boehringer) (see Note 4). 9. ATP polynucleotide ligase (USB) (see Note 5). 10. EcoRI primer +1 (50 ng/µL): 5¢-GACTGCGTACCAATTCN 11. EcoRI primer +3 (12 ng/µL): 5¢-GACTGCGTACCAA TTCNNN 12. HpaII/MspI primer +1 (50 ng/µL): 5¢-GATGAGTCTCGA TCGGN 13. HpaII/MspI primer +3 (50 ng/µL): 5¢-GATGAGTCTCG ATCGGNNN 14. dNTPs 10 mM (a mix of dATP, dTTP, dCTP and dGTP).
68
Ruiz-García et al.
15. Taq DNA polymerase (5 U/µL). 16. Agarose gels: 0.8% agarose, 1× TBE, 0.5 µg/mL ethidium bromide (CAUTION: ethidium bromide is a mutagenic reagent; nitrile gloves and laboratory coat should be worn when it is being handled). 17. Denaturing polyacrylamide gels are made of 8% Long Ranger polyacrylamide gel solution (Cambrex Bio Science Rockland), 7.0 M urea and 0.65× TBE. A total volume of 24 mL gel solution is used to prepare a 25 cm × 25 cm plate gel with 0.25 mm thick spacers (see Note 6). CAUTION: Polyacrylamide solution is carcinogenic, mutagenic, teratogenic and neurotoxic, and the use of nitrile gloves is required. 18. TBE 10× (pH 8): 1 M Tris base, 1 M Boric acid, 0.5 M EDTA (pH 8.0) (see Note 7). 19. N,N,N¢,N¢-Tetramethylethylenediamine 24 mL of gel, add 15 mL TEMED.
(TEMED):
for
20. Ammonium persulphate (APS; 100 mg/mL): for 24 mL of gel, add 150 mL APS (see Note 8). 21. A DNA ladder (see Note 9).
3. Methods Just as in other AFLP-based technologies, the protocol consists of four major steps: Digestion of the genomic DNA and ligation of adapters; pre-amplification of digested–ligated fragments; selective amplification of pre-amplified fragments; and fragment detection and scoring. 3.1. Digestion–Ligation
This step involves two digestions of genomic DNA with two different restriction enzymes and ligation of double-stranded AFLP adapters to the sticky ends generated (Fig. 1). The adapter and restriction site sequences will serve as primer binding sites in the subsequent amplification steps. In this step, complete DNA digestion is crucial to prevent later amplification of uncut fragments. Complete digestion is achieved by the use of high-quality DNA and an excess of restriction enzyme.
3.1.1. Digestion
Two different AFLP analyses have to be performed using EcoRI and either HpaII or MspI to digest each sample. EcoRI/HpaII DNA digestion can not be performed simultaneously, since each restriction enzyme has a different restriction buffer requirement. Thus, 250–500 ng genomic DNA (see Note 2) is incubated in a final volume of 25 µL, with 6 U HpaII and Buffer 1 (according to New England Biolabs recommendation) for 2 h at 37°C.
Isoschizomers and Amplified Fragment Length Polymorphism for the Detection
69
After digestion, DNA is precipitated by adding 0.1 volume of sodium acetate (NaOAc 3 M, pH 5.2) and 2.5 volumes of ethanol and incubated at −20°C for 1 h. After precipitation, the pellet is dried at room temperature for 3 min and resuspended in 24 µL dH2O. The resuspended DNA is digested with EcoRI in 35 µL 1x RL Buffer and 10 U EcoRI for 2 h at 37°C. For EcoRI/MspI DNA digestion, both restriction enzymes can be used together. The reaction is carried out in a final volume of 35 µL with 1× RL Buffer, 10 U EcoRI, 6 U MspI and 250–500 ng of genomic DNA for 3 h at 37°C. 3.1.2. Ligation
Two different adapters, one for the EcoRI sticky ends and one for the HpaII/MspI sticky ends, are ligated to DNA fragments after digestions by adding to each final digestion 5 µL of a mix containing 5 pmol EcoRI adapter, 50 pmol HpaII/MspI adapter, 8 mM ATP, 10× RL Buffer and 1.2 U T4 DNA ligase (see Note 5). The ligation is incubated for 3 h at 37°C and then overnight at 4°C (see Note 10).
3.1.3. Digested–Ligated DNA Fragments (DL-DNA Fragments) are Diluted Fivefold with Sterile dH2O and Stored at −20°C
DNA digestion generates thousands of fragments. The complexity of this fragment population is reduced by two successive PCR reactions using primers with an increased number of selective nucleotides at their 3¢ end in order to accurately visualise a single subset of DL-DNA fragments at the end of each analysis. The use of one, two or three selective nucleotides at the 3¢ end of one of the primers (i.e. EcoRI+1, EcoRI+2 or EcoRI+3, respectively) reduces the number of amplified fragments by factors of 4, 16 and 64, respectively. The use of a level of selection +2/+2 (i.e. EcoRI+AC/HpaII+CG) will decrease the number of amplified fragments to 1/256. The first PCR reaction is named pre-amplification. It is performed using a single selective nucleotide at the 3¢end of both EcoRI and HpaII/MspI primers. The second PCR, or selective amplification, is carried out using more than one selective nucleotide at the 3¢end of both EcoRI and HpaII/MspI primers. The number of selective nucleotides depends on the genome size (see Note 2).
3.2. Pre-amplification
Pre-amplification consists of a PCR reaction using primers which are complementary to the EcoRI and HpaII/MspI adapters with an additional selective 3¢ nucleotide (e.g. EcoRI +A and HpaII/MspI +C), thus selecting 1/16 of DL-DNA fragments. The PCR reactions are performed in a 20 µL volume of 1× PCR buffer, 0.2 mM of each dNTP, 30 ng of each primer EcoRI +1 and HpaII/MspI +1, 0.4 U Taq DNA polymerase and 3 µL of diluted DL-DNA fragments (see Note 11). PCR amplifications are carried out in a Perkin Elmer 9700 thermocycler using 16–28 cycles (see Note 2), each cycle consisting of 30 s at 94°C, 1 min at 60°C, and 1 min at 72°C.
70
Ruiz-García et al.
Fig. 3. Evaluation of pre-amplifications on 0.8% agarose gels. In order to even concentrations, 200 µL dH2O was added to samples 4–6, 8–11, and 13–14; 160 µL to samples 2, 3, 7, and 12; 100 µL to sample 1
In order to verify the efficiency of pre-amplification, 2 µL of final products are electrophoresed on a 0.8% agarose gel and separated in a short run (10–15 min) to compare intensities among amplified samples visually (Fig. 3). If longer runs are performed, smears will be too faint, thus hampering accurate comparisons. Pre-amplified DNA fragments (PR DNA fragments) are diluted at least fivefold up to tenfold with dH2O to approximately even concentrations, depending on the intensity of smears visualised in the agarose gels. Diluted pre-amplification can be stored at −20°C for more than 1 year. 3.3. Selective Amplification
Selective amplification consists of a PCR reaction using primers which are complementary to the EcoRI and HpaII/MspI adapters with two (or three) and three selective nucleotides at their 3¢ ends, respectively, thus selecting 1/64 or 1/256 (if EcoRI primers with three selective nucleotides are used) of diluted pre-amplified fragments (see Note 2). It is important to point out that the selective nucleotides used in the pre-amplification have to be maintained in the selective amplification (Fig. 1). For the selective amplification, only EcoRI primers are fluorescence-labelled or radioactive-labelled, depending on the detection method used (see Notes 1 and 12). The selective PCR reaction is performed in a 10 µL volume of 1× PCR Buffer, 0.1 mM of each dNTP, 6 ng IR800-EcoRI primer (see Note 11), 15 ng HpaII/MspI primer, 0.2 U Taq DNA polymerase, and 2.5 µL of diluted PR DNA fragments. The PCR is carried out using classical AFLP cycling parameters (1): 1 cycle of 30 s at 94°C, 30 s at 65°C, 1 min at 72°C followed by 12 cycles in which the annealing temperature decreases 0.7°C per cycle, followed by 23 cycles of 1 min at 94°C, 30 s at 56°C, and 1 min at 72°C (see Note 13).
3.4. Fragment Detection and Score
The final step of the AFLP technique is separation and visualisation of amplified fragments followed by data interpretation. AFLP-PCR products can be separated and scored using a variety of systems (see Note 1). Polyacrylamide gel electrophoresis
Isoschizomers and Amplified Fragment Length Polymorphism for the Detection
71
(conventional or automated sequencers) and capillary electrophoresis provide maximum resolution of AFLP banding patterns. At the end of the selective PCR, samples are denatured by adding an equal volume of formamide-buffer, heating for 2 min at 94°C, and then quickly cooling on ice. Before loading samples, polyacrylamide gels have to be pre-run for 15 min to warm up the gel using the same settings as for the run. These settings depend on the size and thickness of the gel and the electrophoresis system used. With a Li-Cor 4300 DNA Analysis System, these settings are: 1,500 V, 35 W, 35 mA, and 45°C. After pre-running the gel, remove the urea precipitate or the pieces of gel with a syringe before loading. A total of 0.8–2 µL of each sample is loaded on Li-Cor or conventional polyacrylamide gels, respectively. Fragments are scored visually or using different AFLP scoring softwares (3) as 1 when the fragment is present or 0 when it is absent. A progressive fragment appearance or disappearance can also be illustrated in a table indicating the number and percentage of methylation-sensitive fragments showing a specific pattern. Although AFLP has not been initially developed as a quantitative technique, methylation-sensitive fragments showing different intensity are usually observed (Fig. 2). 3.5. Data Interpretation
The analysis allows the identification of methylation-insensitive polymorphisms, amplified fragments that show similar digestibility in HpaII and MspI assays but differ in their presence or absence among different samples (MeI in Fig. 2). The previous characterisation of fragments detected by this technique (5) revealed that most of the visualised fragments (all of them of small size, ranging from 100 to 700 nt) appear to be generated by the lack of cytosine methylation. Thus, the presence of a fragment is associated with the existence of a non-methylated CCGG restriction site, while its absence could be due to the variation of its nucleotide sequence. Methylation-sensitive fragments found when comparing EcoRI/ HpaII and EcoRI/MspI patterns are associated with differences in the methylation state of the CCGG restriction sites (MeS in Fig. 2). In order to correctly interpret the AFLP profiles, it is important to point out that this technique does not allow us to distinguish non-methylated CCGG sequences from fully methylated (mCmCGG) sequences.
4. Notes 1. Different detection systems can be selected to visualise AFLP fragments that avoid the use of isotopes or silver staining. These detection systems are based on: (a) the use of labelled
72
Ruiz-García et al.
or non-labelled primers (such as the silver staining method). When using labelled primers, different chemistry can be used to label EcoRI primers, including a fluorescent dye (such as IRD 700 or IRD 800 from LI-COR; FAM, HEX, ROX, TAMRA and TET from Applied Biosystems; Cy from Amersham Biosciences; Yakima Yellow from Epoch Biosciences) to visualise amplified products using different automatic fragment analysers or radioactive isotopes; (b) the separation support, such as the gel or capillary instrument system. It is important to point out that if radioactive-AFLP is carried out, 33P-labelled primers provide better resolution of amplified products than 32P-labelled primers. After completion of electrophoresis, radioactive gels can be directly dried without fixation and exposed to X-ray film for 24–72 h at room temperature. 2. Several parameters have been adjusted to use AFLP to analyse different plant species with genome sizes ranging from 0.50 to 40 pg/2C (6): (a) The amount of DNA, ranging from 250 ng for small genomes such as Arabidopsis to 500 ng for large genomes such as conifer (170-fold Arabidopsis); (b) the number of cycles used in pre-amplification ranging from 16 for small genomes to 28 for large genomes; and (c) the number of selective nucleotides used in both PCR steps. The protocol for selective amplification ranges from EcoRI+2/(HpaII/MspI)+3 (genome sizes smaller than 0.60 pg/2C), 2 EcoRI+3/(HpaII/MspI)+3 (genome sizes between 0.60 and 1.00 pg/2C), EcoRI+3/(HpaII/MspI)+3 (genome sizes over 1.00 pg/2C), always preceded by an EcoRI+1/(HpaII/MspI)+1 pre-amplification. For large genomes (i.e. a conifer genome containing from 20 to 38 pg/2C), if an EcoRI+3/(HpaII/MspI)+3 primer combination yields a complex pattern, we should use an additional selective nucleotide which brings an EcoRI+3/ (HpaII/MspI)+4 primer combination; in this case, selective amplification has to be preceded by EcoRI+1/(HpaII/ MspI)+2 pre-amplification. 3. Double-stranded EcoRI and HpaII/MspI adapters are made of 17 and 15, and 14 and 16 base pair primers, respectively. The first time that adapter primer pairs are mixed, they should be heated at 65°C for 5 min to denature in order to anneal the two strands of each adapter stock. Then, allow them to cool slowly to re-nature completely. Adapters can be stored at −20°C. As non-phosphorylated adapters are used, a single strand of each adapter is ligated to DNA. The recessed 3¢ ends of the template are filled-in by the Taq polymerase in the presence of dNTPs during the first cycle. 4. 10 mM ATP aliquots must be prepared and stored at −20°C. Do not re-freeze the rest of the aliquot that has not been used.
Isoschizomers and Amplified Fragment Length Polymorphism for the Detection
73
5. Highly concentrated ATP polynucleotide ligase (>6 U/µL) has to be used to ensure the addition of small volumes to the ligation mix. 6. The AFLP reaction products are analysed on 4.5% denaturing polyacrylamide gels or 6–8% Long Ranger gels. The detection of radiolabelled products is performed using conventional gel electrophoresis systems and 4.5% denaturing polyacrylamide gels (acrylamide/bisacrylamide: 19:1) containing 7.5 M urea and 1× TBE. If a LI-COR automated DNA sequencer is used, 6–8% Long Ranger gels containing 7.0 M urea and 0.8 to 0.65× TBE are prepared. Once the urea is dissolved, the solution is filtrated and maintained at 4°C in dark. Gels should be casted at least 2 h before use to ensure sufficient time for gel polymerisation and may be stored for 24 h at 4°C. 7. 10× TBE: Dissolve 108 g Tris base, 55 g Boric acid, 40 mL EDTA (pH 8.0) in 700 mL distilled water, stir to dissolve, and finally add distilled water to bring up the total volume up to 1 L. If only dry ingredients are used, boric acid should be added last after EDTA is dissolved. 8. To prepare a 100 mg/mL APS solution, it is important to be sure that APS powder is dry. APS solutions are not stable at room temperature and should be stored at 4°C or at −20°C, for a maximum of one week. 9. Different commercial DNA ladders may be used for AFLP analysis: IRD-labelled Li-Cor ladders (Li-Cor), ABI size standards (Applied Biosystems), labelled SEQUAMARK™ 10 bp standard (Research Genetics), labelled 30–330 bp DNA Ladder (Life Technologies), and labelled 100-bp Ladder (Gibco Life Technologies). Home-made DNA ladders made of a combination of labelled DNA fragments of known sizes may also be used. 10. Adapter design avoids reconstruction of restriction sites. Thus, the presence of restriction enzymes in the ligation step results in almost complete adapter/fragment ligation, since primer concatamers that may be generated by ligation are restricted. 11. AFLP reaction mixes should be prepared for a minimum of ten different DNA samples to minimise discrepancies due to inaccurate pipetting of small volumes. 12. The mobility of the two DNA fragment strands is slightly different. Since only one of the two primers is labelled, comparison of AFLP profiles should be carried out using the same primer labelled. 13. The start of the PCR at a very high annealing temperature allows optimal primer selectivity. By decreasing gradually the annealing temperature, we could increase the efficiency of primer binding.
74
Ruiz-García et al.
References 1. Vos P, Hogers R, Bleeker M, Reijans M, van de Lee T, Hornes M et al (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res 23:4407–4414 2. Reyna-López GE, Simpson J, Ruiz-Herrera J (1997) Differences in DNA methylation patterns are detectable during the dimorphic transition of fungi by amplification of restriction polymorphisms. Mol Gen Genet 253:703–710 3. Meudt HM, Clarke AC (2007) Almost forgotten or latest practice? AFLP applications, analyses and advances. Trends Plant Sci 12:106–117
4. Weising K, Nybom H, Wolff K, Kahl G (2005) DNA fingerprinting in plants: principles, methods and applications, 2nd edn. CRC Press, London, pp 66–68 5. Cervera MT, Ruiz-García L, Martínez-Zapater JM (2002) Analysis of DNA methylationsensitive AFLP markers. Mol Genet Genomics 268:543–552 6. Cervera MT, Remington D, Frigerio JM, Storme V, Ivens B, Boerjan W et al (2000) Improved AFLP analysis of tree species. Can J For Res 30:1608–1616
Chapter 8 Analysis of Small RNA Populations Using Hybridization to DNA Tiling Arrays Martine Boccara, Alexis Sarazin, Bernard Billoud, Agnes Bulski, Louise Chapell, David Baulcombe, and Vincent Colot Abstract Small RNA (sRNA) populations extracted from Arabidopsis plants submitted or not to biotic stress, were reverse-transcribed into cDNAs, and these were subsequently hybridized after labelling to a custommade DNA tiling array covering Arabidopsis chromosome 4. We first designed a control experiment with eight cDNA clones corresponding to sequences located on chromosome 4 and obtained robust and specific hybridization signals. Furthermore, hybridization signals along chromosome 4 were in good agreement with sRNA abundance as previously determined by Massive Parallel Sequence Signature (MPSS) in the case of untreated plants, but differed substantially after stress treatment. These results demonstrate the utility of hybridization to DNA tiling arrays to detect major changes in small RNA populations. Key words: Small RNA, cDNA libraries, cy-dye indirect labelling, Hypersensitive response, Microarray, Harpin
1. Introduction There are two predominant classes of small RNAs produced in plants (1, 2). The vast majority of sequenced sRNAs are 24 nucleotides (nt) siRNAs (short interfering RNAs) that correspond to transposable elements and other repeated sequences. These siRNAs are presumed to direct DNA methylation and histones over repeated sequences of the genome. The other most abundant classes of small RNAs correspond to microRNAs (miRNA). MiRNAs are 21 nt long; they are involved in several developmental processes, and in some cases, accumulate in response to various biotic and abiotic stresses (3–6). We are interested in the study of sRNA populations during the plant hypersensitive response (HR), Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_8, © Springer Science + Business Media, LLC 2010
75
76
Boccara et al.
a form of programmed cell death that occurs at the site of infection, when plants are challenged by pathogens (7). We used the harpin protein from Erwinia amylovora, an elicitor of HR in several plant species (8), to infiltrate Arabidopsis thaliana leaves. Small RNAs were extracted from these leaves as well as control leaves infiltrated with buffer to produce two cDNA libraries (9–11). The procedure of cloning sRNAs implies that these sRNAs are 5¢P and not degradation products from conventional ribonucleases, which release 5¢OH. The sRNAs were ligated sequentially to 5¢ and 3¢ RNA/DNA chimeric oligonucleotide adapters with T4 RNA ligase and reverse transcribed. Here, we describe a method to label such cDNA libraries and to hybridize them to a custom-made DNA tiling array covering Arabidopsis chromosome 4 (12, 13). PCR amplification and purification are first required to obtain cDNAs. The labelling reaction can be divided into two steps: the first step involves the incorporation of amino-allyl modified deoxynucleotide (AA-dUTP) into PCR amplified cDNAs of sRNAs; the second step is the chemical coupling of amine reactive Cy-Dye. Although this procedure is longer than direct labelling and is a more labour-intensive protocol, Cy3 or Cy5 are incorporated more evenly, and more Cy-Dye is incorporated into DNA. Results are presented that demonstrate the validity of our method to characterise small RNA populations and to identify major differences in sRNA abundance between populations.
2. Materials 2.1. RNA Extraction and sRNA Isolation (see Note 1)
1. Trizol (Invitrogen) or Tri-reagent (Sigma-Aldrich) corresponding to 4 M Guanidinium isothiocyanate and acidic phenol (pH 4.3). CAUTION: phenol is toxic and corrosive. 2. Chloroform. CAUTION: chloroform is toxic and a suspected carcinogen. 3. Isopropanol. 4. 75% Ethanol. 5. 15% denaturing Polyacrylamide/Urea gel mix: 21 g urea (7 M), 2.5 mL 10× TBE (0.5×), 18.75 mL 40% 19:1 acrylamide: bis-acrylamide (15%) make up to 50 mL with MQH2O (see Note 2). CAUTION: acrylamide monomer is a neurotoxin and a potential carcinogen. Wear gloves while handling acrylamide and clean any spillage thoroughly. 6. 10% ammonium persulfate (freshly prepared). 7. TEMED. 8. Ethidium bromide (10 mg/mL stock), diluted 10,000× in 1× TBE buffer for gel staining. CAUTION: ethidium bromide
Analysis of Small RNA Populations Using Hybridization to DNA Tiling Arrays
77
is an intercalating agent, a mutagen and thought to be carcinogenic. Handle with care, wearing nitrile gloves. 9. Formamide mix: 0.05% Bromophenol blue, 0.05% xylene cyanol in formamide. 10. 10×TBE: 890 mM Tris, 890 mM boric acid, 20 mM EDTA. 11. Oligonucleotide markers: 20 and 30 nucleotides (nt) in size. 12. 0.3 M NaCl. 13. Phenol–chloroform (buffer not added to keep acid the pH) (Sigma). 14. Absolute Ethanol. 2.2. Adapter Ligation and Reverse Transcription
1. Chimeric DNA/RNA oligonucleotide adapters: 5¢ adapter: ACGGAATTCCTCACTaaa and 3¢ adapter: uuuCTATCCATGGACTGTidT (idT:inverted deoxythymidine), (lower case are RNA). The 3¢ adapter is 5¢ phosphorylated. 2. 50% Dimethyl sulfoxide (DMSO). 3. 10× PAN ligation buffer: 0.5 M Tris–HCl pH 7.6, 0.1 M MgCl2, 0.1 M ß-mercaptoethanol, 2 mM ATP, 1 mg/mL acetylated BSA. 4. Acetylated BSA (Sigma). 5. T4 RNA ligase (Roche). 6. Primers for reverse transcriptase and first PCR: Forward primer: 5¢ CAG CCA ACG GAA TTC CTC ACT AAA 3¢; Reverse primer: 5¢ CGA ACA TGT ACA GTC CAT GGA TAG 3¢. 7. 100 mM dNTP Set (Promega). 8. RT mix: 20 µL 0.1 M DTT, 40 µL 5× first strand buffer (both supplied with SuperScript II Reverse Transcriptase), 56 µL 2 mM dNTPs. 9. SuperScript II Reverse Transcriptase (200 U/µL) (Invitrogen). 10. Alkali mix: 150 mM KOH, 20 mM Tris-base.
2.3. Amplification, Labelling, and Hybridization of cDNAs to Microarray
1. Taq polymerase (New England Biolabs). 2. 100 mM dNTP Set (Promega). 3. 20 bp low ladder 40 µg (Sigma). 4. 15% native Polyacrylamide gel: 2.5 mL 10× TBE (0.5×), 18.75 mL, 40% 19:1 acrylamide:bis-acrylamide, 28.4 mL MQH2O. 5. TE buffer pH 7.5. 6. Primers for labelling: sRNArev TGTACAGTCCATGGATA and sRNA for ACGGAATTCCTCACTAA. 7. (3-aminoallyl)-2¢deoxyuridine-5¢-triphosphate (AA-dUTP) (Sigma). For a final concentration of 20 mM add 95.5 µL of
78
Boccara et al.
TE pH 7.5 to a stock vial containing 1 mg of aa-dUTP. Gently vortex to mix and store at −20°C. 8. Labelling Mix (25×): dNTP (minus dTTP) with aa-dUTP: 2 µL dATP (final concentration, 10 mM), 2 µL dCTP (final concentration, 10 mM), 2 µL dGTP (final concentration, 10 mM) and 10 µL aa-dUTP (final concentration, 10 mM) make up 20 µL with RNase-free H2O, store at −20°C. 9. QIAquick Nucleotide Removal Kit (Qiagen). 10. Sodium bi-Carbonate Buffer (Na2HCO3): 0.05 M, pH 9.0. 11. Cy-dye esters (Amersham-GE) (see Note 3). 12. 0.3 M sodium acetate pH 5.2. 13. Acrylamide (2.5 µg/µL). 14. Yeast RNA (10 mg/ mL in RNAse free H2O) (Invitrogen). 15. 20× SSC (Sigma). 16. 10 or 20% SDS solution. 17. Formamide (Sigma). 18. Bovine Serum Albumin (BSA) 10% (Filter solution before using it and store at −20°C). 19. Pre-hybridization solution: 1× SSC, 0.1% SDS, 1% BSA. 20. 2× Hybridization buffer: 50% Formamide, 10× SSC, 0.2% SDS. 21. 22 × 60 mm Lifterslips (electron microscopy sciences). 22. Corning® hybridization chambers (Sigma).
3. Methods 3.1. Small RNA Isolation, Adapter Ligation, and Reverse Transcription
1. Grind the tissue under liquid nitrogen using a pestle and mortar. Add 1 mL Trizol (per 50–100 µg tissues) and grind into slurry. Pipette into a 2 mL microfuge tube and incubate at room temperature for 3 min.
3.1.1. RNA Extraction
2. Add 0.2 mL of chloroform and shake vigorously by hand for 15 s. Leave at room temperature for 2–3 min. 3. Centrifuge at 10,000 × g at 4°C for 15 min. Transfer the aqueous phase to a 1.5 mL microfuge tube. Add 0.5 mL of isopropanol and incubate 10 min at room temperature. 4. Centrifuge at 16,000 × g at 4°C for 20 min. Remove the supernatant and wash the pellet with 1 mL of 75% ethanol (vortexing) and centrifuge at 16,000 × g at 4°C for 5 min. 5. Remove the supernatant and air dry the pellet. 6. Resuspend the pellet in 20 µL of RNase-free H 2O, make up a 1/100 dilution and quantify by a spectrophotometer (see Note 4).
Analysis of Small RNA Populations Using Hybridization to DNA Tiling Arrays 3.1.2. sRNA Isolation
79
1. Add to 50 mL (15 × 17 cm, 1.5 mm thick gel) denaturing 15% Polyacrylamide/Urea mix, 350 µL of 10% ammonium persulfate and 17.5 µL TEMED, pour immediately, and let the gel set for 1 h. 2. Mix 200 µg total RNA with an equal volume of formamide mix, denature 30 s at 90° place on ice. 3. Load the samples (see Note 5), run the denaturing 15% Polyacrylamide/Urea gel in 0.5× TBE at 25 V/cm and stop when xylene cyanol has migrated to the middle of the gel. 4. Stain the gel in ethidium bromide (0.5 µg/mL), excise a gel slice encompassing 20–30 nt (using the oligonucleotide markers) under UV (360 nm) and determine the weight of the slice. 5. Cut the gel slice in small fragments, elute into 0.3 M NaCl (2–3 volumes v/w) at 4°C overnight with agitation, extract once with phenol: chloroform and precipitate the aqueous phase with 3 volumes of absolute ethanol at 20°C for at least 2 h. 6. Collect the pellet of sRNAs after centrifugation (16,000 × g 20 min 4°C) and after drying, resuspend it in 20 µL RNAse free H2O.
3.1.3. Adapter Ligation
1. Prepare a reaction mixture for ligation of 5¢ adapter by combining the following components: 20 µL of gel-eluted sRNAs, 3 µL of 100 µM 5¢ adapter, 15 µL 50% DMSO, 5 µL 10× PAN ligation buffer for a final volume of 48 µl. 2. Denature 30 s at 90° and place on ice. 3. Add 2 µL of T4 RNA ligase (40 U/µL) and incubate at 37°C for 1 h. 4. Add an equal volume of formamide mix, denature 30 s at 90°, place on ice and load on a 15% Polyacrylamide/Urea gel. 5. Run the denaturing 15% Polyacrylamide/Urea gel in 0.5× TBE at 25 V/cm and stop when xylene cyanol has migrated to the middle of the gel. 6. Excise a gel slice encompassing 39–43 nt (just above and including the xylene cyanol loading dye and above the 30 nt marker). 7. Elute into 0.3 M NaCl at 4°C overnight with agitation, extract once with phenol:chloroform and precipitate the aqueous phase with 3 volumes of ethanol at −20°C for at least 2 h (see Note 6). 8. The pellet of sRNAs is collected after centrifugation (16,000 × g 20 min 4°C) and drying and resuspended in 19 µL RNAse free H2O. 9. Prepare a reaction mixture for ligation of 3¢ adapter by combining the following components: 19 µL sRNAs ligated
80
Boccara et al.
to 5¢adapter, 3.8 µL 100 µM 3¢ adapter, 12 µL 50% DMSO, 4 µL 10× PAN ligation buffer. Mix all the reagents, denature 30 s at 90° and place on ice. 10. Add 1.2 µL of T4 RNA ligase (40 U/µL) and incubate at 37°C for 1 h. 11. Add an equal volume of formamide mix, denature 30 s at 90°, place on ice and load on a 15% Polyacrylamide / Urea gel. 12. Run the denaturing 15% Polyacrylamide/Urea gel in 0.5× TBE at 25 V/cm and stop when xylene cyanol has migrated to the middle of the gel. 13. Stain the gel in ethidium bromide, excise under UV (360 nm) a gel slice encompassing 58–62 nt (just above – but not including – the xylene cyanol loading dye). 14. Elute into 0.3 M NaCl at 4°C overnight with agitation, extract once with phenol: chloroform and precipitate the aqueous phase with 3 volumes of ethanol and 2 µL of 100 µM Reverse primer at −20°C for at least 2 h (see Note 7). 15. The pellet collected after centrifugation (16,000 × g 20 min 4°C) and after drying is resuspended in 11.1 µL RNase free H2O. 3.1.4. Reverse Transcription
1. Denature sRNAs ligated to 5¢ and 3¢ adapters 30 s at 90°C and place on ice. 2. Add 17.4 µL of RT mix and incubate at 42°C for 3 min. 3. Add 1.5 µL Superscript RT II (200 U/µl) and incubate at 42°C for 30 min. 4. Hydrolyse the RNAs by adding 80 µL of alkali mix, incubate at 90°C for 10 min and place on ice. 5. Neutralise the solution by adding 80 µL of 150 mM HCl and check the pH value with pH paper (should be around 8–9). Store the cDNAs at −20°C.
3.2. Amplification, Labelling, and Hybridization of cDNAs to a Tiling Microarray 3.2.1. First PCR Amplification of cDNA
1. 10 µL of cDNAs were amplified with 10 µL of 2 mM dNTP, 10 µL of 10× PCR buffer (provided with the Taq polymerase), 1 µL of 100 µM Reverse primer and 1 µL of Forward primer, 2 µL of Taq polymerase (5 U/µL) for a final volume of 100 µL. 2. The programme of cycling is 45 s at 94°C, 1 min 25 at 50°C and 1 min at 72°C for 25 cycles. 3. After amplification, the PCR products are run on a native 15% Polyacrylamide gel alongside the 10 µL of 20 bp ladder at 2 V/cm for 3 h.
Analysis of Small RNA Populations Using Hybridization to DNA Tiling Arrays
81
4. Stain the gel in ethidium bromide and excise a gel slice encompassing 70–80 bp (see Note 8). 5. Elute into 0.3 M NaCl at 4°C overnight and purify by phenol/chloroform extraction. 6. Precipitate the aqueous phase in 3 volumes of ethanol at −20°C for at least 2 h. 7. The pellet collected after centrifugation (16,000 × g 20 min 4°C) and drying is resuspended in 50 µL TE pH 7.5. 3.2.2. Amplification with AA-dUTP
1. For the second PCR, 1 µL of previously amplified DNAs is used in a reaction containing 1 µL labelling Mix with aa-dUTP (25×), 2.5 µL 10× PCR buffer, 0.75 µL 100 µM sRNArev and 0.75 µL 100 µM sRNAfor primers, 0.2 µL Taq polymerase (5 U/µL), in a final volume of 25 µL. 2. The programme of cycling after initial denaturation at 94°C for 3 min is: 30 s at 94°C, 30 s at 55°C and 30 s at 72°C for 30 cycles. 3. The PCR products are purified with QIAquick Nucleotide Removal Kit to remove unincorporated nucleotides and primers, according to the supplier’s instructions. The samples can be kept at −20°C.
3.2.3. Coupling with cy5-Dye (see Note 9)
1. The amplified DNAs are dried in a SpeedVac® and are resuspended in 10 µL of sodium bi-carbonate buffer 0.05 M (pH 9) at room temperature for 30 min (see Note 10). 2. Cy5-ester is provided as a dried product in 5 tubes (Cy5 Mono-Reactive Dye Pack (Amersham-GE)). Resuspend a tube of dye ester in 8 µL of DMSO, distribute 1.5 µL in the microfuge tubes and dry in a SpeedVac®. The tubes are stored at 4°C in the dark. 3. 10 µL of sodium bi-carbonate buffer is transferred to the tubes containing the dried dye; after pipetting and brief centrifugation, the tubes are incubated at room temperature for 30 min in the dark. 4. The excess dye is eliminated by purification with QIAquick Nucleotide Removal Kit, according to the supplier’s instructions. The recovered volume (after two times elution) is 60 µL in TE. 5. For each sample, measure absorbance at 260 nm and 650 nm (corresponding to the maximum absorbance of Cy5-dye). 6. For each sample: calculate the total µg of DNA using: µg of DNA = (OD260 × 50 ng/mL × volume (mL)/1,000) (1 OD260 = 50 ng/mL for DNA). Calculate the total picomoles of dye incorporation using: pmol Cy5 = OD650 × volume (mL)/0.25. Calculate the frequency of incorporation = pmol
82
Boccara et al.
Cy-dye incorporated × 324.5/ng DNA (324.5 average molar mass of dNTP) (see Note 11). 7. 30 pmoles of labelled DNA in 100 µL TE pH = 7.5 are precipitated with 10 µL 0.3 M sodium acetate pH 5.2, 4 µL acrylamide (2.5 µg/µL) (see Note 2), 2 µL of yeast RNA (10 mg/mL) and 3 volumes of ethanol. Keep 2 h at −20°C, then centrifuge and resuspend the pellet in 35 µL RNase-free H2O. 3.2.4. Pre-hybridization and Hybridization to the Arabidopsis thaliana Chromosome 4 Tilling Array (see Note 12)
1. Prepare 50 mL of pre-hybridization solution. 2. Pre-hybridize the array on the slide at 42°C for a minimum of 45 min. 3. Rinse in MilliQ water for 2 min and 1 min in isopropanol and centrifuge for 1 min at 800 g to dry the array. Keep the slide out of light and use within 2 h. 4. Place the slide in a Corning® hybridization chamber with a 22 × 60 mm Lifterslip covering the array area. 5. 30 picomoles of labelled DNAs in 35 µL of RNase-free H2O are heated at 95°C for 1 min and immediately mixed with 35 µL of 2× hybridization buffer pre-heated to 42°C and applied to the slide. 6. Hybridize overnight at 42°C (water bath) in the Corning® hybridization chamber.
3.2.5. Washing the Slides (see Note 13)
1. First wash: 2× SSC, 0.1% SDS at 42°C, the Lifterslip is removed during this step by a gentle hand agitation. 2. Second wash: 4 min in fresh pre-heated buffer (2× SSC, 0.1% SDS) with agitation. 3. Third wash: 1× SSC at room temperature, for 4 min with agitation. 4. Fourth wash: 0.2× SSC at room temperature, for 4 min with agitation. 5. Fifth wash: 0.05× SSC at room temperature, for 4 min with agitation. 6. Spin 2 min at 800 g to dry the array (see Note 14). 7. Scan with the same PMT for Red (635 nm) and Green (532 nm) (around 600–650 V).
3.2.6. Data Treatment
1. Amplification, labelling and hybridization were done in triplicate on the same cDNA preparation. 2. Hybridized probes were ranked according to the intensity of a hybridization signal (1 = the highest signal), and the mean ranking was plotted as a function of the standard deviation computed from the three experiments.
Analysis of Small RNA Populations Using Hybridization to DNA Tiling Arrays
3.3. Hybridization to Genomic Tiling Arrays: Validation Experiments 3.3.1. Hybridization with Known Sequences
3.3.2. Hybridization with sRNA Populations from Stressed and Unstressed Leaves
83
The PCR products from the first amplification were ligated to pGEM®-T Easy Vector (pGEM®-T Easy Vector Systems, Promega Cat#A1360). Plasmids were prepared from clones and used for sequencing. From this sequencing, we extracted eight cDNA clones corresponding to the sequences located on chromosome 4. They were labelled and hybridized to the microarray in three independent experiments. The tiles containing the exact sequence of cDNAs were expected to rank highest and to show the lowest standard deviation. Indeed, we observed a clear-cut separation between two populations of tiles, with those expected to hybridize exhibiting the highest mean ranking and lowest standard deviation (Fig. 1) (see Note 15). In a second step, the chromosome 4 tiling array was hybridized with labelled cDNAs derived from sRNAs that were extracted from buffer or harpin-infiltrated leaves. The experiment was repeated three times, and the same statistical procedure was applied as before to select tiles giving robust hybridization signals. After elimination of the overlapping tiles and the tiles not located on chromosome 4, a set of 155 tiles was selected in this manner for buffer, while 164 tiles were obtained from the harpin-treated sample. The hybridized tiles from buffer-treated leaves were located
Fig. 1. Pilot hybridization to the chromosome 4 tilling-array. The sRNAs used for this experiment are indicated, together with the ~1 kb DNA tiles they should hybridize to. Mean values of ranking and standard deviation are indicated in parentheses. Bold characters: the tiles with the highest hybridization rank and lowest standard deviation. Black squares: the tiles expected to hybridize, white squares: the tiles not expected to hybridize according to an approximate matching approach, using the eight cDNA sequences fused to the 5¢ and 3¢ adaptors as queries. Matches were considered whenever they covered 23 nucleotides or more, with less than 2 mismatches in any window of 12 consecutive nucleotides. (Reproduced from ref. (17) with permission from Elsevier Science)
84
Boccara et al.
Fig. 2. Distribution of tiles hybridized to sRNAs from buffer and harpin-infiltrated leaves, and a comparison with MPSS data. Values for hybridized tiles (closed or opened round symbols) and MPSS expression levels (diamond symbols) were computed in nonoverlapping windows of 1 Mb along the Arabidopsis chromosome 4 sequence. Values are normalised to the total nucleotides number in each set i.e., 100% = 155 tiles from harpin-treated samples (continuous line), 100% = 164 tiles from buffer-treated samples (stripped line) and 100% = the sum of MPSS expression levels (the dashed line)
mainly in the pericentromeric regions within the 3–5 Mb interval (Fig. 2). Significantly, the number of hybridized tiles in each region is in good agreement with the accumulation level of sRNAs in the same region as determined by MPSS (massive parallel sequencing) (14, 15) (Fig. 2). In contrast, the distribution of hybridized tiles from the harpin-treated sample was uniform along chromosome 4 (Fig. 2), suggesting major changes in the accumulation of small RNAs during stress. 3.3.3. Conclusions
Hybridization of labelled cDNAs derived from sRNAs to a DNA tiling microarray can lead to robust and meaningful hybridization signals. This method can be considered cheap (provided a tiling array is available) and can be useful to evaluate rapidly major differences between small RNAs accumulated in different conditions. The use of genomic oligonucleotide tiling arrays (16) should be very valuable to improve these analyses.
4. Notes 1. All solutions should be RNase-free. RNA can be stored at −20°C or below to minimise hydrolysis. 2. Acrylamide/bis-acrylamide 40% stock solution (19:1 ratio) (Sigma). Store at 4°C.
Analysis of Small RNA Populations Using Hybridization to DNA Tiling Arrays
85
3. Wrap all reaction tubes with foil and keep covered as much as possible in order to prevent photobleaching of the dyes. Any introduced water to the dye esters will result in a lower coupling efficiency due to the hydrolysis of the dye esters. 4. To resuspend RNA, we use RNase-free water, otherwise we use milliQ water (resistivity > 5 MΩ.cm at 25°C, with the organic content < 30 ppb). The RNA pellet should not be over-dryed, otherwise it would be difficult to resuspend. Heating at 55–60°C may assist in re-suspending the pellet. 5. When loading the gel, leave an empty place between samples to avoid cross contamination. 6. To precipitate sRNAs after ligation steps, it is advisable to add glycogen (Invitrogen) at a final concentration of 1 mg/mL to make the pellet visible. 7. We precipitate in the presence of a reverse primer as a carrier to help recovery of sRNAs ligated to adapters. 8. After amplification, two bands are observed: a 50 bp band corresponding to self-ligated adapters and a 70–80 bp band- to be collected. Although better resolution is obtained with a Polyacrylamide gel, the 3% agarose gel can be alternatively used. 9. For labelling of sRNas, we used the cy-5 dye which appears red in scanning and allows the identification with no ambiguity of hybridized cDNAs over the background of probe DNAs (green). 10. Bicarbonate buffer changes its composition over time; make a 1 M solution aliquot and store frozen. 11. 50–100 pmoles of dye incorporation per sample, and the frequency of incorporation in the range of 15–30 is optimal for hybridizations. 12. Our experiments were performed with a custom-made Arabidopsis thaliana chromosome 4 DNA tiling microarray. PCR amplification using selected primers at a 1 kb interval was performed on a Bacterial Artificial Chromosome template (BAC) covering chromosome 4 of Arabidopsis thaliana. The generated fragments were printed on Ultragaps Coated Slides (Corning). All PCR products were checked on agarose gels before being printed onto glass slides. 13. Do not let the slides dry. Transfer them as quickly as possible between wash solutions and between wash and centrifugation steps. 14. It is very important to do no more than two slides at a time and to proceed very quickly, as any droplet that dries on the surface of the slide will leave spots.
86
Boccara et al.
15. Some tiles with an exact match to the labelled targets were absent from that group, since one of the three experiments failed to provide significant hybridization to the tile in question. Conversely, some tiles from that group did not show any clear match with the set of labelled cDNAs used as targets, thus denoting some limitations in the computational prediction of hybridization patterns that used small sequences as targets.
Acknowledgments MB was supported by a Visiting Scientist Fellowship from INRA. VC and DB are members of the European Union Network of Excellence “The Epigenome”. References 1. Baulcombe D (2004) RNA silencing in plants. Nature 431:356–363 2. Brodersen P, Voinnet O (2006) The diversity of RNA silencing pathways in plants. Trends Genet 22:268–280 3. Jones-Rhoades MW, Bartel DP (2004) Compu tational identification of plant microRNAs and their targets including a stress-induced miRNA. Mol Cell 14:787–799 4. Sunkar R, Zhu JK (2004) Novel and stressregulated MicroRNAs and other small RNAs from Arabidopsis. Plant Cell 16:2001–2019 5. Navarro L, Dunoyer P, Jay F, Arnold B, Dharmasiri N, Estelle M et al (2006) A plant miRNA contributes to antibacterial resistance by repressing auxin signaling. Science 312: 436–439 6. Sunkar R, Chinnusamy V, Zhu J, Zhu JK (2007) Small RNAs as big players in plant abiotic stress responses and nutrient deprivation. Trends Plant Sci 12:301–309 7. Greenberg JT, Yao N (2004) The role and regulation of programmed cell death in plant–pathogen interactions. Cell Microbiol 6:201–211 8. Wei ZM, Laby RJ, Zumoff CH, Bauer DW, He SY, Collmer A et al (1992) Harpin, elicitor of the hypersensitive response produced by the plant pathogen Erwinia amylovora. Science 257:85–88 9. Llave C, Kasschau KD, Rector MA, Carrington JC (2002) Endogenous and silencing associated small RNAs in plants. Plant Cell 14: 1605–1619
10. Pfeffer S, Lagos-Quintana M, Tuschl T (2005) Cloning of small RNA molecules. Curr Protoc Mol Biol 26, Unit 26.4:26410–26418 11. Hafner M, Landgraf P, Ludwig J, Rice A, Ojo T, Lin C et al (2008) Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing. Methods 44:3–12 12. Martienssen RA, Doerge RW, Colot V (2005) Epigenomic mapping in Arabidopsis using tiling microarrays. Chromosome Res 13:299–308 13. Vaughn MW, Tanurdzic´ M, Lippman Z, Jiang H, Carrasquillo R, Rabinowicz PD et al (2007) Epigenetic natural variation in Arabidopsis thaliana. PLoS Biol 5:e174 14. Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2006) Elucidation of the small RNA component of the transcriptome. Science 309:1567–1569 15. Nakano M, Nobuta K, Vemaraju K, Tej SS, Skogen JW, Meyers BC (2006) Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA. Nucleic Acids Res 34:D731–D735 16. Thibaud-Nissen F, Wu H, Richmond T, Redman JC, Johnson C, Green R et al (2006) Development of Arabidopsis whole-genome microarrays and their application to the discovery of binding sites for the TGA2 transcription factor in salicylic acid-treated plants. Plant J 47:152–162 17. Boccara M, Sarazin A, Billoud B, Jolly V, Martienssen R, Baulcombe D et al (2007) New approaches for the analysis of Arabidopsis thaliana small RNAs. Biochimie 89:1252–1256
Chapter 9 Northern Blotting Techniques for Small RNAs Todd Blevins Abstract In eukaryotes, RNA silencing encompasses a range of biochemical processes mediated by ~20–25 nt small RNAs (smRNAs). This chapter describes northern blot hybridization techniques optimized for detection of such smRNAs, whether extracted from plant or animal tissues. The basic protocol is described, and control blots illustrate the detection specificity and sensitivity of this method using DNA oligonucleotide probes. Known endogenous smRNAs are analyzed in samples prepared from several model plant species, including Arabidopsis thaliana, Nicotiana benthamiana, Oryza sativa, Zea mays, and Physcomitrella patens, as well as the animals Drosophila melanogaster and Mus musculus. Finally, the usefulness of northern blotting in dissecting smRNA biogenesis is shown for the particular case of DNA virus infection. Key words: RNA silencing, Northern blot, RNA hybridization, Small RNA, siRNA, miRNA
1. Introduction Detecting specific sequences of nucleic acids extracted from biological samples is an essential task in molecular biology. One standard approach is the electrophoretic separation of nucleic acids by molecular weight, their blotting to a membrane substrate, and specific detection with radioactively labeled probes (1). Distinct methods are used for analysis of DNA and RNA by blot hybridization. DNA blot hybridization is called Southern blotting, an homage to its inventor, Edwin Mellor Southern (2, 3). Analogous RNA blotting methods were developed and are now commonly referred to as “northern” blotting (4, 5). Blot hybridization has primarily been employed to study genomic DNA or high molecular weight RNA derived from protein-coding genes (1, 3, 6). However, populations of small RNA (smRNA) molecules were recently discovered and characterized Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_9, © Springer Science + Business Media, LLC 2010
87
88
Blevins
in various eukaryotes (7–13). Small RNAs afford sequence specificity to RNA silencing, a process by which Argonaute proteins repress gene expression (14–16). RNA silencing generally involves the following mechanism: stem-loop hairpin RNAs or long double-stranded RNAs (dsRNAs) are processed into ~20–25 nt smRNAs duplexes by a Dicer RNase III endoribonuclease enzyme. A single smRNA strand then guides Argonaute-containing effector complexes to cleave specific RNA transcripts, block productive mRNA translation, or presumably, direct repressive chromatin modifications to particular genomic regions (17–19). Both the evolutionarily conserved mechanism of RNA silencing and functionally distinct pathways in different model organisms were elucidated in part by northern blot detection of the underlying smRNAs (7, 18–22). Endogenous smRNAs of one particular class, called microRNAs (miRNAs), are excised from stem-loop hairpin structures in RNA transcripts; miRNAs can regulate expression of mRNAs containing complementary miRNA-binding sites. Like the let-7 miRNA discovered in Caenorhabditis elegans, many miRNAs are highly conserved across either the animal or plant kingdoms and regulate development (22–24). However, plant family-specific, poorly conserved miRNAs have also been identified (25, 26). smRNAs of another class, called short-interfering RNAs (siRNAs), are processed from perfect dsRNA substrates and have multiple biological functions. These include viral defense, regulation of developmental timing, transposon silencing, and maintenance of heterochromatin. “Heterochromatic” siRNAs appear to guide covalent modifications to specific genomic regions, which can induce heritable, potentially reversible changes in gene expression (27–30). Such changes are called epigenetic modifications (31, 32), which explains the relevance of smRNA detection to epigenetic research. The blot hybridization techniques presented here are optimized for detection of smRNAs in either plant or animal tissues, and their analysis to a resolution of one nucleotide. A smRNA northern blot experiment consists of several independent techniques: (1) isolation of total RNA from fresh or frozen biological materials, (2) size fractionation of the total RNA into high molecular weight (HMW) and low molecular weight (LMW) fractions, (3) denaturing polyacrylamide gel electrophoresis of LMW RNA (or total RNA) followed by its transfer to a nylon membrane, and (4) hybridization of membrane-bound RNA with a radioactivelylabeled DNA oligonucleotide probe to detect specific smRNAs. The last procedure can be repeated at least 10–15 times by stripping the membrane before each hybridization. Most experiments require at least two rounds of hybridization: one round to detect a smRNA whose signal is expected to vary between samples, and a second round to detect a loading control that should not vary significantly. These steps are summarized by the work flow diagram in Fig. 1.
Northern Blotting Techniques for Small RNAs
a Tissue
Fresh Tissue
Liquid N2
Homogenization
[or]
Frozen Tissue
89
Store tissue at-80°C
Grinding
TRIReagent Extraction
b RNA
Size Fractionation
Total RNA [or]
LMW RNA
18% PAGE Separation
c Nylon Membrane
Transfer to membrane
Store RNA samples at either -20° or -80°C
HMW RNA Transcript Analysis
Stripping: 0.1% SDS
UV crosslinking Fresh probe preparation
Hybridization: 32 P-labeled oligoprobe
Washing: 2xSSC 0.5% SDS 3x
Film exp.: 2-7d, - 80°C [or] Phosphoimaging: 1-48h Store membrane at room temperature in ashielded container if still radioactive.
Fig. 1. A workflow diagram summarizing the smRNA northern blot procedure. Blot hybridization analysis of Low Molecular Weight (LMW) RNA is broken into three major steps: (a) extraction of total RNA from fresh or frozen tissues; (b) size fractionation of total RNA to obtain enriched LMW RNA (optional), followed by size separation of RNA using polyacrylamide gel electrophoresis (PAGE); and (c) transfer of separated RNA to a nylon membrane, UV crosslinking of RNA to the membrane, hybridization with a radiolabeled probe, washing with high-salt/SDS buffer, and detection of a radioactive signal using a phosphorimaging screen or photographic film. Subsequent hybridizations require stripping of the membrane, fresh probe preparation, washing and detection. This last cycle of steps can be repeated at least 10–15 times without substantial loss in signal. Unprocessed tissue samples should be stored at −80°C, whereas RNA can be stored at either −20°C or −80°C; RNA crosslinked to a nylon membrane is stable at room temperature
2. Materials 2.1. RNA Isolation and Size Fractionation
1. A ceramic mortar (6–8 cm in diameter), fitted pestle and small metal spatula per tissue sample, all precooled with liquid nitrogen; 100–200 mL liquid nitrogen per tissue sample. 2. Round-bottomed polypropylene tubes (13 mL) with tightsealing screw caps (Sarstedt, Nümbrecht, Germany).
90
Blevins
3. Standard centrifuge rotor types JA-20 (8-tube; Beckman Coulter, Fullerton, CA, USA) or SA-600 (12-tube; Sorvall, Asheville, NC, USA), with type 8441 rubber adapter inserts, a three-fourth inch diameter (Corning Inc., Corning, NY, USA). 4. TRI Reagent (Molecular Research Center, Cincinnati, OH, USA) or TRIzol (Invitrogen, Carlsbad, USA). Caution: phenol is toxic and corrosive. Store at 4°C. 5. Serological pipettes, 10 mL single-use, 2 per tissue sample (Sarstedt). 6. Isopropanol and chloroform, kept cold on ice. Caution: chloroform is toxic and a suspected carcinogen. 7. RNase-free bidistilled water, e.g., diethylpyrocarbonate (DEPC)-treated. Caution: DEPC is a suspected carcinogen (see Note 1). Briefly, pipette 1 mL DEPC (Sigma-Aldrich, St. Louis, MO, USA) into 1 L bidistilled water in a glass bottle and then cap it. Shake well and incubate for 4 h at room temperature. Autoclave to remove DEPC and store at room temperature. 8. Ethanol, absolute or 96% as available, for use in RNA cleanup steps. 9. 75% Ethanol (in DEPC-treated water), kept cold on ice. 10. RNase-free microcentrifuge tubes, 1.5 mL, referred to as “microfuge” tubes in this chapter. 11. RNeasy Mini Spin columns and kit (Qiagen, Venlo, The Netherlands). 2.2. Polyacrylamide Gel Preparation
1. Glass gel plates, 14 cm tall × 16 cm wide (or similar), with 1 mm thick spacers. 2. Gel comb (1 mm thick) with ~15 mm deep × 6 mm wide teeth (13–15 teeth/comb). Gel thickness and slot dimensions are important parameters that affect the resolution of RNA during gel electrophoresis. 3. Four large binder clips, 2 inch size (sometimes called banker’s or dog clips). 4. TBE buffer stock solution (10×): 0.9 M Tris-borate, 20 mM EDTA, pH 8.0 (1). Store at room temperature. 5. Agarose (standard electrophoresis grade), dissolved in 1× TBE to 1% concentration (w/v) using a microwave oven, and cooled to 65°C before use. 6. Urea, ReagentPlus, ³99.5% (Sigma-Aldrich). 7. Acrylamide, 30% solution (w/v), AccuGel 19:1 Acrylamide:BisAcrylamide, ultra pure sequencing grade (National Diagnostics, Atlanta, GA, USA). Caution: acrylamide monomer is a neurotoxin and a potential carcinogen. Store at 4°C.
Northern Blotting Techniques for Small RNAs
91
8. Millipore Express Plus (0.22 mm) presterilized filter, a 250 mL capacity container (Millipore, Billerica, MA, USA). 9. N, N, N¢, N¢-Tetramethylethylenediamine (TEMED) for electrophoresis ~99% (Sigma-Aldrich). Store at 4°C. 10. Ammonium persulfate, APS (USB Corp., Cleveland, OH, USA), prepare as 10% (w/v) in bidistilled water. Store 300 mL aliquots at −20°C; these can be used for at least 3 months. 11. A vertical gel apparatus (a gel rig) with upper and lower buffer reservoirs, for 14 × 16 cm polyacrylamide gels or close equivalent; power supply to generate 14–16 W or 450–550 V (Protein minigel rigs are not recommended, if smRNA resolution of 1 nt is required). 12. Luer-lok syringes (two), 30 mL each, and a 21G 1½″needle (BD, Franklin Lakes, NJ, USA). 2.3. Small RNA Sample Preparation, Electrophoresis, and Electroblotting
1. A spectrophotometer for quantification of small volumes of RNA: e.g., NanoDrop ND-1000 (NanoDrop Technologies, Wilmington, DE, USA). 2. A Speed-Vac apparatus for drying down 20–40 mL volumes of aqueous RNA solution (Thermo Scientific, Waltham, USA). 3. RNA size markers (20–25 nt range): e.g., the “microRNA Marker” 17, 21 and 25 nt RNA oligo mixture (New England Biolabs, Ipswich, MA, USA), or custom synthesized oligos (such as 21 and 24 nt RNA oligos shown in Fig. 2). 4. RNA gel-loading buffer: 95% (v/v) formamide, 0.025% (w/v) bromophenol blue, 0.025% (w/v) xylene cyanol FF, 5 mM EDTA, 0.025% (w/v) SDS, pH 8.5 (1). Store it in 500 mL aliquots at −20°C. 5. A thermoblock or a water bath set to 95°C. 6. Microcapillary pipette tips, 1–200 mL (United Laboratory Plastics, St. Louis, MO, USA). 7. Ethidium bromide (10 mg/mL stock), diluted 10,000× in 1× TBE buffer for gel staining. Caution: ethidium bromide is an intercalating agent, a mutagen and is thought to be carcinogenic. Handle with care, wearing nitrile gloves. 8. An ultraviolet gel documentation system. 9. Whatman #1 filter paper (Whatman/GE Healthcare, Chalfont St. Giles, UK), or 3 MW paper (Midsci, St. Louis, MO, USA). 10. Hybond-N+ positively charged nylon membrane (Amersham/ GE Healthcare) or similar positively charged nylon membrane (33). 11. A polyacrylamide gel-transfer apparatus: e.g., Trans-Blot semi-dry transfer cell (Bio-Rad, Hercules, CA, USA).
92
Blevins
a Oligonucleotide hybridization standards Oligo name RNA24 RNA21 DNA24 DNA-1middle DNA-1end DNA-2 DNA-3 DNA-4
Standard sequence GUAAACGGCCACAAGUUCAGCGUG GUAAACGGCCACAAGUUCAGC GTAAACGGCCACAAGTTCAGCGTG GTAAACGGCCACG AGTTCAGCGTG GTGAACGGCCACAAGTTCAGCGTG GTGAACAGCCACAAGTTCAGCGTG GTGAACAGCCGCAAGTTCAGCGTG GTGAACAGCCGCAAATTCAGCGTG
c Matched to
Detection sensitivity 280 140 70 35
17
8
[pg]
Probe I --
Probe I -----
24 21
ProbeI
Probe II
U6 snRNA
b Detection specificity
Probe I
Probe II
35°C
50°C
R24 R21 D24 D1m D1e D2 D3 D4
R24 R21 D24 D1m D1e D2 D3 D4
24
24
21
21
24
24
21
21
U6 snRNA
Fig. 2. Analysis of oligonucleotide hybridization specificity and sensitivity. (a) A series of DNA oligonucleotide (oligo) standards were designed such that each successive oligo diverged at an additional nucleotide (nt) position from the prototype, DNA24. Two RNA oligos were included (RNA21 and RNA24) with sequences essentially equivalent to DNA24. (b) Oligos were spiked into 4 mg aliquots of RNA from wild-type Arabidopsis. Samples were separated by PAGE and blotted to a nylon membrane. Hybridization at 35°C with Probe I (designed for RNA24, RNA21 and DNA24) detected standards that diverged from DNA24 at up to 2 nt positions; similar results were obtained at 50°C although the overall signal strength was reduced. In contrast, Probe II (designed for DNA-4) showed higher specificity at 50°C and minimal to no signal was detected from standards that differed from DNA-4 at two or more positions. The best specificity resulted when a probe possessed internally mismatched bases with respect to undesired hybrids. Thus, Probe I detected variant DNA-1end more strongly than DNA-1middle. DNA standards (24-nt long) migrated more rapidly than RNA24 because of the lower molecular weight of DNA oligos compared to equivalent RNA species. (c) To estimate sensitivity of oligo probe hybridization, equal amounts of RNA21 and RNA24 standards were mixed and then spiked as a dilution series into Arabidopsis RNA for PAGE separation. The concentration of the added standard decreases by a factor of 2 from left to right (280, 140, 70, 35, 17, and 8 pg). Clear detection required ~35 pg of RNA standard. U6 small nuclear RNA (snRNA) detected from the Arabidopsis RNA serves as a loading control
12. An ultraviolet crosslinking device: e.g., Stratalinker apparatus (Stratagene, La Jolla, CA, USA) or GS Gene Linker apparatus (Bio-Rad). 2.4. Radiolabeled Probe Preparation and Hybridization
1. PerfectHyb Plus buffer (Sigma-Aldrich) or, alternatively, MicroHyb buffer (Invitrogen). Store at room temperature. 2. A hybridization oven and tubes with a 20–30 mL buffer/ wash capacity: tubes need to accommodate ~12 cm membranes.
Northern Blotting Techniques for Small RNAs
93
3. Oligonucleotide (oligo) probes, 25 nmol scale resuspended in bidistilled water to 100 mM. Oligo probes used in this chapter are listed in Table 1 and were synthesized by Integrated DNA Technologies (Coralville, IA, USA). 4. T4 Polynucleotide Kinase (PNK) and the supplied 10× Kinase Buffer (Promega). Store at −20°C. 5. [Gamma-32P] Adenosine-5¢-triphosphate ([g-32P]ATP) with specific activity of 3,000–6,000 Ci/mmol and radioactive concentration of 10 mCi/mL (PerkinElmer, Waltham, MA, USA). Caution: Radiation protection measures must be taken while handling g-32P and all derived materials. Store probes in a shielded container within a freezer at −20°C. 6. Performa DTR Gel Filtration Cartridges (Edge Biosystems, Gaithersburg, MD, USA) or, alternatively, MicroSpin G-25 Columns (GE Healthcare). Store at 4°C. 2.5. Washing, Detection, Stripping, and Reprobing
1. SSC stock solution (20×): 3 M sodium chloride, 0.3 M sodium citrate, adjust pH to 7.0 with concentrated HCl (1). Store at room temperature. 2. Sodium dodecyl sulfate (SDS) stock solution, 10% (w/v) in bidistilled water. Store at room temperature. 3. Washing solution (1×): 2× SSC, 0.5% SDS. Store at room temperature; incubate at 65°C for 15–20 min to resuspend SDS if it precipitates during storage. 4. Heavy duty plastic kitchen wrap. 5. A phosphorimager screen with a cassette and an appropriate scanner (e.g., Personal Molecular Imager, Bio-Rad; or Typhoon Variable Mode Imager, Molecular Dynamics/GE Healthcare). 6. Kodak BioMax MR Film (Kodak, Rochester, NY, USA), an enhancer screen and an automated film developer (only required if the hybridization signal is very weak). 7. SDS stripping solution (0.1%), near boiling or 85°C.
2.6. Biological Materials Used in Examples
1. Arabidopsis thaliana grown from seed and collected 4 weeks postgermination. Wild type material is of Columbia-0 ecotype, and dicer-like mutant (dcl) lines originate from the SALK and GABI T-DNA insertion collections (34, 35). 2. Cabbage Leaf Curl Virus, CaLCuV; viral constructs provided by Dominique Robertson and infected Arabidopsis plants provided by Mikhail Pooggin and Thomas Hohn, described in Blevins et al. (36).
94
Blevins
Table 1 Oligonucleotide probes used for northern blot detection in this chapter Name
Sequence
Size [nt]
RNA24
GUAAACGGCCACAAGUUCAGCGUG
24
RNA21
GUAAACGGCCACAAGUUCAGC
21
DNA24
GTAAACGGCCACAAGTTCAGCGTG
24
Standard Probe I
CACGCTGAACTTGTGGCCGTTTAC
24
DNA-4
GTGAACAGCCGCAAATTCAGCGTG
24
Standard Probe II
CACGCTGAATTTGCGGCTGTTCAC
24
At-miR160a
UGCCUGGCUCCCUGUAUGCCA
21
Os-miR160a
UGCCUGGCUCCCUGUAUGCCA
21
Zm-miR160a
UGCCUGGCUCCCUGUAUGCCA
21
Pp-miR160a
UGCCUGGCUCCCUGUAUGCCA
21
miR160a_probe
TGGCATACAGGGAGCCAGGCA
21
At-miR165a
UCGGACCAGGCUUCAUCCCCC
21
At-miR166a
UCGGACCAGGCUUCAUUCCCC
21
Os-miR166a
UCGGACCAGGCUUCAUUCCCC
21
Zm-miR166a
UCGGACCAGGCUUCAUUCCCC
21
Pp-miR166a
UCGGACCAGGCUUCAUUCCCC
21
miR165_probe
GGGGGATGAAGCCTGGTCCGA
21
At-miR393a
UCCAAAGGGAUCGCAUUGAUCC
22
Bn-miR393
UCCAAAGGGAUCGCAUUGAUC
21
Os-miR393
UCCAAAGGGAUCGCAUUGAUC
21
Zm-miR393
UCCAAAGGGAUCGCAUUGAUCU
22
miR393a_probe
GGATCAATGCGATCCCTTTGGA
22
At-miR824
UAGACCAUUUGUGAGAAGGGA
21
Bn-miR824
UAGACCAUUUGUGAGAAGGGA
21
miR824_probe
TCCCTTCTCACAAATGGTCTA
21
At-siR1003
AGACCGUGAGGCCAACAUAGGCAU
24
siR1003_probe
ATGCCTATGTTGGCCTCACGGTCT
24
Ce-let-7
UGAGGUAGUAGGUUGUAUAGUU
22
Dm-let-7
UGAGGUAGUAGGUUGUAUAGU
21
Mm-let-7a
UGAGGUAGUAGGUUGUAUAGUU
22
Hs-let-7a
UGAGGUAGUAGGUUGUAUAGUU
22 (continued)
Table 1 (continued) Name
Sequence
Size [nt]
let-7_probe
AACTATACAACCTACTACCTCA
22
Mm-miR-122a
UGGAGUGUGACAAUGGUGUUUGU
23
miR-122a_probe
ACAAACACCATTGTCACACTCCA
23
Dm-miR-124
UAAGGCACGCGGUGAAUGCCAAG
23
miR-124_probe
CTTGGCATTCACCGCGTGCCTTA
23
At-miR173
UUCGCUUGCAGAGAGAAAUCAC
22
miR173_probe
GTGATTTCTCTCTGCAAGCGAA
22
Probes for Cabbage Leaf Curl Virus (CaLCuV) smRNAs: Viral reg. 1 sense (detects 21–24 nt species)
AATATGGTTGATCTTCCTTTGG GTGCAACAG
Viral reg. 1 a/sense (detects 21–24 nt species)
CTGTTGCACCCAAAGGAAGAT CAACCATATT
Viral reg. 2 sense (detects 21–24 nt species)
TGGTGATGTAATTCTTGACG GCATTGGTGTCT
Viral reg. 2 a/sense (detects 21–24 nt species)
AGACACCAATGCCGTCAAGA ATTACATCACCA
U6 snRNA sequences from three eukaryotic species: A.t._GUCCCUUCGGGGACA UCCGA
UAAAAUUGGAACGAUACAGA GAAGAUUAGCAUGGCCCC UGCGCAAGGAUGACACGCA UAAAUCGAGAAAUGGUC CAAAUUUU
D.m._GUUCUUGCUUCGGCA GAACAUAUAC
UAAAAUUG GAACGAUACAGAGAAGAUUAG CAUGGCCCCUGCGCAAGGA UGACACGCAAAAUCGUGAA GCGUUCCACAUUUU
M.m._GUGCUCGCUUCGGCAG CACAUAUAC
UAAAAUUGGA ACGAUACAGAGAAGAUUU AGCAUGGCCCCUGCGCA AGGAUGACACGCAAAUUCGU GAAGCGUUCCAUAUUUUU
Cons_U6-probeI Cons_U6-probeII
AATCTTCTCTGTATCGTTCCAATTTTA TGCGTGTCATCCTTGCGCAGGGGC CATGCT
(both detect 102–108 nt species)
Italic sequences are RNA targets, while bold sequences are probes used to detect them in data summarized by Fig. 2–4; all sequences are displayed in the 5¢ to 3¢ orientation. U6 small nuclear RNA (snRNA) sequences are displayed for the species Arabidopsis thaliana (A.t.), Drosophila melanogaster (D.m.) and Mus musculus (M.m.). U6 snRNA regions shown in bold are highly conserved across plants and animals, and were thus used to design conserved probes for the purpose of RNA loading controls
96
Blevins
3. Physcomitrella patens subspecies patens (WT06) streaked onto plates and cultured 7 days. This material is primarily protonemal tissue (provided by Pierre-François Perroud). 4. Other green plants grown from seed in a greenhouse and collected 9-days postgermination. The material includes both cotyledons and true leaves (grown by Mike Dyer). 5. Wild-type Drosophila melanogaster embryos, 12–17 h (provided by Kathryn Huisinga). 6. Healthy mouse liver tissues (provided by Tatiana Simon and Luciano Marpegan).
3. Methods 3.1. RNA Isolation and Size Fractionation
1. Total RNA is isolated from plants or other biological tissues using TRI Reagent, a phenol-based reagent ideal for largescale RNA extraction (37, 38). Column or glass fibre-based purifications are not appropriate at this stage in the protocol, because they exclude low molecular weight (LMW) RNA. About 1 g tissue is ground to a fine powder in liquid nitrogen using a mortar and pestle (see Note 2). All implements and tubes are pre-cooled in liquid nitrogen before contact with the frozen powder. Each ~1.5 mL aliquot of powder is transferred to a 13 mL centrifuge tube. 10 mL TRI Reagent is added to each tube, and the capped tubes are vortexed until the powder melts and is evenly suspended. Caution: phenol is toxic and corrosive. While handling TRI Reagent and during subsequent steps wear gloves. Work in a fume hood for pipetting TRI Reagent and further sample manipulations until RNA pellets are obtained. These mixtures are incubated 5 min at room temperature. Then, 2 mL of cold chloroform is added, the tubes are vortexed 20 s, and incubated for 3 min at room temperature. Caution: chloroform is toxic and a suspected carcinogen. 2. Samples are centrifuged 15 min (at 8,000 × g and 4°C). Warning: excessive centrifugation speeds (>10,000 × g) may rupture polypropylene tubes. The aqueous phase (~6 mL) is transferred to a new centrifuge tube, carefully avoiding contamination by the protein-rich interphase, and 6 mL cold isopropanol is added. The capped tube is gently inverted to mix and incubated 15–30 min on ice. RNA is sedimented by centrifugation for 30 min (at 8,000 × g and 4°C). After decanting isopropanol into a waste container, the pellets are washed with 75% ethanol (prepared from DEPC-treated water). RNA pellets can be stored in 75% ethanol overnight at −20°C, or several days if necessary.
Northern Blotting Techniques for Small RNAs
97
3. 75% ethanol is discarded, and the tubes are air-dried for 15 min at room temperature. Then, 60 mL DEPC-treated water, which has been preheated to 65°C, is pipetted into each tube and agitated across the entire interior surface. The tubes are centrifuged at low speed to collect the resuspended RNA. This is transferred to 1.5 mL microfuge tubes and kept on ice, while an additional 60 mL of DEPC-treated water is added to the larger tubes, repeating agitation and centrifugation steps. The second aliquots are combined with those already in the microfuge tubes. Nucleic acid concentrations are estimated using absorbance at a wavelength of 260 nm in a spectrophotometer (see Note 3). RNA samples can be stored at −20°C for 1 month, or −80°C for at least 6 months. 4. Total RNA is size-fractionated by means of RNeasy Mini Spin columns (Optional – see Note 4), roughly following the manufacturer’s “RNA cleanup” protocol. This improves sensitivity and resolution during polyacrylamide gel electrophoresis by removing higher molecular weight RNA and concentrating smRNA within the loaded samples (see Note 5). 80–100 mg total RNA is brought to a volume of 100 mL with DEPCtreated water, mixed with 350 mL RLT buffer (provided with columns), and then with 250 mL absolute ethanol. The mixtures are pipetted onto RNeasy Spin columns and centrifuged for 30 s (at 10,000 × g and room temperature). Flow-through fractions contain LMW RNA, as do the two subsequent washes with RPE buffer (supplied with columns). 5. Concentrated LMW RNA is recovered by combining the column flow-through and washes (~1.5 mL), and mixing this with an equal volume of isopropanol. This step is easily accomplished by dividing each combined sample into two microfuge tubes and adding 700 mL cold isopropanol to both. The tubes are inverted to mix and incubated for 2 h on ice (or, to maximize recovery, overnight at −20°C). LMW RNA is sedimented by centrifugation for 30 min (at 16,000 × g and 4°C). Isopropanol is discarded and the pellets are washed with 75% ethanol in DEPC-treated water. Ethanol is discarded and the pellets are air-dried for 15 min. The residual liquid is evaporated by placing open microfuge tubes for 10 min in a 65°C thermoblock. LMW RNA can be resuspended in 30–60 mL of DEPC-treated water, with the final volume depending on the pellet size. Store LMW RNA at −20°C if not used immediately; it can be stored there at least for 1 month, or at −80°C for at least 6 months. 3.2. Polyacrylamide Gel Preparation
1. Stock solution for 18% polyacrylamide urea gels is prepared as follows: 42 g urea, 60 mL AccuGel 30% acrylamide (19:1) solution and 10 mL of 10× TBE are combined in a 250 mL
98
Blevins
flask for a final volume of ~100 mL. The solution is stirred until the urea dissolves completely, which is accelerated by placing it over low heat. It is then vacuum-filtered into a Millipore Express container and stored at room temperature. Caution: acrylamide monomer is a neurotoxin and a potential carcinogen. Wear gloves while handling acrylamide and clean any spillage thoroughly. 2. Gel plates are prepared by cleaning inner surfaces with ethanol and wiping away excess moisture. Plastic spacers are sandwiched between left and right edges of these plates, and the whole assembly is clamped together by binder clips. With the assembly standing upright in a plastic receptacle (e.g., a pipette tip box cover), 1% agarose is poured into the receptacle until it seeps 5–6 mm into the assembly from below. Then, 700 mL agarose is pipetted along both spacer edges. This agarose-sealed assembly is allowed to cool for 15 min before proceeding to pour a polyacrylamide gel. 3. For a single gel, 25 mL of acrylamide/urea stock solution is transferred into a small beaker. Then, 25 mL TEMED and 250 mL 10% APS are added in quick succession (a fresh APS aliquot is used for each gel). A 30 mL syringe (without a needle) is used to mix the solution, drawing it up and back into the beaker and then up again. Without hesitation, the liquid is steadily injected into the assembly from above until 3–4 mm space remains. The comb is delicately inserted, taking care to avoid trapping bubbles around the teeth. The comb can be removed and reinserted 2–3 times to exclude such bubbles. Numerous attempts should be avoided, however, since they cause distortions as polymerization advances. The gel typically solidifies within 15–30 min. 4. The gel plate assembly is locked into its vertical gel rig using screws or binder clips, depending on the apparatus used. Both upper and lower reservoirs are filled with 1× TBE buffer. Once buffer submerges the gel top, the comb is removed by slowly pulling up to avoid distorting or tearing wells. Connect the gel rig to the power supply and set it to ~15 W, which requires around 450–550 V. Look for small bubbles emerging from electrodes. Prerun the gel in this manner for 30 min. 3.3. RNA Sample Preparation, Electrophoresis and Electroblotting
1. Samples to be loaded are thawed for 2 min at 65°C to fully resuspend RNA, and then they are placed on ice. Nucleic acid concentrations are estimated via absorbance at the 260 nm wavelength in a spectrophotometer. Equal amounts of RNA must be loaded in the next step: if any sample contains less than 8 mg total, then the smallest total mass will define the amount to be aliquoted from each sample.
Northern Blotting Techniques for Small RNAs
99
2. Volumes for 8 mg aliquots of LMW RNA are calculated based on sample concentrations and each transferred to new microfuge tubes – alternatively, total RNA can be used (see Note 4). In addition, 1 mL each of 21 nt and 24 nt RNA oligos (100 mM stocks) are mixed in a microfuge tube to serve as size standards. All samples and markers are completely dried using the Speed-Vac (medium heating), and re-suspended in 8 mL of RNA loading buffer. The samples are incubated for 3 min at 95°C to minimise secondary structure folding, and placed on ice until loading. 3. After the 30 min gel prerun, the power supply is disconnected, and gel slots are thoroughly rinsed with 1× TBE using a needled syringe: residual acrylamide in wells can lead to uneven migration and must therefore be completely washed into the buffer reservoir. RNA is loaded using microcapillary pipette tips, slowly layering each sample at the bottom of its well, avoiding the generation of air bubbles (see Note 4). 4. The power supply is reconnected and gel electrophoresis is performed for 1–2 h at 450–550 V so as to maintain 15 W; this keeps the gel hot and enhances size resolution. The bromophenol blue marker will run off into the lower buffer reservoir; electrophoresis is only complete, however, when the xylene cyanol FF marker has migrated to 4 cm from the gel bottom. The power supply is disconnected, and then the plate assembly is removed and placed on a square of paper towel. 5. A thin metal spatula is used to wedge the plates apart. Whichever plate the gel adheres to is used as a support. Together, the gel and plate are transferred to a Pyrex dish containing 300 mL ethidium bromide stain (30 mL of 10 mg/ mL ethidium bromide in 300 mL 1× TBE). Caution: ethidium bromide is an irritant, a mutagen, and a suspected carcinogen. Wear nitrile gloves for handling this chemical because latex gloves are too porous. Staining is conducted on an orbital shaker for 15–20 min. The gel and plates are removed, draining excess stain back into the dish, and RNA migration is documented under UV transillumination. Strong 5S rRNA and tRNA bands should be visible in test sample lanes (see Fig. 3a). A nondistinct smear would indicate degradation of RNA samples. Migration of 21 and 24 nt standards in the size marker lane is important for subsequent comparison to blot hybridization results. 6. Remaining polyacrylamide well dividers and the agarose gel bottom are cut away using a razor blade. One square of a nylon membrane and two identically-sized pieces of 3 MW paper (~12 × 14 cm) are cut to fit the gel. The membrane’s upper edge is labeled in pencil to indicate the experiment name and sample order, and to identify the side to which
100
Blevins
a
b
A.t. A.s. B.o. N.b. S.l. Z.m. O.s. P.p.
miR160
miR165/6
miR393
miR824
siR1003
24
let-7
21 24
miR-122
21 24
miR-124
21
21
21
U6 snRNA Ethidium Bromide
M.m.
24 21 24 21 24 21
U6 snRNA
24
24
D.m.
embryos liver
D.m. Drosophila melanogaster M.m. Mus musculus A.t. A.s. B.o. N.b. S.l. Z.m. O.s. P.p.
Arabidopsis thaliana Arabidopsis suecica Brassica oleracea Nicotiana benthamiana Solanum lycopersicum Zea mays Oryza sativa Physcomitrella patens
Fig. 3. Blot hybridization analysis of smRNA isolated from different model organisms. (a) Low molecular weight (LMW) RNA was isolated from a panel of plant species: there were included three members of the Brassicaceae family (Arabidopsis thaliana, Arabidopsis suecica and Brassica oleracea), two Solanaceous plants (Nicotiana benthamiana and Solanum lycopersicum), two monocots (Zea mays and Oryza sativa) and the moss Physcomitrella patens. miR160 is expressed from an evolutionarily ancient MIR gene family conserved from Brassicaceae to moss (23). In contrast, miR824 is Brassicaceae-specific (25), and the heterochromatic siR1003 is only detectable in Arabidopsis species. (b) LMW RNA was isolated from fruit fly (Drosophila melanogaster, 12–17 h embryos) and adult mouse (Mus musculus, liver). let-7 is a prototypical animal miRNA, evolutionarily conserved across much of that kingdom. let-7 is not yet expressed at the embryo stage in Drosophila, but is abundant in mouse liver (46). miR-122 is not conserved in Drosophila but is expressed in vertebrate liver. Finally, miR-124 expression peaks at the Drosophila 12–17 h embryo stage but is not encoded in mammalian genomes (47). U6 snRNA detection and ethidium bromide staining serve as loading controls
RNA will be blotted. All blot components are briefly soaked in 1× TBE. Removing the top electrode of the semidry transfer cell (here, the Bio-Rad Trans-Blot system), one 3 MW square is laid on the bottom electrode and doused with 1× TBE. Air bubbles are smoothed out by rolling a serological pipette (broken to fit) across the 3 MW paper surface. The membrane is carefully laid atop the paper, followed by the gel and a second square of 3 MW paper, smoothing out air bubbles between steps. Finally, the top electrode is locked in place, and electroblotting is carried out for 3 h at 10 V. Warning: orientation of the membrane-gel stack depends on the specific apparatus. Consult the manufacturer’s instructions to avoid RNA loss from transfer in the wrong direction.
Northern Blotting Techniques for Small RNAs
101
7. While still damp, the membrane is UV crosslinked with 140 mJ of energy (see Note 6). Over-crosslinking should be avoided, because this leads to decreased hybridization efficiency (33, 39). The membrane can now be stored in a plastic sleeve at room temperature until use (or between hybridizations). 3.4. Probe Preparation and Hybridization
1. The membrane is slid into a hybridization tube (RNA-side facing the interior), and 7–10 mL of PerfectHyb Plus buffer is added. Prehybridization is carried out for 2–6 h at 35°C. 2. A DNA oligo (the reverse complement of the smRNA to be detected) is resuspended in bidistilled water to a stock concentration of 100 mM. Table 1 lists sequences of probes used to detect smRNA species in Figs. 2–4 (see Note 7). The oligo endlabeling reaction is assembled in a microfuge tube as follows: (a) 12 mL bidistilled water (b) 2 mL kinase buffer (10×, provided with PNK) (c) 0.2 mL oligo (i.e., 20 pmol), (d) 1 mL polynucleotide kinase (PNK, 10 U/mL) This partial reaction is mixed by tapping the tube gently. 3. Caution: Radiation protection measures must be taken for probe preparation, hybridization, detection, and stripping steps. Wear gloves, lab coat, and monitor the work area regularly for contamination. In a properly shielded radioisotope work area, 5 mL of [g-32P]ATP (3,000 or 6,000 Ci/mmol; 10 mCi/mL) is added to the partial reaction (see Note 8), pipetting up and down to mix. The complete mixture is incubated for 30 min at 37°C. 4. A Performa DTR gel filtration cartridge (or MicroSpin G-25 Column) is placed in a microfuge tube and centrifuged for 2 min at 850 × g (~3,000 rpm in microcentrifuges). The cartridge is transferred to a new microfuge tube and the entire endlabeling reaction mixture is pipetted onto the packed matrix. The cartridge and tube are centrifuged for 2 min at 850 × g. Unincorporated 32P is retained with Adenosine-5¢-triphosphate in the matrix, while both labeled and unlabeled oligos pass into the eluate. A quick verification of 32P-incorporation can be made using a Geiger-Müller counter: a count per min reading with the detector pointed at the cartridge alone should be less than or equal to the reading when pointed at the eluate (distances held constant). The cartridge is then disposed of in a solid radioactive waste container and the eluate (i.e., the probe) is retained. 5. An eluted end-labeling reaction (20–30 mL) contains sufficient probe for hybridization with one to four membranes.
CaLCuV infected: Viral region 1 24 sense 21 Viral region 1 24 antisense 21
-
+
+
+
l4
b
dc
l3
WT
dc
a
l2
Blevins
dc
102
+
CaLCuV DNA ?
dsRNAs
Viral region 2 24 sense 21 Viral region 2 24 antisense 21 24 miR173 21
Overlapping viral RNA transcripts
DCL2 DCL3 DCL4 smRNAs
22 nt ~24 nt
21 nt
Downstream silencing effects
U6 snRNA Ethidium Bromide
Fig. 4. Functions of Arabidopsis Dicer-like (DCL) proteins in viral smRNA biogenesis. (a) Wild-type (WT), dcl2, dcl3 and dcl4 mutant plants were inoculated (+) with a DNA virus, Cabbage Leaf Curl Virus (CaLCuV). An uninfected pool of WT plants (−) was used as a negative control. RNA from these samples was analyzed by northern blot hybridization, using four DNA oligonucleotide probes to detect viral smRNAs from two regions of the viral genome. Three size-classes of viral species (21, 22, and 24 nt in length) accumulated in the infected WT sample and were detected in both sense and antisense polarities. In contrast, each individual dcl-mutant showed deficiency for the accumulation of a specific size-class of viral smRNA: dcl2 for 22 nt smRNA, dcl3 for 24 nt smRNA, and dcl4 for 21 nt smRNA. Biogenesis of miR173 is known to require DCL1, but does not require DCLs mutated in these lines; it was thus included as a positive control. U6 snRNA detection and ethidium bromide staining serve as loading controls. (Reproduced from Blevins et al. (36) with permission from Oxford University Press.) (b) This data supports a model wherein three Arabidopsis DCL proteins each process viral dsRNA into a distinct size-class of smRNA. Double-stranded RNA substrates for DCL processing appear to be overlapping viral transcripts in this particular system
To facilitate its transfer to hybridization tubes, 20 mL of bidistilled water is added to the eluted probe for each additional membrane. Then, 20 mL of probe is added to each hybridization tube; concentrated probe droplets should land in the prehybridization buffer, rather than directly on the membrane. Hybridization is performed at 35°C for 10–18 h (or at 50°C for higher specificity; see Fig. 2 and Note 9). 3.5. Washing, Detection, Stripping, and Reprobing
1. The membrane is washed in the hybridization tube three times with 2× SSC, 0.5% SDS for 30 min at 35°C (or 50°C for higher stringency; see Note 9). Each time, the contents of the hybridization tube are carefully poured off into liquid radioactive waste, and 15–20 mL of wash buffer is added. Then, the membrane is removed from the tube using forceps, allowing excess wash buffer to drip back into the tube.
Northern Blotting Techniques for Small RNAs
103
2. The membrane is placed onto a rectangle of plastic wrap just over twice its size. The excess plastic is folded over and wrinkles are smoothed out. The plastic-sealed membrane is taped into the phosphorimager cassette, and a cleared detection screen placed on top. The screen is removed after 1–3 h or up to 2 days later and scanned. Exposure duration must be optimized for the particular smRNA and probe activity. Extremely weak signals may require detection by exposure to Kodak MR film for 2–7 days at −80°C (see Note 10). Figure 3 documents hybridization results for a panel of plant and animal species using different miRNA probes; variation in signal intensity for individual miRNAs reflects their species and tissue-specific expression patterns. 3. Before hybridization with a new probe, the membrane is removed from the plastic wrap, placed in a Pyrex dish on an orbital shaker and stripped with 0.1% SDS previously heated to 85°C. The stripping step is completed once the solution returns to room temperature or after ~30 min. Residual radioactivity on the membrane should be checked using a Geiger-Müller counter or by film exposure overnight at −80°C. If significant signal is detectable in the size-range 20–30 nt, then a second stripping with 0.1% SDS needs to be performed. Caution: Used stripping solution contains the probe and must be disposed of in radioactive waste. 4. After stripping, the membrane is rinsed for 5 min with 2× SSC at room temperature to remove excess SDS, transferred to a hybridization tube, and Subheadings 3.4 and 3.5 are repeated. In addition to probing for endogenous or viral smRNAs, hybridization with probes for the highly conserved U6 small nuclear RNA (snRNA) is generally made. Because these species (102–108 nt) are produced independently of RNA silencing pathways, they serve as an RNA loading control.
4. Notes 1. RNase-free water: Dimethylpyrocarbonate (DMPC) is a suitable replacement for DEPC and is thought to be less carcinogenic; use the same procedure as for DEPC. 2. Tissue homogenization: Different tissues may require alternative homogenization techniques before or during TRI Reagent extraction. Although plant leaf, Drosophila embryo, and mouse liver samples were sufficiently homogenized by grinding in liquid nitrogen, some tissue types benefit from passage through a 15 mL dounce homogenizer after suspension in TRI Reagent but before chloroform addition. This
104
Blevins
procedure is performed on ice to reduce RNA degradation prior to TRI Reagent penetration of tissue fragments. 3. Spectrophotometric measurements: Absorbance at a wavelength of 260 nm is used to estimate RNA concentration as follows: c [mg/mL] = (OD260 × d × 40)/1000, where d is fold dilution with respect to the original RNA sample. Nucleic acid purity can be roughly assessed using the OD260/OD280 ratio, which is 1.8–2.0 for good RNA preps. Ratios below 1.7 indicate poor sample quality (contamination by protein and/or other impurities). Such samples often require additional phenol: chloroform extraction and isopropanol precipitation before proceeding to the northern blot. 4. Loading RNA: Total RNA can be loaded directly onto 18% polyacrylamide gels, avoiding the need for size-fractionation, although gel resolution may suffer as a result. About 5–10 mg is adequate for the detection of high titer smRNAs (e.g., many miRNAs). To detect low titer smRNAs, load 20–30 mg total RNA. Such large amounts of RNA or low purity samples become viscous when resuspended in loading buffer. Cutting 3–5 mm off the microcapillary pipette tip with a razor will facilitate loading these samples. 5. An alternative size-fractionation method: Polyethylene Glycol (PEG) precipitation – described by Hamilton and Baulcombe (7) and modified in Vazquez et al. (40) – is more scalable than the column-based method and produces similar results. Prepare a solution of 20% PEG8000 (Promega, Madison, WI, USA) and 3 M sodium chloride in bidistilled water, treating with DEPC. ~200 mL total RNA and 200 mL 20% PEG/3 M NaCl are mixed gently and incubated on ice for 20–30 min and then centrifuged for 10 min (at 12,000 × g and 4°C). This results in selective precipitation of high molecular weight RNA. Transfer supernatant (containing enriched smRNA) to new microfuge tubes, add 3 volumes of cold ethanol, incubate for 1–2 h at −80°C, and centrifuge for 20 min (at 14,000 × g and 4°C). Wash pellet with 70% ethanol, airdry pellet, and resuspend in DEPC-treated water. 6. An improved RNA crosslinking method: Pall and Hamilton (2008) found that carbodiimide-mediated chemical crosslinking enhances smRNA detection by up to 50-fold over the standard UV crosslinking method (39). To use this alternative procedure, PAGE should be performed using MOPS–NaOH (pH 7) buffer rather than TBE. 7. Alternative radioactive labeling methods: (a) mirVana Probe Construction Kit (Ambion) uses T7 polymerase transcription of oligo templates in the presence of [a-32P]CTP – e.g., Onodera et al. (41) and Pontes et al. (42). This method will
Northern Blotting Techniques for Small RNAs
105
aid the detection of low titer smRNAs by incorporating multiple radiolabeled phosphates into each probe molecule. (b) Single-stranded RNA probes generated by in vitro transcription (1) of linearized plasmid templates in the presence of [a-32P]UTP or [a-32P]CTP. These transcripts are hydrolyzed to an average of 50 nt before use, following Hamilton and Baulcombe (7). This method is best suited for detecting smRNAs from 100 to 1,000 bp regions, as opposed to individual smRNAs already characterized by sequencing. 8. Specific activity: For end-labeled probes, g-32P with 6,000 Ci/ mmol is preferable. Each labeled oligo incorporates only a single radioactive atom, so higher specific activity g-32P maximizes the activity per molecule. Additionally, probes for low titer smRNAs should be synthesized and used immediately after g-32P arrives from the supplier. 9. Hybridization time and temperature: If fast turnover times are required for successive probing, hybridization can be shortened to 6 h for conventional DNA oligo probes, or 2 h for locked nucleic acid probes (43) (see Note 11). Using the protocol described in this chapter, hybridization and washing at 35°C yielded the strongest signal from oligo standards but did not distinguish 2–3 nt variants thereof, whereas hybridization and washing at 50°C improved probe specificity but reduced signal strength somewhat (see Fig. 2). 10. Phosphorimaging versus film. All data shown in this chapter were collected using exposures to a phosphorimager screen for 1–48 h. However, very low titer smRNAs may require up to a 7-day film exposure using conventional DNA oligo probes. Locked nucleic acid probes improve hybridization sensitivity and reduce the time necessary for the overall protocol (43), which could eliminate the need for lengthy film exposures (see Note 11). 11. Locked nucleic acid (LNA) oligo probes. LNA oligo probes contain high-affinity RNA analogues (e.g., at every third nt position) that possess modified ribose moieties. Hybridization with LNA-modified oligos enhances detection, sensitivity, and specificity. Applications were demonstrated for northern blot detection of miRNAs (43) and heterochromatic siRNAs (44), amongst others (45).
Acknowledgments Many thanks to Azeddine Si-Ammour and Hanspeter Schöb for refining techniques described here, and to Frederick Meins, Jr., and Craig Pikaard for providing support and facilities for experi-
106
Blevins
ments shown in this chapter. Thanks to Mikhail Pooggin, Thomas Hohn, and Dominique Robertson for generating materials and ideas behind the viral experiments. Franck Vazquez, Mikhail Pooggin, and Andrzej Wierzbicki provided critical comments on the manuscript. Mike Dyer cared for leafy plants, while PierreFrançois Perroud provided moss tissue. Kathryn Huisinga supplied Drosophila embryos. Tatiana Simon and Luciano Marpegan provided mouse liver. This work was supported by a Friedrich Miescher Institute student fellowship, and postdoctoral fellowships from the Swiss National Foundation and Novartis Foundation. References 1. Sambrook J, Russell DW (2001) Molecular cloning. A laboratory manual, 3rd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor 2. Southern EM (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol 98: 503–517 3. Southern E (2006) Southern blotting. Nat Protoc 1:518–525 4. Alwine JC, Kemp DJ, Stark GR (1977) Method for detection of specific RNAs in agarose gels by transfer to diazobenzyloxymethyl-paper and hybridization with DNA probes. Proc Natl Acad Sci USA 74:5350–5354 5. Thomas PS (1980) Hybridization of denatured RNA and small DNA fragments transferred to nitrocellulose. Proc Natl Acad Sci USA 77:5201–5205 6. Brown T, Mackey K, Du T (2004) Analysis of RNA by northern and slot blot hybridization. Curr Protoc Mol Biol Chapter 4: Unit 4 9 7. Hamilton AJ, Baulcombe DC (1999) A species of small antisense RNA in posttranscriptional gene silencing in plants. Science 286:950–952 8. Hutvagner G, Mlynarova L, Nap JP (2000) Detailed characterization of the posttranscriptional gene-silencing-related small RNA in a GUS gene-silenced tobacco. RNA 6: 1445–1454 9. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP (2002) MicroRNAs in plants. Genes Dev 16:1616–1626 10. Lau NC, Lim LP, Weinstein EG, Bartel DP (2001) An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294:858–862
11. Llave C, Kasschau KD, Rector MA, Carrington JC (2002) Endogenous and silencing-associated small RNAs in plants. Plant Cell 14:1605–1619 12. Lagos-Quintana M, Rauhut R, Lendeckel W, Tuschl T (2001) Identification of novel genes coding for small expressed RNAs. Science 294:853–858 13. Pfeffer S, Zavolan M, Grasser FA, Chien M, Russo JJ, Ju J et al (2004) Identification of virusencoded microRNAs. Science 304:734–736 14. Parker JS, Barford D (2006) Argonaute: a scaffold for the function of short regulatory RNAs. Trends Biochem Sci 31:622–630 15. Hutvagner G, Simard MJ (2008) Argonaute proteins: key players in RNA silencing. Nat Rev Mol Cell Biol 9:22–32 16. Vaucheret H (2008) Plant ARGONAUTE. Trends Plant Sci 13:350–358 17. Meins F Jr, Si-Ammour A, Blevins T (2005) RNA silencing systems and their relevance to plant development. Annu Rev Cell Dev Biol 21:297–318 18. Chapman EJ, Carrington JC (2007) Specialization and evolution of endogenous small RNA pathways. Nat Rev Genet 8:884–896 19. Baulcombe D (2004) RNA silencing in plants. Nature 431:356–363 20. Grosshans H, Slack FJ (2002) Micro-RNAs: small is plentiful. J Cell Biol 156:17–21 21. Vazquez F (2006) Arabidopsis endogenous small RNAs: highways and byways. Trends Plant Sci 11:460–468 22. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297 23. Axtell MJ (2008) Evolution of microRNAs and their targets: are all microRNAs biologi-
Northern Blotting Techniques for Small RNAs
24. 25.
26.
27.
28. 29.
30. 31.
32. 33. 34.
35.
36.
cally relevant? Biochim Biophys Acta 1779:725–734 Jones-Rhoades MW, Bartel DP, Bartel B (2006) MicroRNAS and their regulatory roles in plants. Annu Rev Plant Biol 57:19–53 Kutter C, Schob H, Stadler M, Meins F Jr, Si-Ammour A (2007) MicroRNA-mediated regulation of stomatal development in Arabidopsis. Plant Cell 19:2417–2429 Rajagopalan R, Vaucheret H, Trejo J, Bartel DP (2006) A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev 20:3407–3425 Mette MF, Aufsatz W, van der Winden J, Matzke MA, Matzke AJ (2000) Transcriptional silencing and promoter methylation triggered by double-stranded RNA. EMBO J 19:5194–5201 Matzke MA, Birchler JA (2005) RNAimediated pathways in the nucleus. Nat Rev Genet 6:24–35 Pikaard CS (2006) Cell biology of the Arabidopsis nuclear siRNA pathway for RNAdirected chromatin modification. Cold Spring Harb Symp Quant Biol 71:473–480 Henderson IR, Jacobsen SE (2007) Epigenetic inheritance in plants. Nature 447:418–424 Meins F Jr (1996) Epigenetic modifications and gene silencing in plants. In: Russo V, Martienssen R, Riggs A (eds) Epigenetic mechanisms of gene regulation. Cold Spring Harbor Press, Cold Spring Harbor, NY, pp 415–442 Pikaard CS (2000) The epigenetics of nucleolar dominance. Trends Genet 16:495–500 Reed KC, Mann DA (1985) Rapid transfer of DNA from agarose gels to nylon membranes. Nucleic Acids Res 13:7207–7221 Alonso JM, Stepanova AN, Leisse TJ, Kim CJ, Chen H, Shinn P et al (2003) Genomewide insertional mutagenesis of Arabidopsis thaliana. Science 301:653–657 Sessions A, Burke E, Presting G, Aux G, McElver J, Patton D et al (2002) A highthroughput Arabidopsis reverse genetics system. Plant Cell 14:2985–2994 Blevins T, Rajeswaran R, Shivaprasad PV, Beknazariants D, Si-Ammour A, Park HS et al (2006) Four plant Dicers mediate viral small RNA biogenesis and DNA virus induced silencing. Nucleic Acids Res 34:6233–6246
107
37. Chomczynski P, Sacchi N (1987) Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal Biochem 162:156–159 38. Chomczynski P, Sacchi N (2006) The singlestep method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extr action: twenty-something years on. Nat Protoc 1:581–585 39. Pall GS, Hamilton AJ (2008) Improved northern blot method for enhanced detection of small RNA. Nat Protoc 3:1077–1084 40. Vazquez F, Gasciolli V, Crete P, Vaucheret H (2004) The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol 14: 346–351 41. Onodera Y, Haag JR, Ream T, Nunes PC, Pontes O, Pikaard CS (2005) Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell 120:613–622 42. Pontes O, Li CF, Nunes PC, Haag J, Ream T, Vitins A et al (2006) The Arabidopsis chromatin-modifying nuclear siRNA pathway involves a nucleolar RNA processing center. Cell 126:79–92 43. Varallyay E, Burgyan J, Havelda Z (2008) MicroRNA detection by northern blotting using locked nucleic acid probes. Nat Protoc 3:190–196 44. Henderson IR, Jacobsen SE (2008) Tandem repeats upstream of the Arabidopsis endogene SDC recruit non-CG DNA methylation and initiate siRNA spreading. Genes Dev 22:1597–1606 45. Castoldi M, Schmidt S, Benes V, Noerholm M, Kulozik AE, Hentze MW et al (2006) A sensitive array for microRNA expression profiling (miChip) based on locked nucleic acids (LNA). RNA 12:913–920 46. Pasquinelli AE, Reinhart BJ, Slack F, Martindale MQ, Kuroda MI, Maller B et al (2000) Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408:86–89 47. Aravin AA, Lagos-Quintana M, Yalcin A, Zavolan M, Marks D, Snyder B et al (2003) The small RNA profile during Drosophila melanogaster development. Dev Cell 5:337–350
Chapter 10 qRT-PCR of Small RNAs Erika Varkonyi-Gasic and Roger P. Hellens Abstract Plant small RNAs are a class of 19- to 25-nucleotide (nt) RNA molecules that are essential for genome stability, development and differentiation, disease, cellular communication, signaling, and adaptive responses to biotic and abiotic stress. Small RNAs comprise two major RNA classes, short interfering RNAs (siRNAs) and microRNAs (miRNAs). Efficient and reliable detection and quantification of small RNA expression has become an essential step in understanding their roles in specific cells and tissues. Here we provide protocols for the detection of miRNAs by stem-loop RT-PCR. This method enables fast and reliable miRNA expression profiling from as little as 20 pg of total RNA extracted from plant tissue and is suitable for high-throughput miRNA expression analysis. In addition, this method can be used to detect other classes of small RNAs, provided the sequence is known and their GC contents are similar to those specific for miRNAs. Key words: Small RNA, miRNA, RT, Stem-loop RT, qPCR, SYBR Green I assay, UPL probe assay
1. Introduction Small RNAs are 19–25 nucleotide long noncoding RNA molecules that include short interfering RNAs (siRNAs), implicated in posttranscriptional and transcriptional gene silencing (1), and microRNAs (miRNAs), implicated in processes ranging from developmental patterning to stress responses (2–5). While siRNAs arise from long double-stranded RNA precursors, miRNAs are derived from larger precursors with a characteristic hairpin secondary structure. Similar to siRNAs that target perfect complementary sequences, plant miRNAs repress gene expression by acting on near-perfect complementary sequences in target mRNAs to guide cleavage and translational repression (6–9), or on DNA to guide chromatin modification (10). The majority of plant miRNA targets are developmentally important transcription factors (11, 12) and stress-regulated genes (13, 14). Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_10, © Springer Science + Business Media, LLC 2010
109
110
Varkonyi-Gasic and Hellens
Due to the miRNA action, these targets are either eliminated completely during cell-fate changes (12, 15, 16), or are reduced to appropriate levels of expression in tissues, where both the miRNA and the target mRNA are co-expressed (17, 18). In addition, a possible long-distance signaling role was proposed for some miRNAs (19, 20), in contrast to miRNAs with demonstrated cell-autonomous expression and effects (21, 22). This complexity in miRNA modes of action demonstrates that reliable detection and quantification of miRNA expression in specific tissues is an essential first step for better understanding of miRNA-mediated gene regulation. Although miRNAs represent a relatively abundant class of transcripts, their expression levels can vary dramatically between cells and tissues and they often escape detection by conventional technologies such as cloning, northern hybridization, and microarray analysis because of low abundance combined with high complexity of the small RNA population in plants (11, 23). High sensitivity and specificity of reverse transcription-polymerase chain reaction (RT-PCR) detection methods provide a superior detection and quantification method over the conventional technologies. Stem-loop reverse transcription primers were shown to provide better specificity and sensitivity than linear primers (24), and a pulsed reverse transcription (RT) reaction further increases the sensitivity of miRNA detection (25). These features were utilized to derive a two-step miRNA detection method. First, the stem-loop RT primer is hybridized to the miRNA molecule and then reverse transcribed in a pulsed RT reaction. Next, the RT product is amplified using a miRNA-specific forward primer and the universal reverse primer. The product can be visualized by gel-electrophoresis upon a set number of PCR cycles or monitored in real-time using a SYBR Green I assay or a UPL probe assay that involves a dual labeled hydrolysis probe to increase specificity (Fig. 1). In addition to expression analysis of endogenous miRNAs, this method is amenable for the detection and quantification of other small RNAs, including artificial miRNAs and synthetic siRNAs.
2. Materials 2.1. Plant Material
1. Plant tissue collected into liquid nitrogen and handled according to standard practices to prevent degradation of RNA.
2.2. Isolation and Gel-Electrophoresis of RNA
1. TRIzol reagent for isolation of total RNA (Invitrogen, Carlsbad, CA) (see Note 1). 2. Solutions listed in the TRIzol protocol: chloroform, isopropanol, 75% ethanol, water to resuspend the RNA pellet (see Note 2).
qRT-PCR of Small RNAs
1
G A 5’-GTTGGCTCTGGTGC 5’-UGACAGAAGAGAGUGAGCAC-3’ 3’-CTCGTGCAaccgagacCACG C miR156 G A
RT 2
5’-UGACAGAAGAGAGUGAGCACGTTGGCTCTGGTGC 3’-CTCGTGCAaccgagacCACG C
T
G A
3 4
PCR
5 6
7
SYBR Green I
8
T AT
G
G G TC C T
T AT
G
G A G
C
G A G T T T G stem-loop A RT primer
G A G
5’-UGACAGAAGAGAGUGAGCAC-3’ 3’-ACTGTCTTCTCTCACTCGTGCAaccgagacCACGCTTATGGAGCCTGGGACGTGGTCTCGGTTG-5’ 5’-GCGGCGGTGACAGAAGAGAGT-3’ forward primer 5’-GCGGCGGTGACAGAAGAGAGT-3’ 3’-ACTGTCTTCTCTCACTCGTGCAaccgagacCACGCTTATGGAGCCTGGGACGTGGTCTCGGTTG-5’ 5’-GCGGCGGTGACAGAAGAGAGTGAGCACGTTGGCTCTGGTGCGAATACCTCGGACCCTGCACCAGAGCCAAC-3’ 3’-ACTGTCTTCTCTCACTCGTGCAaccgagacCACGCTTATGGAGCCTGGGACGTGGTCTCGGTTG-5’
5’-GCGGCGGTGACAGAAGAGAGTGAGCACGTTGGCTCTGGTGCGAATACCTCGGACCCTGCACCAGAGCCAAC-3’ + 3’-ACTGTCTTCTCTCACTCGTGCAaccgagacCACGCTTATGGAGCCTGGGACGTGGTCTCGGTTG-5’ 5’-GTGCAGGGTCCGAGGT-3’ universal reverse primer
5’-GCGGCGGTGACAGAAGAGAGTGAGCACGTTGGCTCTGGTGCGAATACCTCGGACCCTGCACCAGAGCCAAC-3’ 3’-TGGAGCCTGGGACGTG-5’ SYBR Green I 5’-GCGGCGGTGACAGAAGAGAGTGAGCACGTTGGCTCTGGTGCGAATACCTCGGACCCTGCACCAGAGCCAAC-3’ 3’-CGCCGCCACTGTCTTCTCTCACTCGTGCAaccgagacCACGCTTATGGAGCCTGGGACGTG-5’
9
UPL probe
5’-UGACAGAAGAGAGUGAGCACGTTGGCTCTGGTGC 3’-ACTGTCTTCTCTCACTCGTGCAaccgagacCACG C
G G TC C
G G TC
111
fluorescent signal suppressed by the quenching label
F
F
Q
Q
5’-tggctctg-3’
5’-tggctctg-3’ 3’-CGCCGCCACTGTCTTCTCTCACTCGTGCAaccgagacCACGCTTATGGAGCCTGGGACGTG-5’
10
“unquenched” fluorescent signal
F
Q
5’-tggctctg-3’
3’-CGCCGCCACTGTCTTCTCTCACTCGTGCAaccgagacCACGCTTATGGAGCCTGGGACGTG-5’
Fig. 1. Schematic showing the primer design and RT-qPCR process using the example of miR156. A stem-loop RT primer binds to the 3¢ portion of the miRNA, initiating reverse transcription. Then, the RT product is amplified using a miRNA specific forward primer and the universal reverse primer. Quantification is achieved either through SYBR Green I incorporation during amplification, or by the fluorescence generated upon cleavage of the UPL probe. Sequences related to miR156 are presented in grey. Sequences related to UPL probe #21 are in lower case. (1) Annealing, (2) Pulsed RT, (3) Denaturation, (4) Annealing, (5) Extension, (6) Denaturation, (7) Annealing, (8) Extension, (9) Hybridisation, (10) Cleavage
3. 12.3 M formaldehyde-containing 1% agarose gel. CAUTION: Formaldehyde is toxic through skin contact and inhalation of vapours. Manipulations involving formaldehyde should be done in a chemical fume hood.
112
Varkonyi-Gasic and Hellens
4. 10× MOPS buffer: 0.4 M MOPS, pH 7.0, 0.1 M sodium acetate, 0.01 M EDTA. 5. Formaldehyde Load Dye (Ambion, Austin, TX). 6. Ethidium bromide to final 10 mg/ml. CAUTION: Ethidium bromide is a strong mutagen and should be handled with extreme care. 7. Molecular weight markers, e.g. 0.5–10 Kb RNA Ladder (Invitrogen). 2.3. Stem-Looped Pulsed Reverse Transcription
1. Stem-loop RT primers. Prepare 100 mM stocks for long-term storage and 1 mM dilutions for immediate use. 2. 10 mM dNTP mix. Prepare by mixing dATP, dCTP, dGTP, and dCTP stock solutions, aliquot out and store at −20°C. 3. Reverse transcriptase, e.g. SuperScript III RT, 200 units/ml that is supplied with the First-Strand buffer for cDNA synthesis and 0.1 M DTT (Invitrogen). 4. RNase inhibitor such as RNaseOUT, 40 units/ml (Invitrogen). 5. Nuclease free water, e.g. UltraPure DEPC-treated Water (Invitrogen).
2.4. qPCR 2.4.1. miRNA SYBR Green Assay
1. LightCycler FastStart SYBR Green I master mix (Roche Diagnostics, Mannheim, Germany), prepared according to manufacturer’s instructions. 2. Universal reverse primer. Prepare 100 mM stock for longterm storage and 10 mM dilution for immediate use. 3. Forward miRNA-specific primer. Prepare 100 mM stock for long-term storage and 10 mM dilution for im mediate use. 4. 10 mM dNTP mix as above. 5. Nuclease free water.
2.4.2. miRNA UPL Probe Assay
1. LightCycler TaqMan master mix (Roche Diagnostics) prepared according to manufacturer’s instructions. 2. UPL probe #21 prepared as 10 mM stock (Roche Diagnostics). 3. Universal reverse oligo. Prepare 100 mM stock for long-term storage and 10 mM dilution for immediate use. 4. Forward miRNA-specific oligonucleotide. Prepare 100 mM stock for long-term storage and 10 mM dilution for immediate use. 5. 10 mM dNTP mix as above. 6. Nuclease free water.
2.5. Equipment
1. Standard laboratory equipment for isolation of RNA (fume hood, centrifuge, tubes, pipettes, and tips).
qRT-PCR of Small RNAs
113
2. A spectrophotometer for quantification of RNA, e.g. NanoDrop ND-1000 Spectrophotometer (NanoDrop Technologies, Wilmington, DE) (see Note 3). 3. Standard gel electrophoresis equipment (casting trays, gel tanks, power supply, UV transilluminator). 4. A thermal cycler for pulsed reverse transcription. Our reverse transcription reactions and end-point PCR analyses were performed on the Mastercycler (Eppendorf, Hamburg, Germany). 5. A real-time thermal cycler for qPCRs. All our real-time PCR analyses were performed on LightCycler 1.5 (Roche Diagnostics).
3. Methods 3.1. Primer Design
The primers are designed according to Chen et al. (24) with some modifications (26) (Fig. 1). The stem-loop RT primers have a universal backbone and a specific extension. The universal backbone sequence is as follows: 5¢-GTTGGCTCTGGTGCAGGGTCCGAGGT ATTCGCACcagagccaAC-3¢. This backbone sequence can form a stem-loop structure because of the complementarity between the nucleotides in the 5¢ and 3¢ end; it includes the reverse complement of the UPL probe #21 (in lower case) and the universal reverse primer site in the loop region (in bold). The specificity of a stem-loop RT primer to an individual miRNA is conferred by a six-nucleotide extension at the 3¢ end; this extension is a reverse complement of the last six-nucleotides at the 3¢ end of the miRNA. In an miR156 example, the miRNA sequence is as follows (last six nucleotides are underlined): 5¢-UGACAGAAGAGAGUGAGCAC-3¢. Thus, the miR156 stem-loop RT primer sequence is as follows (last six nucleotides that provide specificity are underlined): 5¢-GTTGGCTCTGGTGCAGGGTCCGAGGTATTC GCACcagagccaACGTGCTC-3¢. Forward primers are specific to the miRNA sequence but exclude the last six nucleotides at the 3¢ end of the miRNA. A 5¢ extension of 5–7 nucleotides is added to each forward primer to increase the length and the melting temperature; these sequences were chosen randomly and are relatively GC-rich, bringing the GC content of the forward primer to 50–60%. In an miR156 example, the forward primer sequence is as follows (the GC-rich 5¢ extension is underlined): 5¢-GCGGCGGTGACAGAAGAGAGT-3¢.
114
Varkonyi-Gasic and Hellens
3.2. Isolation and Gel-Electrophoresis of RNA
We provide an example of a method for isolation, quantification, and evaluation of RNA. Other methods may be used (see Notes 1–3). 1. Isolate RNA from the plant tissue snap-frozen in liquid nitrogen using the TRIzol reagent, according to manufacturer’s instructions. 2. Determine concentration by spectrophotometric analysis. Use an aliquot (200 ng–1 mg) to assess quality by gel electrophoresis. Store the remaining RNA on ice or at −20°C. 3. Determine RNA quality by gel-electrophoresis. Prepare the gel by heating 1 g agarose in 72 ml water until dissolved, and then cool slightly. Add 10 ml 10× MOPS running buffer and mix. Add 18 ml 37% formaldehyde (12.3 M). If required, top up with water to 100 ml. Pour the gel and wait until set. 4. Assemble the gel in the tank. Add 1× MOPS running buffer to cover the gel by a few millimetres. 5. Prepare the RNA sample by adding 3× volumes Formaldehyde Load Dye to 200 ng–1 mg RNA. Add ethidium bromide to the Formaldehyde Load Dye at a final concentration of 10 µg/ml. 6. Prepare the molecular weight marker in the same manner. 7. Heat denature samples at 65°C for 5–15 min. Load the gel and electrophorese at 5–6 V/cm. 8. Stop the run when the bromophenol blue dye has migrated as far as 70% of the length of the gel. 9. Visualize the RNA on a UV transilluminator. High quality RNA will have clearly visible rRNA bands. 10. Adjust RNA concentration with nuclease free water to 20 ng/ml.
3.3. Stem-Loop Pulsed Reverse Transcription Protocol
3.3.1. RT Reaction When Testing Many RNA Samples for One miRNA
The most reproducible results are obtained with 2–20 ng of total RNA per reaction, but abundant miRNAs can be detected from as little as 20 pg of total RNA. The protocol is designed to evaluate expression of a specific miRNA in a large number of samples or expression of a large number of miRNAs in one sample. If testing many RNA samples for one miRNA, prepare a “no RNA” master mix; if testing for many different miRNAs in one sample, prepare a “no RT primer” master mix. Include 10% excess to cover pipetting errors. At least three replicates per RT reaction are recommended. Also prepare “minus RT” controls by omitting reverse transcriptase from the reactions and “no template” controls by adding nuclease-free water in place of RNA. It is important to keep the reactions on ice and work in the cold room if handling large number of samples. 1. Prepare the “no RNA” master mix by scaling the volumes for an individual RT reaction to the desired number of RT reactions. Prepare an individual reaction by adding the following components to a nuclease-free microcentrifuge tube:
qRT-PCR of Small RNAs
115
0.5 ml 10 mM dNTP mix, 11.15 ml nuclease-free water and 1 ml of appropriate stem-loop RT primer (1 mM). 2. Heat mixture to 65°C for 5 min and incubate on ice for 2 min. 3. Centrifuge briefly to bring solution to the bottom of the tube. 4. Add the following: 4 ml 5× First-Strand buffer, 2 ml 0.1 M DTT, 0.1 ml RNaseOUT (40 units/ml) and 0.25 ml SuperScript III RT (200 units/ml). 5. Mix gently and centrifuge to bring solution to the bottom of the tube. 6. Assemble the RT reaction by aliquoting 19 ml of the “no RNA” master mix and adding 1 ml RNA template (see Note 4). 7. Mix gently and centrifuge to bring solution to the bottom of the tube. 3.3.2. RT Reaction When Testing One RNA Sample for Many miRNAs
1. Prepare the “no RT primer” master mix by scaling the volumes for an individual RT reaction to the desired number of RT reactions. Prepare an individual reaction by adding the following components to a nuclease-free microcentrifuge tube: 0.5 ml 10 mM dNTP mix, 11.15 ml nuclease-free water and 1 ml of appropriate RNA template (see Note 4). 2. Add the following: 4 ml 5× First-Strand buffer, 2 ml 0.1 M DTT, 0.1 ml RNaseOUT (40 units/ml) and 0.25 ml SuperScript III RT (200 units/ml). 3. Mix gently and centrifuge to bring solution to the bottom of the tube. 4. Assemble the RT reaction by aliquoting 19 ml of the “no RT primer” master mix and adding 1 ml of appropriate stem-loop RT primer (1 mM) previously denatured by heating to 65°C for 5 min. 5. Mix gently and centrifuge to bring solution to the bottom of the tube.
3.3.3. Pulsed RT Reaction
1. Load thermal cycler and incubate for 30 min at 16°C, followed by pulsed RT of 60 cycles at 30°C for 30 s, 42°C for 30 s and 50°C for 1 s. 2. Incubate at 85°C for 5 min to inactivate the reverse transcriptase.
116
Varkonyi-Gasic and Hellens
3.4. qPCR
Protocols are provided for the SYBR Green I Assay and the UPL probe assay. SYBR Green I assay provides good specificity, if the number of PCR cycles is limited to the maximum of 35 to minimize nonspecific amplification. At this number of cycles, highly and moderately abundant miRNAs can be easily quantified (Fig. 2). For miRNA sequences that are expressed at low levels or when a particular set of primers produces background amplification, the UPL probe assay provides higher specificity (Fig. 3).
Fig. 2. The sensitivity of the stem-loop RT-PCR assay. (a) RT-PCR analysis of miR159 expression visualized on agarose gel stained with ethidium bromide. Very little nonspecific amplification was detected with negative control reactions (−RT, minus RT, and NTC, “no template” control) at 35 cycles. The amount of RNA used for reverse transcription reactions are indicated on the top. PCR cycle numbers are indicated on the left. Size markers are indicated on the right. (b) qPCR analysis of the same sample using the SYBR Green I assay at 35 cycles
qRT-PCR of Small RNAs
117
Fig. 3. Improved specificity of the miRNA UPL probe assay. (a) SYBR Green I assay PCR for miR166. Negative control reactions (−RT, minus RT, and NTC, “no template” control) produced detectable amplicons after 40 cycles. (b) UPL probe assay PCR for miR166. No fluorescence was detected in the negative control reactions after 45 cycles. (c) UPL probe assay amplification products for miR166 separated by gel electrophoresis on 4% agarose, showing specific and nonspecific amplification products above, below and in the size-range of specific products, obtained after 45 cycles of PCR. Arrowhead indicates the expected size of specific amplicons
3.4.1. miRNA SYBR Green I Assay
1. Prepare 5× LightCycler FastStart SYBR Green I master mix (Roche Diagnostics) according to manufacturer’s instructions. 2. Prepare a PCR master mix by scaling the volumes listed below to the desired number of amplification reactions. Include 10% excess to cover pipetting errors. For a single reaction, add the following components to a nuclease-free microcentrifuge tube: 12 ml nuclease-free water 4 ml SYBR Green I master mix
118
Varkonyi-Gasic and Hellens
1 ml forward (miRNA specific) primer (10 mM) and 1 ml reverse (universal) primer (10 mM) 3. Mix gently and centrifuge to bring solution to the bottom of the tube. 4. Store in cooling block or on ice. 5. Place required number of LightCycler capillaries in precooled centrifuge adapters. 6. Pipette 18 ml master mix into each LightCycler capillary. 7. Add 2 ml RT product. 8. Seal each capillary with a stopper. 9. Place capillaries into the LightCycler carousel and spin in the carousel centrifuge. 10. Incubate the samples at 95°C for 5 min, followed by 35–40 cycles of 95°C for 5 s and 60°C for 10 s. 11. For melting curve analysis, denature samples at 95°C, then cool to 65°C at 20°C per second. Collect fluorescence signals at 530 nm wavelength continuously from 65°C to 95°C at 0.2°C per second. 3.4.2. miRNA UPL Probe Assay
1. Prepare 5× LightCycler TaqMan master mix (Roche Diagnostics) according to manufacturer’s instructions. 2. Prepare a PCR master mix by scaling the volumes listed below to the desired number of amplification reactions. Include 10% excess to cover pipetting errors. For a single reaction, add the following components to a nuclease-free microcentrifuge tube: 11.8 ml nuclease-free water, 4 ml TaqMan master mix, 1 ml forward (miRNA specific) primer (10 mM) and 1 ml reverse (universal) primer (10 mM) and 0.2 ml UPL probe #21 (10 mM). 3. Mix gently and centrifuge to bring solution to the bottom of the tube. 4. Store in cooling block or on ice. 5. Place required number of LightCycler capillaries in precooled centrifuge adapters. 6. Pipette 18 ml master mix into each LightCycler capillary. 7. Add 2 ml RT product. 8. Seal each capillary with a stopper. 9. Place capillaries into the LightCycler carousel and spin in the carousel centrifuge. 10. Incubate samples at 95°C for 5 min, followed by 35–45 cycles of 95°C for 5 s and 60°C for 10 s.
qRT-PCR of Small RNAs
119
3.5. Data Analysis
The qPCR data can be analysed and presented as absolute or relative values. Relative quantification is the preferred method because it takes into account the potential errors due to variation in RNA input and RT efficiency. The most accurate method to correct these potential errors is normalization to endogenous control genes. An ideal endogenous control generally demonstrates gene expression that is relatively constant and highly abundant across tissues and cell-types. In addition, a suitable control for normalization of miRNA expression would have similar properties to miRNAs in terms of size and stability and would be amenable to the miRNA assay design. Some classes of small noncoding RNAs (ncRNAs) other than miRNAs are often expressed in an abundant and stable manner. Several human and mouse snRNAs and snoRNAs were tested across the range of tissues and experimental conditions and confirmed as suitable endogenous controls for quantification of miRNA expression levels (Applied Biosystems). No such analysis was performed with plant tissues yet, and a large-scale study is required to evaluate suitability of different plant ncRNAs for miRNA quantification. Therefore, plant researchers have to select a set of controls individually and screen under appropriate conditions or select a specific miRNA that demonstrates the least variability across tissues or experimental conditions under consideration. Either way, the consistency of expression should be confirmed under the specific conditions of the experiment (see Note 5). Here, we provide general instructions for data analysis using the LightCycler Software 4.05. If using a different instrument or software, refer to the appropriate instrument user manual for instructions on how to analyze data.
3.5.1. Melting Curve Analysis
1. This analysis is done after the SYBR Green I Assay to determine that each of the primer pairs amplified a single predominant product with a distinct melting temperature (Tm). 2. Follow the instrument user manual for instructions for Melting Curve Analysis and Tm calling. 3. If a single melting peak is observed for a particular primer pair, it is likely that a single product with a distinct Tm was amplified. 4. Evaluate by gel-electrophoresis (see Note 6).
3.5.2. Relative Quantification
1. Relative Quantification analysis compares two ratios: the ratio of the target gene to a reference gene sequence in an unknown sample is compared with the ratio of the same two sequences in a standard sample called a Calibrator. 2. To perform relative quantification with an external standard, prepare standard curves for the target and reference genes by serial dilutions of external standards with a known copy number (see Note 7). Use at least three points or one point per log of concentration, whichever is greater. Always use a “no template” control.
120
Varkonyi-Gasic and Hellens
3. Prepare master mix and perform qPCR as described above. Use at least three replicates per standard dilution and “no template” control. 4. Follow the instrument user manual for instructions for the Standard Method that will automatically calculate and display the amplification curves and the standard curve, crossing points, calculated concentrations, and statistics for replicates. 5. Save as an external standard curve object. 6. Perform Relative quantification, Calibrator normalized, without efficiency correction: select Relative Quantification – Monocolor Analysis, assign a “Target Calibrator” and a “Reference Calibrator” sample, assign appropriate pairs of target and reference samples and perform analysis following the instrument user manual (see Note 8). 7. Perform efficiency correction by applying an external standard curve. 8. Download data and present as graphs or tables.
4. Notes 1. Whenever possible, we used the TRIzol Reagent for isolation of RNA because of its convenience, good RNA quality and speed. Some plant tissues may not be amenable to isolation of RNA by this method. Other methods for isolation of RNA may be used; however, avoid RNA purification methods that use RNA-binding glass-fibre filters that do not recover small RNA species quantitatively (e.g. Qiagen RNeasy mini and midi kits). If unfamiliar with the method for isolation of RNA, subsequent isolation, quantification, and polyacrylamide gel electrophoresis of the low molecular weight RNA fraction can be used to evaluate its quantity and quality. 2. RNA should be handled according to standard laboratory practices to avoid RNase contamination. All buffers and solutions should be nuclease-free. 3. Spectrophotometry followed by gel electrophoresis is still the most widely used method for assessing the RNA yield, purity, and quality. Fluorometry (e.g. RiboGreen, Molecular Probes) can also be used to determine yield, and microfluidic systems (such as Agilent’s bioanalyzer chips) can be used to determine yield and quality. 4. In our hands, both nondenatured RNA and RNA denatured by incubation at 65°C for 5 min produced similar results. However, it has been suggested that denaturation of RNA may reduce the yield of cDNA for some miRNAs.
qRT-PCR of Small RNAs
121
5. In general, evaluation of endogenous controls involves demonstration of relatively abundant and relatively constant expression levels across the tissues and environmental conditions, compared with the RNA input and expression of other housekeeping genes. 6. Melting curve analysis needs to be combined with gel electrophoresis. Due to the small size of the fragment, a primerdimer product generated form the “minus RT” and “no template” controls often has a very similar Tm to that of the appropriate miRNA amplification fragment. This becomes an issue with lowly abundant miRNAs that require a large number of PCR amplification cycles. In that case, the UPL assay is recommended. 7. Alternatively, use a cDNA sample with the highest level of target expression and prepare serial dilutions. 8. This method assumes that the efficiency of target and reference gene amplification is identical and equal to 2 (the amount of PCR product doubles during each cycle). In reality, the efficiency is often lower because of a number of different factors. Efficiency correction is required for more reliable data. The software calculates the efficiency from the slope of the standard curve. References 1. Hannon GJ (2002) RNA interference. Nature 418:244–251 2. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297 3. Bartel B, Bartel DP (2003) MicroRNAs: at the root of plant development? Plant Physiol 132:709–717 4. Mallory AC, Vaucheret H (2006) Functions of microRNAs and related small RNAs in plants. Nat Genet 38(Suppl):S31–S36 5. Zhang B, Wang Q, Pan X (2007) MicroRNAs and their regulatory roles in animals and plants. J Cell Physiol 210:279–289 6. Aukerman MJ, Sakai H (2003) Regulation of flowering time and floral organ identity by a MicroRNA and its APETALA2-like target genes. Plant Cell 15:2730–2741 7. Chen X (2004) A microRNA as a translational repressor of APETALA2 in Arabidopsis flower development. Science 303:2022–2025 8. Llave C, Xie Z, Kasschau KD, Carrington JC (2002) Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297:2053–2056 9. Palatnik JF, Allen E, Wu X, Schommer C, Schwab R, Carrington JC et al (2003) Control
10.
11.
12. 13. 14.
15.
of leaf morphogenesis by microRNAs. Nature 425:257–263 Bao N, Lye KW, Barton MK (2004) MicroRNA binding sites in Arabidopsis class III HD-ZIP mRNAs are required for methylation of the template chromosome. Dev Cell 7:653–662 Llave C, Kasschau KD, Rector MA, Carrington JC (2002) Endogenous and silencing-associated small RNAs in plants. Plant Cell 14:1605–1619 Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, Bartel DP (2002) Prediction of plant microRNA targets. Cell 110:513–520 Sunkar R, Zhu JK (2004) Novel and stressregulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16:2001–2019 Sunkar R, Kapoor A, Zhu JK (2006) Posttranscriptional induction of two Cu/Zn superoxide dismutase genes in Arabidopsis is mediated by downregulation of miR398 and important for oxidative stress tolerance. Plant Cell 18:2051–2065 Juarez MT, Kui JS, Thomas J, Heller BA, Timmermans MC (2004) microRNA-mediated repression of rolled leaf1 specifies maize leaf polarity. Nature 428:84–88
122
Varkonyi-Gasic and Hellens
16. Kidner CA, Martienssen RA (2004) Spatially restricted microRNA directs leaf polarity through ARGONAUTE1. Nature 428:81–84 17. Tang G, Reinhart BJ, Bartel DP, Zamore PD (2003) A biochemical framework for RNA silencing in plants. Genes Dev 17:49–63 18. Mallory AC, Reinhart BJ, Jones-Rhoades MW, Tang G, Zamore PD, Barton MK et al (2004) MicroRNA control of PHABULOSA in leaf development: importance of pairing to the microRNA 5¢ region. EMBO J 23:3356–3364 19. Yoo BC, Kragler F, Varkonyi-Gasic E, Haywood V, Archer-Evans S, Lee YM et al (2004) A systemic small RNA signaling system in plants. Plant Cell 16:1979–2000 20. Pant BD, Buhtz A, Kehr J, Scheible WR (2008) MicroRNA399 is a long-distance signal for the regulation of plant phosphate homeostasis. Plant J 53:731–738 21. Parizotto EA, Dunoyer P, Rahm N, Himber C, Voinnet O (2004) In vivo investigation of the transcription, processing, endonucleolytic activity, and functional relevance of the spatial
22.
23.
24.
25.
26.
distribution of a plant miRNA. Genes Dev 18:2237–2242 Alvarez JP, Pekker I, Goldshmidt A, Blum E, Amsellem Z, Eshed Y (2006) Endogenous and synthetic microRNAs stimulate simultaneous, efficient, and localized regulation of multiple targets in diverse species. Plant Cell 18:1134–1151 Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2005) Elucidation of the small RNA component of the transcriptome. Science 309:1567–1569 Chen C, Ridzon DA, Broomer AJ, Zhou Z, Lee DH, Nguyen JT et al (2005) Real-time quantification of microRNAs by stem-loop RT-PCR. Nucleic Acids Res 33:e179 Tang F, Hajkova P, Barton SC, Lao K, Surani MA (2006) MicroRNA expression profiling of single whole embryonic stem cells. Nucleic Acids Res 34:e9 Varkonyi-Gasic E, Wu R, Wood M, Walton EF, Hellens RP (2007) Protocol: a highly sensitive RT-PCR method for detection and quantification of microRNAs. Plant Methods 3:12
Chapter 11 Cloning New Small RNA Sequences Yuko Tagami, Naoko Inaba, and Yuichiro Watanabe Abstract Small RNAs are key molecules in RNA silencing pathways that exert sequence-specific regulation of gene expression and chromatin modifications in many eukaryotes. In plants, endogenous small RNAs, including microRNAs (miRNAs) and trans-acting small interfering RNAs (tasiRNAs) play an important role in biological processes such as development and stress responses. In addition, viral genome-derived siRNAs are produced during viral infection, and they exhibit anti-viral defense by an RNA silencing pathway. These endogenous and exogenous small RNAs are mainly 21–24 nucleotides in length. Here, we describe a method to identify small RNA sequences from plant tissues. Small RNAs are purified by column fractionation and gel excision from total RNAs. These small RNAs are ligated at both termini to DNA/RNA chimeric adapters and reverse-transcribed to produce cDNAs. By the following PCR amplification, BanI restriction sites are added to cDNAs, which enables directional concatamerization. Concatamerizedfragments are cloned and sequenced. This method could be applied to identify small RNA sequences from many sources, e.g., mutant plants, plants in various stress environments, and virus-infected plants. Key words: Cloning, Small RNA, siRNA, miRNA, Virus-derived siRNA, Sequencing
1. Introduction It has recently been demonstrated that many kinds of noncoding RNAs are expressed, and they play an essential role in organisms. In particular, the functions of small RNAs (20–30 nucleotides in length) have been analysed. These are key molecules in RNA silencing pathways that regulate gene expression transcriptionally or post-transcriptionally in eukaryotes. Small regulatory RNAs are classified into several groups by their biogenesis and functions. They include microRNAs (miRNAs), trans-acting small interfering RNAs (tasiRNAs), natural cis-antisense transcripts-associated siRNAs (nat-siRNAs), and heterochromatic siRNAs (hc-siRNA) (1). Target mRNAs of miRNAs include many translation factors that are important for plant Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_11, © Springer Science + Business Media, LLC 2010
123
124
Tagami, Inaba, and Watanabe
development in Arabidopsis (2), therefore proper biogenesis and functions of miRNAs are essential for normal development and are tightly regulated in various organs. In addition to the endogenous small RNAs, viral genome-derived siRNAs are produced during viral infection (3). The viral genome is targeted by these siRNAs and silenced using the RNA silencing pathway. Thus, RNA silencing also functions as an anti-viral defense. Recently, endogenous siRNAs were newly identified in animals such as mouse and Drosophila (4–7). Likewise, we can expect to identify as-yet-undiscovered small RNA molecules in plants under different environmental conditions or mutants obtained by using small RNA cloning methods. In this chapter, we describe a method for cloning small RNA molecules from plant tissues such as leaves and flowers. Using this method, we have obtained small RNA sequences from leaves of virus-infected plants (8, 9). Our protocol is based on a previously published protocol (10), except that we do not use a radioisotope in the procedure. The procedure outline is shown in Fig. 1. Total RNA is isolated from plant tissue. Small RNA is purified by an anion-exchange
Fig. 1. A scheme for cloning small RNA. Purified small RNA samples are ligated to a 3¢ adapter followed by ligation to a 5¢ adapter. Both adapter-ligated small RNAs are reverse transcribed to cDNAs, and then Ban I sites are added by PCR reactions. Ban I-restricted fragments are concatamerized and cloned into a vector followed by identification of sequences
Cloning New Small RNA Sequences
125
chromatography column and gel-purification. Because small RNAs do not have polyA tails, adapters are ligated to a small RNA at its 5¢- and 3¢-ends for the following reverse transcription and PCR amplification. By two consecutive PCR reactions, BanI restriction sites are added to both ends of cDNA. Then, fragments are restricted by BanI endonuclease and directionally concatamerized. These fragments are cloned into a vector and sequenced (see Note 1).
2. Materials 2.1. RNA Isolation and Size Fractionation
1. 1 g of plant tissue. 2. A mortar and pestle. 3. TRIzol® Reagent (Invitrogen, Carlsbad, CA). CAUTION: phenol is toxic and corrosive. Store at 4ºC. 4. Liquid nitrogen. 5. Chloroform. CAUTION: this is a probable carcinogen. Store at room temperature. 6. Isopropanol. CAUTION: this is flammable and harmful. Store at room temperature. 7. 70% (v/v) ethanol. 8. RNase-free water (see Note 2). 9. RNA/DNA Midi Kit (QIAGEN, Valencia, CA) containing QRL1, QRV2, QRE, QRW buffers and a QIAGEN-tip column. 10. QRW2 buffer: 750 mM NaCl, 50 mM MOPS, pH 7.0, 15% ethanol. Store at room temperature. 11. Formamide. CAUTION: this is a probable carcinogen. Store at −20ºC.
2.2. Polyacrylamide Gel Electrophoresis (PAGE)
1. 5× TBE buffer: 445 mM Tris, 445 mM boric acid, 10 mM EDTA, pH 8.0. Store at room temperature. 2. 40% (w/v) acrylamide/bis solution (19:1): CAUTION: this is a neurotoxin when unpolymerized. 3. 20% ammonium persulfate (APS): CAUTION: this is harmful. Store at −20ºC and use within 1 month. 4. N,N,N¢,N¢-Tetramethylethylenediamine(TEMED).CAUTION: this is corrosive. 5. Urea. 6. Glass plates (11 × 10 cm), 0.1 cm spacers and 14-well combs.
126
Tagami, Inaba, and Watanabe
7. 4× loading dye: 50% glycerol, 0.03% bromophenol blue (BPB), 50 mM Tris-HCl, pH 7.7, 5 mM EDTA, pH 8.0. Store at room temperature. 8. 21 and 24 nt RNA size markers (see Note 3). 9. 5¢ and 3¢ adapters: modified chimeric DNA/RNA oligonucleotides described in Table 1 (see Note 4). They are purchased from an oligonucleotide synthesis company (e.g. SIGMA, St. Louis, MO). 2.3. Gel-Purification and Precipitation of Oligonucleotides
1. Ethidium bromide (EtBr) solution: prepare 1 µg/mL solution. CAUTION: This is a mutagen, handle with Nitrile gloves at all times. 2. RNA elution buffer: 0.5 M ammonium acetate, 1 mM EDTA, pH 8.0. 3. Homogenization pestle for a 1.5 mL tube. 4. Ethanol, 3 M sodium acetate and 20 µg/µL glycogen for ethanol precipitation. 5. 5 M NaCl. 6. TE buffer: 10 mM Tris, 1 mM EDTA, pH 8.0. 7. TE-saturated phenol and chloroform. CAUTION: these are toxic. 8. 50× TAE: 2 M Tris–acetate, 50 mM EDTA. Store at room temperature. 9. Standard agarose.
Table 1 Sequences of adapters and primers for cloning small RNA Adapter/primer
Sequence
3¢ adaptera,b
p-rUrUrUAACCGCGAATTCCAG-L
5¢ adapter
OH-ACGGAATTCCTCACTrArArA-OH
a,c
First PCR 5¢ primer
CAGCCAACGGAATTCTCACTAAA
RT primer, first PCR 3¢ primer
GACTAGCTGGAATTCGCGGTTAAA
Second PCR 5¢ primer GAGCCAACAGGCACCGAATTCCTCACTAAA Second PCR 3¢ primer GACTAGCTTGGTGCCGAATTCGCGGTTAAA M13 Reverse primer
CAGGAAACAGCTATGAC
rU and rA are ribonucleotides, not deoxyribonucleotides The 5¢ end of the 3¢ adapter is phosphorylated, and the 3¢ end of the 3¢ adapter is modified with a hydroxyl blocking group (L) (see Note 4) c The 5¢ end of the 5¢ adapter is not phosphorylated a
b
Cloning New Small RNA Sequences
127
10. 100 bp DNA ladder marker (Takara Bio, Madison, WI). 11. Wizard SV Gel and PCR Clean-Up System (Promega, Madison, WI). 2.4. Adapter Ligation
1. T4 RNA ligase (20 U/µL) and 10× reaction buffer (New England BioLabs, Ipswich, MA). 2. Dimethyl sulfoxide (DMSO).
2.5. Reverse Transcription
1. SuperScript III Reverse Transcriptase (200 U/µL) and 5× First Strand Buffer (Invitrogen). 2. dNTP mixture (25 mM each). 3. 0.1 M DTT. 4. RNase inhibitor (40 U/µL) (Takara Bio). 5. 100 µM RT primer (Table 1).
2.6. PCR Amplification
1. First PCR 5¢ and 3¢ primers and second PCR 5¢ and 3¢ primers (100 µM each) (Table 1). 2. Ex Taq (5 U/µL) and 10× Ex Taq buffer (Takara Bio). 3. dNTP mixture (2.5 mM each).
2.7. BanI Restriction and Concatamerization
1. BanI restriction endonuclease (6 U/µL) and 10× L buffer (TOYOBO, Osaka, Japan).
2.8. Cloning of ConcatamerizedFragments and Sequencing
1. TOPO TA cloning kit (Invitrogen) containing pCR4-TOPO vector solution and salt solution.
2. T4 DNA ligase (350 U/µL) and 10× reaction buffer (Takara Bio).
2. XL1-Blue or JM109 of Escherichia coli competent cells. 3. 5-Bromo-4-chloro-3-indolyl b-D-galactopyranoside (X-gal): prepare 20 mg/mL solution in DMSO. Wrap in foil to protect from light and store at −20°C. 4. Isopropyl thiogalactoside (IPTG): prepare 0.1 M solution in water. Sterilize by filtration and store at −20°C. 5. LB plates with Km, IPTG and X-gal: spread 50 µL each of X-gal and IPTG solution onto standard LB plates containing 30 µg/mL kanamycin. 6. LB liquid medium. 7. FastPlasmid Mini (Eppendorf, Westbury, NY) or other equivalent plasmid extraction kit. 8. M13 reverse primer (Table 1). 9. BigDye Terminator v31 Cycle Sequencing Kits and ABI PRISM 310 Genetic Analyser (Applied Biosystems, Foster City, CA).
128
Tagami, Inaba, and Watanabe
3. Methods 3.1. RNA Isolation and Size Fractionation
To prepare small RNA samples for cloning, crude total RNA samples are extracted from plant tissues using TRIzol reagent. Then, low-molecular-weight (LMW) RNAs are recovered using the anion-exchange chromatography RNA/DNA Midi kit. After this fractionation, RNAs of less than 500 nt in length will be obtained. The procedures listed in this section should be performed using a chemical fume hood when handling TRIzol and chloroform. 10 mg of LMW RNA is enough to move to the next procedure.
3.1.1. Total RNA Extraction
1. Grind 1 g of plant tissues in liquid nitrogen to fine powder with a clean mortar and pestle. 2. Add 10 mL of TRIzol reagent to the frozen powder and mix thoroughly. After it is melted, transfer the sample to a 15 mL tube and incubate for 5 min at room temperature. 3. Add 2 mL of chloroform, vortex the tube and incubate for 3 min at room temperature. 4. Centrifuge the sample at 4,500 × g for 20 min at 4°C. Transfer the upper aqueous phase including RNA into a new 15 mL tube (see Note 5). 5. Add an equal volume (~5 mL) of isopropanol, vortex briefly and keep at −20°C for 30 min. Centrifuge the sample at 4,500 × g for 30 min at 4°C. 6. Discard the supernatant and add 3 mL of 70 % ethanol to the pellet. Centrifuge at 4,500 × g for 10 min at 4°C. 7. Discard the supernatant and air-dry the pellet. Do not vacuum-dry the RNA pellet because RNA becomes hard to dissolve if it is too dry. 8. Completely dissolve the pellet in 200 µL of RNase-free water.
3.1.2. Low-MolecularWeight (LMW) RNA Isolation
1. Pour 3 mL of Buffer QRE into the QIAGEN-tip column to equilibrate, and allow the buffer to enter the column by gravity flow. 2. Add 1 mL of Buffer QRL1 to total RNA (Subheading 3.1.1). Mix thoroughly by vortexing. Then add 9 mL of Buffer QRV2 and mix by vortexing. 3. Apply the sample to the column and allow it to enter the resin by gravity flow. 4. Add 12 mL of Buffer QRW into the column and allow it to enter the resin by gravity flow to wash out unbound materials.
Cloning New Small RNA Sequences
129
5. Set the column on a new 15 mL tube, and pour 6 mL of Buffer QRW2 into the column to elute LMW RNA. 6. Precipitate the eluted RNA with an equal volume (6 mL) of isopropanol by centrifugation at 4,500 × g for 30 min at 4°C. The supernatant is discarded, and the RNA pellet is washed with 70% ethanol by centrifugation at 4,500 × g for 10 min at 4°C. The pellet is dissolved in 25 µL formamide. 3.2. Small RNA Purification and Adapter Ligation
3.2.1. Gel-Purification of Small RNA and 3 ¢ Adapter
In this section, small RNAs from plant tissue and 5¢ and 3¢ adapters are gel-purified and ligated. To know where to excise in the gel, RNA size markers (21 and 24 nt in length) are ligated to adapters as well as small RNAs and used as markers. Before ligation, the adapter and samples are excised from the gel, and their RNA eluates are mixed and co-precipitated to facilitate precipitation and simplify the ligation step. 1. Add 4× loading dye to 25 µL of LMW RNA (Subheading 3.1.2) (sample #2 in Fig. 2a). Also add 4× loading dye to 24 nt RNA size markers (sample #1), the 21 nt RNA size marker (sample #3) and the 3¢ adapter (sample #4) in separate tubes (2–3 nmoles each). Denature by incubating for 15 min at 65°C. Place the tubes on ice. 2. Load these samples as shown in Fig. 2a onto a denaturing 15% polyacrylamide gel (see Note 6, 7 M urea). 3. Run the gel for 1–1.5 h at a constant voltage of 100–150 V using 0.5× TBE buffer until BPB dye reaches one-third from the bottom of the gel. 4. Stain the gel in EtBr solution (see Note 7). Using razor blades and the mobility of RNA size markers as a guide, excise the gel containing 21–24 nt small RNAs at the LMW RNA lane considered small RNAs (Fig. 2a-(a), distinct bands are not visible). Also, excise the 18 nt band for 3¢ adapter (b), the 21 nt RNA size marker band (c) and the 24 nt RNA size marker band (d) in parallel. Transfer each gel slice to preweighed 1.5 mL tubes separately and determine the weight of each gel slice. 5. Add 2 volumes (v/w) of RNA elution buffer to each gel slice and then crush with pestles. Incubate the tubes for 1 h at 65°C (mix by occasional vortexing). 6. Transfer the eluate to new 1.5 mL tubes without sucking out any small gel pieces. Centrifuge at 15,000 × g for 5 min at 4°C to remove any small gel remnants. 7. Add one-third each of the 3¢ adapter eluate (b) to the total eluate of the 24 nt RNA size marker (d), small RNAs (a) and the 21 nt RNA size marker (c) to obtain samples #5, 6 and 7, respectively.
130
Tagami, Inaba, and Watanabe
Fig 2. Procedures in Subheading 3.2. Small RNA Purification and Adapter Ligation. Samples are loaded leaving two lanes empty between samples. Excise the gel indicated by rectangles. (a) The procedure in Subheadings 3.2.1–3.2.2. The figure shows the gel for purification of small RNAs and 3¢ adapter. Sample #1: 24 nt RNA size marker, #2: LMW RNA (if the well is too small, load in two lanes), #3: 21 nt RNA size marker, #4: 3¢ adapter. (b) The procedure in Subheadings 3.2.3– 3.2.4, step 1. The figure shows the gel for purification of 3¢ adapter-ligated small RNAs and 5¢ adapter. Sample #5: 3¢ adapter-ligated 24 nt RNA size marker, #6: 3¢ adapter-ligated small RNAs, #7: 3¢ adapter-ligated 21 nt RNA size marker, #8: 21 nt RNA size marker, #9: 5¢ adapter. (c) The procedure in Subheading 3.2.4. The figure shows the gel for purification of 5¢ and 3¢ adapters-ligated small RNAs. Sample #10: 5¢ and 3¢ adapters-ligated 21 nt RNA size marker, #11: 5¢ and 3¢ adapters-ligated small RNAs
Cloning New Small RNA Sequences
131
8. Add 2.5 volumes of ethanol, 0.1 volume of 3 M sodium acetate and 1 µL glycogen and mix by vortexing. Incubate the tubes at −80°C for more than 1 h. 9. Co-precipitate by centrifugation at 15,000 × g for 30 min at 4°C. Remove the supernatant completely. Do not wash the pellet with 70% ethanol. 3.2.2. 3¢ Adapter Ligation
1. Dissolve each RNA pellet (Subheading 3.2.1) in the mixture of 1 µL of 10× T4 RNA ligase buffer, 1 µL of DMSO and 7 µL of RNase free water. Denature RNA samples by incubating at 90°C for 30 s and immediately place on ice. 2. Add 1 µL of T4 RNA ligase, mix gently and incubate for 2 h at room temperature (ligation in a 10 µL volume).
3.2.3. Gel-Purification of 3¢ Adapter-Ligated Small RNA and 5¢ Adapter
1. Add 4× loading dye to the ligated products (sample #5–7) see Subheading 3.2.2. Also add 4× loading dye to the 21 nt RNA size marker (sample #8) and the 5¢ adapter (sample #9) (2–3 nmoles each). Denature by incubating for 15 min at 65°C. Place the tubes on ice. Load samples #5–9 as shown in Fig. 2b onto a denaturing 15% polyacrylamide gel (see Note 6, 7 M urea). Run the gel (as in Subheading 3.2.1, step 3). 2. Stain the gel in EtBr solution (see Note 7). Excise the gel containing 3¢ adapter-ligated 21–24 nt small RNAs (Fig. 2b(e), distinct bands are not visible). Also excise the band of the 3¢ adapter-ligated 21 nt RNA size marker (f) and the 18 nt band of the 5¢ adapter (g) using the mobility of the 21 nt RNA size marker as a guide. Transfer each gel slice to preweighed 1.5 mL tubes separately and determine the gel slice weight. 3. Elute RNA from each gel (as in Subheading 3.2.1, steps 5 and 6). 4. In new tubes, add half of each of the 5¢ adapter eluates (g) to a total eluate of the 3¢ adapter-ligated 21 nt RNA size marker (f) and 3¢ adapter-ligated small RNAs (e) to obtain samples #10 and 11. Precipitate RNAs with ethanol (as in Subheading 3.2.1, steps 8 and 9).
3.2.4. Ligation of 5¢ Adapter and GelPurification of 5¢ and 3¢ Adapters-Ligated Small RNA
1. Perform ligation in samples #10 and 11 as in Subheading 3.2.2. 2. Add 4× loading dye to ligated products and denature by incubating for 15 min at 65°C. Place the tubes on ice. Load samples #10 and 11 as shown in Fig. 2c onto a denaturing 15% polyacrylamide gel (see Note 6, 7 M urea). Run the gel (as in Subheading 3.2.1, step 3). 3. Stain the gel with EtBr solution (see Note 7). Excise the gel containing higher molecular weight RNA than the 5¢ and 3¢
132
Tagami, Inaba, and Watanabe
adapter-ligated 21 nt RNA size marker (bands of direct ligation products between 5¢ and 3¢ adapters can be seen (Fig. 2c-(i)), but bands are not visible in the excised region (h)). Transfer the gel slice to a pre-weighed 1.5 mL tube, determine the gel weight and elute RNAs (as in Subheading 3.2.1, steps 5 and 6). 3.3. Reverse Transcription and PCR Amplification of Adapter-Ligated Small RNAs
3.3.1. Reverse Transcription of the Ligation Product
In this section, the RNA eluted from the gel is co-precipitated with RT primer to facilitate the precipitation, and 5¢ and 3¢ adapter-ligated small RNAs are reverse-transcribed to cDNAs. The cDNA is amplified in two PCR rounds with primers that add BanI recognition sites at their 5¢ and 3¢ ends. After the first PCR, the bands derived from the direct ligation products of 5¢ and 3¢ adapters (no small RNAs in between) are removed by gelpurification. 1. Add 1 µL RT primer to the RNA eluate (Subheading 3.2.4) as a carrier to facilitate the recovery of the ligation product. 2. Co-precipitate RNA and primer with 2.5 volumes of ethanol, 0.1 volumes of 3 M sodium acetate and 1 µL glycogen by centrifugation at 15,000 × g for 30 min at 4°C. 3. Dissolve the pellet in 9 µL RNase free water and 4 µL of 25 mM dNTPs solution. 4. Denature RNA by incubating for 5 min at 65°C and place on ice. Add 1 µL of 0.1 M DTT, 4 µL of 5× first strand buffer, 1 µL of RNase inhibitor and 1 µL of SuperScriptIII reverse transcriptase. 5. Incubate for 30 min at 50°C, then for 15 min at 70°C, and place on ice.
3.3.2. First PCR Amplification of cDNA and Gel-Purification
1. Transfer 10 µL of cDNA (Subheading 3.3.1) to a new PCR tube and add 8 µL of 2.5 mM dNTPs mixture, 10 µL of 10× Ex Taq buffer, 1 µL of first PCR 5¢ primer, 1 µL of first PCR 3¢ primer, 69 µL of water, and 1 µL of Ex Taq polymerase (the first PCR is performed on the 100 µL scale). If there are a lot of samples, it would be better to make a master mix, except for cDNA. 2. Run the first PCR using the following cycle conditions: 20 cycles; 45 s at 95°C, 1 min 25 s at 50°C, 1 min at 72°C. 3. Run the total PCR product with the 100 bp ladder marker on a 15% polyacrylamide gel (see Note 6, without urea), as in Subheading 3.2.1, step 3. 4. Stain the gel in EtBr solution (see Note 7). The 50 bp bands derived from the direct ligation products of 5¢ and 3¢ adapters (no small RNAs in between) should appear (an arrow in Fig. 3a), though the size of the first PCR product of both
Cloning New Small RNA Sequences
133
adapter-ligated small RNAs is larger than 70 bp. Excise the gel including the larger PCR products (indicated by an asterisk in Fig. 3a), but do not excise the 50 bp bands (indicated by an arrow in Fig. 3a). Extract DNA from the gel slice as in Subheading 3.2.1, steps 5 and 6, and precipitate as in steps 8 and 9. Dissolve the pellet in 50 µL TE buffer. 3.3.3. Second PCR Amplification to Generate the BanI Restriction Site
1. Transfer 1 µL of the first PCR product to a new 1.5 mL tube and add 50 µL of 2.5 mM dNTPs mixture, 50 µL of 10× Ex Taq buffer, 5 µL second PCR 5¢ primer and 5 µL second 3¢ PCR primer, 384 µL water and 5 µL of Ex Taq polymerase. This PCR solution is divided into five aliquots (100 µL each) in PCR tubes. 2. Run the second PCR in the following cycle condition: 10 cycles; 45 s at 95°C, 1 min 25 s at 50°C, 1 min at 72°C.
Fig. 3. Gel images of DNAs. (a) The first PCR products in Subheading 3.3.2, step 4. An arrow indicates the bands derived from the ligated products of 5¢ and 3¢ adapters. Excise the larger bands indicated with an asterisk. (b) The second PCR products separated on a 10% polyacrylamide gel in Subheading 3.3.3, step 4. A major band indicated by an arrow should be about 70 bp long. (c) Verification of BanI digestion in Subheading 3.4.1, step 3. The band of digested DNA is slightly smaller than that of undigested DNA. (d) Verification of concatamerization in Subheading 3.4.2, step 3. Successful concatamerization products exhibit broadly smeared bands. Excise the bands between 600 and 800 bp indicated by an asterisk
134
Tagami, Inaba, and Watanabe
3. Using 6 µL of the PCR product, examine the second PCR reaction on 10% polyacrylamide with TBE buffer or a 2% agarose gel with TAE buffer. 4. If the product is detectable (Fig. 3b), the second PCR product is combined into one 1.5 mL tube. Add 30 µL of 5 M NaCl and perform phenol/chloroform and chloroform extractions (see Note 8). 5. Add 2 volumes of ethanol to DNA solution and place on ice for 1 h. Precipitate DNA by centrifugation at 15,000 × g for 30 min at 4°C. 3.4. BanI Restriction and Concatamerization 3.4.1. BanI Restriction Digestion of the Second PCR Product
The PCR products are restricted by BanI endonuclease and ligated to yield directionally concatamerized fragments. 1. Remove the supernatant without drying the pellet (see Note 9). Dissolve the pellet with the mixture of 20 µL of 10× L buffer and 172 µL water. Keep 2 µL of undigested DNA for examining BanI digestion. 2. Add 10 µL of BanI endonuclease to DNA solution and incubate for 3 h at 37°C. 3. Verify digestion on 2% agarose gel electrophoresis using 2 µL of a digested sample and an undigested sample (Fig. 3c). 4. After BanI digestion is examined, add 12 µL of 5 M NaCl to the digested sample, followed by phenol/chloroform and chloroform extractions (see Note 8). 5. Precipitate DNA as in Subheading 3.3.3, step 5.
3.4.2. Concatamerization of BanI-Digested DNA and Gel-Purification
1. Dissolve the DNA pellet in the mixture of 5.5 µL of 10× T4 DNA ligase buffer and 39.5 µL water. Add 3 µL of each of second PCR 5¢ primer and second PCR 3¢ primer and incubate for 10 min at 65°C to prevent re-ligation of 12 bp BanI digestion fragments and long DNA. Immediately place the tube on ice. 2. Add 2 µL of T4 DNA ligase and incubate overnight at 22°C. 3. Check for complete concatamerization on a 2% agarose gel using 2 µL of the reaction mix along with a 100 bp DNA ladder marker. If concatamerization is successful, smear bands ranging from 60 bp to 1 kbp are detected (Fig. 3d). 4. Separate the remaining concatamerization products on the 2% agarose gel and excise the bands of between 600 and 800 bp. Elute DNA using the Wizard SV Gel and PCR Clean-Up System according to the kit protocol. 5. Purify eluted DNA by phenol/chloroform and chloroform extractions (see Note 8). Add 2.5 volumes of ethanol, 0.1
Cloning New Small RNA Sequences
135
volume of 3 M sodium acetate and 1 µL glycogen, and precipitate DNA by centrifugation at 15,000 × g for 30 min at 4°C. 3.5. Sequencing of Small RNAs and Analysis of Results 3.5.1. Cloning and Sequencing of Concatamerized Fragments
Protruding ends of concatemerized fragments are filled in by Taq polymerase to be cloned into the vector. Clones with inserts are screened by blue–white selection, and plasmids are recovered and sequenced. 1. Dissolve the DNA pellet in the mixture of 1 µL of 2.5 mM dNTPs mixture, 1 µL of 10× Ex Taq buffer and 7.5 µL water. 2. Add 0.5 µL of Ex Taq polymerase and incubate for 30 min at 72°C for enzymatic 3¢ tailing. Perform TOPO TA cloning by mixing 2 µL of DNA solution, 0.5 µL of salt solution and 0.5 µL of TOPO vector solution and incubating for 1 h at room temperature. 3. Transform XL1-Blue or JM109 of Escherichia coli competent cells with total TOPO cloned products. Spread transformed E. coli onto the LB plate with Km, IPTG and X-gal. 4. Recover plasmids from white colonies by starting an overnight LB culture and purifying the plasmid DNA using a quick plasmid purification kit, for example FastPlasmid Mini. 5. Obtain small RNA sequences in the TOPO pCR4 vector by sequencing with M13 reverse primer by the dideoxynucleotide chain-termination method using BigDye Terminator v31 Cycle Sequencing Kits and DNA sequencer. One clone should have 3–12 concatamerized small RNA sequences. An example of sequencing results is shown in Fig. 4.
Fig. 4. An example of sequencing results. When reading sequences using M13 Reverse primer, small RNA sequences derived from plant materials are obtained between the 5¢ and 3¢ adapters. BanI restriction sites are underlined. Usually, 3–12 small RNAs are obtained in one plasmid
136
Tagami, Inaba, and Watanabe
3.5.2. Analysis of Cloning Results
Small RNA sequences are mapped to the genome by Blastn searches (TAIR-BLAST for Arabidopsis: http://www.arabidopsis. org/Blast/index.jsp) (NCBI-BLAST: http://blast.ncbi.nlm.nih. gov/Blast.cgi). TAIR SeqViewer (http://www.arabidopsis.org/servlets/sv) would also be helpful for mapping of small RNAs in Arabidopsis. Small RNAs are mainly categorized into groups, such as miRNA, tasiRNA, gene (sense or antisense or both), intergenic region, rRNA, tRNA, snRNA, transposon. However, not all small RNAs can be mapped. In our study, 25–35% of small RNAs remain unknown (9). The cloning frequency of a small RNA is thought to correlate with its abundance at some levels. Northern blotting analysis would be helpful to state the abundance more directly.
4. Notes 1. It is optional to dephosphorylate small RNA samples prior to adapter ligation in order not to ligate small RNAs themselves. While omitting dephosphorylation, we did not obtain any sequences that showed that small RNAs were ligated to themselves. 2. When handling RNA, all reagents should be prevented from being contaminated by RNase. All solutions should be made using commercial RNase-free water or DEPC-treated water until RNA is reverse-transcribed to cDNA. Bench tops and pipettes should be treated with RNase Away (Fisher Scientific, Pittsburgh, PA). 3. 5¢ phosphorylated RNA oligos (21 and 24 nt in length). Any sequence can be used unless it includes biased nucleotide compositions or sequences. For example, 21 nt GFP: p-UGU GGCCGAGGAUGUUUCCGU and 24 nt GFP: p-UUGU GGCCGAGGAUGUUUCCGUCC. 4. The 5¢ end of the 3¢ adapter is phosphorylated, and its 3¢ end is modified with amination (NH2(C6)) to block the hydroxyl group and prevent ligation of the 5¢ end of the small RNA and the 3¢ end of the adapter. We used amination, but other modifications (e.g. biotinylation) that block ligation can be used. Commercial kits for cloning small RNAs such as small RNA Cloning Kit (Takara Bio) and DynaExpress miRNA Cloning KitII (BioDynamics Laboratory, Tokyo, Japan) include adapters and all reagents. They are convenient for starting up a cloning experiment.
Cloning New Small RNA Sequences
137
5. If the upper phase is not transparent or the white layer at the interface is contaminated, transfer the upper aqueous phase in a new 15 mL tube and add an equal volume of a phenol/ chloroform mixture (1:1). Vortex briefly and centrifuge at 4,500 × g for 10 min at 4°C. Then, transfer the upper phase to a new 15 mL tube. 6. A 15% polyacrylamide gel (7 M or without urea): mix 1 mL of 5× TBE buffer, 7.5 mL of 40% acrylamide/bis solution, and 4.2 g or without urea. Mess up to 10 mL with RNasefree water. Incubate at 50°C until urea dissolves, then place in the air to cool down. Add 50 µL of 20% APS and 5 µL of TEMED. Mix well by inversion, pour into glass plates, and wait for 30 min until it is solidified. 7. The gel is stained in 1 µg/mL EtBr solution for 20 min on a shaker. Handle with extreme care wearing disposable Nitrile gloves. EtBr intercalates between bases of DNA and RNA. DNA or RNA bands are visualised by exposing to a UV light trans-illuminator. 8. Phenol/chloroform and chloroform extractions: add an equal volume of a phenol/chloroform mixture (1:1), vortex briefly and centrifuge at 15,000 × g for 10 min. Move the upper phase to a new 1.5 mL tube. Add an equal volume of chloroform, vortex briefly and centrifuge at 15,000 × g for 10 min. Then move the upper phase to a new 1.5 mL tube. These procedures should be performed in a chemical fume hood. 9. After this, do not dry the pellet at all steps to avoid DNA denaturation. To avoid drying out, prepare the solution that is added after removing the supernatant in advance.
Acknowledgments We thank Toshiaki Watanabe (National Institute of Genetics) for helpful advice on the detailed cloning methods. References 1. Ramachandran V, Chen X (2008) Small RNA metabolism in Arabidopsis. Trends Plant Sci 13:368–374 2. Willmann MR, Poethig RS (2007) Conser vation and evolution of miRNA regulatory programs in plant development. Curr Opin Plant Biol 10:503–511 3. Mlotshwaa S, Prussa GJ, Vance V (2008) Small RNAs in viral infection and host defense. Trends Plant Sci 13:375–382
4. Okamura K, Chung WJ, Ruby JG, Guo H, Bartel DP, Lai EC (2008) The Drosophila hairpin RNA pathway generates endogenous short interfering RNAs. Nature 453:803–806 5. Ghildiyal M, Seitz H, Horwich MD, Li C, Du T, Lee S et al (2008) Endogenous siRNAs derived from transposons and mRNAs in Drosophila somatic cells. Science 320:1077–1081 6. Watanabe T, Totoki Y, Toyoda A, Kaneda M, Kuramochi-Miyagawa S, Obata Y et al (2008)
138
Tagami, Inaba, and Watanabe
Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 453:539–543 7. Tam OH, Aravin AA, Stein P, Girard A, Murchison EP, Cheloufi S et al (2008) Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453:534–538 8. Kurihara Y, Inaba N, Kutsuna N, Takeda A, Tagami Y, Watanabe Y (2007) Binding of
tobamovirus replication protein with small RNA duplexes. J Gen Virol 88:2347–2352 9. Tagami Y, Inaba N, Kutsuna N, Kurihara Y, Watanabe Y (2007) Specific enrichment of miRNAs in Arabidopsis thaliana infected with Tobacco mosaic virus. DNA Res 14: 227–233 10. Pfeffer S, Lagos-Quintana M, Tuschl T (2005) Cloning of small RNA molecules. Curr Protoc Mol Biol Chapter 26, Unit 264
Chapter 12 Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation and DNA Microarray Hybridization (ChIP-chip). Part A: ChIP-chip Molecular Methods Julia J. Reimer and Franziska Turck Abstract Chromatin immunoprecipitation in combination with DNA-microarray hybridization (ChIP-chip) allows the identification of chromatin regions that are associated with modified forms of histones on a genomic scale. The ChIP-chip workflow consists of the following steps: generation of biological material, in vivo formaldehyde-fixation of protein-DNA and protein-protein interactions, chromatin preparation and shearing, immunoprecipitation of chromatin with specific antibodies, fixation reversal and DNA purification, DNA amplification, microarray hybridization, and data analysis. In Part A of this chapter, we describe molecular methods of the experimental procedure employed to identify chromosomal regions of Arabidopsis thaliana associated with H3K27me3. In addition, some general information on the microarray platform from Roche-NimbleGen will be provided. Part B of this chapter focuses on ChIPchip data analysis of H3K27me3 on the Roche-NimbleGen platform. Key words: ChIP-chip, Linker-mediated PCR, Amplification, Two-color microarray RocheNimbleGen, H3K27me3
1. Introduction Gene expression states are reflected in the occurrence of specific local histone modification patterns that can be the basis of molecular memory. In Arabidopsis, in particular, dimethyl-lysine 9 of histone 3 (H3K9me2) and trimethyl-lysine 27 of histone 3 (H3K27me3) are correlated with distinct and nonoverlapping genetic pathways of epigenetic gene repression (1). The combination of chromatin immunoprecipitation (ChIP) with DNA microarray (ChIP-chip) or high-throughput sequencing Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_12, © Springer Science + Business Media, LLC 2010
139
Reimer and Turck
technologies (ChIP-seq) has provided powerful tools to map local histone modifications at a genomic level. In this combined chapter, we describe a method to identify genomic targets of H3K27me3 in Arabidopsis thaliana by ChIP-chip experiments based on a commercially available microarray from Roche NimbleGen. However, the method can easily be adapted to probe for other histone modifications. The full workflow for the outlined ChIP-chip experiment encompasses five discrete steps: (1) Formaldehyde fixation (crosslinking) of Arabidopsis seedlings followed by ChIP using a-H3K27me3 antibodies. (2) Amplification of immunoprecipitated (IP) and input DNA. (3) Probe labeling and array hybridization. (4) Data normalization and target identification. (5) Meta-analysis (Fig. 1).
Part A Molecular Methods
Genome-wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation and DNA Microarray Hybridization (ChIP-chip)
Step 1 Formaldehyde mediated cross-linking and chromatin immunoprecipitation
Step 2 DNA amplification and quality control
Step 3 Sample labeling and microarray hybridization Part B Data Analysis
140
Step 4 Data normalization and target definition
Step 5 Meta-analysis
Fig. 1. A general overview of worksteps in ChIP-chip. The first two steps, chromatin immunoprecipitation and DNA-amplification, are described in Part A (ChIP-chip Molecular Methods) of this chapter. The third step, microarray hybridization, is usually in the hand of a service provider, and therefore an experimental procedure is not included in this chapter. However, we provide some details on the Roche-NimbleGen microarray platform in the introduction to Part A. Part B of this chapter (ChIP-chip data analysis) is dedicated to data processing and target identification. Analysis of genomic data includes many approaches of cross-database comparison (Meta-analysis)
Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation
141
Grow seedlings
Formaldehyde-crosslink and chromatin preparation (see section 3.1)
Quality check sonication: pilotdecrosslinking of chromatin (see section 3.2)
50-200 µl per sample
Quality check ChIP: control PCR (see section 3.3.14)
ChIP (see section 3.3) 20 µl per input sample 40 µl per IP sample
First round of LM-PCR (see section 3.4)
Quality check amplification: real time PCR (see section 3.5)
200 ng DNA per sample
Second round of LM-PCR (see section 3.4)
optional
Quality check amplification: real time PCR (see section 3.5)
6 µg per probe
DNA labeling and microarray hybridization
Fig. 2. A detailed flowchart of the procedure (ChIP-chip Molecular Methods)
These five main steps are further subdivided into different tasks. Typically, the first two steps of the ChIP-chip work-flow are in the hands of an experimental biologist. Part A of this chapter (ChIP-chip Molecular Methods) describes molecular methods in detail (Fig. 2). Step 3, probe labeling and array hybridization, is typically carried out by a service provider, therefore procedures will not be described here. However, an introduction to Part A presents some important details on probe labeling and Roche-NimbleGen microarray platform. ChIP-chip data analysis, step 4 of the workflow, requires at least moderate training in bioinformatics. Therefore, ChIPchip data analysis usually represents a bottleneck in an experimentally working laboratory. Part B of this chapter describes some basic principles of ChIP-chip data analysis. Part B also introduces the reader to the R statistical programming language and provides a package of commands written in R enabling ChIP-chip data analysis without an expertise in bioinformatics. In order to extract biological sense from genome-scale ChIPchip data, they need to be weighed against biological knowledge stored in other relevant genome-scale databases. Some aspects
142
Reimer and Turck
of cross-database analysis, also termed meta-analysis, are introduced in the chapter “Metaanalysis of ChIP-chip data.” 1.1. Outline of Molecular Methods
The ChIP step of the ChIP-chip workflow (Figs. 1, 2) has already been a subject of many method and review publications (2–4). In brief, the biological material is fixed by the addition of a crosslinking agent (usually formaldehyde (FA)) that freezes protein-DNA interactions by the formation of covalent chemical bonds. Chromatin is prepared from the fixed material and fragmented into uniform, smaller sized pieces. Protein-DNA complexes are purified by immunoprecipitation with the help of specific antibodies that recognize a DNA-binding protein or histone modification of interest. Covalent protein-DNA bonds are subsequently reversed, and the DNA is purified for further analysis. In direct ChIP, the DNA is probed via locus specific PCR reactions. Here, the DNA present in a specific ChIP sample is quantified in relation to the input DNA (the starting material), and the amount of precipitated material is compared between enriched and nonenriched positions in the genome. For ChIP-chip experiments, the DNA from a specific ChIP sample needs to be amplified because the amount of DNA recovered is several orders of magnitude below that required for microarray hybridization. Several methods suited for global DNA amplification have been used in ChIP-chip experiments (5, 6). No matter which protocol is employed, amplification introduces skewing of the relative concentration of DNA fragments present in the original precipitate. Therefore, it is necessary to amplify the input DNA in parallel to the ChIP-sample, although the amount of input DNA that can be recovered may allow direct usage in microarray hybridization.
1.2. Technical Considerations Per Experimental Step
Histone/DNA-interactions are relatively stable, and ChIP of histone modifications does not necessarily require chemical fixation prior to chromatin preparation (7). The covalent fixation enhances the stability of histone-DNA interactions and therefore allows for a higher stringency of the washing steps during immunoprecipitation. However, the fixation also causes epitope masking, thereby reducing the efficiency of antibody recognition and target precipitation. As a compromise, fixation times should not exceed 30 min at room-temperature in 1% FA. In general, chromatin suitable for ChIP does not require high purification, and several published protocols omit nuclear enrichment in its preparation. However, we have found that crude nuclear enrichment enhances the precipitation efficiency of plant chromatin. Nuclear enrichment of fixed material can be performed in highdetergent buffer, which is directly compatible with the immunoprecipitation step. Preferably, we use a method compatible with the isolation of unfixed nuclei as this method preserves
1.2.1. Formaldehyde Cross-Linking and ChIP
Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation
143
the nuclear protein composition and is more compatible with western-blot analysis. Prior to immunoprecipitation, the DNA needs to be sheared by sonication to obtain fragments of uniform length around 500 bp. While sonication is sometimes difficult to control, its reproducibility is particularly important for quantitative extract to extract comparisons. Properly cross-linked Arabidopsis nuclei will not break without relatively high concentrations of SDS. To avoid undesirable foaming during sonication, in our protocol nuclei are first cracked in high SDS and then diluted to become more sonciation-compatible. The use of high-energy bath sonicators greatly improves the reproducibility of the sonication step and is recommended, if available. The quality of antibodies against histone modifications is a further critical issue. Antibody cross-reactivity toward other modified forms of the epitope should be minimal or at least controlled by peptide competition assays. In addition, antibodies must be compatible with denaturing and fixative conditions used in the ChIP assay. Several companies offer ChIP-grade antibodies against histone modifications, which have been tested to various degrees to comply with these criteria (e.g., Abcam, MilliporeUpstate). In particular, the researcher should be aware of the available data on cross-reactivity of a given antibody. 1.2.2. Amplification
Different strategies have been described to amplify genomic DNA. These strategies are based on random amplification, amplification after linker-ligation or encompass a RNA-mediated amplification step. The key is to insure that the amplification is still proportional for all parts of chromatin. Thus, the best theoretical method can go wrong if this step is not controlled. We use a linker-mediated PCR (LM-PCR) approach which is a modified version of the method first described by Ren et al. (8). DNA samples are polished (a.k.a. blunted) and ligated to a phosphorylated oligonucleotide linker. The material is PCR-amplified in two successive runs of PCR with linker-specific primers. The amplification is not expected to be in the linear range, but it should be in the semi-quantitative (proportional) range for all fragments. In each run, the quality of amplification is assessed by real-time PCR based on control primer pairs. These control primer pairs amplify genomic regions that are known targets of H3K27me3, the histone modification recognized by the antibodies used in the sample protocol. In addition, other control regions are amplified that are not enriched in H3K27me3. It is not advisable to start a full scale ChIP-chip experiment without the idea about at least some expected targets and nontargets. Rigorous control of LM-PCR- mediated DNA amplification is crucial to obtain interpretable data in the next steps. In particular, it should be noted that ChIP samples and input reference samples amplified in parallel are very different in complexity and therefore
144
Reimer and Turck
behave differently in the PCR. Consequently, optimization of PCR amplification should be carried out separately for input and ChIP sample DNA. 1.2.3. Microarray Platform
Several DNA microarray platforms are available for Arabidopsis whole genome analysis. In our workflow, we use the commercially available two-color 385 K whole-genome tiling array for Arabidopsis from Roche NimbleGen. In this platform, a set of three slides with 385,000 probes per slide (in total 1,155,000 probes) covers the whole genome. Each probe is a 50mer oligonucleotide. The median probe spacing is 90 bp, but this spacing can be considerably larger within repeat regions. The microarray is hybridized in a single step with two differentially color-labeled probes: a ChIP sample and a reference. The reference can be nonprecipitated (but amplified) input DNA or a ChIP precipitate purified with antibodies against unmodified histones. Unlabeled DNA slightly fluoresces in the green channel; therefore it is preferable to label the reference sample with a green dye (e.g., Cy3) and the IP-sample – with a red one (e.g., Cy5). After hybridization, the arrays are scanned to create a pixelintensity image that is translated to an intensity-value file using Nimblescan software. While acquiring the image, the photomultiplicator gain (the PMT gain) of a scanner for both channels should be adjusted, so that both channels generate pictures of similar intensities. Nimblescan generates several tab delimited text files that contain intensity data. Raw data files with the .pair extension contain probe position identifiers together with raw intensity data (a linear scale) for one slide and label. In addition to raw data files, Nimblescan generates text files in the General Feature Format 3 (GFF3). These .gff files contain log2 ratios of intensities from two .pair files that correspond to samples hybridized with the same array. GFF3 files also contain position information for each probe as referenced to the Arabidopsis genome sequence. ChIP-chip results in GFF3 format can be visualized with the Roche-NimbleGen inclusive SignalMap browser. A free demo version of SignalMap can be downloaded for 30 days from the RocheNimbleGen homepage (http://www.nimblegen.com/products/ software/signalmap.html). Part B of the chapter “Genome-wide mapping of protein-DNA interaction by chromatin immunoprecipitation and DNA microarray hybridization (ChIP-chip)” includes sample data that were generated by using the method described in this part. SignalMap can be useful to display the raw data and the results of more sophisticated data analysis steps that are described in the following chapter.
Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation
145
2. Materials Precise ordering information is only indicated if the source of supply is likely to have an impact on the experimental outcome. 2.1. Formaldehyde Cross-Linking and Chromatin Preparation
1. 37% Formaldehyde (FA) stabilized in 10% Methanol (Merck) (see Note 1). Caution: Formaldehyde is toxic, allergenic, and a possible carcinogen. Therefore, handle in a fume hood and dispose of waste according to safe laboratory procedures. 2. 1 M glycine. 3. Plant Proteinase Inhibitor Cocktail (PPIC, Sigma-Aldrich). 4. Phosphate buffered saline (PBS): 135 mM NaCl, 2.7 mM KCl, 8 mMNa2HPO4*2H2O,1.5 mMKH2PO4.AdjustthepHto7.4with HCl. 5. Nuclei Isolation Buffer (NIB): 50 mM Hepes/NaOH (pH 7.4), 5 mM MgCl2, 25 mM NaCl, 5% sucrose, 30% glycerin, 0.25% Triton X-100. Store at 4°C and add 0.1% b-mercaptoethanol (b-ME), 0.1% PPIC prior use (see Note 2). Caution: b-ME is toxic. Therefore, handle in a fume hood and dispose of waste according to safe laboratory procedures. 6. 3× Wash Buffer (WB) stock: 50 mM Hepes/NaOH (pH 7.4), 20 mM MgCl2, 100 mM NaCl, 40% sucrose, 40% glycerin. Store at 4°C. Make 1× WB by mixing 60 ml 3× WB stock with 120 ml H2O and 0.45 ml Triton X-100. Add 180 ml b-ME and 180 ml PPIC prior use. Caution: b-ME is toxic. Therefore, handle in a fume hood and dispose of waste according to safe laboratory procedures. 7. TE: 10 mM Tris/HCl (pH 7.4) and 1 mM EDTA (pH 8.0). 8. 0.5% SDS in TE (0.5% SDS/TE). Caution: SDS is highly inflammable, and harmful by inhalation. Therefore, handle in a fume hood. 9. Liquid N2. 10. Nylon mesh 70 µm and 20 µm. 11. Bioruptor (Diagenode) or Tip sonicator with microtip (Branson). 12. Cooled centrifuge for 50 ml plastic tubes and 1.5 ml tubes. 13. Vacuum bell and pump. 14. Rotation Mixer.
2.2. De-crosslinking of Chromatin and Control PCR
1. 5 mg/ml Proteinase K in H2O. 2. 10 mg/ml RNAse A.
146
Reimer and Turck
3. Phenol/TE and Phenol/Chloroform/Isoamylalcohol (25/ 24/1). Caution: Phenol/TE and Phenol/Chloroform/ Isoamylalcohol are toxic. Therefore, handle in a fume hood and dispose of waste according to safe laboratory procedures. 4. 3 M sodium acetate (NaOAc). 5. 20 mg/ml glycogen. 6. 100% EtOH. Caution: EtOH is highly inflammable. 7. 70% EtOH Caution: EtOH is highly inflammable. 8. PCR primer pairs, see Table 1. 2.3. ChIP
1. 10% SDS. Caution: SDS is highly inflammable and harmful by inhalation. Therefore, handle in a fume hood. 2. 1 M Dithiothreitol (DTT). 3. rProtein A Healthcare).
Sepharose™
FastFlow
50%
slurry
(GE
4. 1 M Tris/HCl (pH 9.7). 5. Antibodies: a-rat IgG (whole molecule), from rabbit (R9255 Sigma-Aldrich), a-H3K27me3, from rabbit (07-499 Millipore-Upstate). 6. Taq-Polymerase (4 U/ml, home made). 7. 0.25% SDS in TE (0.25% SDS/TE). Caution: SDS is highly inflammable and harmful by inhalation. Therefore, handle in a fume hood. 8. Immunoprecipitation Dilution Buffer (IP-dil): 80 mM Tris/ HCl (pH 7.4), 230 mM NaCl, 1.7% NP40 and 0.17% Deoxycholate (DOC). 9. RIPA: for 100 ml take 60 ml IP-dil add 1 ml 10% SDS and 39 ml H2O. Caution: SDS is highly inflammable and harmful by inhalation. Therefore, handle in a fume hood. 10. Glycine Elution Buffer (GEB): 100 mM glycine (pH 2.5), 0.5 M NaCl, 0.05% SDS. Caution: SDS is highly inflammable and harmful by inhalation. Therefore, handle in a fume hood. 11. All solutions used in Subheading 2.2. 2.4. Linker-Mediated Amplification (LM-PCR)
1. T4 DNA Polymerase (3,000 U/ml) (New England BioLabs). 2. 10 mg/ml bovine serum albumin (BSA) in H2O (provided with the T4 DNA Polymerase from New England BioLabs). 3. 10 mM dNTP. 4. T4 DNA Ligase (400,000 U/ml) (New England BioLabs).
TTTTCCACCAACTTCTTGCAT
AGGGCGAGATACATGTGGAC
CACCTGCCGTTTCAAGAACT
GGCCACATTGTTGGTAGCTT
TAATTGAGCCACGACATTGC
Primer pair 2
Primer pair 3
Primer pair 4
Primer pair 5
Sequence forward primer
Primer pair 1
Table 1 PCR primer pairs
Chr 4 5198381
Chr 4 1675304
Chr 4 42685
Chr 4 12026594
Chr 1 24335693
Chr start position
GGTCCGCTTGTACCCAAGTA
TCGCAAGTTCTAGCCGATTT
CCGCCGTAACGTAAGGATAA
CACCAGTGCTACTGCTAGGC
AGGCTGGCTTGAATATCAGAA
Sequence reverse primer
Chr 4 5198527
Chr 4 1675504
Chr 4 42862
Chr 4 12026788
Chr 1 24335992
Chr start position
Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation 147
148
Reimer and Turck
5. 100 µM oligo 1 (5¢-GCGGTGACCCGGGAGATCTGA ATTC-3¢, HPLC-purified), 100 mM oligo 2 (5¢-GAATTCA GATC-3¢, HPLC-purified). 6. 1 M Tris/HCl (pH 9.7). 7. TaKaRa La Taq (5 U/ml) (TaKaRa Bio Inc). 8. 2.5 mM dNTP (TaKaRa Bio Inc, provided with the TaKaRa La Taq). 9. NucleoSpin® Extract II (MACHEREY-NAGEL). 10. Heating/cooling block. 2.5. Quality Control for Linker-Mediated Amplification
1. SYBR Green Supermix. 2. Control PCR primer pairs as in subheading 2.2, Item 8.
3. Methods An overview of experimental steps is found as a flowchart in Fig. 2. 3.1. FormaldehydeCrosslink and Chromatin Preparation
This part needs about half a day of work. 1. Harvest 10 day-old seedlings from two or three 9 cm plates (1–3 g wet weight) into 50 ml PBS. Add 1.35 ml of 37% FA and mix well. 2. Apply vacuum in a vacuum bell for 5 min, then slowly release vacuum and incubate for 5 min. Thereby, twirl seedlings to mix–seedlings should be suspended in liquid and not floating. Repeat the procedure once (see Note 3). 3. Quench the cross-linking reaction by adding 5.5 ml 1 M glycine solution to a final concentration of 0.1 M glycine and incubate for 3 min. 4. Harvest the plants by filtration over filter paper and funnel, blot the material dry with filter paper or Kleenex, wrap in aluminum foil, and freeze immediately in liquid N2. The material should be stored at −80°C until further needed. 5. Grind cells to fine powder in a mortar and pestle in liquid N2. Transfer frozen tissue powder to a 50 ml screw-cap tube using a cooled spatula and add NIB to the 30 ml mark. Ensure that the powder is well dispersed in the liquid. Thaw on ice while processing up to 10 parallel samples. 6. Filter through a double layer of 70 mm and 20 mm nylon meshes. Place 70 mm mesh above 20 mm mesh and place the double layer into a filter funnel. Funnel the filtrate through
Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation
149
the double layer into a fresh 50 ml tube. Twirl the mesh to press the filtrate through the filter. Rinse nylon meshes and adjust volume to 30 ml with NIB. 7. Centrifuge for 20 min at 4°C, 3,000×g. Resuspend the pellet gently in 1 ml 1× WB, and then add more 1× WB to the 20 ml mark. 8. Centrifuge for 20 min at 4°C, 3,000×g. Resuspend the pellet in 0.5–1 ml 0.5% SDS/TE and transfer to a 15 ml tube. 9. Incubate for 20–60 min on a rotation mixer at 4°C. 10. Add 1 volume of TE buffer (see Note 4). 11. Place 15 ml tubes in a Bioruptor adapter and sonicate for ten cycles (1 cycle = 30 s sonication, 1 min off, power setting = high) in the Bioruptor. Add ice to the water in the Bioruptor to ensure cooling. (Alternative: Sonicate in an iceEtOH bath in a Branson microtip-sonicator at position 3–5, constant 50% duty cycle, 4–5 times 30 s pulses separated by 1 min cooling on ice-EtOH, carefully avoid freezing of the sample during the cooling). 12. Centrifuge for 15 min at 4°C, 3,000×g. Transfer the supernatant to 2 ml tubes and freeze in liquid N2. The extract (1–2 ml) can be stored for several weeks at −80°C. It supports several rounds of freeze-thaw; there is no need to aliquot. 3.2. Pilot De-Crosslinking of Chromatin
This part will be done to check the quality of chromatin preparation. It takes about half a day and one over night incubation. 1. Add 150 ml of 0.25% SDS/TE and 5 ml of 5 mg/ml Proteinase K to 50 ml aliquot of the sonicated chromatin from Subheading 3.1, step 12. Incubate this preparation at 37°C for 4–5 h. 2. Increase temperature to 65°C for at least 6 h (see Note 5). 3. Extract with phenol/TE: add 200 µl phenol/TE, vortex for 1 min and centrifuge for 5 min at 20,000× g. Transfer the aqueous phase to a fresh tube. 4. Extract with 200 ml of phenol/chloroform/isoamylalcohol (25/24/1) as described above for extraction with phenol/TE. 5. Add 1 ml of RNAseA (10 mg/ml) to the aqueous phase and incubate for 15 min at room temperature (see Note 6). 6. Add 20 ml of 3 M NaOAc, 500 ml of 100% EtOH, 1 ml of 20 mg/ml glycogen, and precipitate at −20°C over night. 7. Pellet DNA by centrifugation for 15 min, 4°C, and 20,000×g.
150
Reimer and Turck
8. Wash the pellet once with 500 ml of 70% EtOH and centrifuge for 15 min, 4°C, and 20,000×g. 9. Remove supernatant and let the pellet dry at room temperature for at least 10 min. 10. Dissolve the pellet in 100 ml H2O. 11. Quantify DNA in a spectrophotometer at OD 260 without further dilution (recover the sample after measuring in a disposable UV-compatible plastic cuvette or use a Nano-Drop). The typical estimated DNA concentration varies between 20 and 70 ng/ml. If the obtained DNA concentration falls below 20 ng/ml, chromatin preparation was not successful, and the experiment should be repeated (see Note 7). 12. Analyze size distribution of fragments on a 1% agarose gel by loading 20–30 ml of DNA. The bulk of the DNA that runs as a smear should be around 0.3–1 kB (Fig. 3a). If DNA is too large or partially sheared, try resonication of the sample and repeat pilot de-crosslinking. 3.3. ChIP
All steps are performed in 1.5 ml tubes on ice and do not need so much time, but they include three over night incubations. 1. Use nuclear extract equivalent to 200–400 ng of DNA based on OD260 measurements (usually 50–100 ml of chromatin preparation) and equalize samples to a total volume of 200 ml with 0.25% SDS/TE. Add 300 ml IP-dil, 1 ml 1 M DTT and 1 ml PPIC per sample. 2. Add 5 µl of a-H3K27me3 antibodies to each specific ChIP and 1 ml of a-rat IgG to control ChIP reactions. Incubate over night at 4°C while mixing on a rotation mixer (see Note 8). 3. Centrifuge for 15 min at 4°C, 20,000×g and transfer supernatant to a fresh tube (see Note 9). 4. Wash rProtein A Sepharose FastFlow; add 1 ml of RIPA to N × 30 ml of rProtein A Sepharose (50% slurry). N = number of samples; e.g., if N = 4, use 120 ml of rProtein A Sepharose in this step. Mix by inversion, centrifuge for 1 min at 20,000×g; discard the supernatant. Repeat the washing step two times. Add N × 15 ml of RIPA after the last wash step. 5. Add 30 ml of washed rProtein A Sepharose (50% slurry) to each sample. Incubate for 1–2 h at 4°C while mixing on a rotation mixer (see Note 10). 6. Centrifuge for 1 min at 4°C, 20,000 × g. Save all supernatants for further use as input samples (see Note 11). 7. Wash the ChIP reaction four times by adding 1 ml of RIPA to beads. Resuspend the beads by tube inversion, centrifuge for 1 min at 4°C and 20,000×g; remove the supernatant after centrifugation.
151
Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation
a
M
1
2
3
4
b
IP
input 10−4 10−3 10−2 10−1
IP input 10−4 10−3 10−2 10−1
1
1
Primer Pair 1 1.6 1.0 0.5
c
Primer Pair 5
1.8 1.6 1.4
140
ChIP1:10 α-rat IgG ChIP1:10 α-H3K27me3 Input1:100 α-rat IgG Input1:100 α-H3K27me3
100 % input
1.0 0.8 0.6
80 60 40
0.4
20
0.2 0.0
ChIP 1:1000 α-rat IgG ChIP 1:1000 α-H3K27me3 Input 1:100 α-rat IgG Input 1:100 α-H3K27me3
120
1.2 % input
α-rat IgG
α-H3K27me3
Primer Pair 2 Primer Pair 3 Primer Pair 4 Primer Pair 5
0
2 1.
3 2.
1.
4 2.
1.
5 2.
1.
2.
Primer Pair LM PCR
Fig. 3. (a) Quality control of sonicated chromatin. After cross-link reversal and DNA-purification, an aliquot per extract was analyzed by agarose-gel electrophoresis (1 % agarose). An uniform size distribution centered at 0.5–1.0 kB is expected. In this example, only chromatin in lane 2 and 4 showed acceptable results. Chromatin in lane 1 and 3 does not pass quality control and is resonicated. Samples in lane 1 and 2, although being similar in the DNA amount, will have very different ChIP precipitation efficiencies. (b) Quality control of ChIP. PCR was performed with primer pair 1 amplifying an H3K27me3 enriched region (top) and primer pair 5 amplifying a nonenriched region (bottom). The samples precipitated with a-H3K27me3 antibodies are shown on the left together with their corresponding input dilution series, a-rat IgG precipitates are shown on the right. The a-rat IgG ChIP sample should not show a detectable signal in the semi-quantitative range (background indicates insufficient washing of precipitates or other experimental problems). In contrast, the nonenriched region will be amplified in H3K27me3 ChIP because nonspecific DNA is carried along (entanglement) during specific ChIP. At least, a ten-fold enrichment should be observed between control regions and true targets (compare the top and the bottom signal on the left side). (c) Quality control of linker-mediated PCR amplification. Primer pairs 2–5 were used in Real-Time PCR as indicated in Table 12.7. A dilution series of the nonamplified input of control ChIP is used as standard. Amplification of nonamplified samples is shown on the left and results form the first (1. LM-PCR) and second (2. LM-PCR) linker-mediated PCR on the right. Samples and dilutions are as indicated on the graph. During amplification, absolute proportions between precipitated and input DNA change greatly because of the inherent difference in the complexity of the reaction mixture. Input samples are amplified 20–100-fold (compare the light and dark gray bars between panels), IP samples – 2,000–10,000-fold (compare the black bars). In contrast, relative proportions within samples should not change greatly: ratios between the enriched and non-enriched regions in H3K27me3 IP are still five to ten-fold (compare the black bars between primer combinations)
8. Add 800 ml of RIPA and transfer liquid and beads to a fresh tube. Centrifuge as above and carefully remove the supernatant. Recentrifuge beads and remove the remaining liquid using a 20 ml volume-pipette (see Note 12).
152
Reimer and Turck
9. Elute precipitated chromatin in 200 ml of GEB. Vortex for 30 s; centrifuge for 1 min at 4°C–24°C, 20,000×g. Repeat the elution step, and pool the eluates in a single tube. Neutralize pH by adding 100 ml 1 M Tris/HCl (pH 9.7). These are the ChIP samples. 10. Add 10 ml of 10 % SDS and 5 ml of 5 mg/ml Proteinase K to each ChIP sample as in Subheading 3.3, step 7. Add 10 ml of 5 mg/ml Proteinase K to the input sample as in Subheading 3.3, step 4. 11. Incubate all probes for 3–5 h at 37°C, then over night at 65°C as in Subheading 3.2., pilot de-crosslinking steps 1 and 2. 12. Extract with phenol/TE and phenol/chloroform/isoamylalcohol (25/24/1) as in Subheading 3.2, steps 3 and 4. 13. Precipitate DNA by adding 1/10 volume of 3 M NaOAc, 2.5 volumes of 100% EtOH and 1 ml of 20 mg/ml glycogen over night at −20°C. 14. Centrifuge for 15 min at 4°C, 20,000×g. Wash the pellet with 800 ml of 70% EtOH and recentrifuge for 15 min at 4°C, 20,000×g. Remove all liquid and dry the pellet at room temperature for a minimum of 10 min. 15. Resuspend the pellet in 100 ml of H2O. 16. Check the quality of immunoprecipitation with all prepared input and ChIP samples from the previous step via PCR with a region-specific primer pair (e.g., pair 1 in subheading 2.2) and a control for a nontarget region (e.g., pair 5 in subheading 2.2) (see Note 13). Prepare 1:10, 1:100, 1:1,000, and 1:10,000 dilutions of the input sample. Use 4 ml of each input dilution, the undiluted input sample, and 4 ml of the ChIP sample, as a template for the PCR. Prepare PCR mix as indicated in Table 2. Add 21 ml of this mix to each probe. Program
Table 2 PCR Mix for semi-quantitative test of the ChIP experiment 1 sample 5 samples
10 samples
10 × Taq buffer (ml)
2.5
12.5
25
10 mM dNTP (ml)
0.5
0.5
5
10 mM Primer pair 1 (ml)
2
10
20
Taq ploymerase (4 U/ml) (ml)
0.25
1.25
2.5
H2O (ml)
15.75
78.25
157.5
Total (ml)
21
105
210
Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation
153
the thermocycler for the following conditions: 1 min at 72°C, 30 s at 94°C, 30 s at 72°C, 30 s at 58°C, repeat the last three steps for additional 32 times. Load a 2% agarose gel with the PCR products. Usually, an a-H3K27me3 antibody precipitates 1–10% of the input from a target region, and no target signal should be detected in the a-rat IgG control ChIP. However, a minor signal will be detected in nontarget regions in the a-H3K27me3 samples due to nonspecific co-purification of chromatin during successful ChIP (Fig. 2b). 3.4. Linker-Mediated Amplification (LM-PCR)
This protocol is a modified version of the protocol from Ren et al. (8). Place samples for all steps on ice. We usually add one control probe (H2O) for every step to control contamination. The whole part needs about two half-days of work separated by one overnight incubation. 1. Prepare 15 mM annealed oligos 1 and 2. Use 25 ml of 1 M Tris/HCl (pH 9.7), 15 ml of oligos 1 and 2 (100 mM), and 45 ml H2O. Incubate for 5 min at 85°C in an incubator block. Switch off the heating block and allow to slowly cool to room-temperature. Incubate at 4°C over night. Aliquots of the annealed linker should be stored at −20°C. 2. Take 20 ml of input samples and 40 ml of IP samples, add H2O to a total volume of 100 ml (see Note 14). 3. Prepare the blunting mix as indicated in Table 3. Add 12.2 µl of this mix to each probe and incubate for 20 min at 12°C. 4. Prepare the NaOAc/glycogen mix as indicated in Table 4 and add 12 ml to each sample. Mix well. Add 120 ml of phenol/chloroform/isoamylalcohol (25/24/1) and mix well for at least 1 min on a vortex mixer. Centrifuge for 5 min with 20,000×g at 4°C.
Table 3 Blunting mix
10× T4 DNA polymerase buffer (ml)
1 sample
5 samples 10 samples
11
55
110
10 mg/ml BSA (ml)
0.5
2.5
10 mM dNTP (ml)
1
5
10
T4 DNA polymerase (3 U/ml) (ml)
0.2
1
2
12.7
63.5
Total (ml)
5
127
154
Reimer and Turck
Table 4 NaOAc/glycogen mix 1 sample
5 samples
10 samples
3 M NaOAc (ml)
11
55
110
20 mg/ml glycogen (ml)
1
5
10
Total (ml)
12
60
120
Table 5 Ligase mix 1 sample 5 samples 10 samples 10× T4 DNA ligase buffer (ml)
5
25
50
15 mM annealed linker (ml)
6.7
33.5
67
T4 DNA ligase (400 U/ml) (ml)
0.5
2.5
5
H2O (ml)
13
Total (ml)
25.2
13
130
126
252
5. Transfer the aqueous phase to a new tube and add 230 ml of 100% EtOH. Mix well and incubate for 15–30 min at −80°C. 6. Centrifuge for 15 min, 20,000×g at 4°C, wash the pellet with 500 ml of 70% EtOH, and recentrifuge for 15 min at 4°C, 20,000×g. 7. Remove all liquid and dry the pellet at room temperature for a minimum of 10 min. Resuspend polished DNA in 25 ml H2O. 8. Prepare the ligase-mix as indicated in Table 5. Add 25 ml of ligase-mix to each probe and incubate at 16°C over night. 9. Add 6 ml of 3 M NaOAc and 130 ml of 100% EtOH and mix well. Incubate for 15–30 min at −80°C. 10. Centrifuge for 15 min, 20,000×g at 4°C, wash the pellet with 500 ml of 70% EtOH, and recentrifuge for 15 min at 4°C, 20,000×g. 11. Remove all liquid and dry the pellet at room temperature for a minimum of 10 min. Resuspend the pellet in 25 ml of distilled water.
Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation
155
Table 6 PCR Mix for linker-mediated amplification (LM-PCR) 1 sample 5 samples
10 samples
10× TaKaRa LA Taq buffer (ml)
5
25
50
2.5 mM dNTP (ml)
5
25
50
100 mM oligo 1 (ml)
0.5
2.5
TaKaRa LA Taq (5 U/ml) (ml)
1
5
10
H2O (ml)
13.5
67.5
135
Total (ml)
25
125
5
250
12. Prepare the PCR mix as indicated in Table 6. Add 25 µl of the PCR-mix to each probe. 13. Program the thermocycler for the following cycle conditions: 2 min at 55°C, 5 min at 72°C, 2 min at 95°C, 1 min at 95°C, 1 min at 60°C, 2 min at 72°C; repeat the last three steps 21 times. 14. Purify the PCR product using a NucleoSpin Extract II kit from MACHEREY-NAGEL according to the manufacturer’s instructions. Elute the DNA in the last step of this protocol while using 50 ml of the provided elution buffer. 15. Measure the DNA content using a UV-spectrometer. The ratio 260 nm/280 nm should be greater than 1.7, and the concentration should be around or greater than 80 ng/ml for all samples excluding the control ChIP sample (see Note 15). 16. If required, start a second run of PCR. Prepare the PCR mix as indicated in Table 5. Take 200 ng of the PCR product from the first run and equalize all samples to a volume of 25 ml with H2O. Add 25 ml of the PCR mix to each probe as indicated in Table 12.6. 17. Program the thermocycler for the following cycle conditions: 2 min at 55°C, 5 min at 72°C, 2 min at 95°C, 1 min at 95°C, 1 min at 60°C, 2 min at 72°C, repeat the last three steps 5 times, and then add 5 min at 72°C. 18. Purify the PCR product with a NucleoSpin Extract II kit from MACHEREY-NAGEL following the manufacturer’s instructions. Elute the DNA in the last step of this protocol while using 50 ml of the provided elution buffer. 19. Measure the DNA-content using a UV-spectrometer (see before). 20. If necessary, repeat steps 16–19 in Subheading 3.4 until the amount of DNA is sufficient.
156
Reimer and Turck
3.5. Quality Control for Linker-Mediated Amplification
We use at least four primer pairs to control that LM-PCR amplification kept the proportion of DNA fragments similar to starting material. Primer pairs 2 and 3 amplify chromosomal regions enriched in a-H3K27me3 ChIP, while the chromo somal regions amplified by primer pairs 4 and 5 are not enriched (see Table 1 and Note 13). It is best to use real-time PCR for quantification, since a concentration range is very large, and it is difficult to quantify using semi-quantitative gel approaches. Quality control takes about one day of work. 1. Prepare a dilution series ranging from 1:10, 1:100, and 1:1,000 to 1:10,000 with the input sample from the control ChIP experiment (with a-rat IgG antibody used). This will be used as a standard curve. Prepare also a 1:10 dilution of all other ChIP samples and a 1:100 dilution of all input samples. Prepare a 1:100 dilution of input samples after the first (1. LM-PCR) and the second (2. LM-PCR) round of linkermediated amplification of all samples. Prepare a 1:1,000 dilution of the ChIP samples after 1. LM-PCR and 2. LM-PCR. 2. Prepare two 96-well plates with diluted samples as indicated in Table 7. Two µl of diluted sample should be used in triplicate. One prepared plate can be used to test two different primer pairs. 3. Prepare the PCR mix as indicated in Table 8 for primer pairs 2 to 5. Add 18 ml of the mix with primer pair 2 to all wells in row A to D in plate 1. Prepare row E to H of plate 1 with the PCR mix prepared with primer pair 3. Do the same with PCR mixes prepared with primer pairs 4 and 5 in the second plate, respectively (see Note 16). 4. Program the real time PCR machine for the following conditions: 3 min at 95°C, 10 s at 95°C, 20 s at 60°C, 20 s at 72°C while detecting a fluorescence signal, repeat the last four steps 40 times. Afterward, run a melting curve protocol according to your real-time PCR specifications. 5. The PCR efficiency should be above 90%, and the log value of the calculated starting concentration of every probe should be within the range of the standard curve. Analysis of the melting curve ensures that only one PCR product is detected. Due to the inherent difference in complexity of the reaction mixture, the absolute proportion between precipitated and input DNA varies greatly after amplification. Input samples are amplified 20–100 fold, while IP samples are amplified 2,000–10,000 fold. Nevertheless, the relative proportion within the samples should not greatly change (Fig. 2c).
1:10 ChIP a rat 1:1,000 1. LM-PCR ChIP a rat 1:1,000 2. LM-PCR ChIP a rat
G
H
1:10 Input ChIP a rat
E
F
1:1,000 2. LM-PCR ChIP a rat
D
Primer pair 3 or 5
1:1,000 1. LM-PCR ChIP a rat
C
1:10 Input ChIP a rat
3
1:10 ChIP a rat
Primer pair 2 or 4
2
B
A
1
5
6
1:100 2. LM-PCR Input ChIP a rat
1:100 1. LM-PCR Input ChIP a rat
1:100 Input ChIP a rat
1:100 Input ChIP a rat
1:100 2. LM-PCR Input ChIP a rat
1:100 1. LM-PCR Input ChIP a rat
1:100 Input ChIP a rat
1:100 Input ChIP a rat
4
Table 7 A 96-well plate for real-time PCR quality control 8
9
1:1,000 2. LM-PCR ChIP a H3K27me3
1:1,000 1. LM-PCR ChIP a H3K27me3
1:10 ChIP a H3K27me3
1:1,000 Input ChIP a rat
1:1,000 2. LM-PCR ChIP a H3K27me3
1:1,000 1. LM-PCR ChIP a H3K27me3
1:10 ChIP a H3K27me3
1:1,000 Input ChIP a rat
7
11
12
1:100 2. LM-PCR Input ChIP a H3K27me3
1:100 1. LM-PCR Input ChIP a H3K27me3
1:100 Input ChIP a H3K27me3
1:10,000 Input ChIP a rat
1:100 2. LM-PCR Input ChIP a H3K27me3
1:100 1. LM-PCR Input ChIP a H3K27me3
1:100 Input ChIP a H3K27me3
1:10,000 Input ChIP a rat
10
Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation 157
158
Reimer and Turck
Table 8 PCR mix for real-time PCR 1 sample
50 samples
BioRad iQ SYBR Green Supermix (ml)
10
500
Primer pair (10 mM; ml)
2
100
H2O (ml)
6
300
Total (µl)
18
900
4. Notes 1. Other protocols suggest using para-FA. However, since ChIP does not require preservation of cytological structures, the presence of methanol as a stabilizer of monomeric FA is acceptable. The pH of an aging FA-solution will turn from neutral to acidic because of FA-polymerization. An acidic stock should, therefore, be discarded. 2. Hepes is pH-adjusted with NaOH instead of KOH. It is crucial to avoid high potassium content in ChIP, since SDS and potassium will precipitate and pull-down chromatin in a nonspecific way. This causes high unspecific background in the immunoprecipitation step. 3. Do not exceed 25 min total in 1% FA at room-temperature, since over fixation will have a negative impact on the immunoprecipitation efficiency. If the material is difficult to infiltrate (e.g., senescent leaves), longer incubation times should be balanced by lower FA concentrations. 4. This is to reduce foaming during the sonication step and is particularly recommended if a tip sonicator is used. If foaming is a persistent problem, centrifugation of the 1 × sonicated extract to remove debris may help. 5. Either Proteinase K digestion or cross-link reversal by elevated temperature can be prolonged over night for convenience. 6. RNA is astonishingly well preserved in the cross-linked material. A banded pattern on the analytical agarose gel is indicative of RNA. 7. Measurements will only approximately reflect the amount of chromatin. Usually, higher DNA content as estimated by OD is correlated with more visible DNA detected on the analytical agarose gel. However, sometimes relations are reversed. This is likely due to uneven shearing of different samples. In
Genome-Wide Mapping of Protein-DNA Interaction by Chromatin Immunoprecipitation
159
addition, even seemingly equal extracts may exhibit different precipitation efficiency during the immunoprecipitation step. 8. The IgG rat antibody will result in a very clean control, as chromatin is not usually precipitated with this antibody. Other antibodies can be used as controls, but it must be assured that they do not recognize chromatin-associated proteins. Alternatively, if peptide antibodies are used, peptide competition can be used as a negative control. 9. Long incubation times will always lead to unspecific precipitation of proteins due to denaturation. Since some of these proteins are linked to chromatin, this leads to background signals that are removed in this centrifugation step. In fact, many other protocols include preclearing in the presence of rProtein A sepharose to remove proteins that bind to the matrix in a nonspecific way. As we find very little nonspecific bindings in our a-rat control ChIP reaction, we omit this step. 10. Ensure that rProtein A sepharose has a high binding affinity for the primary antibodies used in your experiments, otherwise use rProtein G sepharose. To pipette rProtein A Sepharose, cut off lower 5 mm of a pipette and make sure that beads are well suspended during distribution. 11. If the ChIP experiment is performed to compare precipitation between different extracts, it is crucial to process one input sample per extract because there may be considerable variability in precipitation efficiency. In addition, if comparisons are to be quantitative, it is advisable to process parallel samples for each input and IP in order to minimize variations due to loss of DNA in the subsequent steps. 12. This step should reduce contamination of samples by chromatin sticking to the tube walls. Some commercial suppliers offer low rProtein A and DNA binding plastic ware, and the use of these materials may render this step obsolete. 13. To check the quality of immunoprecipitation, it is obligatory to know at least one binding site as a positive control which needs to be compared to a nonbinding site as a negative control. In this protocol, primer pairs 1 to 3 amplify the known H3K27me3 enriched sites, while primer pairs 4 and 5 correspond to nonenriched controls. In particular, primer pair 1 amplifies the first intron of At1g56480, primer pair 2 – a region in the sixth intron of At4g22950, primer pair 3 – an exon region of At4g00120, primer pair 4 amplifies within the transposon At4g03770, and primer pair 5 – a part of the 3¢UTR of At4g08250. 14. We had the best results while using 20 ml of the input sample. But an optimization step for the amount of input sample is required for best results of linear amplification. Depending on the concentration of nuclear extract, the range could be from 0.5 µl to 40 µl of input sample in this step.
160
Reimer and Turck
15. About 2 mg DNA per slide of a microarray are needed. If the tiling array includes three slides, approximately 5–6 mg DNA is necessary in total. However, some DNA will be used beforehand for quality control. If this run is the first, check the quality of the first run of linker-mediated PCR before a second run is added. 16. For best results, we use the BioRad iQ SYBR Green Supermix, but probably other commercial mixes are also suited. References 1. Turck F, Roudier F, Farrona S, MartinMagniette ML, Guillaume E, Buisine N et al (2007) Arabidopsis TFL2/LHP1 specifically associates with genes marked by trimethylation of histone H3 lysine 27. PLoS Genet 3:e86 2. Fode B, Gatz C (2009) Chromatin immunoprecipitation experiments to investigate in vivo binding of Arabidopsis transcription factors to target sequences. Methods Mol Biol 479:261–72 3. Orlando V, Strutt H, Paro R (1997) Analysis of chromatin structure by in vivo formaldehyde cross-linking. Methods 11:205–14 4. Saleh A, Alvarez-Venegas R, Avramova Z (2008) An efficient chromatin immunoprecipitation (ChIP) protocol for studying histone modifications in Arabidopsis plants. Nat Protoc 3:1018–25 5. O’Geen H, Nicolet CM, Blahnik K, Green R, Farnham PJ (2006) Comparison of sample
preparation methods for ChIP-chip assays. Biotechniques 41:577–80 6. van Bakel H, van Werven FJ, Radonjic M, Brok MO, van Leenen D, Holstege FC et al (2008) Improved genome-wide localization by ChIP-chip using double-round T7 RNA polymerase-based amplification. Nucleic Acids Res 36:e21 7. Brand M, Rampalli S, Chaturvedi CP, Dilworth FJ (2008) Analysis of epigenetic modifications of chromatin at specific gene loci by native chromatin immunoprecipitation of nucleosomes isolated using hydroxyapatite chromatography. Nat Protoc 3: 398–409 8. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I et al (2000) Genomewide location and function of DNA binding proteins. Science 290:2306–9
Chapter 13 Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation and DNA Microarray Hybridization (ChIP-chip). Part B: ChIP-chip Data Analysis Ulrike Göbel, Julia Reimer, and Franziska Turck Abstract Genome-wide targets of chromatin-associated factors can be identified by a combination of chromatinimmunoprecipitation and oligonucleotide microarray hybridization. Genome-wide mircoarray data analysis represents a major challenge for the experimental biologist. This chapter introduces ChIPR, a package written in the R statistical programming language that facilitates the analysis of two-color microarrays from Roche-Nimblegen. The workflow of ChIPR is illustrated with sample data from Arabidopsis thaliana. However, ChIPR supports ChIP-chip data preprocessing, target identification, and cross-annotation of any species for which genome annotation data is available in GFF format. This chapter describes how to use ChIPR as a software tool without the requirement for programming skills in the R language. Key words: ChIPR, ChIP-chip, The R statistical programming language, Preprocessing, Target identification, Cross-annotation
1. Introduction ChIP-chip data are generated by the combination of chromatin immunprecipitation (ChIP) and oligonucleotide microarray measurements (chip). It is common practice to measure ChIP-chip data from two-color microarray platforms, which are simultaneously hybridized with a ChIP sample and a nonprecipitated input DNA as a reference. Samples are differentially labeled with fluorescent dyes, usually Cy3 and Cy5, which can be spectrally separated. The log2-ratio of intensities detected in two channels is then used as a signal to determine target sites.
Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_13, © Springer Science + Business Media, LLC 2010
161
162
Göbel, Reimer, and Turck
Although bioinformatic analysis methods to measure differential gene expression using two-color microarrays are well established (1, 2), these methods have to be adapted carefully to ChIP-chip data, since the expected output from expression and ChIP-chip measurements is fundamentally different. Comparison of expression data has revealed that the majority of probe intensities do not change between samples, and a similar number of those that change are up- and down-regulated. In contrast, depending on the nature of the ChIP experiment, differentially hybridizing probes represent either a negligible or a substantial proportion of the total signal. Therefore, the procedure used to analyze ChIPchip data differs if a transcription factor with relatively few binding sites or histone modifications covering large chromatin regions is the subject of the study. Here, we present a workflow to analyze ChIP-chip data generated by hybridizing chromatin immunoprecipitated with antibodies against the histone mark H3K27me3 to commercially available whole-genome arrays of Arabidopsis thaliana from Roche-Nimblegen. The analysis is carried out with the ChIPR software package that is programmed in the R statistical programming language. ChIPR and the H3K27me3 sample data are available for download. Input files for ChIPR are intensity measurement files in the .pair format generated by the software Nimblescan. The workflow consists of three steps: (1) Preprocessing, (2) Identification of enriched probes, (3) Annotation of enriched regions (Fig. 1). ChIPR implements several options for each step. The best choice among these options depends much on the actual ChIP-chip experiment. Since there is no standard procedure, the general approach should be empirical, which implies that different options are tested and their outcomes are compared. Ample use of the proposed diagnostic plots is recommended to evaluate the outcome of each step. The sample data provided with this chapter show some technical problems that will be discussed for better illustration. 1.1. Preprocessing: Diagnosis and Normalization
Preprocessing starts with an evaluation of the quality of the dataset available. The A. thaliana whole-genome microarray from RocheNimblegen consists of three separate glass slides that form a set. Although an aliquot of the same probe is applied to each slide in the set, hybridization, washing, and scanning are technically independent events. Therefore, it is possible that very different dynamic ranges of signal intensities are observed for different slides hybridized with the same sample, which results in different distributions of the log2-intensity ratios. In addition, problems may occur during handling of slides, thus causing local physical damage. A helpful tool in detecting such damage is the generation of back-calculated images of hybridized slides. The information that allows the generation of these slide images is contained in .pair files.
163
Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation
Raw data
Diagnostic plots
-Visible problems with particular arrays?
no
repeat hybridization
yes
-Are M densities of replicates correlated?
Normalized data
Diagnostic plots yes
-Are width and center of M distributions similar?
Try a different method
-Is the enriched left shoulder in M distribution detectable? -Are slides similar? no
Enrichment analysis
RINGO: A mixed distribution approach A RANK-based approach
Annotation
Compare enriched regions to features of annotation files (GFF format) Compare results from different enrichment detection methods
Final result
GFF files
Fig. 1. Workflow of ChIPR. Three-step preprocessing, target identification and target annotation are indicated by grey boxes. Solid arrows connect work tasks, whereas dashed arrows indicate a suite of single steps that are carried out to fulfill work tasks
Analytical plots of density distributions of intensity values (usually as log2-intensities) per dye-channel, slide, and biological replicate are also helpful to evaluate whether the dataset is of good or rather heterogeneous quality. Regardless of the nature of the ChIP-chip experiment, it is advisable to carry out some normalization of raw data. The goal of normalization is to reduce the influence of confounding variables on intensity distributions so that any remaining differences are due to different binding behavior of the targets. Proper normalization of ChIP-chip data is a challenge because so many confounding factors are present, and their contribution may change depending on the experiment. Some confounding factors such as dye bias and differences in probe signal intensities (e.g., because of differences in GC levels) can be considered shared between replicate arrays.
164
Göbel, Reimer, and Turck
However, other variables such as skewing during PCR amplification or differences in fragment size distributions are likely to be different in biological replicates but rather similar between slides. Normalization can be applied to raw data or to log2-ratios of two channels. Nimblegen proposes a Tukey-biweight scaling procedure of log2-ratios per slide, so that the median is centered at 0. However, due to different shape of distributions in different slides, this normalization is not sufficient to allow direct slide by slide comparisons and pooling of the data from different slides in a genomic set. Therefore, it is preferable to include intensity values in the normalization procedure. This can be achieved by a locally-weighted regression (LOESS) normalization approach (Fig. 2a and b). a
Ler_Sample1, original scatter plot with loess line
b
Ler_Sample1, after fitting to loess line
4 4
M (log ratio)
0 −2 −4
0.0 0.2 0.4 0.6
M (log ratio)
2
−6
0 −2 −4
−4 −2 0 2 4
6
2
10
8
12
14
6
16
8
A (intensity)
c
Correlation between Biological Replicates and Slides
d
12
14
16
Correlation between Biological Replicates and Slides
6
4 2
Sample No 2
Sample No 2
10
A (intensity)
0 −2
Slide 1 Slide 2
4 2 0 −2
−4
Slide 1 Slide 2
−4 −3
−2
−1
0
1
Sample No 1 ma_slides
2
3
−2
0
2
4
6
Sample No 1 final_loess_separate
Fig. 2. Examples of graphical outputs of ChIPR. (a) The MA-plot generated by the simpleLoess function for sample slide 1. The LOESS line is indicated. The inset shows the density distribution of the raw M-values. A vertical line indicates the boundary determined by the RINGO function upperBoundNull. All M-values above the threshold are displayed in red in the MA-plot. (b) The MA-plot generated by simpleLoess after LOESS normalization. The color code is transferred from panel A. (c) The scatter plot generated by plotBetweenSlides of the raw M-values from two samples and two slides. (d) as (c) after LOESS normalization and normalizeBetweenSlides scaling of the density distributions
Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation
165
LOESS is based on the expected relationship between log2intensity ratios (M-values) for each array probe and the log2 of their geometric mean (A-values, see Note 1). In an ideal experiment, all data-points of a plot of M versus A (a MA-plot) derived from non-enriched probes would distribute along a horizontal line with a constant M-value around 0. The enriched fraction would be visible as the parallel horizontal distribution with M-values above 0. In reality, neither background nor enriched distributions follow a horizontal trend, and their trends are not parallel. LOESS normalization corrects the locally observed trends, so that they approach the expected ideal. However, standard LOESS is based on the assumption that the majority of M-values belong to the null distribution. This assumption is not true for ChIP-chip data with many enriched probes. In fact, enriched probes may introduce an intensity-dependent trend by themselves, confounding any trend due to dye-bias and intensity. Some approaches have been proposed to overcome these limitations and adapt the LOESS method for ChIP-chip data. ChIPR includes the rotated LOESS algorithm proposed by Peng et al., which improves the normalization of ChIP-chip data that show a strong dye bias (3). In the rotated LOESS approach, the differences between M- and A-values of probes found at adjacent genomic positions are calculated (DM-values and DA-values). A plot of these values (DMDA-plot) shows that they distribute symmetrically around a baseline that is rotated in comparison to a horizontal one. The rotation is caused by dye bias, and the values are adjusted so that the baseline of the adjusted values has a slope = 0. After the rotation, rotated M and A values are calculated from the adjusted DM-values and DA-values, and standard LOESS is subsequently carried out on the rotated M versus A values. Other normalization approaches, such as quantile normalization, rely on comparisons between samples. Quantile normalization is often used to normalize ChIP-chip data from one-color microarrrays, and it can also be used for two-color arrays. As a result of quantile normalization, the density distributions of all measurements submitted to normalization are forced into the same shape. Quantile normalization can be carried out on raw intensity values (per dye channel alone or combined), the A-values or the M-values. Since the probes representing the whole A. thaliana genome are distributed over several slides, two options to finalize preprocessing are proposed in our workflow. If it is assumed that the overall M distributions for different slides are equal, even though the slides contain different probes, then the distributions are scaled between slides and compiled in a single file. If it seems dubitable that different slides can be pooled and treated as one, then they remain independent and scaling is limited to a comparison between samples/replicates.
166
Göbel, Reimer, and Turck
1.2. Determination of Enriched Probes and Regions
The expected distribution of M-values in ChIP-chip is not directly comparable to that of two-condition gene expression analysis. In the latter case, one expects three classes of probes: (1) those that are significantly more intense under condition 1, (2) those that are significantly less intense under condition 1 compared to condition 2, and (3) those that do not change. The result is a symmetric, normal distribution of the M-values that is centered at 0 (no change in expression). In ChIP-chip, there are only two theoretical classes: (1) probes that show binding and therefore are over-represented in the ChIP channel, and (2) non-binding probes which, on average, should behave identically in both channels. The M-value distribution is therefore expected to be bimodal: one mode, scaled to zero, represents non-binding background, and a second positive mode that represents the binding fraction. The bimodality is more pronounced if more genomic regions show binding, such as observed for ChIP-chip histone modification data. In general, it is a good idea to begin the analysis with diagnostic plots to evaluate whether a bimodal distribution is clearly observed. An inherent nonnormality of the distribution makes it difficult to identify positive probes by standard statistical tests such as t-tests. On the other hand, a strong bimodality allows the use of methods that take positive advantage of this distribution. Mixturedecomposition methods separate the M-values into a zero-centered background distribution and one or more additional components. Then either positive components are directly predicted to represent binding probes or background components are used to define a threshold above which a probe is considered to be enriched. A mixture decomposition method is implemented in the R-package RINGO, which is integrated in ChIPR (4). The method is well suited to determine targets of factors with abundant binding sites such as those expected of histone modifications. On dense tiling arrays with small-sized probes, several adjacent probes will respond to a single binding event, since on average, the sonicated DNA is longer than the distance between adjacent probes. Therefore, autocorrelation of probe intensities along the genome sequence is observed, leading to a characteristic peakshaped intensity profile in the vicinity of a binding event. This expected distribution is used by several algorithms to determine target sites for single-site binding factors. Here, intensity peaks around point-wise binding events are required as a safeguard against false-positives. However, in the case of ChIP-chip of chromatin factors, the autocorrelation of clustered binding events superimposes on the autocorrelation of probes and therefore distorts the expected peak shape. Therefore, methods that rely on the peak shape are less suited for the identification of target regions of chromatin-associated proteins. In contrast, for chromatin modification marks, it makes biological sense to combine sequentially close positive probes into more extended positive regions. A step that
Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation
167
allows the combination of adjacent target regions is implemented in the ChIPR workflow. The threshold of distances between signals that result in combination can be estimated from plots that show the signals in a genome browser view. Moreover, it is advisable to combine all binding events that occur in a distance below the theoretical resolution of ChIP (usually 300 bp). To identify enriched regions, our workflow implements an approach based on median M-values calculated from predetermined regions. These predetermined regions, which can be genomic annotation classes or sliding windows, are ranked based on their median. The rank positions are compared between replicates, and a threshold is determined to separate the ordered (enriched) part of the list from randomly distributed portions (the background distribution). 1.3. Annotation of Target Regions
The output of enrichment analysis is a table of target regions that shows the location with regard to the reference sequence. In order to extract biological sense from the dataset, target regions need to be cross-annotated with the available reference genome annotations. Cross-annotation results in lists and tables containing genes and other annotated features (e.g., ncRNAs, transposons, pseudogenes, and repeat regions) that overlap with enriched regions. These gene- or feature lists are an input for further metaanalysis (see Chapter 14). In our workflow, the implementation of cross-annotation procedures allows interactivity. This interactivity makes it possible to constantly update the genome annotation for A. thaliana (and other plant species) and to perform the analysis based on the most up-to-date annotation data. The Arabidopsis Information Resource (TAIR) provides most recent annotations in General Feature Format 3 (GFF3) that can be supplied to a cross-annotation tool. Flexibility is also needed to define overlap thresholds for feature scoring. H3K27me3 histone marks are expected to cover large regions and should be set to require a greater overlap for a positive score as other marks that are more locally restricted. Last but not least, it is also useful to generate output files in GFF format that allow visualization of enriched regions and scores in a genome browser such as SignalMap or the TAIR implementation of GBrowse.
2. Materials 2.1. Hardware
1. A desktop computer running Windows XP or Linux. We have successfully analyzed the package data using Windows XP system with Pentium R 4CPU 3.2 GHz, 1.00 GB RAM, and 1.536 GB of virtual memory extension. 2. An internet connection to upload functions.
168
Göbel, Reimer, and Turck
2.2. Software
1. R-2.6.0 or higher (download from http://www.r-project. org) (see Note 2). 2. The R package ChIPR that can be downloaded from http:// bioinfo.mpiz-koeln.mpg.de/pcb/downloads/r as option “ChIPR.” 3. Bioconductor packages: Ringo, Biobase, Biostrings, matchprobes, marray, convert. 4. R packages: MASS, spatstat, TeachingDemos.
3. Methods In the following sections, the symbol “→” abbreviates “from a point in the main menu, follow the option in the submenu.” 3.1. Downloading and Installing RGui and R Packages
The steps described here have to be performed once per R version installation. 1. Download the latest release of R from http://www.r-project. org. 2. Install the software in a folder that allows read and write access for the user (see Note 3). 3. Launch RGui from the Windows start menu. It will display the main menu bar at the top of the screen and the window labeled “R Console”, which is the command console. In the Windows version, the appearance of the prompt “>” signifies that the program is ready to accept the next command. 4. Select main menu “Package(s)” → “Set CRAN mirror.” A pop-up window will appear where you select your closest mirror with the help of the mouse. 5. Select “Package(s)” → “Install package(s).” A pop-up window will appear where you select the package “MASS.” Follow the same procedure to install packages “spatstat” and “TeachingDemos.” 6. Select “Package(s)” → “Select repositories.” A pop-up window will appear where you select the option “BioC software.” 7. Select “Package(s)” → “Install package(s).” A pop-up window will appear where you select the packages “Ringo”, “matchprobes”, “marray”, and “convert.” 8. Open your internet browser and navigate to http://bioinfo. mpiz-koeln.mpg.de/pcb/downloads/r. Select “ChIPR” and save the zip-compressed file somewhere on your disc.
Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation
169
9. From the same site, select also the file example.R and save it somewhere on your disc. 10. Select “Package(s)” → “Install package(s) from local zip files.” A pop-up window will appear that allows you to navigate to the “ChIPR_1.0.zip” file and select it to open. The R console will display some log-messages during the installation. You do not need to install the example.R file. 3.2. Reading the Data Into R
1. Create an analysis folder somewhere on your computer. 2. Launch RGui from the Windows start menu if not yet opened. 3. The calculations will probably use all RAM your computer has. To make sure that WindowsXP attributes as much memory as possible to RGui start the session by typing: memory. limit(4095) in the command console (see Note 4). Ignore the NULL message displayed in the command console. 4. Select main menu “Package(s)” → “Load package,” and select “ChIPR” within the pop-up window (see Note 5). 5. Follow “File” →“Change dir,” and browse to the analysis folder that you created in step 1. 6. Type in the R console: copyExample(). This function will load example data into the analysis folder. The function needs to be performed only once. 7. After completing the process indicated by reappearance of the input prompt in the R console, go to the analysis folder and check for the presence of the following files (see Note 6): 127615_532.pair 127615_635.pair 129856_532.pair 129856_635.pair 129906_532.pair 129906_653.pair 1298562_532.pair 1298562_653.pair 2006-08-18_ATH6_ChIP_1.pos 2006-08-18_ATH6_ChIP_2.pos K27M3_slide1.txt K27M3_slide2.txt spottypes.txt TAIR7_GFF+1000u_200d+introns All files are text files that can be opened with any text editor program. The .pair files contain raw intensity values per dye-channel; spottypes.txt contains information on the types of probes for which data are in the .pair files; K27M3_slideNumber.txt files contain information on the samples and the way the data files relate to
170
Göbel, Reimer, and Turck
these samples. Nimblegen provides .pos files for each slide containing information on the position of probes in the A. thaliana genome. Table 1 shows the structure of the K27M3_slide1.txt file. Column [1] contains an index name for a slide, [2] specifies which .pair files correspond to this slide, [3] indicates the biological replicates, [4] information on the experiment, [5] the experimental material, [6] and [7] specify which dye channel belongs to input or precipitate. The text in the header row of the table is fixed, but other rows can contain any name and number (Table1). If you provide your own dataset, use the table header row as a template and change other rows so that they correspond to your dataset (see Note 7). 8. Load the raw data into the R workspace by typing in the command console (see Note 8): slides <-readNimblegenSlideList(list(˝K27 M3_slide1.txt˝,˝K27M3_slide2.txt˝)) To avoid typing a lot of text, open the file example.R with a plain text editor such as Windows Wordpad, or you may use the text editor within RGui: follow “File” → “read R script.” The file contains a sample session including most commands that will be detailed below. You can transfer the commands from the example.R text file into the R console by copy and paste (see Note 9). The function readNimblegenSlideList creates an object “slide” that contains the intensity values of both dye-channels per sample and per slide. The object slide is a list of “RGList”-objects. RGList-objects are described in the R package LIMMA from Bioconductor (5). Visualize the top rows of all tables in slides by typing the command line: head(slides) 9. To convert the raw intensity values into a representation using M and A-values, type in the command console: ma_slides <-RG2MA(slides) The data object ma_slide is a list of “MAList”-objects. MAList-objects are described in detail in the Bioconductor package LIMMA (1).
Table 1 The structure of the K27M3_slide1.txt file Slide number
File name Cy3
File name Cy5
Sample Antibody Genotype Cy3
Cy5
129906
129906_532.pair
129906_635.pair 1
K27M3
Ler
input
IP
127615
127615_532.pair
127615_635.pair 2
K27M3
Ler
input
IP
Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation
171
10. Now read the slide and position information into the R workspace by typing: designInfo <- readDesignInfo(list(˝2006-08-18_ ATH6_ChIP_1.pos˝, ˝2006-08-18_ATH6_ChIP_2.pos˝)) The data object designInfo created here is of the type “probeAnno” as specified in the Bioconductor package RINGO (4). The object contains all information of probe location with respect to the A. thaliana genome. The function readDesignInfo also creates a file named “all.pos” that is saved in the analysis folder. In the all.pos file, the information corresponding to different slides is concatenated. The file can be opened with a text editor. 3.3. Quality Assessment
1. Start the quality test by generating images of intensity values across the slides. An image of slide 1 showing the first biological replicate in the red channel will be generated by typing: image(slides[[1]], 1, channel=˝red˝) The options for channel = argument are “red,” “green,” or “logratio.” The last option generates an image of the M-values. Data from other slides and other biological replicates are displayed by changing numbers in double square brackets or after the first comma, respectively. After entering the command, the R graphics device will open as a new window. The image can be saved by moving the mouse cursor over the picture and opening a pop-up menu with the right mouse button. Alternatively, the main menu “File” options can be used (see Note 10). Images of our sample data detect two problematic arrays. Sample 2 hybridized to slide 1 shows a particular pattern that could arise from physical damage caused by forceps. However, the damage is only detected in the red and green channels but absent in the logratio image. Therefore, this perturbation of the dataset may not be problematic, if further analysis steps are based on the M-values. In contrast, sample 2 hybridized to slide 2 shows regions of high intensity that are also seen in the log ratio image. This indicates that the M-values obtained from these regions do not reflect actual probe enrichment but are hybridization artifacts that affect one channel more than the other. 2. Generate plots of density distributions of log2-intensities in the two channels and the M-values by typing: densityPlot(slides,ma_slides) The function will generate four graphs, the first showing density distributions of the red and green channel of slide 1, the second showing density distributions of the M-values, followed by the corresponding plots for slide 2 (see Notes 11 and 12). The results obtained help us evaluate variability in signal
172
Göbel, Reimer, and Turck
distributions between the channels, variations between slides for biological replicates, and differences between slides. Although densities of raw intensities are quite variable among channels and arrays, the M-values have similar distributions and, in particular, show a characteristic shoulder in a positive tail that may correspond to the population of binding probes. An exception in our sample data is the distributions of sample 2 hybridized to slide 2. In this sample, a shoulder of enriched probes is hardly observed, and the values are overall shifted to higher M-values. This sample has already been flagged as problematic by the image analysis in step 1. Despite the overall similarity of the other arrays, there is variation in width, height, and peak location of the distributions, suggesting that the dataset should be normalized before further analysis. 3.4. Normalization
There are different options to normalize datasets, and common standards for ChIP-chip data do not exist yet. The purpose of normalization is to obtain scaled datasets that show a similar distribution across arrays and samples. In addition, perturbations in the dataset that are not caused by probe enrichment due to immunoprecipitation should be eliminated. As a first step, we recommend to test several normalization options. Diagnostic plots of data normalized with different options will help choose the most adequate method for a specific dataset. 1. Perform LOESS normalization of data and plot the corresponding MA-plots by typing: normalized_slides_loess <-simpleLoess(ma_slides) This command creates a new list of MAList-objects that contains normalized data. It also generates two MA-plots per slide and sample. The first MA-plot displays non-normalized data, the second – data after normalization. The first graph also shows the density distribution of the non-normalized M-values as an inset. A vertical line in this density plot indicates the boundary determined by the function upperBoundNull from package RINGO (see below). Probes that have M-values higher than the boundary are considered enriched. The corresponding M-values are labeled in red in the main graph. The green line in the raw-data MA-plot represents the LOESS line: it marks the center of mass of the M-values dependent on the intensity of the A values; (Fig. 2a). In the second MA-plot, the M-values have been adjusted so that their center is at 0 for all A values (Fig. 2b). Probes that were considered enriched in the first graph keep their red label in
Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation
173
the second graph, although they may have moved below or above a linear threshold after normalization (see Note 11). 2. Peng et al. (3) developed a method called “rotated Loess” that eliminates the bias in data due to dye effects. This normalization is carried out and assessed by typing: normalized_slides_rotatedLoess <-rotatedLoess (ma_slides) The command generates the same output objects as simpleLoess. The method is particularly recommended if a strong dye bias is observed in a dataset. The method does not improve normalization problems that are caused by a large number of enriched probes. In our dataset, dye-bias was not pronounced. Therefore, using this method was not an improvement if compared to the standard LOESS procedure. 3. A simpler way to normalize data is to scale the M-values per slide for each sample by their respective median without considering the A-values. This normalization is carried out by the following command: normalized_slides_median<-normalizeWithinArrayListma_slides, method=”median”) Median normalization works for array sets that do not show dye bias and have similar density distributions across samples and slides. In contrast to the standard LOESS procedure, median normalization avoids problems caused by a trend in the MA-plot due to a large proportion of enriched probes. In our sample dataset, it does not improve the density distri butions observed for the problematic slide 2 hybridized with sample 2. 4. An additional option is to perform quantile normalization between arrays, which forces data from all replicates per slide into a common distribution. The following function is used to normalize the raw red and green intensities of both biological replicates per slide via quantile normalization: normalized_slides_quantile<-normalize Between ArraysList(ma_slides, method=”quantile”) Replacing quantile with Aquantile results in quantile normalization of A-values per slide, which may make more sense if the red but not the green channel shows a left- handed shoulder. Quantile normalization is often used to normalize short oligonucleotide tiling arrays from Affymetrix, but it can be adequate for Nimblegene arrays. However, quantile normalization does not improve the density distribution of the problematic slide 2 hybridized with sample 2. Therefore, quantile normalization is not recommended for this particular dataset.
174
Göbel, Reimer, and Turck
5. To evaluate new density distributions after normalization, graphs can be created as described above: densityPlot(NULL, normalized_slides_quantile, title=”Nor malized Slides Quantile”) In order to see how the red and green intensities were affected during normalization, a new object needs to be created: slides_quantile<-MA2RG (normalized_slides_ quantile) This object contains the adjusted intensity values as a list of RGList-objects and can be included in the densityPlot command: densityPlot(slides_quantile,normalized_ slides_quantile, title=”Normalized Slides Quantile”) To obtain density distributions of other normalizations, substitute the second argument in brackets with the corresponding object name (e.g., normalized_slides_loess). Replace the text in quotation marks by “Any Title You Like” to add a title in the graph. Adding an explanatory title is strongly recommended to avoid confusion between many similar-looking graphs. 6. Although previous normalization methods generate distributions of M-values that are centered approximately at 0 with a more or less pronounced right shoulder, the range of these distributions is still very different between slides and biological replicates. Therefore, scaling between slides should also be performed. final_loess<-normalizeBetweenSlides(normaliz ed_slides_loess, input.RG=FALSE) Create objects of different names and input by substituting the final_loess with other names and the first argument in brackets with the name of the appropriate input object (e.g., normalized_slides_quantile). 7. Evaluate the results of this step by plotting the densities of object final_someMethod (see Note 12). The distributions are now nearly identical in the area of the major peak (likely the non-binding background). densityPlot(NULL,list(final_loess), ”final_loess”)
title=
8. Evaluate what type of normalization appears to perform best with your dataset (Fig. 1). The criteria are listed here: – If you use simpleLoess or rotatedLoess functions, evaluate the plots generated by these commands. The ideal situa-
Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation
175
tion is when all red points that belong to the enriched cloud are above a linear threshold, which is constant for all A-values. In contrast, points clearly belonging to the background distribution should fall below. – While using normalization functions, the shape of density distributions in the final_someMethod object should be similar between samples/replicates. If the right shoulder is visible, it should overlap between replicates. In the case of the H3K27me3 sample dataset, only the LOESS-method resulted in acceptable distributions for the problematic sample 2 on slide 2. Therefore, for this particular dataset, LOESS normalization is the method of choice. On the down-side, LOESS normalization also decreased the M-values for those probes that showed the highest A-values, because a trend was created by the high proportion of enriched probes that had high A-values (Fig. 2a). In consequence, some genuinely enriched probes with high A-values will not be scored positive in the subsequent analysis (Fig. 2b). 9. Write normalized data of the type you decide to keep in a tabdelimited file in your analysis folder: writeTabFile(final_loess, final_loess.txt˝)
file=˝normalized_
10. The following command splits normalized distributions of different slides back into separated MAList-objects. This step is useful, since subsequent analysis will be less RAM-intense if carried out on a per-slide basis. In addition, depending on the dataset and the success of normalization, it may be preferable to determine thresholds and targets for each slide separately. final_loess_separate<-separate Slides(final_loess)
Combined
11. Evaluate the density per slide after between slide normalization: densityPlot(NULL, final_loess_separate, tile= ˝loess and between slide normalized data˝) The following command generates a scatter plot between the M-values of biological replicates. Each slide is plotted in a different color. Optimal normalization results in linearly correlated replicates and a good overlap between slides. plotBetweenSlides(final_loess_separate) 12. It is time to clean up some data from your R workspace, because it uses too much active memory. To see all objects in your current workspace, type: ls() Objects can be removed by: rm(objectA, objectB,..)
176
Göbel, Reimer, and Turck
3.5. Finding ChIPEnriched Genomic Regions
3.5.1. Finding ChIPEnriched Genomic Regions by the upperBoundNull Method from RINGO
We describe two methods to identify enriched probes and regions. The function upperBoundNull is implemented in package RINGO and was mentioned previously. It will be described in Subheadings 3.5.1–3.5.2. The second method uses M-value ranks and compares biological replicates (Subheadings 3.5.3 and 3.5.4). 1. Calculating the median of M-values of several adjacent probes will help reduce the effect of outliers. In the following module, such median smoothing is performed in a sliding window of 301 bp by calculating the median of at least three probes. These settings can be changed by changing values in the typed command. Give different names to the objects (e.g., smooth300, smooth600). As a rule of thumb, the resolution of ChIP cannot be better than the average fragment length (usually 300 bp), therefore a winHalfSize of at least 150 makes sense. smooth <- computeRunningMedians(asExprSet(fi nal_loess), probeAnno=designInfo, modColumn= ”Cy5”, allChr=c(“1”,”2”), min.probes=3, winHalfSize=150) The object smooth is of the “Expression Set” type as defined in the Bioconductor package Biobase (6). If the command is carried out on an object final_some_method as generated in subheading 3.4, step 6, the objects contain data for all replicates and slides. If the evaluation of normalized data indicates that it is preferable to keep the data from different slides separated, then create one object per slide: smooth_slide1<-computeRunningMedians(as ExprSet(final_loess_separate[[1]]), probeAnno= designInfo, modColumn=”Cy5”, allChr=c(“1”,”2”), min.probes=3, winHalfSize =200) To generate an object with smoothed data from the second slide, type the following command: smooth_slide2<-computeRunningMedians(asExpr Set(final_loess_separate[[2]]), probeAnno=designInfo, modColumn=”Cy5”, all Chr=c(“2”,”3”), min.probes=3, winHalfSize=200) Note that you also need to adjust the range of the allChr argument, since slide 2 contains data from chromosome 2 and 3 but no data from the first chromosome. 2. To compare the effect of smoothing, evaluate the data distribution by plotting a density histogram (see Note 14). hist(exprs(smooth)) 3. Another way to get a feeling about the effect of smoothing is to visualize its effect over a distance along a chromosome.
Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation
177
The following command displays data without smoothing. Here, we display the range from position 5000–15000 on chromosome 1. If you want to display other regions, the values need to be changed accordingly. chipAlongChrom(asExprSet(final_loess), chrom=”1”, xlim=c(5000,15000), ylim=c(-6,6), probeAnno=designInfo) The second command adds two more lines to the graph that are derived from the smoothed dataset. chipAlongChrom(smooth, chrom=”1”, xlim=c (5000, 15000), probeAnno=designInfo, palett eName= ”Spectral”, add=TRUE) 4. After evaluating the most promising smoothing protocol, compute a threshold of the M-values above which a probe is considered to be enriched. RINGO does this by creating a mirror image of the distribution of the left side from the main peak that is projected to the left side. y0 <- as.matrix(apply(exprs(smooth), upperBoundNull))
2,
5. A threshold line can be included in the histogram. First, draw a histogram of smoothed distributions: hist(exprs(smooth)[,1],n = 40) Then add a vertical line for the threshold: abline(v=y0[1,], col=”red”) The number 1 in square brackets in both commands indicates that you are now drawing data for the first replicate. Repeat the command after changing the number to 2 to visualize the second replicate. 6. Finally, all probes that are above the threshold are extracted, and adjacent probes are combined into larger regions called chip enriched regions (chers). The command will write chers_sampleindex.txt files into the workspace directory, one file per sample. It will also write the object “chers” into the workspace that contains information from all samples. The execution of this command is very RAM-intense and can be slow (in the order of 15 min). chers <- findChers(smooth, designInfo, y0, final_loess$targets, allChr=c(“1”, “2”), minProbesInRow=5, distCutOff=100) Continue from here to annotate chers with genes or other features. 3.5.2. Annotation of Chers With the Reference Genome
1. Position information within enriched regions needs to be cross-annotated with the reference genome. The most recent A. thaliana annotation data is available at TAIR and can be downloaded in various formats. The latest release that is
178
Göbel, Reimer, and Turck
c ompatible with the*.pos files provided by Roche-Nimblegen is TAIR8. Annotation data in GFF3 format can be downloaded from the following URL: ftp://ftp.arabidopsis.org/ home/tair/Genes/TAIR8_genome_release/TAIR8_gff3/. We have also added an annotation file based on TAIR7 that annotates all genes (including introns), promoters, and 3¢untranslated regions to the ChIPR package. The cut-off for promoters and 3¢UTRs is 1,000 bp and 200 bp, respectively. The following command annotates chers from the first replicate with data from the custom annotation file achers1<-getAnnotations(infiles=”chers_1. txt”, outfile=”chers_1_loci”, gff_file=”TAIR7_ GFF+1000u_200d+introns”, mode=”ringoChers”, mincoverage_query=0.0, mincoverage_target=0.5) In this command, chers are the query and genes including promoters and 3¢UTRs are the target. The arguments mincoverage_query=0 and mincoverage_target=0.5 can be translated to the worded command: “If a cher covers at least half a locus including a promoter and 3¢UTR, the locus should be considered as enriched”. In addition to the object achers1 in the R-space, the command writes a file named as specified in outfile=″filename″ in the analysis folder. Create also an object achers2 that contains annotation data from the other sample: achers2<-getAnnotations(infiles=”chers_2. txt”,outfile=”chers_2_loci”,gff_file=”TAIR7_G FF+1000u_200d+introns”,mode=”ringoChers”,mi ncoverage_query=0.0,mincoverage_target=0.5) To extract just the non-redundant AGI codes from the annotation list, use: achers1_IDs<-getIds(achers1) and achers2_IDs
Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation
179
pseudogenes_sample1
The two following sections describe a method that identifies enriched regions or annotations based on the fact that they belong to the top score items in the ranked lists of both biological samples. The ranked items can either be annotation units (Subheading 3.5.3) or sliding windows that are annotated in a subsequent step (Subheading 3.5.4). 1. In the first step, we map all probes of the microarray to features from the annotation file. The following command will map probes to the feature locus in the custom annotation file TAIR7_GFF+1000u_200d+introns. annotatedProbes
180
Göbel, Reimer, and Turck
annotated at once, the select argument needs to be slightly changed. As example, select=”pseudogene”:”gene”: ”transposable_element_gene” will retrieve data from all genomic regions that contain either a gene or a pseudogene or a transposable element gene as annotation. 2. In the following, the median is calculated from all probes that cover an annotated feature in the annotation file for both replicates. A rank index is attributed to each feature per replicate based on the median. Finally, the overlap between lists is calculated for partial lists of increasing length starting from the top-ranked items. The overlap between the partial lists is plotted, and a cut-off is chosen that coincides with the first local maximum of the plot. positive_loci <- getIds(listIntersection(as ExprSet(final_loess), allChr=c(1,2), designInfo, arrays=c(1,2), annotatedProbes=annota tedProbes)) The object positive_loci contains all annotated features from the object annotatedProbes located in the ordered part of the lists and present in both lists. In addition, the command writes a pdf file of intersection plot to the analysis folder. As an additional example, the command positive_pseudogenes <- getIds(listIntersec tion(asExprSet(final_loess), allChr=c(1,2) designInfo, arrays=c(1,2), annotatedProbes= annotatedProbesPseudogenes)) creates the object positive_pseudogenes that contains all pseudogenes from the ordered part of both lists. 3.5.4. Finding ChIPEnriched Genomic Regions by Rank Intersection
The following steps follow the same rationale as those described in Subheading 3.5.3, except that all probes are ranked and compared between two biological samples based on sliding windows rather than annotation units. 1. The following command first calculates medians over sliding windows of 1001 bp for each sample. Each probe is the center of a new window. The windows are ranked based on the median of their M-values within each sample. As described in Subheading 3.5.3, step 2, the windows present in both samples are scored in partial lists of increasing size starting from the topscore. The count of intersected windows in the partial lists is plotted, and a cut-off is chosen that coincides with the first local maximum of the plot. All windows present in both samples below the cut-off are scored positive. A pdf file of the plot is saved in the analysis folder. Last, the command writes a file “positive_windows.txt” to the analysis folder that contains the names of the positive windows, their position, and median values.
Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation
181
positive_windows <-listIntersection(asExpr Set(final_loess_separate[[1]]), allChr=c(1,2), winHalfSize=500, probeAnno=designInfo,arrays= c(1,2), outfile=”positive_windows.txt”) The intersection algorithm uses a lot of RAM memory. Therefore, it is probably best to work with the final_separate object and perform the analysis of one slide at a time. The second slide is analyzed by changing some arguments in the command: positive_windows_slide_2 <-listIntersectio n ( a s E x p r S e t ( f i n a l _ l o e s s _ separate[[2]]),allChr=c(2,3), winHalfSize=500, probeAnno=designInfo,arrays=c(1,2), outfile= ”positive_windows_2.txt”) The index number in double brackets for the argument final_ loess_separate[[1]] must be changed to 2 to analyze the next slide. Slide 2 contains data from chromosome 2 and 3 but no data from chromosome 1. Therefore, the allChr argument requires also changes (allChr=c(2,3)). In addition, the outfile requires a different name, otherwise the file from the first analysis will be overwritten in the analysis folder. 2. In this step, the positive windows are annotated with features in the annotation file in GFF format. positive_loci_on_windows <- getIds(getAnnot ations(infiles=”positive_windows.txt”, gff_file= ”TAIR7_GFF+1000u_200d+introns”, mode=”simple”, mincoverage_query=0.33, minco verage_target=0.1)$Annotation) The arguments mincoverage_query=0.33 and mincoverage_ target=0.1 can be translated as “if a third of a positive window overlaps at least with 10% of an annotated feature then score this feature as hit (enriched)”. As described in Subheading 3.5.2, step 1 for the annotation of chers, the annotation of the feature “locus“ from the provided annotation file TAIR7_ GFF+1000u_200d+introns is a default option. However, any annotation file in GFF3 format can be used as input. 3.6. Creation of GFF Output Files
In addition to the gene and feature lists that are generated by the commands described in Subheading 3.5, it can be useful to generate output files in GFF format. These files can be visualized in genome browsers such as SignalMap (see chapter 12) or uploaded to the TAIR implementation of gbrowse. 1. The following command creates a GFF file from a chers object generated in Subheading 3.5.1, step 6: makeGFF(chers, “chers.gff”, seqidColumn=2, prepend To Seqid=”Chr”, sourceString=”Ringo”,
182
Göbel, Reimer, and Turck
typeString=”cher”, startColumn=3, endColumn=4, scoreColumn=9, attributesColumn=1) The object chers comprises two columns that contain quantitative data. Changing the argument scoreColumn=8 will result in a GFF file output that contains data from column 8. The command generates an output file in the analysis folder with the name chers.gff (or any other name specified as an argument). 2. The following command will generate a GFF file from the positive_windows.txt file that was written in the analysis folder during step 1 in Subheading 3.5.4. makeGFF(“positive_windows.txt”, “positive_windows.gff”, splitBy=”\t”, skipLines=1, seqidColumn=1, prependToSeqid=”Chr”, sourceString=”ChIPR”, typeString=”window”, startColumn=2, endColumn=3, scoreColumn=4) The file “positive_windows.txt” contains two columns that comprise quantitative data (Median M-values of the windows from two replicate samples). Changing the argument to scoreColumn=5 will result in a GFF file output that contains the data from column 4. The command generates an output file in the analysis folder with the name “positive_windows.gff” (or any other name specified as an argument).
4. Notes 1. Let us assume that the intensity signal from the ChIP sample was in the red channel (R) and the input sample in the green channel (G). The M-values and A-values are then calculated as follows:
M = log 2 (R / G) = log 2 R − log 2 G
A = log 2 (R * G) = 1 / 2 * (log 2 R + log 2 G ) 2. Pick your closest CRAN mirror, select your operating system, and download precompiled distributions. If you work with Windows XP, R can be run with a graphical user interface (Gui), whereas the standard Linux version uses the command line. Our description assumes that you use the WindowsXP version and the Gui version. 3. If you encounter internet connection problems while using R, it may be helpful to enter the custom install option and choose internet2 for connection. 4. This does not necessarily mean that your computer can give so much memory to RGui, but it is the maximum possible
Genome-Wide Mapping of Protein–DNA Interaction by Chromatin Immunoprecipitation
183
one for any 32bit Windows PC. You may have to increase the virtual memory size of your computer (Windows → Control Panel → System → Advanced → Performance → Advanced→ Virtual Memory). 5. The occurrence of an error message in the command console during loading of ChIPR probably indicates that a package is not installed for loading. The error message will specify which packages are missing. If this is the case, go back to Subheading 3.1, step 4 and install the package (or Subheading 3.1, step 6 if the missing package is part of Bioconductor). 6. Commercial full genome Nimblegen arrays of the A. thaliana genome are composed of a set of three separate slides. Our sample data contains slides 1 and 2 with two biological replicates. As a consequence, there are no data from chromosome 5 and only partial data from chromosome 4. If your microarray encompasses three or more slides, you need to create corresponding descriptive.txt files for additional slides. 7. If you provide your own data, place *. pair files into a new ana lysis folder, add the *.pair files that correspond to your array set, and modify the *.txt files so that they correspond to your experiment. If you add data from more than two replicates, you need to add another line in the descriptive text file. If you add more than two slides, you need an additional descriptive file. 8. If you provide your own data, replace the names of descriptive files in quotation marks in Subheading 3.2, step 8 with appropriate names. 9. The file example.R contains command lines and comments. Comment lines are preceded by one or more copies of a symbol “#” and are ignored by the program. Command lines are delimited by a semi-colon “;” this is equivalent to pressing the enter key on the keyboard. If you copy and paste the entire text into the R console, all steps are carried out as a sequence (but your computer may crash because it probably runs out of memory during the process). 10. As a default, the RGui graphics device displays and keeps in memory only one picture at a time. The main menu option “History” → “Recording” will change this default and records a sequence of pictures. ”History” → “Previous” allows scrolling between pictures. However, keeping too many graphics in memory will use up all available memory. Therefore, use “History” → “Clear history” to purge the memory when needed. 11. You may want to combine several pictures in one graph. Typing the command par(mfrow=c(2,2))
184
Göbel, Reimer, and Turck
will change the default for the RGui graphic device, so that pictures are displayed in 2 rows and 2 columns. One way to revert this setting back to the default is to close the graphics window. 12. The addition of square brackets will generate plots for only one slide. densityPlot(slides[1],ma_slides[1]) 13. The object final is now of a subtly different type than the previous normalized list of MAList objects. Therefore, the argument for the function densityPlot is changed. 14. These commands from the package RINGO allow many more arguments to improve on the graphic representation. Here, we have chosen a minimal option to evaluate data, but you can refer to the RINGO’s manual for more details (1). 15. Do not type this command if you do not have the annotation file TAIR8_GFF3_genes.gff downloaded from TAIR and saved in your analysis folder. The algorithm will disappear in a loop because it looks out for the file, and you will lose all analysis data that have not been saved in the analysis folder. References 1. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80 2. Yang YH, Paquet AC (2005) Preprocessing two-color spotted arrays. In: Gentleman R et al (eds) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York, NY, pp 49–69 3. Peng S, Alekseyenko AA, Larschan E, Kuroda MI, Park PJ (2007) Normalization and experimental design for ChIP-chip data. BMC Bioinformatics 8:219
4 . Toedling J, Skylar O, Krueger T, Fischer JJ, Sperling S, Huber W (2007) Ringo–an R/Bioconductor package for analyzing ChIP-chip readouts. BMC Bioinformatics 8:221 5. Smyth GK, Speed T (2003) Normalization of cDNA microarray data. Methods 31: 265–73 6. Huber W, Irizarry R, Gentleman R (2005) Preprocessing overview. In: Gentleman R et al (eds) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York, NY, pp 1–12
Chapter 14 Metaanalysis of ChIP-chip Data Julia Engelhorn and Franziska Turck Abstract Genome-wide analysis of histone modifications via ChIP-chip (chromatin immunoprecipitation followed by whole genome tiling array hybridization) may generate lists of up to several thousand potential target genes. In the case of the model organism Arabidopsis thaliana, several databases are available to alleviate further characterization and classification of genomic data sets. The term metaanalysis has been coined for this type of multidatabase comparison. In this chapter, we describe open source software and web tools that perform transcriptional and functional analysis of target genes. Sources of transcription data and clustering tools to subdivide genes according to their expression pattern are described. The user is guided through all necessary steps, including data download and formatting. In addition, the Gene Ontology (GO) vocabulary and methods to uncover over- or underrepresented functions among target genes are introduced. Genomic targets of the histone H3K27me3 modification are presented as a case study to demonstrate that metaanalysis can uncover novel functions that were hidden in genomic data sets. Key words: Metaanalysis, AtGenExpress, Hierarchical clustering, K-means clustering, Gene ontology, Functional enrichment analysis
1. Introduction The output of genomic methods such as ChIP-chip usually consists of a long list of genes that require further characterization. In the case of general regulatory mechanisms such as histone modifications, the set of target genes may encompass several thousands of genes that cannot be analyzed at a single gene level. For model organisms such as A. thaliana, well-organized databases for gene functions and transcriptional profiles exist. The use of these databases can reduce the need for experiments for the functional characterization of target genes identified in ChIPchip. The term metaanalysis is applied to these multidatabase/ multiexperiment comparisons. Depending on the underlying Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_14, © Springer Science + Business Media, LLC 2010
185
186
Engelhorn and Turck
biological question, metaanalysis of ChIP-chip targets can reveal insight into processes controlled by the binding factor or characterize unknown target genes by extrapolating from functional data of coregulated (since they are targets of the same factor) and coexpressed genes. Here, the analysis of Arabidopsis gene targets of the H3K27me3 histone mark identified by ChIP-chip is used as an illustrated example for transcriptional and functional metaanalysis. The list of H3K27me3 target genes can also be downloaded and used as a practice set to follow the instructions of this chapter. In the first step, their expression patterns direct the subdivision of target genes. For this subdivision, we apply clustering algorithms to publicly available expression microarray data sets. Subsequently, the Gene Ontology (GO) is used for further functional characterization of a gene list. GO is a general species-independent vocabulary that was built to standardize descriptions in three functional categories, which can be used to characterize a gene product. The GO categories are “biological processes,” “molecular functions,” and “cellular components.” GO terms describe properties within these three functional categories. For example, a “biological process” can be a “developmental process,” which is then further divided into 53 processes among which are “reproductive developmental process,” “anatomical structure development,” and “multicellular organismal development.” The general term “biological processes” generates up to six sublevels in the GO vocabulary (the same holds true for “molecular function”; for “cellular component,” there are even more). Accordingly, the GO is divided into levels. High levels correspond to general terms, and low levels to specific terms. The lower level terms are also called “child terms” of a general “parent term” to which they belong; e.g., “developmental process” is a parent term of “anatomical structure development” and a child of “biological process.” At first view, the structure of the GO vocabulary seems to generate a tree. However, unlike in a logical tree, a GO child term can have several parent terms, which means that a graphical representation of the GO generates a direct acyclic graph. GO annotation databases contain gene products of an organism and their assigned GO terms. If a low level term is assigned to a gene product, all parents of the low level term are also automatically attributed to this gene. The GO annotation for A. thaliana is curated by “The Arabidopsis Information Resources” (TAIR) and can be accessed from TAIR website. The full GO vocabulary can be accessed from the GO project website (http://www. geneontology.org). The attribution of GO terms to genes is based on different evidence, and, in consequence, not all GO term attributions are equally trustworthy. Usually, experimental evidence underlies the annotation, but there are also instances where the attribution of
Metaanalysis of ChIP-chip Data
187
GO terms is purely based on in silico analysis. The method of investigation underlying the annotation of a GO term is indicated by an evidence code. For example, “IDA” stands for “Inferred from Direct Assay” and “ISS” for “Inferred from Sequence or Structural Similarity.” The abbreviations for all evidence codes can be viewed at the GO website (http://www.geneontology. org/GO.evidence.shtml). Based on evidence codes, the user of the GO can decide whether or not to trust the annotation. The GO annotation of an organism can be queried for single genes so that all GO terms (a.k.a. all functional information for this gene) are retrieved. In addition, the GO annotation can be queried for groups of genes, which allows to generate an overview about functions present in a large data set. In particular, GO slim terms were established for gene list queries. In the GO slim vocabulary, GO terms of lower levels are joined to one higher, more general level. The advantage of the use of GO terms over keywords is that there is only one unique GO term for a function that may be defined by several keywords. The nonambiguity of GO terms is a precondition for statistical approaches that allow comparisons between different gene lists. These comparisons are necessary to determine whether specific functions are over- or underrepresented in a list of genes. A complete description of the GO project can be viewed on the website of the GO consortium (http://www.geneontology. org). A review that also considers plant-specific GO aspects was written by Clark et al. (1). Further, tools available for GO analysis have been recently reviewed; however, not all tools accommodate Arabidopsis data (2–4). In the following, we present several tools and web interfaces for both transcriptional and functional analysis of ChIP-chip target genes. Although the example described in this chapter refers to publicly available data from A. thaliana, a general procedure also applies to individually generated expression data. In addition, several of the web tools described here also accept data from different species.
2. Materials The user should have a workstation with a Microsoft® Windows™, Apple® Macintosh®, or Unix®/Linux operating system that has a spreadsheet program, such as Microsoft (MS) Excel or OpenOffice (OO) calc, installed (please assure that your software uses a dot as decimal limiter, e.g., 100.00 for one hundred instead of 100,00). For some applications, database tools such as MS Access or OO Base are helpful. A recent web browser, such as Mozilla Firefox
188
Engelhorn and Turck
(http://www.mozilla.com), is required. Since many of the tools described here are web tools, a fast internet connection is recommended. A list of genes that are positive for the H3K27me3 histone mark can be downloaded and used to follow the steps described below (http://www.mpiz-koeln.mpg.de/english/research/couplandGroup/turck/Teaching/index.html). A table containing the developmental set of AtGenExpress expression data for these genes can be downloaded from the same site.
3. Methods 3.1. Expression Analysis for Single Genes and Expression Data Download
3.1.1. Downloading of Expression Set Data from Weigelworld
For A. thaliana, numerous expression microarray experiments have been carried out. Publicly available data from these expression sets can be useful for metaanalysis of ChIP-chip data. We describe here the use of the AtGenExpress data set, which was generated in a multinational coordinate effort to uncover the transcriptome of Arabidopsis (http://www.arabidopsis.org/info/ expression/ATGenExpress.jsp). Expression data in this set is available for different tissues, developmental stages, responses to biotic and abiotic stress, ecotypes, and other experimental parameters. In the following, we will describe the analysis of the transcriptional profile of genes that are positive for the H3K27me3 chromatin mark based on developmental series of the AtGenExpress data set (5). However, any other list of genes of interest and any other series of the AtGenExpress set could be analyzed following the same instructions. The AtGenExpress project used the Affymetrix ATH1 microarray as a common platform. Each experiment was performed on biological replicates applying standardized quality-control criteria. Expression of single genes across samples can be viewed using the BAR web service (http://bbc. botany.utoronto.ca/), which also offers a web tool for clustering and several other tools for data processing (6, 7). In order to perform clustering of large gene lists, it is preferable to obtain whole expression sets containing data for all genes present on the ATH1 array. Already normalized AtGenExpress data is available for download at The Nottingham Arabidopsis Stock Centre (NASC) (data normalized with the MAS5.0 method) and from the WEIGELWORLD (gcRMA normalized data) (see Note 1 and Tables 1 and 2). 1. URL: http://www.weigelworld.org/resources/microarray/ AtGenExpress. 2. Data sets of interest can be selected by clicking on links titled “Expression data” to download the data as a compressed file
Metaanalysis of ChIP-chip Data
189
Table 1 Summary of expression set download Source
URL
Format
Normalization
Weigelworld
http://www.weigelworld.org/ resources/microarray/ AtGenExpress
tab separated text file
gcRMA
Nottingham Arabidopsis Stock Centre (NASC)
http://affymetrix.arabidopsis. info/narrays/experimentbrowse.pl
tab separated text file/excel file
Mas5.0
Table 2 Summary of clustering platforms Tool
URL
Type
Advantages
Disadvantages
Genesis
http://genome.tugraz.at
Java
Clear layout for k-means clustering
Tree not suited to view many genes
Gene Expression Pattern Analysis Suite (GEPAS)
http://www.gepas.org
Web tool
Clear tree layout
Poor visualization of k-clusters
in zip format. The series (e.g., “Development” in our case) is indicated in parentheses. 3. Extraction of a zip archive generates a tab-separated text file containing sample codes in the first row and AGI locus codes in the first column. There are several columns for each sample (indicated by a letter suffix), which represent biological replicates. The corresponding sample code can be downloaded from the same page (named “Sample list”) (see Note 2). 4. Prior to transcriptional clustering, the mean or median of biological replicates should be calculated for each sample in the expression set with the help of spreadsheet calculating software. Save a tab-limited text file that contains only the mean (or median) expression values per experiment and gene. The text file should also contain AGI codes in the first column and a name for each experiment in the first row. The name of experiments can be chosen freely but should not contain blank spaces. 3.1.2. Downloading of Expression Set Data from NASC
In the following, “→” indicates a sequence of links or options to follow in a web tool browser interface or program menu. 1. URL: http://affymetrix.arabidopsis.info/narrays/experimentbrowse.pl => “Treeview.”
190
Engelhorn and Turck
2. Data sets of interest can be reached by selecting “AtGenExpress Project” and picking a data set from a drop-down menu (e.g., “Developmental Stage-Developmental Series”). A list of links will then open for different related experiments (in the case of “Developmental Series,” for example, “Flowers and Pollen,” “Leaves,” “Roots” and others). 3. By selecting one of the links (e.g., “Flowers and Pollen”), a download page is opened. Here, information about experimental conditions and authors of data is displayed. Download is possible of either the whole data set (recommended) or all parts of a set separately. 4. To download entire data sets => “Download all of the ATH1 chip data for this experiment.” 5. Select “Just a single value (previously known as for clustering)” for “Data” and “Basic annotation (just gene names)” for “Annotation.” We suggest downloading data as “tab-delimited,” which will reduce the size of a file to be transferred. “Download via email!” directly generates a file in a suitable txt format and avoids having to copy and pasting a text file through a browser. 6. The first three columns and empty rows of the data set need to be removed within spreadsheet software (MS Excel, OO calc). The file should be saved as a tab-delimited text file (. txt). Refer to Subheading 3.1.1, step 4 to modify the expression set for clustering. 3.1.3. Extracting Expression Data for Genes of Interest
The downloaded expression sets contain data for all genes present on the ATH1 array. To cluster genes of interest only (e.g., targets from ChIP-chip analysis), expression values for these genes have to be extracted from the entire set (see Note 3). For a small number of genes, data extraction can be performed manually by copying and pasting. To extract data for many genes, database software like MS Access or OO base should be used to select items present in one list (AGI codes of target genes) from the expression data set. We describe here the procedure for OO base, as this software is freely available and platform independent. You can use the list of H3K27me3 positive genes to follow the steps. 1. A complete OO package can be downloaded from http:// www.openoffice.org/, where one can find the detailed installation instructions. For this application, parts base (database software) and calc (spreadsheet software) are needed. 2. Open the expression data set and your list of genes in spreadsheet calculation software such as MS excel or OO calc. The first column of both lists should contain an AGI locus code in the same format (the following operations are case-sensitive) and the same headline (e.g., “Name”).
Metaanalysis of ChIP-chip Data
191
3. Create a new database by opening OO-base software and following the instructions of the “Database Wizard” (there are no adjustments needed, all settings can be used as suggested by default). 4. To load tables into a new database, click the “Tables” button on the left side of the window. This allows uploading tables from spreadsheets by copy (in spreadsheet software) and paste (in database software). Once a table is pasted in the database Table view, an assistant will open to import the data. Copy both the expression set and the list of ChIP-chip target genes into the database as separate tables (importing of the whole expression sheet may take some time). 5. Click the “Queries” button to reach the query view, choose “Create Query in SQL view...,” and type in the following command: SELECT “Table1”.* FROM “Table1”,”Table2” WHERE “Table1”.”Name” = “Table2”.”Name” “Table1” should be the expression data set and “Table2” the gene list. “Name” should be the headline of a column containing the AGI locus codes. Save this query. 6. Go back to the “Queries” menu and chose “Edit” => “Create as view.” A new table named Table3 will appear in the table view. 7. Open a new spreadsheet in OO calc, click on Table3 in the database, keep the mouse button pressed and drag it to the A1 cell in the OO calc spreadsheet. This will start the actual query. The process will take considerable time, depending on the computational process speed (about 30 min). 8. In OO calc, delete the first column (this column contains a primary key, an enumeration that is created by base), and then save the resulting table as a .csv file using tab as “Field delimiter” and no “Text delimiter.” If required, the file can be renamed as name.txt. 3.2. Clustering of Expression Data
In this section, we explain the principle of two clustering algorithms that are useful to structure the information contained in expression data sets. We then introduce the Java software Genesis and the web tool Gene Expression Pattern Analysis Suite (GEPAS) that perform clustering and explain their operation. As an input for the tools described, the data should be available as tab-separated text files containing gene names/gene codes in the first column and sample names in the first row. For the GEPAS web tool, the first entry in the first row must be “#NAMES.” This header directs the tool to choose which line can be considered as a sample name. You can use expression data for H3K27me3 positive genes to follow clustering steps.
192
Engelhorn and Turck
3.2.1. Hierarchical Clustering
Hierarchical clustering of the expression data set allows a preliminary overview of the main groupings within a gene set according to an expression pattern. The hierarchical clustering algorithm starts with the calculation of a similarity matrix, which means it calculates distances between expression values of all genes for all experiments. Then, an iterative process is started, in which two genes with the closest expression values are joined to one branch of a tree. Expression values of joint genes are then replaced in the original matrix by single entry values that represent the mean characteristics of both genes. The distance calculation and joining process is repeated until all genes are assigned. The results are displayed as a tree where genes are assigned to branches and the length of branches reflects the similarity between connected genes (Fig. 1). Analysis of the tree allows conclusions about the whole data set, because the tree reveals major expression patterns represented by major branches of the tree. In addition, the tree depicts the relationship of gene expression values between single genes. This property of the tree facilitates the identification of coexpressed genes within the data set and allows comparing expression patterns of multicopy genes. Although analysis of the cluster tree allows the identification of major branches, the interpretation becomes complex with several hundreds or thousands of genes to be clustered (the number of subbranches increases and a graphical display of the tree can reach lengths up to meters). Therefore, we use hierarchical clustering only on randomly selected subsets of data with the aim to identify major branches present in the expression set (please refer to Note 4 to obtain randomly selected subsets). The number of major branches defines the number of clusters to be used in subsequent k-means clustering of the entire set as described in Subheading 3.2.2.
3.2.2. K-means Clustering
K-means clustering is a method that groups genes together according to their expression pattern. The number of groups (k) has to be specified in advance (we normally use the number of major branches determined by hierarchical clustering). In contrast to hierarchical clustering, where genes are grouped together gradually according to their similarity in expression pattern, k-means clustering leads to a defined number (k) of gene groups. This allows to generate a delimited list of genes (one of the groups) expressed in a certain condition or tissue. An additional advantage of the k-means algorithm lies in its low memory requirement and fast calculation speed. K-means clustering returns an average expression pattern for each cluster, which is called the center of the cluster. In an iterative process, the algorithm searches for both the best position for centers and the best assignment of genes to centers. First, cluster centers are randomly set and genes are assigned to the center with the lowest distance. Then the average for each cluster is calculated,
Metaanalysis of ChIP-chip Data
Seedsstage 10 Mature pollen siliques Flowersstage 12 petals Flowersstage 12 carpels flowersstage 12 flower Shootapex inflorescence (after bolting) Shoot apex transition (before bolting) shoot apex vegetative Cauline leaves Rosette leaf #10 leaf Vegetative rosette cotyledons seedling hypocotyl root
1.760
193
14.350
At1g67870 At3g58010 At2g03310 At2g29370 At3g26300 At5g40780 At2g39850 At4g12490 At2g14920 At1g63030 At3g55970 At3g17230 At1g66310 At3g55550 At5g02000 At5g54700 At3g61340 At3g45220 At3g58890 At1g51880 At5g19560 At1g30710 At4g18790 At3g27840 At3g46570 At4g39490 At4g21020 At4g22100 At1g72570 At2g40050 At1g23580 At5g17340 At2g01530 At5g23220 At5g50760 At5g24090 At3g54770 At1g30650 At3g12230 At4g31730 At4g36380 At3g01420 At3g12460 At5g06250 At1g30160 At1g13400 At4g24540 At4g21750 At4g18910 At2g38110 At1g24470
Fig. 1. An example of hierarchical clustering result for H3K27me3 positive genes. The output of the hierarchical clustering function of a web tool GEPAS: Here, only a subset of genes was clustered according to their expression in a subset of samples. A linear correlation coefficient (Pearson) was used to obtain this cluster. The original output has been changed from blue and red to grayscale. Major branches of the tree are highlighted by a bold line
194
Engelhorn and Turck
and the center is relocated to this average value. Afterwards, the genes are assigned to the cluster center with the lowest distance. The averaging and reattribution is repeated for a number of iterations specified in advance, or until no further improvement in the overall distance of genes to their cluster centers can be achieved. A disadvantage of the k-means algorithm lies in the random choice of the first cluster centers and its iterative character, which implies that there is no unique solution. Therefore, some genes are assigned to different clusters in two consecutive calculations based on the same expression data set. In practical terms, we consider it reasonable to repeat the calculation several times and choose only those genes that are stably assigned to a given cluster of interest for further analysis. Ten repeats of the k-clustering approach are sufficient for our example data set to generate stable lists of genes that share a particular expression pattern (expression values of developmental series for genome-wide H3K27 targets). The number of major branches detected in hierarchical clustering of a subset of H3K27me3 target genes indicated that nine principal expression patterns were detectable within the set. Therefore, k = 9 seems to be adequate for k-means clustering of our example data based on the developmental set from AtGenExpress (see Note 5). The method generates nine lists of H3K27me3 target genes expressed in a specific part of the plant (about 3,000 genes for which expression data were available). In fact, a pattern of nine major expression branches seems to be a characteristic of the developmental series data set, since similar organ-specific patterns are obtained, if all or a large number of randomly selected genes present on the ATH1 expression array are clustered (22,812 genes, entire developmental series data set). 3.2.3. Clustering with Genesis-Software
1. Download: http://genome.tugraz.at/ 2. For system requirements, installation, and detailed description of all functions in Genesis, see the Genesis Operation Manual, the author’s Master thesis (can be downloaded from a web page), and Sturn et al. (8). Please be aware that a license for this software has to be requested prior to installation. The license is free for noncommercial users. For some applications (e.g., hierarchical clustering of the whole genome), the memory used by Genesis has to be enlarged (default 512 MB); the Genesis Operation Manual describes how this is performed for different operating systems (see Note 6). 3. Load an expression file (must be in a tab-delimited format). You may use the provided file containing H3K27me3 targets and AtGenExpress data to follow the steps. 4. Expression values are visualized as a heatmap in a red color (if only positive values are present in the data set) or red and green colors (if positive and negative values are present).
Metaanalysis of ChIP-chip Data
195
5. Per default, the maximum color intensity is set to 3.0/−3.0. All higher/lower values are also displayed with the maximum intensity. Adjust the maximum value to the real maximum of expression values or to a customized value (the main menu “view” => “adjust to maximum”/“set maximum”). 6. Genes can be searched for, either within clusters or in the complete set. The AGI locus code of the queried gene will be highlighted in pink. 7. Depending on the original data and biological implication, the data may need an adjustment: “adjust”=> “different options” (for example a transformation from a linear scale to a log-scale). Please note that the adjustment can not be reverted, and the data set needs to be reloaded if a different calculation is to be performed. If the goal of clustering is analysis of overall expression patterns rather than absolute expression values, divide values per gene by the root-mean-square for each gene (rms) (see Note 7). Dividing by rms leads to an adjustment of absolute values between the genes but retains the expression pattern. 8. Hierarchical clustering: activate the “HCL” button in the toolbar to open a dialog window. Choose “average linkage clustering” (see Note 8) and “cluster genes.” If the experimental conditions/samples are also to be clustered, choose “cluster experiments” (see Note 2). The tree can then be displayed by activating “Tree” (“average linkage”) in the program tree (the left side of the window, the data tree to navigate through the results). 9. K-means clustering: activate the “KMC” button on the left side of the analysis tool bar to open the dialog. Choose the required number of clusters and the maximum number of iterations (see Note 9). Results are displayed in the program tree in different ways: ●●
●●
●●
●●
“Expression Images”: shows color-coded expression of all genes in the chosen cluster. “Cluster Information”: provides the number of genes per cluster and their percentage of the whole set. “Centroid Views”: returns a graph showing samples of the expression-set on the x-axis and the average expression on the y-axis. The variation inside the samples is indicated by bars (Fig. 2). “Expression Views”: the same axes as for “Centroid Views” but with a line for every gene. Centroid and expression views can also be displayed for all clusters at once (Fig. 2). This is useful to compare clusters and find clusters with special attributes (e.g., expression only in one tissue type).
196
Engelhorn and Turck
Cluster 1-427 Genes
Cluster 2-1962 Genes
Cluster 3-197 Genes
Cluster 4-613 Genes
Cluster 5-245 Genes
Cluster 6-130 Genes
Cluster 7-292 Genes
Cluster 8-435 Genes
Cluster 9-377 Genes
Fig. 2. K-means clustering of expression data from genome wide H3K27me3 targets. Each box represents one k-means cluster. Different plant organs are presented on the x-axis (from left to right: root, stem and whole plants, leaves, shoot apex, flowers, and seed samples); expression values are presented on the y-axis. Squares represent the average expression of one sample; bars indicate the variance within each sample (output “Expression Centroid View” from Genesis; note that blank spaces appear in the output because no negative values are present in the data set)
10. Saving of data: the complete project and all images can be saved by using the menu “file” => “save project”/“save expression image.” Lists of genes from single clusters can be saved by clicking on the cluster with the right mouse button and choosing the data type (for example “save cluster gene list”). 3.2.4. Clustering Using GEPAS
1. URL: http://www.gepas.org 2. GEPAS (9) is a web-based implementation of the software “Cluster” (10) and its visualization tool “Treeview”
Metaanalysis of ChIP-chip Data
197
(“Mapletree” for Linux-systems). Both programs can also be downloaded and used on a local computer. Since there are different requirements and options for different platforms, please refer to http://rana.lbl.gov/eisen => “software” => “Cluster Analysis and Visualization” for installation instructions and the manual. 3. Clustering tools can be accessed by activating “Tools” => “Clustering.” 4. Hierarchical clustering: “Unweighted pair-group method using arithmetic averages (UPGMA)” should be chosen as a cluster method (see Note 8). As in the Genesis software, the data set can be clustered according to absolute expression values or a relative expression pattern. Different distance measurements between clusters are proposed. For absolute values, one should choose “Euclidean” as “Distance” for clustering according to the pattern of expression – one of the correlation coefficients (see Note 7). If samples should also be clustered, “clustering of conditions” can be chosen. Analysis is started by entering a job name and activating the “run” button. 5. The results will be stored on the server and can be accessed by clicking on the job name in the upper right corner of the screen. The results can be visualized by clicking on “SEND TO ETE TREEVIEWER.” In the following dialog screen, “run” results in a tree and heatmap display of expression patterns. The scale can be adjusted via “Profile” => “Set color scale.” 6. K-means clustering: The “NonHierarchical” tab must be checked to perform k-means clustering. Here, “Method” => “k-means” and a desired number of clusters (k) are chosen. For clustering absolute values, “Euclidean” should be used as “Distance”; to consider expression patterns, the “Spearmann’s rank correlation coefficient” should be used. If samples are to be clustered as well, “clustering of conditions” can be chosen. Analysis is started by entering a job name and activating the “run” button. 7. The results of k-means clustering can be visualized with “ETE Tree Viewer.” To view the expression pattern per cluster, activate the summary button on the toolbar. 8. The treeviewer output for hierachical and k-means clustering can be saved by clicking on “Capture image”. The image can then be accessed by selecting the job name on the right. To save cluster gene lists from k-means clustering, save the file ending with .cl. The file contains genes of each cluster as tab separated text.
198
Engelhorn and Turck
3.3. Gene Ontology Analysis
Several web tools offer the analysis of GO terms in a given gene list for A. thaliana. These tools can be divided into simple annotation displays and more sophisticated tools that calculate whether certain GO terms are statistically over- or underrepresented.
3.3.1. Basic Annotation Using TAIR
The easiest way to access Arabidopsis GO data is to use the TAIR webpage (http://www.arabidopsis.org). Complete annotations for gene lists can be displayed via “Search” => “GO Annotations.” A list of AGI locus identifiers can be entered and submitted by “Get All GO Annotations.” The result of this query displays a complete listing of GO terms associated with a submitted gene list, including evidence codes, corresponding GO slim categories and links to references. In addition, an overview of functions present in a gene set is generated, if the option “Functional Categorization” is activated. This function displays a list of all GO slim categories and the number of genes in the submitted list that belong to each category. For a graphical representation, follow “Gene Bar Chart” => “Draw.” After the selection of this function, bar charts are drawn that show for each GO slim category the percentage of genes that belong to this category in the submitted data set. An impression of the general distribution of GO slim categories in the entire Arabidopsis genome can be provided by following the option “Whole Genome Categorization.”
3.3.2. Functional Enrichment Analysis
Functional enrichment analysis tools employ statistical methods to test whether a certain GO term appears at higher or lower than expected frequency in a gene list. To perform such an analysis, a reference set of genes is needed to calculate the expected background distribution of GO terms. From reference sets, tools calculate the expected distribution of GO terms in a randomly selected list of the same size as the query gene list. The GO term distribution in the reference set is compared to the target list, and this comparison allows generating a statement on whether there are overrepresented or underrepresented GO terms in the query list compared to the reference list. Tools testing for over/underrepresentation of GO terms are available as web interfaces and as freely available stand-alone programs. GO tools differ in their employed statistical method, their input (GO slim versus GO), and their output and flexibility. Since all tools are well documented and updated constantly, it does not make sense to describe the exact handling of them. Rather, we will present an overview about tools that can be used for metaanalysis of Arabidopsis ChIP-chip data (please be aware that this list does not include all available tools). The
Metaanalysis of ChIP-chip Data
199
overview enables the reader to choose the tools that are suited for the analysis of a personal data set (see Table 3). Depending on the GO tool, the required input file is a text file containing AGI locus codes or Affymetrix probe IDs. To obtain Affymetrix probe IDs from AGI locus codes, submit a list of AGI locus codes to “Search” => “Microarray elements” on the TAIR webpage (http://www.arabidopsis.org), or use the “_at to AGI Conversion Tool” from the BAR web service (6). 3.3.3. Choosing Your Reference Set
Depending on how a gene set of interest was generated, different reference sets are appropriate. Some GO tools offer gene sets based on commonly used microarrays as reference, while other tools require a list to be supplied by the user. If a microarray was involved in generating a gene set for clustering (like the ATH1 Affymetrix in the case of AtGenExpress data), it makes sense to compare genes of interest to genes present on the microarray rather than comparing them to the whole genome.
3.3.4. Tests and Statistics
All tools start with the null hypothesis that a certain GO term appears in the query list as often as in a randomly picked list of genes from the reference set. As a result, p-values are generated that reflect the probability that the null hypothesis is true. If these p-values fall below a certain threshold (usually 0.05), the null hypothesis is rejected, and the GO term is then called over- or underrepresented in the query gene set. One difference between the available GO tools is the distribution assumed for a given hypothetically randomly selected set. The options offered are a hypergeometric distribution, a binomial distribution, and a c2 distribution. Practically, there are no significant differences between outputs based on the three different methods. It should be mentioned that the hypergeometric distribution gives a correct description of the situation (in this distribution, genes can only be sampled once for the random distribution), whereas the binomial and c2 distributions are only approximations for larger reference sets (binomial-distribution) and large reference sets with large sample sets (c2 distribution) (3). The p-values obtained from all three tests are usually corrected by false discovery rate (FDR) estimation. FDR correction estimates the probability of a wrong rejection of the null hypothesis and adjusts the p-values according to this probability. There are several methods to estimate the FDR correction term, and several tools offer more than one. Among those methods, the Bonferroni familywise error rate is the most conservative. In contrast, FDR methods (e.g., Benjamini and Hochberg, Benjamini and Yekutieli) are less conservative but well-suited, especially if dependencies exist as in the case of GO terms (3, 13). Simulation methods should not be used, if only a few categories are involved (3).
AGI locus user list, codes whole genome
no Fisher’s Westfall and exact test Young, Benjamini and Hochberg, Benjamini and Yekutieli
hyperno Benjamini geometric, and Hochberg, Fisher’s Benjamini exact test, and Yekutieli c2 test
Webhttp:// tool babelomics. bioinfo.cipf.es/
Webhttp://omicslab. tool genetics.ac.cn/ GOEAST/
FatiGO
GOEAST
no
yes
clicking on “details” shows information about each gene in query
calculation Jobs are times can stored be very long under login at certain name, easy times of day to handle
use “Batchgenes” for whole genome
(19)
(12) Cave: “remove all duplicates” removes duplicates from both lists
(18)
plugin of no automatic Cytoscape update, under representation has to be tested separately
Ref (11)
Comment uses GO-slim (only general functions)
table
fast overview
very fast, easy to handle
no
Disadvantages
table, no network
Bar-chart
ATH1 array, table, tree Affywhole metrix, genome AGI locus codes
AGI locus whole codes genome
yes
Benjamini and hyper Hochberg, geometric, Bonferroni binomial
Java http://www.psb. ugent.be/cbd/ papers/BiNGO/
BiNGO
AGI locus whole codes genome
no
Webtool
http://bar. utoronto.ca/
Classification super viewer
ratio between frequencies
Type simulation
GO-level Reference Visuali- indiAdvansets zation cated? tages
Select evidence Input codes? IDs
Test correction
URL
Statistic test method
Name
Table 3 Summary of GO tools
Genecodis http://genecodis. Webtool dac.ya.ucm.es/ analysis/
http://gostat.wehi. Webtool edu.au/cgi-bin/ goStat.pl
Gostat
hyper-geometric, c2 Benjamini and Hochberg, simulation
Benjamini and Hochberg, Benjamini and Yekutieli
hyper-geo- Benjamini and metric, Yekutieli binomial, c2 test
http://bioinformat- Webtool ics.cau.edu.cn/ easygo/
EasyGO
Fisher’s exact test
hyperBonferroni, geometric simulation
Webhttp:// tool go.princeton. edu/cgi-bin/ GOTermFinder
GOTerm Finder
AGI locus user list, codes whole genome
AGI locus user list, codes whole genome
no
no
ATH1 Affyarray metrix, AGI locus codes
AGI locus user list, codes whole genome
yes
yes
table, piechart
additional levels returns can be genes that selected share a group of GO-terms
GO terms only displayed as numbers
Gene names displayed
no
table
every category Clear tree of GO has to showing be tested which separately genes are assigned to each term every branch of Overview the GO has shown first, deeper levels to be tested separately can be visualized stepwise
no
text-tree, yes graphic
table, tree
user GOannotation file can be used
(23)
(22)
(21)
(20)
202
Engelhorn and Turck
3.3.5. Drawbacks of GO
One should keep in mind that in a large annotation file like the GO annotation, there will always be errors. Corrections of acknowledged erroneous annotations are included in the next release of the annotation file by TAIR. Therefore, only the latest version of the GO annotation should be used for metaanalysis. A recurrent problem for GO annotations in the plant field is that the GO hierarchy was established for animal models, which can lead to misinterpretation of GO terms in plants. Usually, these drawbacks do not significantly change the outcome of gene set analysis but can lead to wrong conclusions for single genes. Therefore, functional annotations for single genes should be verified by considering evidence codes, available publications, and individual BLAST results. In particular, when working with gene families, a wrong annotation of one gene family member can be transferred to the entire family. Misannotation errors are somewhat avoided if only experimentally verified evidence codes are included in GO term analysis. A recent review (14) provides readers with detailed information about drawbacks of GO analysis.
3.3.6. Analysis of H3K27me3 Positive Genes as an Example of GO Term Metaanalysis
The H3K27me3 histone mark is generated by Polycomb Group proteins. Analyses of Arabidopsis mutants that carry mutations in the genes that encode Polycomb Group proteins strongly suggest that the function of these proteins is required for all aspects of plant development. However, a GO-slim analysis using the “Classification super viewer” from the BAR web interface (11) showed only a slight enrichment of the term “development,” if carried out with all H3K27me3 positive genes identified by ChIP-chip. We aimed to find genes involved in embryonic development among the H3K27me3 targets. Since embryonic development mainly takes place in developing seeds, we first identified a cluster of genes highly expressed in seeds, using the k-means algorithm for development series array data (Fig. 2, cluster 3). The H3K27me3 target gene seed cluster was then submitted for GO enrichment analysis by “FatiGO” (12). All genes present on the ATH1 array were used as a reference set, since the ATH1 array was used in the AtGenExpress project. FatiGO analysis revealed an overrepresentation of GO terms involved in embryonic development (e.g., GO term: “embryonic development ending in seed dormancy”) (Fig. 3). To ensure that this overrepresentation was not a quality of the fact that genes were expressed in seeds, but because the genes were H3K27me targets, we performed transcriptional k means clustering for all genes present on the ATH1 array. Clustering analysis revealed a seed-expressed cluster, which was also submitted to GO analysis. The result of this FatiGO submission did not return overrepresented “developmental” GO terms. The conclusion at this step is that the H3K27me3 target
Metaanalysis of ChIP-chip Data
203
All K27me3 targets
GO Slim analysis extracellular Cell wall transcriptionfactor activity receptor binding or activity transcription electron transport or energy pathways other membranes developmental processes other molecular functions
0
1
2
3
4
5
Only slight enrichment for "development" Cluster analysis
Seed cluster GO Full analysis embryonic development (GO:0009790) p = 2.22.10−9 seed development (GO:0048316) p = 4.04.10−15 embryonic development ending in seed dormancy (GO:0009793) p=8.44.10−10
Genes in seed cluster have a high probability to be involved in embryonic development Fig. 3. Overview of example procedures. GO slim analysis was performed with the “Classification super viewer” from the BAR web interface (11) displayed as a selected part of the output. GO full analysis was performed by the tool FatiGO (12), only a selection of enriched GO terms is shown. The p-values given here are FDR-adjusted p-values
genes include most of seed-expressed genes that have already been characterized as developmentally important. As a second control, all H3K27me3 targets for which expression data was available in the developmental series (these genes are the ones that had a chance to be in the seed-expressed cluster) were submitted to GO analysis. Among the H3K27me3 genes, developmental GO terms were enriched but not embryo-specific ones. Thus, combining clustering analysis with functional GO term enrichment analysis was a precondition for identifying a group of genes (the seed-expressed H3K27me3 targets) that are with high probability involved in embryonic development. It also allowed concluding that HeK27me3-positive genes play a role in embryo development.
204
Engelhorn and Turck
3.4. AFAWE: Metaanalysis Beyond GO
The ultimate goal of ChIP-chip data metaanalysis could be the identification of candidate genes that are involved in a process of interest but have not yet been studied with regard to that function. In our example, we can conclude that H3K27me3 targets with unknown functions that are expressed in the seed cluster have a high probability to play a role in embryo development. However, before embarking on a directed (e.g., reverse genetics) approach for studying the function of these candidate genes, it is advisable to compile all available information on these targets. There are many tools and web pages available for that purpose and well-known among the biological community (e.g., TAIR webpage and BLAST). Nevertheless, we would like to mention here a functional prediction tool named AFAWE (15). AFAWE performs a functional prediction based on structure similarity, domain prediction, and phylogeny. The tool executes several BLAST and protein domain searches by calling InterProscan, BLAST against the SwissProt, UniProt, the RefSeq database, and the RPSBlast database. InterProscan additionally uses the function InterPro2GO, which predicts GO terms for genes submitted according to protein domains that have been found. A phylogenomic pipeline is included that uses a program called “SIFTER” to compile information about relationships (phylogeny) of gene products and sequence similarity (genomic information) (16). In the end, AFAWE returns “molecular function” GO terms that have been predicted based on a combination of BLAST and phylogeny analysis. A phylogenetic tree for the query gene shows putative paralogous and orthologous genes. AFAWE facilitates a fast comparison between the abovementioned functional prediction tools by highlighting trustworthy results. Each user has the possibility to add a detailed manual annotation inside AFAWE, and this annotation is used to update the automatic functional annotation in public databases. AFAWE is available at http:// bioinfo.mpiz-koeln.mpg.de/afawe/.
4. Notes 1. The Affymetrix® Microarray Analysis Suite 5.0 (Mas 5.0) was used to normalize the data stored at NASC. Mas 5.0 performs trimmed mean calculation: the mean of all values in the array, except 2% of the upper and lower values, is calculated and used to scale the data so that the trimmed mean value is 100. The gcRMA (robust multiarray average) algorithm (17) uses a global background and takes the GC-content of the probe into account for correction. Expression values are calculated using a log2 scale.
Metaanalysis of ChIP-chip Data
205
2. Order samples according to tissue and developmental stage. A possible order (from root to flowers) is proposed by the “AtGenExpress Visualization Tool (AVT)” on the weigelworld website. Submit any AGI locus code and download a tabdelimited file containing the slide numbers ordered, tissue descriptions, and expression values. Another option is to cluster experiments with the help of clustering software. Clustering of experiments will group together tissues with a similar expression pattern for many genes. 3. Please note that not all genes of interest will be in the data set as not all genes of the present annotation have probes on the ATH1 microarray. 4. There are two easy possibilities to generate a random expression set. Use the “Random ID list generator” on the BAR web page (6) to generate random lists of AGI locus codes. Enter a name for your random list in the first column of a spreadsheet program and the number of AGI-codes that the random list should contain in the next column. Copy two cells and paste them in the query box of the “Random ID list generator.” After submitting the data, the tool prints a random list of AGI codes that can be copied back to a spreadsheet. This list can be used to extract expression data for the genes that are also present in the original set (all ChIP-chip targets) (see Subheading 3.1.3, “Table1” should be the expression set containing only the ChIP-chip data, and “Table2” – the random gene list). A second approach to obtain a random gene list is to generate random numbers in a spreadsheet program. Open the expression set containing expression data for ChIP-chip targets in a spreadsheet program and add one column to the data set. This column will be filled with random numbers via the RANDBETWEEN command. In OO calc, type (“=RANDBETWEEN(1; number of genes in list)” in the first data line, and in MS Excel – “=RANDBETWEEN(1, number of genes in list)”. Note that in MS Excel, the “analysis toolpak” add-in has to be installed for this function. Random numbers are still formulas; therefore, the whole column has to be cut out and pasted back in via “paste special” without pasting formulas (in MS Excel, select “paste special” => “value”). After generating random numbers, a complete document (without a header line) is sorted according to random numbers. From the sorted spreadsheet, the desired number of genes can be selected and used to generate a data subset. We found that 400 genes generate a representative subset of 3,000 H3K27me target genes. 5. A higher k number results in clusters that show similar expression patterns.
206
Engelhorn and Turck
6. Administrator rights are needed to install Genesis. The program needs unrestricted write access to its own folder. 7. If the aim of clustering analysis is to find genes similar in their expression patterns and their expression levels, absolute values should be used. The algorithm will use expression values measured in the microarray experiment to calculate the distance between genes. This signifies that two genes, which are expressed in the same tissues/conditions but at very different levels (e.g., 1 gene is 100 times stronger than the other), are located at a substantial distance and are not grouped together. If the aim is to find genes with similar expression patterns regardless of their expression level, different options should be used. Genesis software offers a function to divide genes by their rms-value (rms is the mean of values corrected by the algebraic sign). Division by rms (which is the mean corrected by the algebraic sign, thus the mean for positive values) generates similar expression values between genes but preserves differences in pattern. GEPAS employs correlation coefficients to cluster genes according to expression patterns regardless of their expression level. In this approach, the deviance from the mean divided by standard deviation is used for the comparison, which leads to similar results as the normalization by rms. 8. Average linkage clustering, or the UPGMA algorithm in GEPAS, should be employed, since this algorithm takes mean distances between genes as a value for cluster analysis. Therefore, average linkage clustering considers the overall similarity for analysis. In contrast, complete linkage and single linkage algorithms consider a distance between the farthest or the nearest elements of two genes as a value for the distance. GEPAS documentation explains different methods in more detail. 9. During each iteration step, a new position for cluster centers is chosen. Usually, the program stops calculation after several iterations at a point, where no further improvement of average distances is achieved. Prefixing the maximum number of iterations avoids entering in an endless calculation loop in cases where no optimal average distance can be found. Usually, the default of 50 iterations is sufficient for the applications described in this chapter (for the complete developmental series data set, 33 iterations was the highest number we observed). The number of iterations is provided as “General Information” among the clustering results. If 50 iterations were truly performed, the maximum number of iterations should be set to a higher value because this result would indicate that the best solution for cluster centers was not yet reached. A detailed description of the number of iterations can be found in a master thesis of Alexander Sturn (see Subheading 3.2.3, item 2).
Metaanalysis of ChIP-chip Data
207
Acknowledgments We thank Drs. Seth Davis and Anika Jöcker for critical reading of the manuscript. References 1. Clark JI, Brooksbank C, Lomax J (2005) It’s all GO for plant scientists. Plant Physiol 138:1268–1279 2. Coulibaly I, Page GP (2008) Bioinformatic tools for inferring functional information from plant microarray data II: analysis beyond single gene. Int J Plant Genomics 2008:893941 3. Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21:3587–3595 4. Rivals I, Personnaz L, Taing L, Potier MC (2007) Enrichment or depletion of a GO category within a class of genes: which test? Bioinformatics 23:401–407 5. Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M et al (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37:501–506 6. Toufighi K, Brady SM, Austin R, Ly E, Provart NJ (2005) The botany array resource: e-Northerns, expression angling, and promoter analyses. Plant J 43:153–163 7. Winter D, Vinegar B, Nahal H, Ammar R, Wilson GV, Provart NJ (2007) An “electronic fluorescent pictograph” browser for exploring and analyzing large-scale biological data sets. PLoS One 2:e718 8. Sturn A, Quackenbush J, Trajanoski Z (2002) Genesis: cluster analysis of microarray data. Bioinformatics 18:207–208 9. Montaner D, Tarraga J, Huerta-Cepas J, Burguet J, Vaquerizas JM, Conde L et al (2006) Next station in microarray data analysis: GEPAS. Nucleic Acids Res 34:W486–W491 10. de Hoon MJ, Imoto S, Nolan J, Miyano S (2004) Open source clustering software. Bioinformatics 20:1453–1454 11. Provart NJ, Zhu T (2003) A browser-based functional classification superviewer for Arabidopsis genomics. Curr Comput Mol Biol 2003:271–272 12. Al-Shahrour F, Minguez P, Tarraga J, Medina I, Alloza E, Montaner D et al (2007) FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res 35:W91–W96
13. Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1168–1188 14. Rhee SY, Wood V, Dolinski K, Draghici S (2008) Use and misuse of the gene ontology annotations. Nat Rev Genet 9:509–515 15. Jocker A, Hoffmann F, Groscurth A, Schoof H (2008) Protein function prediction and annotation in an integrated environment powered by web services (AFAWE). Bioinformatics 24:2393–2394 16. Engelhardt BE, Jordan MI, Muratore KE, Brenner SE (2005) Protein molecular function prediction by Bayesian phylogenomics. PLoS Comput Biol 1:e45 17. Irizarry RA, Hobbs B, Collin F, BeazerBarclay YD, Antonellis KJ, Scherf U et al (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–264 18. Maere S, Heymans K, Kuiper M (2005) BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21:3448–3449 19. Zheng Q, Wang XJ (2008) GOEAST: a webbased software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res 36:W358–W363 20. Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM et al (2004) GO::TermFinder – open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20:3710–3715 21. Zhou X, Su Z (2007) EasyGO: Gene Ontology-based annotation and functional enrichment analysis tool for agronomical species. BMC Genomics 8:246 22. Beissbarth T, Speed TP (2004) GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20:1464–1465 23. Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Montano A (2007) GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists. Genome Biol 8:R3
Chapter 15 Chromatin Immunoprecipitation Protocol for Histone Modifications and Protein–DNA Binding Analyses in Arabidopsis Stéphane Pien and Ueli Grossniklaus Abstract Epigenetic gene regulation via histone modifications controls different processes ranging from embryonic development, vegetative development, floral induction, floral organ development, to pollen tube growth. The identification of an increasing number of epigenetically regulated processes was greatly advanced by genome-wide histone modification and chromatin–protein interaction surveys. However, genome-wide approaches are too global to access in detail a large number of histone modifications taking place at a single locus. Here we provide a robust Chromatin Immunoprecipitation (ChIP) protocol, allowing in vivo analyses of multiple chromatin modifications and binding of histone modifiers in different plant organs and tissues. This method is quantitative and provides a way to study the dynamic state of chromatin during plant development and also in response to different environmental stimuli. Key words: Arabidopsis, Chromatin, Histone modification, Epigenetic, ChIP, Protein–DNA interaction
1. Introduction Understanding the control of the dynamic state of chromatin is essential for unraveling the epigenetic processes involved in the regulation of the development of multicellular organisms. In eukaryotes, DNA is organized within chromatin, which consists of double stranded DNA wrapped around nucleosomes. Nucleo somes are composed of histone proteins, and contain two copies of each of the core histones H2A, H2B, H3, and H4. Histones are basic, globular proteins, whose N-terminal tails protrude from the core nucleosome. Histone tails are often modified by chromatin modifying enzymes, which can mediate methylation, Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_15, © Springer Science + Business Media, LLC 2010
209
210
Pien and Grossniklaus
acetylation, phosphorylation, citrullination, sumoylation, ubiquitinylation, and ADP-ribosylation. Specific combinations of these histone modifications result in the maintenance of an active or repressed state of transcription. Several plant Polycomb group (PcG) and trithorax group (trxG) proteins have been shown to encode chromatin-modifying enzymes or are part of modifier complexes (1). Among the PcG and trxG proteins are SET domain proteins that control diverse plant developmental processes via their histone lysine methytransferase activity. PcG complexes were shown to repress PcG target genes via the modification of N-terminal tails of histones, mainly of H3K27 and H3K9. Conversely, trxG complexes maintain gene expression through the deposition of H3K4 and H3K36 methyl marks on their targets. Chromatin Immuno precipitation (ChIP) is the method of choice to monitor in vivo histone modifications and DNA–protein interactions (2). This method allows the detection of histone modifications and chromatin binding proteins present at a specific locus. It is quantitative and provides a way to study the dynamic state of chromatin during plant development and also in response to different environmental stimuli. This last point is of importance since biotic and abiotic stresses can result in epigenetic modifications of the genome (reviewed in (3, 4)). In this protocol, plant tissue is harvested and immediately subjected to a cross-linking step (Fig. 1). After in vivo cross-linkage of the chromatin and associated proteins, e.g., chromatin modifying enzymes, chromatin is extracted and sheared into fragments ranging from 300 bp to 1,000 bp. This sheared chromatin is semi-purified and incubated with commercially available antibodies that recognize and bind either specific histone modifications, chromatin modifiers, or other associated proteins. The antibody–chromatin complexes are extracted using protein A agarose beads. In order to reduce nonspecific binding of the antibody to chromatin, the immunoprecipitated complexes are processed through several washing steps. The chromatin is released from the antibody and associated proteins through a de-crosslinking step. The resulting DNA is precipitated and used in PCR reactions with specific primers designed against the gene of interest. The amplified immunoprecipitated gene fragments are detected on an agarose gel and quantified or analyzed by quantitative PCR. This last step shows whether histone modifications or binding of specific modifier proteins are associated with the investigated DNA fragments. This protocol facilitates the probing of multiple histone modifications or DNA–protein interactions in a single experiment, which allows the simultaneous monitoring of many epigenetic marks.
Chromatin Immunoprecipitation Protocol for Histone Modifications and Protein–DNA
211
Fig. 1. A summary of experimental steps in the ChIP protocol. A schematic representation of cross-linking, shearing, immunoprecipiation, reverse cross-linking, and PCR analysis is shown.
2. Materials 2.1. Plant Growth Medium
1. Murashige and Skoog (MS) salt base (4.3 g/L) (Carolina Biological Supply Company) supplemented with sucrose (10 g/L) and adjusted to pH 5.6. Add Phytagar (9 g/L) (Sigma-Aldrich) and autoclave. Pour the MS medium in Petri dishes and store plates at 4°C.
2.2. Equipment for ChIP
1. Vacuum chamber. 2. Refrigerated centrifuge for 50-mL Falcon tube. 3. Refrigerated centrifuge for Eppendorf 1.5- and 2-mL tubes. 4. Sonicator (Bioruptor UCD-200TM, Diagenode). 5. Vortexer.
212
Pien and Grossniklaus
6. Cold room (4°C). 7. Rotating wheel. 8. Heating block. 9. Thermal cyler/PCR machine. 10. Agarose electrophoresis equipment. 11. UV illuminator coupled to an image acquiring system. 2.3. ChIP Solutions
All solutions with the mention “prepare fresh and keep on ice” have to be prepared just prior to processing tissue samples or chromatin material. 1. Phosphate buffered saline (PBS): prepare 10× stock with 1.37 M NaCl, 27 mM KCl, 100 mM Na2HPO4. 2. Formaldehyde (37%) (Fluka). Caution: Formaldehyde is toxic through skin contact and inhalation of vapors. Manipulations involving formaldehyde should be done in a chemical fume hood. Gloves are required to handle it. 3. Stop cross-linking solution: 1× PBS, 0.125 M glycine. 4. Tris–HCl/pH 8.0: 1 M Tris, adjusted to pH 8.0 with concentrated HCl solution. 5. Tris–HCl/pH 6.5: 1 M Tris, adjusted to pH 6.5 with concentrated HCl solution. 6. Ethylene diamine tetraacetic acid (EDTA): 0.5 M Na2EDTA.2H2O, adjusted to pH 8.0 with NaOH solution. 7. Extraction buffer 1: 0.4 M sucrose, 10 mM Tris–HCl/pH 8.0, 10 mM MgCl2, 1 mM Phenyl Methane Sulphonyl Fluoride (PMSF). Caution: PMSF is extremely toxic to mucous membranes of the lung, eyes, and skin. Any contact by inhalation, swallowing, or contact with skin must be avoided. Face shield and gloves are required to handle it. Prepare fresh and keep on ice. Before use dissolve 1 mini-tablet protease inhibitor (Roche) in 30 mL buffer. 8. Extraction buffer 2: 0.25 M sucrose, 10 mM Tris–HCl, pH 8.0, 10 mM MgCl2, 1% Triton X-100, 1 mM PMSF. Prepare fresh and keep on ice. Before use dissolve half mini-tablet protease inhibitor in 8 mL buffer. 9. Extraction buffer 3: 1.7 M sucrose, 10 mM Tris–HCl, pH 8.0, 2 mM MgCl2, 0.15% Triton X-100, 1 mM PMSF. Prepare fresh and keep on ice. Before use dissolve half mini-tablet protease inhibitor in 10 mL buffer. 10. Nuclei lysis buffer: 50 mM Tris–HCl, pH 8.0, 10 mM EDTA, 1 % SDS, 1 mM PMSF. Prepare fresh and keep on ice. Before use dissolve half mini-tablet protease inhibitor in 5 mL buffer.
Chromatin Immunoprecipitation Protocol for Histone Modifications and Protein–DNA
213
Caution: SDS is extremely toxic to mucous membranes of the lung, eyes, and skin. Any contact by inhalation, swallowing, or contact with skin must be avoided. Face shield and gloves are required to handle it. 11. ChIP dilution buffer: 16.7 mM Tris–HCl, pH 8.0, 1.2 mM EDTA, 1.1% Triton X-100, 167 mM NaCl. Prepare fresh and keep on ice. Before use dissolve half mini-tablet protease inhibitor in 20 mL buffer. 12. Salmon sperm DNA/protein A agarose beads (Upstate). Protein A agarose beads should be washed three times in 1.5mL precold ChIP dilution buffer before use (see Note 1). 13. Loading Buffer: 10× stock: 50% glycerol; 100 mM EDTA; 0.1% Bromophenol Blue; 0.1% xylene cyanol FF. Caution: xylene cyanol FF causes respiratory tract, eye, and skin irritation and may be harmful if swallowed. Use under chemical hood with gloves and face shield. 14. 1 Kb DNA ladder (Invitrogen). 15. Elution buffer: 1% SDS, 0.1 M NaHCO3. Prepare fresh and keep on ice. 16. Low salt wash buffer: 20 mM Tris–HCl/pH 8.0, 2 mM EDTA, 150 mM NaCl, 0.2% SDS, 1% Triton X-100. Prepare fresh and keep on ice. 17. High salt buffer: 20 mM Tris–HCl/pH 8.0, 2 mM EDTA, 500 mM NaCl, 0.2% SDS, 1% Triton X-100. Prepare fresh and keep on ice. 18. LiCl wash buffer: 20 mM Tris–HCl/pH 8.0, 2 mM EDTA, 150 mM NaCl, 0.2% SDS, 1% Triton X-100. Prepare fresh and keep on ice. 19. Tris–EDTA buffer (TE): 10 mM Tris–HCl/pH 8.0, 2 mM EDTA. 20. 19. 5 M NaCl: prepare 5× stock with 5 M NaCl. 21. 19. 3 M NaOAc: 3 M NaOAc in 80 mL of ddH2O, adjusted to pH 5.2 with glacial acetic acid, and add ddH2O to 100 mL. 22. Ethanol (Fluka). 23. Glycogen (Roche Applied Science). 24. Phenol:Chloroform:Isoamyl Alcohol 25:24:1 (Sigma-Aldrich). Caution: Phenol and chloroform are extremely toxic to mucous membranes of the lung, eyes, and skin. Any contact by inhalation, swallowing, or contact with skin must be avoided. Both have to be used under the chemical hood, face shield, and gloves are required to handle them.
214
Pien and Grossniklaus
3. Methods This protocol has been used for quantification of H3K4me2, H3K4me3, H3K27me2, H3K27me3 histone modifications, as well as quantification of binding by the ARABIDOPSIS TRITHORAX 1 (ATX1) protein to its target DNA (1) but is expected to be suitable for the quantification of other modifications as well. ACTIN2/7 is used as control to normalize PCR products in the case of seedling tissue, while ACTIN11, which is specifically expressed in reproductive tissues, is used for flower tissue before fertilization, and up to 4 days after pollination. 3.1. Preparation of Plant Material
1. Wild-type seeds are stratified on MS plates in the dark at 4°C for 2 days. 2. For seedling tissue: plates are transferred to a growth cabinet with daily cycles of 16 h light at 21°C and 8 h darkness at 18°C. After germination, plants are grown for 10 days and subsequently harvested without the roots. 3. For collection of flower tissue: seedlings are transferred to soil and grown for 3 weeks with daily cycles of 16 h light at 22°C and 8 h darkness at 18°C.
3.2. ChIP
1. Seedlings (500 mg) or flowers (1,000 flowers) are harvested and immediately transferred to a vial containing 20 mL prechilled 1× PBS buffer. Keep vial on ice. 2. Cross-linking is done by adding formaldehyde to the vial containing harvested tissue into PBS buffer, to a 1% final concentration (0.55 mL formaldehyde in 20 mL 1× PBS). At this step, tissue should not be directly transferred to 1% formaldehyde in PBS to avoid a prolonged contact with form aldehyde, which will result in irreversible and unspecific cross-linking of the chromatin with proteins. 3. Transfer vial into a vacuum chamber and vacuum infiltrate tissue for 15 min. Tissue will sink to the bottom of the vial when fully infiltrated. 4. Replace formaldehyde/PBS solution with ice-cold stop-crosslinking solution. Vacuum infiltrate tissue for 5 min. 5. Remove solution and rinse tissue twice with ice-cold water and briefly blot dry the plant tissue on filter paper. At this stage, tissue can be frozen with liquid nitrogen and stored at −80°C. Caution: liquid nitrogen is extremely cold and may burn the skin; gloves and face shield are required to handle it. 6. Grind tissue using a prechilled mortar and pestle to a very fine powder. Here it is very important that the tissue is thoroughly
Chromatin Immunoprecipitation Protocol for Histone Modifications and Protein–DNA
215
ground to isolate the nuclei. This step requires 5 min of grinding. Make sure that mortar and pestle are well prechilled using liquid nitrogen. Tissue should never thaw during the grinding step in order to preserve the integrity of the chromatin. 7. Add tissue to a prechilled 50 mL Falcon tube containing 30 mL of extraction buffer 1. 8. Thoroughly dissolve tissue powder by inverting the tube, and filter the solution twice through a 50 µm nylon mesh (Milian) into a fresh prechilled Falcon tube. 9. Centrifuge the solution at 3,000 × g for 20 min at 4°C. 10. Gently remove the supernatant and resuspend the pellet by pipetting up and down in 1 mL of extraction buffer 2. 11. Transfer solution to a 1.5 mL prechilled Eppendorf tube. 12. Centrifuge at 12,000 × g for 10 min at 4°C. 13. Discard supernatant and resuspend the pellet in 300 µL of extraction buffer 3. 14. In a prechilled, ice-cold Eppendorf tube, add 300 µL of extraction buffer 3. Carefully transfer the resuspended pellet from step 13 on top of the 300 µL of extraction buffer 3 using a pipette. 15. Centrifuge at 12,000 × g for 1 h at 4°C. 16. In the interim, prepare the nuclei lysis buffer and ChIP dilution buffer. 17. Remove supernatant and resuspend the chromatin pellet in 300 µL of nuclei lysis buffer by pipetting up and down and with short vortexing. Pipette 2 µL of this sample into an Eppendorf tube and store one ice; this will be used to check chromatin shearing (at step 22). 18. Sonicate chromatin solution with a prechilled sonicator using the following program: 10 cycles of 30 s sonication, 30 s cooling, power setting: high (see Note 2, Fig. 2). 19. Centrifuge chromatin sample at 16,000 × g for 10 min at 4°C and transfer supernatant to a fresh ice-cold 1.5-mL Eppendorf tube. Repeat once. Pipette 2 µL of this sample into an Eppendorf tube and store one ice; this will be used to check chromatin shearing (at step 22). 20. Measure the remaining volume of sonicated chromatin and bring to a final volume of 3 mL using the ChIP dilution buffer. Keep 2 µL of this sample as an input control and store at −20°C. 21. Split the sonicated chromatin into 3 Eppendorf tubes each containing prewashed 40 µL of salmon sperm DNA/protein A agarose beads. Gently rotate the tubes overnight at 4°C.
216
Pien and Grossniklaus
Fig. 2. Optimization of chromatin sonication using a Bioruptor UCD-200TM sonicator. Before proceeding with a ChiP experiment, the optimal number of sonication cycles required to generate DNA fragments ranging from 300 bp to 1,000 bp has to be determined. A chromatin extract from Step 17 of the protocol is subjected to an increasing number of sonication cycles. After every cycle, 2 µL chromatin are pipetted out and loaded on a 1% agarose gel. The quantity of DNA fragments of a required size increases with the number of sonication cycles, while the amount of fragments bigger than 1,000 bp is reduced. After 10 cycles, the majority of the chromatin fragments range from 300 bp to 1,000 bp. Ld: 1 kb DNA ladder.
22. Check shearing of the chromatin using the 2 µL samples from steps 17 (not sheared chromatin) and 19 (sheared chromatin). Prior to loading add to each sample 7 µL ddH2O and 1 µL loading buffer. On a 1% agarose gel supplemented with ethidium bromide (0.4 µg/mL), load side by side the 2 samples and 2 µL 1Kb DNA ladder to estimate the fragment sizes of the sheared chromatin. Run chromatin samples until the ladder is well resolved (see Note 3). 23. Centrifuge the chromatin/beads solution from step 21 at 12,000 × g for 2 min at 4°C. 24. Transfer the supernatant into a fresh 1.5-mL Eppendorf tube, leaving the agarose beads behind. 25. Add histone or protein antibody to two of three samples (see Notes 4 and 5). The tube without antibody is a no-antibody control. Incubate overnight with gentle rotation at 4°C. 26. Collect immune complexes by transferring each sample from step 25 to a fresh 1.5-mL Eppendorf tube containing 50 µL of prewashed salmon sperm DNA/protein A agarose beads (wash the beads as previously described). Incubate 1 h with gentle rotation at 4°C. 27. Pellet beads by centrifuging at 4,500 × g for 30 s at 4°C. Pipette out and discard supernatant, be careful not to
Chromatin Immunoprecipitation Protocol for Histone Modifications and Protein–DNA
217
remove the agarose beads. Wash the beads with 1 mL low salt buffer by gently rotating tube 1 min at 4°C. Repeat wash with 1 mL low salt buffer for 5 min. 28. Repeat step 27 using high salt buffer. 29. Repeat step 27 using LiCl wash buffer. 30. Repeat step 27 using TE buffer. 31. Elute immune complexes by adding 250 µL of elution buffer to the pelletted beads. Resuspend the beads by briefly vortexing, and incubate the three tubes at 65°C for 15 min with gentle agitation. 32. Pellet beads by centrifuging tubes at 4,500 × g for 30 s. 33. Transfer supernatant to a fresh 1.5-mL Eppendorf tube and repeat elution in steps 31 and 32. Pool the two 250 µL eluate fractions. 34. Add 500 µL of elution buffer to the input sample from step 20. 35. Reverse cross-link the no-antibody control, the antibody, and input sample by adding 20 µL 5 M NaCl to each tube and incubate at 65°C for 6 h. 36. Treat samples with proteinase K by adding 10 µL 0.5 M EDTA, 20 µL 1 M Tris–HCl/pH 6.5, and 2 µL proteinase K 10 mg/mL 1 h at 45°C. 37. Recover DNA by phenol/chloroform extraction, adding 550 µL phenol:chloroform:isoamyl alcohol 25:24:1 to the samples. Vortex samples for 30 s and centrifuge at 16,000 × g for 5 min. 38. Transfer aqueous phase to a fresh 2-mL Eppendorf tube and add 55 µL 3 M NaOAc, 1.22 mL 99% ethanol, and 2 µL glycogen (20 mg/mL). Transfer tubes to −80°C for 1 h or overnight. 39. Centrifuge tubes at 16,000 × g for 30 min at 4°C. Discard supernatant and wash the pellet twice by adding 600 µL 70% ethanol and centrifugating for 5 min at 4°C. 40. Dissolve DNA pellet in 50 µL TE buffer and store the eluted input chromatin, the no-antibody control, and eluted immunoprecipitated chromatin at −20°C. 3.3. PCR Quantification
1. The pair of primers used for the PCR amplification of the DNA target should be designed to amplify a genomic region corresponding to the DNA fragment length of the sonicated chromatin i.e., between 300 and 1,000 bp. Both primers should have a similar Tm and GC/AT content in order to obtain optimal primer-DNA annealing. Histone modifications or protein binding is locus-specific. Therefore, primers should be designed to cover the entire gene of interest.
218
Pien and Grossniklaus
2. The PCR program and PCR reaction mixture have to be optimized according to each primer pair used for immunoprecipitated chromatin amplification. The primer efficiency used to amplify the chromatin region under investigation is tested with a time-point PCR reaction, where equal volumes of PCR products are loaded on an agarose gel, supplemented with ethidium bromide, and quantified after a defined number of PCR cycles. PCR product signal intensities are measured using ImageQuant software (Molecular Dynamics). Each PCR product quantification value is blotted on a semi-log graph to determine the log-linear amplification stage of the PCR reaction, which determines the optimal number of PCR cycles needed for accurate PCR product quantification (Fig. 3). 3. 2 µL of eluted input chromatin from Subheading 3.2, step 39, no-antibody control, and immunoprecipitated chromatin are used in a 12.5 µL PCR reaction mix with 1.25 µL 10× buffer (Sigma, P2192-VL), 0.25 µL dNTPs (10 mM), 0.25 µL forward target gene primer (10 µM), 0.25 µL reverse target gene primer (10 µM), 0.25 µL ACTIN2/7 forward primer (10 µM) 5¢-CGTTTCGCTTTCCTTAGTGTTAGCT-3¢, 0.25 µL ACTIN reverse primer (10 µM) 5¢-AGCGAACGGATCTAGAGACT CACCTTG-3¢, or 0.25 µL ACTIN11 forward primer (10 µM) 5¢-AACTTTCAACACTCCTGCCATG-3¢, 0.25 µL ACTIN11 reverse primer (10 µM) 5¢-CTGCAAGGTCCAAACG CAGA-3¢ x µL MgCl2 (25 mM) depending on the primer pair,
Fig. 3. Primer optimization. For each pair of primers, a time-point PCR reaction is done. Equal volumes of PCR products are loaded on an agarose gel and quantified after a defined number of PCR cycles. PCR product quantification values are blotted on a semilog graph to determine the log-linear amplification stage of the PCR reaction. The optimal number of PCR cycles needed for accurate PCR product quantification (indicated in green) is derived from the log-linear amplification area.
Chromatin Immunoprecipitation Protocol for Histone Modifications and Protein–DNA
219
Fig. 4. An example of PCR products obtained from the ChIP assay using wild-type seedlings and an ATX1-specific antibody. ATX1 binding at the FLC locus is checked using FLC-specific primers. PCR product quantification is normalized using ACTIN2/7-specific primers.
0.125 µL DNA Taq polymerase (Sigma, D6677), ddH2O up to 12.5 µL. The PCR program has to be empirically established for each chromatin region investigated. The eluted input chromatin is used as a PCR control to confirm that the target DNA investigated by ChIP was present in the starting chromatin material (Fig. 4). If the target DNA is not amplified after PCR, this could be due to chromatin degradation during extraction. The no-antibody control should not give any amplification of the target DNA. The amplification of the target DNA in the no-antibody sample means that washing steps 27–29, from Subheading 3.2, were not correctly done. With the immunoprecipitated chromatin used as DNA template for PCR, a band with the appropriate size should be detected on the agarose gel if the target DNA was recognized by the antibody used. 4. Quantification of PCR products is done by loading the 12.5 µL PCR product on a 1.5% agarose gel supplemented with ethidium bromide (0.4 µg/mL) (Fig. 4). PCR product signal intensities are measured using ImageQuant software (Molecular Dynamics). Signal intensities are normalized relative to ACTIN PCR products. Final quantification is the result of at least three independent ChIP experiments with the corresponding standard error.
4. Notes 1. Protein A agarose beads should be washed three times before use, using 1.5-mL precold ChIP dilution buffer. Beads are resuspended by inversion in the buffer, kept on ice for 5 min, and gently spun at 4,500 × g for 30 s. 2. In our hands, Bioruptor allows better reproducibility of chromatin shearing between independent experiments than a
220
Pien and Grossniklaus
regular sonicator. Regular sonicators use a probe that is directly in contact with the biological sample. This has major drawbacks in terms of shearing reproducibility as the sonication energy depends on the depth of the sonication probe in the liquid. As a consequence DNA fragment size varies from sonication to sonication, which has an impact on quantification of histone modifications. The described sonication program produces DNA fragments ranging from 300 bp to 1,000 bp which can be visualized by loading 2 µL sonicated chromatin on an agarose gel. 3. The described sonication program generates DNA fragments ranging from 300 bp to 1,000 bp. If sonication did not work, fragments will be of bigger size and will look similar to the nonsonicated chromatin control. In that case, the experiment should be stopped. Therefore, sonication should be optimized (Fig. 2) prior to starting the ChIP experiments. 4. For H3K4me2 (Upstate), H3K4me3 (Upstate), H3K27me2 (Upstate), and H3K27me3 (Upstate) modifications use 4 µL antibody. The amount of antibody to be used for ChIP depends on the supplier of the antibody, check supplier instructions. For ATX1-DNA binding 5 µL anti-ATX1 antibody was used (not available commercially, (5)). 5. When investigating the dynamics of the deposition of repressive and activating marks in a gene of interest, our protocol provides enough chromatin to quantify in parallel antagonistic marks using the same tissue sample. References 1. Pien S, Grossniklaus U (2007) Polycomb group and trithorax group proteins in Arabidopsis. Biochem Biophys Acta 1769:375–382 2. Orlando V, Paro R (1993) Mapping Polycombrepressed domains in the BX-C using in vivo formaldehyde crosslinked chromatin. Cell 75:1187–1198 3. Boyko A, Kovalchuk I (2008) Epigenetic control of plant stress response. Environ Mol Mutagen 49:61–72
4. Madlung A, Comai L (2004) The effect of stress on genome regulation and structure. Ann Bot (Lond) 94:481–495 5. Pien S, Fleury D, Mylne JS, Crevillen P, Inzé D, Avramova Z et al (2008) ARABIDOPSIS TRITHORAX1 dynamically regulates FLOWERING LOCUS C activation via histone 3 lysine 4 trimethylation. Plant Cell 20: 580–588
Chapter 16 cDNA Libraries for Virus-Induced Gene Silencing Andrea T. Todd, Enwu Liu, and Jonathan E. Page Abstract Virus-induced gene silencing (VIGS) exploits endogenous plant antiviral defense mechanisms to posttranscriptionally silence the expression of targeted plant genes. VIGS is quick and relatively easy to perform and therefore serves as a powerful tool for high-throughput functional genomics in plants. Combined with the use of subtractive cDNA libraries for generating a collection of VIGS-ready cDNA inserts, VIGS can be utilized to screen a large number of genes to determine phenotypes resulting from the knockdown/knockout of gene function. Taking into account the optimal insert design for VIGS, we describe a methodology for producing VIGS-ready cDNA libraries enriched for inserts relevant to the biological process of interest. Key words: Functional genomics, RNA silencing, Virus-induced gene silencing, VIGS, siRNAs, Plant viruses, Tobravirus, Tobacco rattle virus, Nicotiana benthamiana.
1. Introduction Virus-induced gene silencing (VIGS) is a potent fast-forward genetics technique that allows for rapid, high throughput functional analysis of plant genomes. Infection of a host plant with a recombinant virus containing a fragment of a plant gene takes advantage of innate plant antiviral defenses (1–3), resulting in posttranscriptional silencing of expression of the targeted plant gene (4–6). In effect, VIGS misleads the plant into identifying its own transcripts as viral RNA, targeting them for degradation and resulting in a knockout or knockdown phenotype for the gene of interest. VIGS allows for gene knockdown/knockouts to be studied without having to undergo lengthy plant transformations or risking potential lethality. It also permits the effects of gene expression knockdown for gene families to be detected if highly conserved regions of sequence are used for silencing. Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_16, © Springer Science + Business Media, LLC 2010
221
222
Todd, Liu, and Page
VIGS has been used to study genes involved in plant metabolism (7–15), defense responses (12, 13, 16–26) and development (11, 14, 27, 28) including genes encoding enzymes, transcription factors, signal transduction components, or structural molecules. One of the most important applications of VIGS has been in forward genetic screens, where genes involved in various pathways have been initially identified for further study via the VIGS phenotypes observed (10, 11, 14, 21, 22, 26). We have combined the powerful VIGS forward genetics approach with sequenced VIGS inserts derived from subtractive cDNA libraries to optimize our ability to identify clones containing sequences of interest. Using this method, we have studied the regulation of secondary metabolism in Nicotiana benthamiana by identifying transcription factors that affect nicotine production (Fig. 1). The most advantageous conditions for selecting VIGS inserts in the tobacco rattle virus (TRV)-based vector system (17, 27) were determined for N. benthamiana using the phytoene desaturase (PDS) gene. PDS is a well-known VIGS marker enzyme involved in carotenoid biosynthesis (7, 8, 10, 13, 15), and silencing of PDS expression results in white leaf tissue due to photobleaching. We found that for effective VIGS, inserts should be between 200
Fig. 1. Nicotine levels in N. benthamiana plants infected with TRV constructs containing cDNA fragments of transcription factors. We analyzed 3,480 unique sequences obtained from our three VIGS-cDNA libraries and found that 108 cDNAs were annotated as transcription factors. To test if these genes regulate nicotine biosynthesis, we silenced all 108 transcription factors using VIGS. Leaf nicotine levels before and after induction of nicotine biosynthesis with methyl jasmonate were measured using HPLC. Buffer control plants were infiltrated with infiltration medium only; GFP control plants were infiltrated with a VIGS construct containing a non-functional 363 bp GFP insert. Putrescine N-methyltransferase (PMT), an enzyme involved in nicotine biosynthesis, was used as a positive control. Out of 108 transcription factors, silencing six led to altered nicotine levels. Four (TF1, TF3, TF4, and TF5) decreased leaf nicotine, and two (TF2 and TF6) increased it. Each bar represents the average level of nicotine found in six individual VIGS plants. Error bars represent standard deviation
cDNA Libraries for Virus-Induced Gene Silencing
223
and 1,300 bp in length, be situated in the middle of the gene, and should not contain homopolymeric regions (i.e., poly(A/T) tails) (29). These conditions have been incorporated into a cDNA library synthesis protocol to provide the most effective inserts for high-throughput VIGS screening. This protocol involves the construction of a short-insert cDNA library using a solid-phase support and subtractive suppression hybridization (SSH). After the ligation of subtracted cDNAs directly into the TRV-based VIGS vector and the passaging through Escherichia coli, the cDNA library is transformed into Agrobacterium tumefaciens. Individual Agrobacterium colonies serve both as templates for DNA sequencing and as a source of VIGS-ready Agrobacterium constructs. This methodology will be useful for a number of functional genomics approaches in plants. One application could be in the creation of an extensive collection of sequenced, VIGS-ready cDNA constructs to allow the dissection of a given biological process. For example, we have analyzed our VIGS-cDNA collection for transcription factors but any class of gene could be similarly targeted. Another application would be in “fast-forward” genetic screens, where a population of plants is infected with a VIGS-cDNA constructs, with a different gene being silenced in each individual. Since our method produces a subtracted cDNA library, a given biological process (e.g., pathogen infection) can be specifically targeted. In addition, each of our cDNAs has been sequenced, which would allow the selection of a non-redundant subset of VIGS constructs to be used for infiltrations. This should significantly decrease the number of plants and VIGS infections that would need to be performed to screen a cDNA collection.
2. Materials 2.1. cDNA Synthesis, RsaI Digestion, and Suppression Subtraction Hybridization
1. Dynabeads mRNA Purification kit (Invitrogen). 2. PCR-Select cDNA Subtraction kit (Clontech). 3. Magnetic separator stand for microcentrifuge tubes (e.g., Dynal MPC-S, Invitrogen). 4. Rotating shaker for microcentrifuge tubes (e.g., Labquake shaker, Thermo Scientific). 5. 0.5 M EDTA. 6. RsaI wash buffer 1: 5 mM Tris–HCl pH 7.5, 0.5 mM EDTA, 1 M NaCl, 1% SDS and 10 mg/mL glycogen. 7. RsaI wash buffer 2:5 mM Tris–HCl pH 7.5, 0.5 mM EDTA, 1 M NaCl and 200 mg/mL BSA. 8. Phenol-chloroform-isoamyl alcohol (25:24:1). Caution: Phenol and chloroform are toxic substances and should be handled with care.
224
Todd, Liu, and Page
Phenol is irritating and caustic, it can burn unprotected skin, is easily absorbed through the skin, and its vapors can also be damaging. Chloroform is an irritant as well and should not be inhaled or come into contact with skin. All manipulations with phenol and chloroform should be performed in a fume hood. Gloves, protective clothing, and protective eye wear should be used. 9. Chloroform-isoamyl alcohol (24:1). See above caution for working with chloroform. 10. 100% ethanol. 11. 6 M sodium acetate pH 4.5. 12. 10 mg/mL glycogen. 13. TE Buffer: 10 mM Tris–HCl pH 8.0, 1 mM EDTA. 14. High-Fidelity PCR Enzyme Mix (Fermentas). 15. Primers 5¢-CGGGATCCTCGAGCGGCCGCCCGGGCAG GT-3¢ (BamHI site underlined) and 5¢-CGGAATTCAGC GTGGTCGCGGCCGAGGT-3¢ (EcoRI site underlined). 2.2. Ligation of cDNAs to VIGS Vector
1. Restriction enzymes EcoRI and BamHI, along with associated reaction buffers (Invitrogen). 2. Tobacco rattle virus TRV2-based vector pYL156 (courtesy of Savithramma Dinesh-Kumar, Yale University, http://plantfunctionalgenomics.yale.edu/index.html). 3. QIAquick PCR Cleanup kit (Qiagen). 4. T4 DNA ligase (New England Biolabs).
2.3. Electroporation of E. coli and Amplification of VIGS-cDNA Library
1. DH10B E. coli electroporation competent cells (Invitrogen). 2. 1 mm electroporation cuvettes (VWR Scientific). 3. Gene Pulser with capacitance extender and pulse controller (Bio-Rad). 4. SOC medium: 2% tryptone, 0.5% yeast extract, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4, and 20 mM glucose. 5. LB agar: 1% tryptone, 0.5% yeast extract, 0.5% NaCl and 1.5% agar with 50 mg/mL kanamycin in 100 mm and 150 mm sterile polystyrene Petri dishes. 6. Liquid LB medium: 1% tryptone, 0.5% yeast extract and 0.5% NaCl. 7. Plasmid Midi kit (Qiagen).
2.4. Electroporation of A. tumefaciens
1. A. tumefaciens strain C58. 2. LB agar: 1% tryptone, 0.5% yeast extract, 0.5% NaCl and 1.5% agar with 10 mg/mL rifampicin in 100 mm sterile polystyrene Petri dishes.
cDNA Libraries for Virus-Induced Gene Silencing
225
3. Liquid LB medium: 1% tryptone, 0.5% yeast extract and 0.5% NaCl with 10 mg/mL rifampicin, 100 mL in a 500 mL flask. 4. Sterile 10% (v/v) glycerol. 5. 1 mm electroporation cuvettes (VWR Scientific). 6. Gene Pulser with capacitance extender and pulse controller (Bio-Rad). 7. S.O.C. medium: 2% tryptone, 0.5% yeast extract, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4, and 20 mM glucose. 8. LB agar: 1% tryptone, 0.5% yeast extract, 0.5% NaCl, and 1.5% agar with 50 mg/mL kanamycin and 10 mg/mL rifampicin in 100 mm sterile polystyrene Petri dishes. 2.5. Sequencing of cDNA Inserts from VIGS-cDNA Library
1. 96-well cell culture plates. 2. Liquid LB medium: 1% tryptone, 0.5% yeast extract and 0.5% NaCl with 50 mg/mL kanamycin and 10 µg/mL rifampicin aliquoted into 96-well plates, 100 µL per well. 3. Sterile toothpicks or other implements for colony transfer. 4. PCR primers: pYL156for 5¢-GTTACTCAAGGAAGCACG ATGAG-3¢ and pYL156rev 5¢-CAGTCGAGAATGTCAATCT CGTAG-3¢. 5. 10× PCR Buffer: 15 mM MgCl2, 500 mM KCl, 100 mM Tris–HCl pH 9.0, 1% Triton ×-100. 6. Taq DNA polymerase. 7. Mix of dNTPs with a final concentration for each of 10 mM. 8. BigDye Terminator Cycle Sequencing kit (Applied Biosystems). 9. Sterile 50% glycerol.
2.6. Growth of N. benthamiana
1. N. benthamiana seed. 2. Germination mixture (on a volume basis): 1 part Sunshine LG3 Mix, Professional Germination (Sun Gro Horticulture Canada Limited, Vancouver, Canada): 1 part fine vermiculite (W.R. Grace and Co. of Canada Ltd., Ajax, Canada). 3. Soil mixture (on a volume basis): 60 parts Sunshine LG3 Mix, Professional Germination (Sun Gro Horticulture Canada Ltd): 20 parts fine vermiculite (W.R. Grace and Co. of Canada Ltd.): 10 parts Perlite (W.R. Grace and Co. of Canada Ltd.): 1 part Osmocote (14-14-14; ScottsMiracleGro, Marysville, USA). 4. 72-well sowing trays and square 10 × 10 cm plant pots. 5. A controlled environment chamber with 16 h/23°C days and 8 h/20°C nights at 150 mmol/m2/s light intensity.
226
Todd, Liu, and Page
2.7. Virus-Induced Gene Silencing
1. Tobacco rattle virus pTRV1 plasmid in A. tumefaciens (Dinesh-Kumar Laboratory, Yale University). 2. LB agar: 1% tryptone, 0.5% yeast extract, 0.5% NaCl and 1.5% agar with 50 mg/mL kanamycin and 10 mg/mL rifampicin in 100 mm sterile polystyrene Petri dishes. 3. Liquid LB medium: 1% tryptone, 0.5% yeast extract and 0.5% NaCl with 50 mg/mL kanamycin and 10 mg/mL rifampicin. 4. Infiltration medium: 1 mM 2-(N-morpholino)ethanesulfonic acid (MES; pH 5), 10 mM MgCl2 and 100 mM acetosyringone (3¢,5¢-dimethoxy-4¢-hydroxyacetophenone, Sigma, 100 mM in dimethylformamide) in sterile water, filter sterilized and stored at 4°C. 5. 1 mL syringes.
3. Methods The protocol involves binding of polyA RNA and cDNA synthesis on a solid phase support (Dynabeads), followed by digestion with the four-base cutter RsaI to free short cDNA fragments from their bound poly(A/T) tails. Finally, suppression subtractive hybridization PCR is performed to enrich for differentially expressed transcripts. The short cDNAs so obtained are then directly ligated into the TRV VIGS vector and transformed into Agrobacterium. The TRV-cDNA plasmids serve as both a sequencing template (after PCR amplification of cDNA inserts) and a means of introducing the VIGS virus into the plant via agroinfiltration. 3.1. cDNA Synthesis, RsaI Digestion, and Suppression Subtraction Hybridization
This protocol is based on the manuals of the Dynabeads mRNA Purification kit and PCR-Select cDNA Subtraction kit with some modifications. We use the Dynabead mRNA Purification kit not only to isolate mRNAs from total RNA but also to immobilize them on oligo(dT)25 beads for cDNA synthesis. The subsequent RsaI digestion liberates cDNA fragments of an appropriate size for VIGS while retaining the 3¢ regions including homopolymeric tails. Although we describe the synthesis and digestion of one cDNA, you will need to synthesize two cDNAs (which are referred to in the kit as tester and driver cDNAs; see Note 7) for the subsequent suppression subtraction hybridization step. Suppression subtraction enriches for cDNAs that are more abundant in the tester sample compared to the driver sample. 1. In a 1.5 mL microcentrifuge tube, mix 250 mg of total RNA in less than 300 mL (see Note 1) with an equal volume of Dynabeads binding buffer. 2. Heat at 65°C for 5 min.
cDNA Libraries for Virus-Induced Gene Silencing
227
3. Place on ice for 1 min. 4. In another tube, aliquot 600 mL of well-mixed oligo(dT)25 Dynabeads slurry. 5. Place the tube containing Dynabeads on the magnetic stand and allow the solution to clear. 6. Remove and discard the supernatant. 7. Resuspend the Dynabeads in 300 mL Dynabeads binding buffer and repeat steps 5 and 6 twice. 8. Combine total RNA solution (from step 3) with the washed Dynabeads and mix slowly for 10 min on a rotating shaker. 9. Repeat steps 5 and 6. 10. Resuspend the beads in 200 mL Dynabeads washing buffer B and repeat steps 5 and 6 twice. 11. Resuspend the Dynabeads in 200 mL 1× PCR-Select firststrand buffer and repeat steps 5 and 6. 12. Resuspend the Dynabeads in 20 mL of PCR-Select first-strand synthesis mixture (4 mL 5× first strand buffer, 2 mL dNTP mix, 12 mL water, and 2 mL (40 U) AMV reverse transcriptase). 13. Incubate at 42°C for 1.5 h. 14. Add 140 mL of PCR-Select second-strand synthesis mixture (96.8 mL water, 32 mL 5× second strand buffer, 3.2 mL dNTP mix, and 8 mL 20× second-strand enzyme cocktail). 15. Incubate at 16°C for 2 h. 16. Add 4 mL (12 U) T4 DNA polymerase. 17. Incubate at 16°C for 30 min. 18. Stop the reaction by adding 20 mL 0.5 M EDTA. 19. Repeat steps 5 and 6. 20. Resuspend the Dynabeads in 500 mL of RsaI wash buffer 1. 21. Incubate at 75°C for 15 min. 22. Repeat steps 5 and 6. 23. Resuspend the Dynabeads in 500 µL RsaI wash buffer 2, and repeat steps 5 and 6 twice. 24. Resuspend the Dynabeads in 200 mL 1× RsaI restriction buffer, and repeat steps 5 and 6 twice. 25. Resuspend the Dynabeads in 87 mL water, 10 mL 10× RsaI buffer and 3 µL (30 U) RsaI. 26. Incubate at 37°C overnight to digest the Dynabead-bound cDNAs (see Note 2). 27. Place the microcentrifuge tube containing Dynabeads on the magnet, and allow the solution to clear.
228
Todd, Liu, and Page
28. Transfer the supernatant (containing the digested cDNA) to a 1.5 mL microcentrifuge tube. 29. Repeat RsaI digestion of the Dynabead-bound cDNAs in steps 24 to 28. 30. Combine the supernatants containing digested cDNA from steps 28 and 29. 31. Add an equal volume (200 mL) of phenol-chloroform-isoamyl alcohol (25:24:1). 32. Mix well by vortexing. 33. Spin at maximum speed for 2 min. 34. Transfer the supernatant to a new 1.5 mL tube. 35. Add an equal volume (200 µL) of chloroform-isoamyl alcohol (24:1). 36. Mix well by vortexing. 37. Spin at maximum speed for 2 min. 38. Transfer the supernatant to a new 1.5 mL tube. 39. Analyze 10 µL of digested cDNA on a 1% agarose gel. The cDNA should show a smear of between 100 and 2.5 kb with the highest intensity at approximately 600 bp. 40. Precipitate the cDNA by adding two volumes of 100% ethanol, one tenth volume of 6 M sodium acetate (pH 4.5) and 5 µL of 10 mg/mL glycogen and incubating at −80°C for 30 min. 41. Spin at maximum speed for 10 min. 42. Remove the supernatant. 43. Wash the pellet with 800 µL of 70% ethanol. 44. Spin at maximum speed for 5 min. 45. Remove the supernatant, and air dry the pellet. 46. Resuspend the pellet in 10 mL TE buffer. 47. Suppression subtraction PCR is performed essentially as desc ribed in the manual for the PCR-Select cDNA Subtractive kit. We used 2 mL of RsaI-digested cDNA from step 40 for this process. In order to facilitate cloning of the PCR products into the TRV vector pYL156, we performed the secondary PCR reaction using primers containing BamHI and EcoRI sites. 3.2. Ligation of cDNAs to VIGS Vector
In this step, the subtracted cDNAs are cloned into the VIGS vector using restriction digestion and ligation. 1. Digest the entire amount of PCR-Select Suppression Subtracted cDNA (about 150 ng) and 1.6 mg of pYL156 vector with EcoRI and BamHI according to the manufacturer’s protocol.
cDNA Libraries for Virus-Induced Gene Silencing
229
2. Purify digested cDNA and pYL156 using the QIAquick PCR Cleanup kit (Qiagen). 3. Quantify the cDNA and digested plasmid using spectro photometry. 4. Mix 600 ng of cDNA and 1.2 µg of BamHI/EcoRI digested pYL156 vector with 2 U of T4 DNA ligase in 20 mL total volume. 5. Incubate at 16°C overnight. 6. Purify the ligation reaction using the QIAquick PCR Cleanup kit (Qiagen). 7. Elute ligated DNA in 20 mL sterile water. 3.3. Electroporation of E. coli and Amplification of VIGS-cDNA Library
The pYL156-cDNA constructs are transformed into E. coli for amplification of the primary cDNA library. Plasmids are then isolated from bacterial cells and quantified for introduction into Agrobacterium (see Subheading 3.4). 1. Chill 1 mm electroporation cuvettes on ice. 2. Thaw electrocompetent E. coli on ice. 3. Add 1 mL ligated plasmid DNA and 50 mL E. coli to each cuvette. 4. Electroporate with the following settings: 1.8 V, 200 W, 960 mFD. 5. Immediately place the cuvette on ice. 6. Add 1 mL room temperature SOC and transfer to a 1.5 mL microcentrifuge tube. 7. Incubate at 37°C for at least 30 min with shaking. 8. To determine the titer of the primary cDNA library (in colony forming units (cfu)), perform a serial dilution of an aliquot of the transformed cells on LB + kanamycin agar 100 mm plates. Incubate at 37°C overnight, and count the number of colonies (see Note 3). 9. Plate out the entire transformation on 150 mm plates at an appropriate density to obtain single colonies as determined from the calculated titer (step 8), and grow overnight at 37°C. 10. Wash bacterial colonies of transformed E. coli from agar surfaces of all the 150 mm plates using liquid LB medium, and isolate plasmid DNA from the combined bacterial culture with the Plasmid Midi kit (Qiagen). 11. Quantify the amount of plasmid DNA spectrophotometrically.
3.4. A. tumefaciens Electroporation
In this section, the pYL156-cDNA constructs are transformed into A. tumefaciens to be used as DNA sequencing templates and for VIGS infiltrations.
230
Todd, Liu, and Page
1. Chill 1 mm electroporation cuvettes on ice. 2. Thaw electrocompetent A. tumefaciens on ice (see Note 4). 3. Add 100 ng plasmid DNA and 50 mL A. tumefaciens to each cuvette. 4. Electroporate with the following settings: 1.8 V, 200 W, 960 µFD. 5. Immediately place the cuvette on ice. 6. Add 1 mL room temperature SOC, and transfer to a 1.5 mL microcentrifuge tube. 7. Incubate at 27°C for at least 30 min with shaking. 8. Plate cells on LB + rifampicin + kanamycin plates to obtain single colonies. 9. Incubate at 27°C for 48 h. 3.5. Sequencing of cDNA Inserts from VIGS-cDNA Library
This step allows individual A. tumefaciens colonies, each of which should contain a single cDNA-VIGS construct, to be picked and grown in 96-well plates. The cDNA insert of each construct is then sequenced using an aliquot of the culture as a sequencing template. Bioinformatic analysis of the EST sequences obtained is crucial, since the similarity of the insert sequence to known sequences will determine which constructs to test using VIGS (see Note 6 and 7). 1. Transfer individual A. tumefaciens colonies to the wells of a 96-well plate containing 100 mL of liquid LB medium + kanamycin + rifampicin. Pick an appropriate number of colonies for EST sequencing (see Note 5). 2. Incubate at 27°C for 48 h. 3. Aliquot 2 ml of each A. tumefaciens culture to a corresponding well in a new 96-well PCR plate. 4. Amplify cDNA inserts for sequencing using a 30 mL final PCR reaction volume and including 1.5 U Taq polymerase, 0.2 mM primers pYL156for and pYL156rev (designed to anneal to the pYL156 vector at positions flanking the cDNA insert), 1× PCR Buffer and 200 mM dNTPs. Thermal cycle at 95°C for 2 min, followed by 35 cycles of 95°C for 30 s, 55°C for 30 s, 72°C for 90 s. 5. Add 30 mL of water to dilute PCR reaction products and sequence 2 mL directly using BigDye terminator sequencing chemistry (Applied Biosystems) and pYL156 for as a sequencing primer. 6. Use bioinformatics software to process the raw sequence data (removing vector and low-quality sequence). Identify sequence similarity between cDNAs and known gene sequences using blastx (see Note 6).
cDNA Libraries for Virus-Induced Gene Silencing
231
7. To the A. tumefaciens cultures, add an equal volume of 50% glycerol (to a final concentration of 25%) and store at −80°C. This is the VIGS-cDNA library from which clones of interest can be obtained later. 3.6. Growth of N. benthamiana
1. Sow N. benthamiana seed into a 72-cell tray in germination mixture (see Note 8). 2. Place in a controlled growth environment. 3. Seven days after sowing, seedlings should be thinned to an average of 2–6 plants per cell. 4. Fourteen days after sowing, transplant individual seedlings into 10 cm pots containing moist soil mixture. 5. Plants are ready for VIGS infiltration at 22 days.
3.7. Virus-Induced Gene Silencing
This protocol is based on the methods described by DineshKumar et al. (2003) (30), with some modifications. Infection of plants with TRV viruses is accomplished via agroinfiltration, where a mixture of A. tumefaciens cultures containing plasmidencoded TRV1 and TRV2 (see Note 10) is infiltrated into the leaves of N. benthamiana plants using a syringe. Transcription of these plasmids generates infectious viral RNAs and results in TRV spreading systemically through the plant. In this section, the A. tumefaciens cultures bearing plasmids containing TRV1 and TRV2 (pYL156-containing cDNA fragments of interest) are grown and used to infiltrate 22-day old N. benthamiana plants. 1. Grow clones of interest from the A. tumefaciens library on LB + kanamycin + rifampicin agar plates at 27°C for 2 days. Also grow fresh colonies of TRV1 carrying A. tumefaciens and any desired controls (see Notes 9 and 10). 2. Inoculate 3 mL liquid LB + kanamycin + rifampicin cultures for each pYL156 construct and TRV1, each from a single A. tumefaciens colony. 3. Incubate at 27°C with shaking at 240 rpm overnight. 4. Inoculate 20 mL liquid LB + kanamycin + rifampicin cultures for each construct and TRV1 with 10 mL of previous 3 mL overnight culture (see Note 10). 5. Incubate at 27°C with shaking at 240 rpm overnight to an OD600 of not more than 1, which usually takes ~16 h (see Note 11). 6. Centrifuge cultures at 3000 g for 20 min. 7. Discard the supernatant. 8. Invert tubes and allow pellets to drain. 9. Add 5 mL infiltration buffer to each pellet. 10. Let stand for a 5 min, then vortex to dissolve the pellet.
232
Todd, Liu, and Page
11. Dilute 10 µL of dissolved culture in 990 mL sterile water. 12. Measure the OD600 of the dilution. 13. Calculate the volume of infiltration buffer needed to produce a 5 mL solution with an OD600 of 1. More than 5 mL final volume of TRV1 may be needed, if greater than eight pYL156 constructs are being infiltrated. 14. Mix the appropriate amounts of each culture and infiltration buffer as determined in step 13. 15. Incubate the infiltration solutions at room temperature for at least 2 h. 16. Mix 600 mL of the TRV1 solution with 600 mL of each pYL156 construct solution in a 1.5 mL microcentrifuge tube. 17. Place 1.2 mL of infiltration buffer in a microcentrifuge tube to be used as a mock infiltration control. 18. Use the mixed A. tumefaciens solution to infiltrate the undersides of 22 day-old N. benthamiana leaves: fill a 1 mL syringe with each mixture (remove bubbles), and then press a finger against the upper surface of smaller leaves while injecting the fluid from the syringe into the underside of the leaves. The solution can be observed to penetrate the leaves (see Note 12). 19. Symptoms of viral infection, including curling of leaf margins, are visible 6–10 days after infiltration. 20. VIGS phenotypes can be observed or measured 19 days or more after infiltration.
4. Notes 1. We typically isolate total RNA from plants using an RNeasy Plant Mini kit (Qiagen) including an on-column digestion with DNase I (RNase-Free DNase Set, Qiagen). It is important to select tissues, treatments and/or growth conditions in which genes of interest will show differential regulation in order to enrich for cDNAs that will silence genes relevant to the target process. For instance, to enrich for genes expressed in the presence of methyl jasmonate, we isolated RNA from plants treated with or without methyl jasmonate, synthesized cDNA, and performed the appropriate subtractions to enrich for differentially expressed cDNAs (see Note 7). 2. RsaI digestion of the Dynabead-bound cDNAs allows us to isolate cDNAs with the optimum characteristics for successful VIGS. As described in the Introduction, we wished to remove homopolymeric sequences, which was achieved by leaving the
cDNA Libraries for Virus-Induced Gene Silencing
233
polyA/T tails and proximal 3¢ regions bound to the Dynabeads. Digestion with RsaI, which recognizes a four base-pair sequence, permitted us to obtain insert sizes in the optimum range (see Note 7). 3. A primary library titer of around 1 × 106 cfu was typically obtained. 4. We used previously generated electrocompetent A. tumefaciens that had been prepared using a typical protocol (31). Briefly, after an overnight growth of A. tumefaciens on LB agar 150 mm plates at 27°C, a single colony was used to inoculate 5 mL of liquid LB medium and was grown overnight at 27°C with shaking at 240 rpm. This overnight culture was used to inoculate 100 mL liquid LB medium, and this culture was grown at 27°C with shaking at 240 rpm until an OD600 of 0.5–0.6 was reached. The culture was chilled in an ice bath and transferred to prechilled centrifuge bottles. The cells were pelleted by spinning at 4,000 rpm for 20 min at 4°C, and the supernatant was discarded. The cells were washed twice with ice-cold sterile water and resuspended in 40 mL ice-cold 10% glycerol. The cells were pelleted by centrifugation at 4,000 rpm for 10 min at 4°C, the supernatant was discarded, and 1 mL of 10% glycerol was used to resuspend the cells. Aliquots of 100 mL of cells were placed in prechilled 1.5 mL microcentrifuge tubes, frozen in liquid nitrogen, and stored at −80°C. 5. We picked 5,856 colonies to 61 96-well plates to serve as sequencing templates. From these, we obtained 3,480 unique sequences (606 clusters of overlapping clones and 2,874 singleton clones). The number of cDNAs selected will depend on the biological process under investigation. 6. We have used FIESTA, an integrated EST processing and analysis system developed at the NRC Plant Biotechnology Institute, for analysis of our sequences. Unfortunately, FIESTA is a unix-based program that requires extensive computing expertise and infrastructure to run. There is a lack of publically available systems for processing and analyzing ESTs. One program we are aware of is SeqTools (http:// www.seqtools.dk/), which runs on Windows and can be downloaded as a demo and licensed for a reasonable fee. The SeqTools website lists some other software packages for sequence analysis, though none is well suited for ESTs. 7. As mentioned in Note 5, FIESTA analysis determined that we had obtained 3,480 unique sequences from 5,856 individual colonies. Three cDNA libraries were constructed using the suppression subtractive hybridization PCR, which subtracts the “driver” cDNA from the “tester” cDNA. The first was made using cDNA from methyl jasmonate-treated
234
Todd, Liu, and Page
N. benthamiana roots as tester and untreated root cDNA as driver (i.e., untreated root cDNA subtracted from treated root cDNA). The second cDNA library was constructed using cDNA from methyl jasmonate-treated roots as tester and untreated leaf cDNA as driver (i.e., untreated leaf cDNA subtracted from treated root cDNA). The third was as a normalized library designed to remove highly abundant transcripts: it was made using cDNA from methyl jasmonate-treated N. benthamiana roots as both tester and driver. The sequences obtained from the three libraries were pooled before bioinformatics was performed. The cDNA insert sizes ranged from 100 to 1,200 kb, with an average high quality sequence read of 281 bp. We have found some evidence of chimeric clones in the library, possibly created by ligation of multiple short cDNAs during the cloning step. Clones of interest should therefore be individually analyzed to insure correct identity and annotation of sequence homology. 8. N. benthamiana seems to be sensitive to overwatering, which makes the plants more susceptible to fungal infection. Use a soil medium that has Perlite and vermiculite added to improve drainage. Do not let the plants sit in standing water for extended (e.g., overnight) periods of time and let the soil dry between waterings. 9. For our VIGS experiments, we use a number of controls, some of which should be grown at the same time as the experimental cultures. An “empty” pYL156 carrying A. tumefaciens culture can be used as an infiltration control, although typically “empty” pYL156 infiltrated plants show significantly stronger symptoms of viral infection than pYL156-cDNA infiltrated plants. As well, we have created a pYL156-GFP control construct carrying a 363 bp fragment from the mGFP5 gene. The presence of the non-functional GFP fragment, however, reduces infectious symptoms seen after infiltration to levels similar to those seen in pYL156-cDNA infiltrated plants and thus serves as a negative control for the effect(s) of VIGS on the process of interest. We have also used pYL156 containing an N. benthamiana PDS fragment as a visual control for successful infiltration and spreading of the virus: PDS infiltrated plants develop white leaf tissue after infiltration and viral spreading. 10. The tobacco rattle virus has a bipartite genome consisting of a ~6.8 kb RNA1 and a RNA2 of variable length (32). For infiltration, we grow and combine A. tumefaciens cultures carrying a plasmid encoded the TRV1 genome with those carrying the modified TRV2 genome in the pYL156 vector. Multiple 20 mL tubes of TRV1 culture may be required if large numbers of pYL156 constructs are to be tested.
cDNA Libraries for Virus-Induced Gene Silencing
235
We find that to insure adequate amounts of TRV1 culture, we grow one 20 mL tube of A. tumefaciens carrying pTRV1 for every 15 pYL156 constructs to be tested. 11. TRV1 A. tumefaciens cultures tend to grow more slowly than cultures containing the pYL156-cDNA constructs. We typically measure the OD of a VIGS construct culture rather than that of the TRV1 culture to decide when to finish the growth phase. 12. We usually infiltrate four to eight plants as biological replicates for each construct and control.
Acknowledgments We are grateful to S. Dinesh-Kumar (Yale University) for generously providing the TRV vectors. We also thank the DNA Service Unit at NRC-PBI for EST sequencing, Jacek Nowak and Kannan Vijayan for bioinformatic analysis, and Sandra Polvi for assistance in plant cultivation. This is manuscript NRCC #50134. References 1. Vance V, Vaucheret H (2001) RNA silencing in plants–defense and counterdefense. Science 292:2277–2280 2. Voinnet O (2001) RNA silencing as a plant immune system against viruses. Trends Genet 17:449–459 3. Waterhouse PM, Wang MB, Lough T (2001) Gene silencing as an adaptive defence against viruses. Nature 411:834–842 4. Baulcombe DC (1999) Fast forward genetics based on virus-induced gene silencing. Curr Opin Plant Biol 2:109–113 5. Burch-Smith TM, Anderson JC, Martin GB, Dinesh-Kumar SP (2004) Applications and advantages of virus-induced gene silencing for gene function studies in plants. Plant J 39:734–746 6. Godge MR, Purkayastha A, Dasgupta I, Kumar PP (2008) Virus-induced gene silencing for functional analysis of selected genes. Plant Cell Rep 27:209–219 7. Kumagai MH, Donson J, della-Cioppa G, Harvey D, Hanley K, Grill LK (1995) Cytoplasmic inhibition of carotenoid biosynthesis with virus-derived RNA. Proc Natl Acad Sci USA 92:1679–1683 8. Ruiz MT, Voinnet O, Baulcombe DC (1998) Initiation and maintenance of virus-induced gene silencing. Plant Cell 10:937–946
9. Kjemtrup S, Sampson KS, Peele CG, Nguyen LV, Conkling MA, Thompson WF et al (1998) Gene silencing from plant DNA carried by a geminivirus. Plant J 14:91–100 10. Liu Y, Schiff M, Dinesh-Kumar SP (2002) Virus-induced gene silencing in tomato. Plant J 31:777–786 11. Fitzmaurice WP, Holzberg S, Lindbo JA, Padgett HS, Palmer KE, Wolfe GM et al (2002) Epigenetic modification of plants with systemic RNA viruses. Omics 6:137–151 12. Saedler R, Baldwin IT (2004) Virus-induced gene silencing of jasmonate-induced direct defences, nicotine and trypsin proteinaseinhibitors in Nicotiana attenuata. J Exp Bot 55:151–157 13. Brigneti G, Martin-Hernandez AM, Jin H, Chen J, Baulcombe DC, Baker B et al (2004) Virus-induced gene silencing in Solanum species. Plant J 39:264–272 14. Dong Y, Burch-Smith TM, Liu Y, Mamillapalli P, Dinesh-Kumar SP (2007) A ligationindependent cloning tobacco rattle virus vector for high-throughput virus-induced gene silencing identifies roles for NbMADS4–1 and -2 in floral development. Plant Physiol 145:1161–1170 15. Senthil-Kumar M, Hema R, Anand A, Kang L, Udayakumar M, Mysore KS (2007) A systematic
236
16.
17.
18.
19.
20.
21.
22.
23.
Todd, Liu, and Page study to determine the extent of gene silencing in Nicotiana benthamiana and other Solanaceae species when heterologous gene sequences are used for virus-induced gene silencing. New Phytol 176:782–791 Romeis T, Ludwig AA, Martin R, Jones JD (2001) Calcium-dependent protein kinases play an essential role in a plant defence response. EMBO J 20:5556–5567 Liu Y, Schiff M, Marathe R, Dinesh-Kumar SP (2002) Tobacco Rar1, EDS1 and NPR1/ NIM1 like genes are required for N-mediated resistance to tobacco mosaic virus. Plant J 30:415–429 Peart JR, Cook G, Feys BJ, Parker JE, Baulcombe DC (2002) An EDS1 orthologue is required for N-mediated resistance against tobacco mosaic virus. Plant J 29:569–579 Peart JR, Lu R, Sadanandom A, Malcuit I, Moffett P, Brice DC et al (2002) Ubiquitin ligase-associated protein SGT1 is required for host and nonhost disease resistance in plants. Proc Natl Acad Sci USA 99:10865–10869 Liu Y, Schiff M, Serino G, Deng XW, DineshKumar SP (2002) Role of SCF ubiquitinligase and the COP9 signalosome in the N gene-mediated resistance response to tobacco mosaic virus. Plant Cell 14:1483–1496 Lu R, Malcuit I, Moffett P, Ruiz MT, Peart J, Wu AJ et al (2003) High throughput virusinduced gene silencing implicates heat shock protein 90 in plant disease resistance. EMBO J 22:5690–5699 Peart JR, Mestre P, Lu R, Malcuit I, Baulcombe DC (2005) NRG1, a CC-NB-LRR protein, together with N, a TIR-NB-LRR protein, mediates resistance against tobacco mosaic virus. Curr Biol 15:968–973 Borras-Hidalgo O, Thomma BP, Collazo C, Chacon O, Borroto CJ, Ayra C et al (2006) EIL2 transcription factor and glutathione synthetase are required for defense of tobacco
24.
25.
26.
27.
28.
29. 30. 31.
32.
against tobacco blue mold. Mol Plant Microbe Interact 19:399–406 Gabriels SH, Takken FL, Vossen JH, de Jong CF, Liu Q, Turk SC et al (2006) cDNA-AFLP combined with functional analysis reveals novel genes involved in the hypersensitive response. Mol Plant Microbe Interact 19:567–576 Kim KJ, Lim JH, Lee S, Kim YJ, Choi SB, Lee MK et al (2007) Functional study of Capsicum annuum fatty acid desaturase 1 cDNA clone induced by Tobacco mosaic virus via microarray and virus-induced gene silencing. Biochem Biophys Res Commun 362:554–561 Anand A, Vaghchhipawala Z, Ryu CM, Kang L, Wang K, del-Pozo O et al (2007) Identification and characterization of plant genes involved in Agrobacterium-mediated plant transformation by virus-induced gene silencing. Mol Plant Microbe Interact 20:41–52 Ratcliff F, Martin-Hernandez AM, Baulcombe DC (2001) Technical Advance. Tobacco rattle virus as a vector for analysis of gene function by silencing. Plant J 25:237–245 Liu Y, Nakayama N, Schiff M, Litt A, Irish VF, Dinesh-Kumar SP (2004) Virus induced gene silencing of a DEFICIENS ortholog in Nicotiana benthamiana. Plant Mol Biol 54:701–711 Liu E, Page JE (2008) Optimized cDNA libraries for virus-induced gene silencing (VIGS) using tobacco rattle virus. Plant Methods 4:5 Dinesh-Kumar SP, Anandalakshmi R, Marathe R, Schiff M, Liu Y (2003) Virus-induced gene silencing. Methods Mol Biol 236:287–294 Ausubel FM (2002) Short protocols in molecular biology: a compendium of methods from Current protocols in molecular biology, 5th edn. Wiley, New York MacFarlane SA (1999) Molecular biology of the tobraviruses. J Gen Virol 80:2799–2807
Chapter 17 Detection and Quantification of DNA Strand Breaks Using the ROPS (Random Oligonucleotide Primed Synthesis) Assay Alex Boyko and Igor Kovalchuk Abstract DNA double strand breaks (DSBs) arise from spontaneous DNA damage due to metabolic activities or from direct and indirect damaging effects of stress. DSBs are also formed transiently during such processes as replication, transcription, and DNA repair. The level of DSBs positively correlates with the activities of homologous and nonhomologous DNA repair pathways, which in turn inversely correlate with methylation levels and chromatin structure. Thus, measurement of strand breaks can provide an informative picture of genome stability of a given cell. The use of random oligonucleotide-primed synthesis for the analysis of DSB levels is described. Applications of the assay for quantitative detection of 3¢OH, 3¢P, or DNA strand breaks at a cleavage site of the deoxyribose residue are discussed. Key words: Random oligonucleotide-primed synthesis, ROPS, Double strand breaks, DSBs, Genome stability
1. Introduction DNA strand breaks (DSBs) are an important component of a cell’s life. They are absolutely necessary for homologous recombination during gametogenesis (1) and are required for the generation of antigen-receptor and immunoglobulin diversity during T and B lymphocytes development in vertebrates (2). At the same time, DNA strand breaks may arise as a result of oxidative damage to DNA. This negative effect may result from cellular metabolic reactions generating reactive oxygen species and from various external stresses, which can directly (ionizing radiation) or indirectly (pathogen) induce overproduction of free radicals in a cell (3–7). DSBs challenge the DNA repair machinery and, if over Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_17, © Springer Science + Business Media, LLC 2010
237
238
Boyko and Kovalchuk
accumulated, can lead to apoptosis (8–10). Increased frequency of DNA strand breaks negatively affects genome stability leading to increased rates of chromosomal rearrangements. This usually results in gene conversion, translocation, and duplication events and creates large deletions and chromosome fragmentation (9, 11–13). Measuring the level of DNA strand breaks may provide us with an important reference for the assessment of genotoxicity of various environmental factors and chemicals. Also, it could help evaluate immediate and transgenerational effects of stress on genome stability. Knowing that epigenetic changes such as methylation level and chromatin density negatively correlate with the frequency of homologous recombination, it can be suggested that the level of strand breaks in cells can also be correlated with the chromatin status. Measuring DNA damage would require an effective and sensitive assay that permits quantitative detection of DNA strand breaks. In this chapter, we discuss a rapid and sensitive assay for the quantification of DNA breaks (14). The method developed by Basnakian and James (14) represents the modification of a previously reported random oligo nucleotide-primed synthesis (ROPS) assay (15). Using this method, Ausubel et al. (15) produced uniformly labeled DNA fragments with the Klenow fragment polymerase. In contrast, Basnakian and James (14) used the inability of the Klenow fragment enzyme to discriminate between complementary primers and primers with mismatches at their terminal regions. This allowed the development of a quantitative assay for detection of DNA damage. Specifically, using radioactively labeled nucleotide incorporation, it is possible to detect and quantify 3¢OH single stranded gaps and 3¢OH single and double stranded breaks in DNA (14). Further, nonradioactively labeled non3¢OH breaks can be assayed by treatment with phosphatase or exonucleas III. In brief, the assay is based on separation of double stranded DNA containing nicks, gaps and breaks in single-stranded sequences by heat denaturation (Fig. 1). Cooling the samples allows random reassociation of DNA fragments. During reassociation, relatively short DNA fragments play the role of primers and associate with an excess of high molecular weight DNA that serves as a template. During the next step of DNA synthesis performed by the Klenow fragment polymerase, radioactively labeled nucleotides are incorporated. This incorporation is proportional to the number of breaks and reflects the relative level of strand break accumulation (14). The assay is highly sensitive and capable of detection of the low strand break frequency (16). The assay is work-efficient and permits the analysis of several hundred samples per day.
3’OH
Single Strand Breaks
3’OH
Single Strand Gaps
3’OH
3’OH
3’OH
3’OH
3’OH
3’OH
Double Strand Breaks
3’OH
3’OH 3’OH
DNA Denaturation 3’OH
3’OH
3’OH
3’OH 3’OH
3’OH
3’OH
3’OH
DNA Reassociation 3’OH
3’OH 3’OH
3’OH 3’OH 3’OH 3’OH
3’OH 3’OH
3’OH
Labeled DNA Synthesis
Fig. 1. The mechanism of detection of DNA strand breaks using a modified ROPS assay. The assay permits detection of various types of DNA damage including nicks, single-stranded gaps, single- and double-stranded DNA breaks that have a 3¢OH group at their ends. Following heat-denaturation, single-stranded DNA fragments randomly reassociate with each other. Next, the relatively short DNA fragments associated with the high molecular weight DNA play the role of primers for DNA synthesis performed by the Klenow fragment polymerase. During DNA synthesis, the Klenow fragment polymerase incorporates radioactively labeled nucleotides at a rate that is proportional to the number of original breaks
240
Boyko and Kovalchuk
2. Materials 1. Nuclease-free water. 2. Agarose, electrophoresis grade. 3. 1× TBE: 90 mM Tris, pH 8.0, 90 mM boric acid, 2 mM EDTA. 4. 6× DNA gel loading buffer. 5. 0.5 mM 3dNTP mix (dATP, dGTP, dTTP). 6. 33 µM dCTP. 7. 10× Klenow fragment buffer (New England Biolabs). 8. Klenow fragment polymerase (New England Biolabs). 9. (3H)dCTP (NEN, Boston MA). Caution: Radiation protection measures must be taken for handling 3H and all derived materials. Store in a shielded container in a dedicated freezer at −20°C. 10. 12.5 mM EDTA, pH 8.0. 11. Whatman DE-81 ion-exchange filters. 12. 500 mM Na-phosphate buffer, pH 7.0.
3. Methods 1. Using nuclease-free water, prepare 0.25 µg genomic DNA aliquots in a final volume of 10 µl. Keep the samples on ice (see Notes 1–3). 2. Denaturate the DNA at 100°C for 5 min. Chill the samples on ice immediately. 3. Add a 15 µl of reaction mixture to each sample, while the samples are on ice. Mix well. The reaction mixture for 10 samples (150 µl) should contain: 25 µl of 0.5 mM 3dNTP mix (dATP, dGTP, dTTP), 25 µl of 10× Klenow fragment buffer (New England Biolabs), 4.5 µl of 33 µM dCTP, 5 units of Klenow fragment polymerase (New England Biolabs), (3H)dCTP (42.9 Ci/mmol) (NEN, Boston MA) (see Note 4). 4. Incubate the samples at room temperature for 1 h (see Note 5). 5. Place the samples on ice, and stop the reaction by adding 25 µl of 12.5 mM EDTA, pH 8.0. 6. Apply each reaction on Whatman DE-81 ion-exchange filters. Air-dry the filters (see Note 6). 7. Wash the filters in 500 mM Na-phosphate buffer, pH 7.0 at room temperature for 10 min.
Detection and Quantification of DNA Strand Breaks Using the ROPS
241
8. Repeat wash twice. 9. Air-dry the filters and process them in scintillation counter. 10. Express the results as relative (3H)dCTP incorporation/0.25 µg of DNA.
4. Notes 1. The assay design allows quantitative detection of 3¢OH DNA strand breaks only. However, if detection of other types of DNA strand breaks (3¢P or breaks at a cleavage site of the deoxyribose residues) is required, then additional steps may be included to expose 3¢OH ends. The 3¢P ends can be removed by phosphatase treatment. Similarly, 3¢-glycosyl ends can be eliminated by Escherichia coli exonuclease III treatment. Both treatments will result in the exposed 3¢OH ends that can be quantified using the ROPS assay. 2. The quality of genomic DNA is essential for the assay. The samples can be quantified using a spectrophotometer. However, we recommend checking equal sample loading using gel electrophoresis and if necessary adjusting it accordingly. Moreover, analysis of highly fragmented DNA may require the addition of a nondegraded high-molecular-weight DNA template to the samples, as recommended by Basnakian and James (16). An excess of the high-molecular-weight DNA template is a major requirement for the assay, as high amounts of DNA fragmentation will result in a quick reaction of saturation affecting the frequency of (32P) or (3H) incorporation. DNA fragmentation can be checked using a 1% agarose gel: at least 50% of DNA should be located in the initial band (Basnakian and James, (16)). 3. It is important to ensure that DNA preparation is not contaminated with SDS, EDTA, proteinase K, or phenol, since these chemicals can significantly inhibit the Klenow fragment polymerase activity. 4. The method was originally developed for the application of (32P)dCTP. Considering a short half-life of (32P), the assay was modified for using (3H)dCTP instead. Regardless of an isotope used, both (32P)dCTP and (3H)dCTP should be supplied to the reaction in a mixture with unlabeled dCTPs. We primarily used (3H)dCTP in our work. 5. Incubation time may be decreased to 30 min if the frequency of (3H)dCTP incorporation is too high. Similarly, incubation temperature can be decreased to 16°C to reduce variability between samples. Assay sensitivity can be increased either by
242
Boyko and Kovalchuk
increasing the amount of radioactively labeled dCTP or by decreasing the amount of unlabeled dCTP. 6. Using Whatman DE-81 ion-exchange filters is essential, as it drastically reduces DNA contamination with unincorporated nucleotides. References 1. Richardson C, Horikoshi N, Pandita TK (2004) The role of the DNA double-strand break response network in meiosis. DNA Repair (Amst) 3:1149–1164 2. Rooney S, Chaudhuri J, Alt FW (2004) The role of the non-homologous end-joining pathway in lymphocyte development. Immunol Rev 200:115–131 3. Breimer LH (1990) Molecular mechanisms of oxygen radical carcinogenesis and mutagenesis: the role of DNA base damage. Mol Carcinog 3:188–197 4. Foyer CH, Harbinson J (1994) Oxygen metabolism and the regulation of photosynthetic electron transport. In: Foyer CH, Mullineaux PM (eds) Causes of photooxidative stress and amelioration of defence systems in plants. CRC Press, Boca Raton, pp 1–42 5. Mittler R (2002) Oxidative stress, antioxidants and stress tolerance. Trends Plant Sci 7:405–410 6. Vranova E, Inze D, Breusegem V (2002) Signal transduction during oxidative stress. J Exp Bot 53:1227–1236 7. Blokhina O, Virolainen E, Fagerstedt KV (2003) Antioxidants, oxidative damage and oxygen deprivation stress. Ann Bot 91:179–194 8. Evans HH, Ricanati M, Horng M-F, Jiang Q, Mencl J, Olive P (1993) DNA double-strand break rejoining deficiency in TK6 and other human B-lymphoblast cell lines. Radiat Res 134:307–315
9. Critchlow E, Jackson P (1998) DNA endjoining: from yeast to man. Trends Biochem Sci 23:394–402 10. Shrivastav M, De Haro LP, Nickoloff JA (2008) Regulation of DNA double-strand break repair pathway choice. Cell Res 18: 134–147 11. Orel N, Kyryk A, Puchta H (2003) Different pathways of homologous recombination are used for the repair of double-strand breaks within tandemly arranged sequences in the plant genome. Plant J 35:604–612 12. Dudas A, Chovanec M (2004) DNA doublestrand break repair by homologous recombination. Mutat Res 566:131–167 13. Puchta H (2005) The repair of double-strand breaks in plants: mechanisms and consequences for genome evolution. J Exp Bot 56:1–14 14. Basnakian AG, James SJ (1994) A rapid and sensitive assay for the detection of DNA fragmentation during early phases of apoptosis. Nucleic Acids Res 22:2714–2715 15. Ausubel FM, Brent R, Kingston RE, Moore DD, Seidman JG, Smith JA et al.(1989) Current protocols in molecular biology. Wiley-Interscience, New York, 1: units 2.2.3(a). 3.5.9(b), 3.6.1-2(c) 16. Basnakian AG, James SJ (1996) Quanti fication of 3’OH DNA breaks by random oligonucleotide-primed synthesis (ROPS) assay. DNA Cell Biol 15:255–262
Chapter 18 Reporter Gene-Based Recombination Lines for Studies of Genome Stability Palak Kathiria and Igor Kovalchuk Abstract Homologous recombination is a double-strand break repair mechanism operating in somatic cells and involved in meiotic crossovers in plants. It is responsible for the maintenance of genome stability and thus plays a crucial role in adaptation to stress. Recombination between homologous loci is believed to be regulated in part by epigenetic machinery such as methylation. Therefore, the recombination frequency at a specific locus can reflect the chromatin status. Several reporter gene-based recombination constructs have been developed to study HR frequencies in plants. Among them, the luciferase and beta-glucuronidase-based recombination reporter systems are the most widely used. Here, we explain how reporter gene recombination assays operate and in which applications they are used. We also present a conceptually new system for analysis of sequence-specific recombination frequency. These assays can be effectively used for analysis of locus-specific endogenous and stress-induced recombination frequencies. Key words: Homologous recombination, Reporter gene recombination constructs, Luciferase, Beta-glucuronidase, Intramolecular recombination, Intermolecular recombination
1. Introduction Homologous recombination is a multifaceted cellular mechanism. It functions in double-strand DNA break repair in somatic and meiotic cells and participates in the mechanism of crossing over. Whereas the process of homologous recombination repair excludes errors, the process responsible for crossing over results in an exchange of sequences between homologous chromosomes, frequently leading to the appearance of new genetic traits. Hence, modulation of recombination in plants either imparts genome stability or leads to the creation of higher genetic variability (1). It has also been suggested that homologous recombination plays an important role in stress response and stress tolerance (2, 3). Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_18, © Springer Science + Business Media, LLC 2010
243
244
Kathiria and Kovalchuk
Initial experiments on the rate of recombination were based on the analysis of differences in the plant tissue pigmentation (4, 5). However, the discovery and utilization of reporter genes such as Firefly Luciferase (LUC) and b-Glucuronidase (uidA/GUS) have facilitated molecular analysis of recombination frequency. A LUCor GUS-based construct carries two truncated nonfunctional versions of a reporter gene serving as a recombination substrate (6, 7). Transgenic plants carrying the substrate do not show transgene expression unless recombination between two truncated versions restores the gene structure. Cells and their progeny in which recombination events take place can be detected and reflected as the number of events per single plant in the population. Spontaneous recombination frequency varies in different areas of the genome and depends on many factors, including transgene copy number, transgene expression at the locus, and the methylation status (8–10). The frequency of spontaneous recombination is also develop mentally regulated (11). The number of various abiotic and biotic stress factors have previously been shown to increase the frequency of homologous recombination (7, 10, 12, 13). Although GUS- and LUC-based recombination constructs are useful for analysis of recombination frequency at various genome positions, they are ‘artificial’ genes and thus do not correctly represent the plant genome. In order to analyze what is happening in plant-specific sequences, especially in those that are involved in pathogen recognition, we developed a new construct that uses different parts of resistance (R) genes as a recombination substrate. Transgenic plants carrying this construct allow the analysis of recombination events that potentially occur between sequences coding the for LRR or NBS domains of R genes. In this chapter, we describe the structure of recombination constructs and the use of transgenic plants carrying these constructs in a variety of applications. 1.1. Design of Recombination Constructs
An efficient recombination construct should contain a sequence overlap of at least 200 bp, preferably more than 400 bp (14–16). Analysis of recombination frequencies can be done either through the use of the reporter gene as the recombination substrate or through the use of any specific sequence, fused to the reporter gene (Figs. 1 and 2). Various reporter genes can be
Fig. 2. A diagrammatic representation of the recombination construct for analysis of sequence-specific recombination frequency. (a) The basic concept of the construct. The transgene consists of the promoter-less LRR region fused to the luciferase reporter. Upon recombination with a native R gene, luciferase comes under regulation of the promoter and starts expressing. (b) The construct which contains the stuffer sequence followed by additional 35S promoter-driven repeats at the 3¢ end. The construct will lead to expression by recombination between either sister chromatids or homologous chromosomes. Expression can also be observed, if recombination occurs between the transgene and a native R gene (ectopic recombination)
Reporter Gene-Based Recombination Lines for Studies of Genome Stability
245
Fig. 1. A diagrammatic representation of the reporter-based recombination construct to measure sequence-unspecific recombination frequency. (a) The two truncated copies are in direct orientation to each other. Recombination between them results in a removal of the linker sequence. (b) The truncated copies are in the reverse orientation to each other, which results in sequence conversion of the linker sequence and truncation of one copy of the reporter gene. (c) The third version of the recombination construct, in which the position of the two copies is interchanged. This allows expression of the reporter gene only after intermolecular recombination, such as the one between either sister chromatids or chromosomes
246
Kathiria and Kovalchuk
used, but preference should be given to luciferase, b-Glucuronidase, or antibiotic resistance genes. Transgenic plants carrying either one of these three constructs have been used in the past (7, 12, 16, 17). Minor variations exist within various constructs, but overall the concepts appear to be analogous. We will briefly discuss the design and use of these two types of recombination constructs. 1.2. A Construct for Analysis of the Global SequenceIndependent Recombination Frequency
A construct to analyze the frequency of global HR at any given time during development or under the influence of any stress factor consists of two truncated copies of reporter genes. The first copy lacks the 3¢ region of the original reporter gene and is preceded by a promoter. The second copy lacks the promoter and the 5¢ region of the reporter gene (Fig. 1a). Both copies are connected by a stuffer sequence. The CaMV 35S promoter is widely used due to the level of resultant expression, but the selection of the promoter may vary depending on plant species analyzed. Both copies share the considerably overlapping region. The length of the overlapping sequence, which acts as a substrate for recombination, is variable. In previous experiments, it had been determined that the minimum length of 200 bp of the overlapping region was required for efficient recombination between sequences (see Note 1). Shorter lengths result in substantially lower recombination frequencies (14–16). The formation of an intact copy of the reporter gene occurs via recombination between overlapping homologous sequences after the formation of the Holiday junction structure (Fig. 1a). Depending on the arrangement and orientation of sequences in the recombination construct, characteristics of a recombination event may vary. A basic construct as described above can recombine with a homologous copy of the reporter gene on the same DNA strand. Also, recombination between sister chromatids, the same locus on homologous chromosomes, and a homologous sequence at different chromosomal locations are possible. Hence, both inter- and intramolecular recombination can be analyzed using this construct. Intramolecular recombination in the construct will lead to removal of the linker sequence (Fig. 1a), while intermolecular recombination will not affect the linker sequence (18). A similar construct consists of two truncated reporter gene copies in the opposite orientation (Fig. 1b). Such a construct results in conversion of the linker sequence and further truncation of one copy of the reporter gene with the formation of an intact copy of the reporter gene (Fig. 1b). This type of construct will also allow both inter- and intramolecular recombination. To study intermolecular recombination exclusively, a third type of a construct has also been developed. Positions of the 5¢ copy and the 3¢ copy of the reporter gene have been reversed
Reporter Gene-Based Recombination Lines for Studies of Genome Stability
247
(Fig. 1c). As a result, recombination between the two copies does not result in a functional reporter gene. In contrast, intermolecular recombination results in expression of the reporter gene. Hence, the design of a recombination reporter construct may vary depending on the type of recombination analysis required (either inter- or intrachromosomal recombination). Li et al. developed a similar construct to study intermolecular recombination, using intronic sequences as recombination substrates (18). Using this construct, the authors analyzed the impact of sequence length and homology on homologous recombination. Moreover, since homologous recombination frequency varies depending on the position of the transgene in the genome (8), the system can be used for analysis of recombination at loci with various degrees of chromatin condensation, different methylation patterns, etc. 1.3. A Construct for Analysis of SequenceSpecific Recombination Frequency
Using our experimental system, homologous recombination frequency can be analyzed in a sequence-specific manner. In the experiment described here, we attempted to monitor the recombination frequency in the leucine-rich repeat (LRR) region of plant R genes. We tested whether this sequence that is known to be critical for pathogen recognition is more prone to frequent rearrangements. The conventional RFLP method for analysis of the frequency of rearrangements in these loci would require hundreds of thousands of individual plants. To make the analysis a practical reality, we have developed a novel reporter gene-based recombination construct. The basic design of the construct consists of a sequence under study serving as the recombination substrate fused to the luciferase reporter gene at the 3¢ end (Fig. 2a). The construct does not contain any promoter, hence in the native state there is no reporter gene expression. If a recombination event occurs between the transgenic LRR sequence under study and the LRR sequence of the pre-existing endogenous R gene, the luciferase gene will come under regulation of the promoter driving that R gene. Hence, the event can be visualized by expression of the reporter gene. To make the system more robust, three different LRR sequences from three unique R genes were selected and arranged in a direct array followed by the reporter gene (Fig. 2b). Conserved protein domain prediction was carried out using Pfam and InterPro softwares. The corresponding DNA sequence from the conserved protein domain was selected for the study. In the construct, each LRR repeat is exactly 416 bp in length. To analyze the specificity of such recombination event, a control construct consisting of fragments from Actin, RENT, and Ubiquitin genes has been generated. Such construct will also lead to luciferase gene expression, only if recombination occurs between homologous sequences at different chromosomal loci. Both the aforementioned constructs allow interchromosomal recombination
248
Kathiria and Kovalchuk
events to occur only between the transgene locus and the endogenous locus. In order to analyze intrachromosomal events occurring between sister chromatids and thus allow more frequent events to occur, another construct has been generated. In addition to the LRR sequences in the first construct, a second identical LRR repeat sequence is added to the 3¢ region. This repeat sequence is driven by the 35S promoter (Fig. 2b). In such construct, recombination between either sister chromatids or homologous chromosomes can also lead to activation of the reporter gene. In addition, for convenience of detection of recombination events and analysis of sequence specificity and for detection of the particular LRR region in which recombination occurred, a Hemagglutinin (HA) tag sequence and a Kanamycin antibiotic resistance gene were added to the construct. The recombination event will result in restoration of this reporter gene. The reporter gene encodes a fusion protein consisting of the Hemagglutinin (HA) sequence followed by a protein encoding resistance against Kanamycin and by luciferase protein (Fig. 2b). The luciferase activity stemming from the fusion protein can be detected by the in vivo luciferase assay. The tissue expressing the luciferase can be excised and propagated through calli regeneration in the presence of kanamycin, which allows obtaining a sufficient amount of material for PCR and sequencing analysis of recombination events. This will allow us to detect whether recombination occurred in the LRR1, LRR2, or LRR3 region, and whether it occurred between sister-chromatids or between the transgene and the endogene (Fig. 2b). 1.4. Analysis of Recombination Constructs
The methods of detecting activity of GUS and LUC reporter genes are described below.
2. Materials 2.1. Histochemical Analysis of GUS (uidA) Gene Expression
1. Staining solution: 2 mM 5Bromo 4chloro 3indoxyl beta D-glucuronide cyclohexylammonium salt (X-gluc), 100 mM Sodium phosphate buffer pH 7.0. % Triton X-100. 2. 70% Ethanol. 3. Vacuum chamber. 4. Dissecting microscope attached to a still picture camera. 5. 37°C oven.
Reporter Gene-Based Recombination Lines for Studies of Genome Stability
249
6. 15 or 50 ml tubes. 7. Forceps. 2.2. In Vivo Analysis of Luciferase Gene Expression
1. Luciferin solution: mM Luciferin sodium salt, 100 mM Sodium phosphate buffer pH 7.0, % Triton X-100. 2. Fine spray nozzle. 3. CCD camera (Gloor Instruments AG, Uster, Switzerland). 4. PIXcel and ANAlysis softwares. 5. Paper towels.
3. Methods 3.1. Histochemical Analysis of GUS (uidA) Gene Expression
1. Transfer the plants to be analyzed for GUS expression to a 15 or 50 ml tube, depending on the total amount of plant tissue. 2. Place the tubes in a vacuum chamber without lids, and apply vacuum for 20 min (see Note 2). 3. Incubate the tubes at 37°C for 24 h. 4. Replace the staining solution with a 70% ethanol solution. Incubate the tubes for 24 h at 37°C. Carry out one more change of 70% ethanol to destain tissues (see Note 3). 5. Analyze plants using a low magnification dissecting microscope and count the number of GUS spots per plant (Fig. 3a, b).
3.2. In Vivo Analysis of Luciferase Gene Expression
1. Carry out all the procedures in a dark clean place. 2. Spray plants uniformly with a luciferin solution using a fine spray nozzle (see Note 4). 3. Incubate the plants in the dark for 20–30 min (see Note 5). 4. Transfer the plants to a dark chamber equipped with a CCD camera. 5. Take one control picture in the presence of light, using PIXcel software. 6. Take two consecutive images in the dark. The normal exposure time is 10 min for each picture. 7. Analyze the images using ANAlysis software provided with the CCD camera. Superimpose the two dark pictures together, and give the image of luseferase expression a red color. 8. Superimpose the control and the dark image together to obtain a final picture (Fig. 3c–f).
250
Kathiria and Kovalchuk
Fig. 3. Detection of recombination events in plants with various kinds of recombination constructs. (a) The Arabidopsis plant with the GUS-based recombination construct showing spots of GUS expression at low magnification (a black arrow). (b) A similar GUS spot on the stem of a young seedling analyzed at higher magnification (black arrow). (c) The Arabidopsis plant with recombination spots stemming from the luciferase-based recombination construct. (d) Similar Arabidopsis plants treated with UVC. The increased number of spots indicates the greater number of recombination events in the cells. (e) The leaves of transgenic Arabidopsis plants carrying the specific recombination construct in the genome site, such as the one depicted in Fig. 2b (f). Similar recombination events in transgenic canola plants
4. Notes 1. Previous experiments suggested that the rate of recombination is directly proportional to the size of a sequence. However, newer experiments suggest that the length of homology as small as 6–20 bp allows high rates of recombination in the
Reporter Gene-Based Recombination Lines for Studies of Genome Stability
251
sequence (19). We have not tested such small overlapping sequences, and hence we cannot evaluate how efficient they are. However, we recommend the overlapping sequence with a minimum length of 200 bp to be used in the design of the recombination reporter. 2. It is very essential to ensure that the plants remain fully submerged in the staining solution. It can be achieved by using hollow cork plugs to keep the plants submerged, but let the air pass through the plug. Uneven staining can occur, if the tissues are not completely submerged. 3. If the tissue is hard to destain and green patches can still be visualized, the plants can be transferred to 100% ethanol and incubated at 65°C for 2 h. After this treatment, the plants must be transferred to 70% ethanol, as prolonged exposure to 100% ethanol can make tissues brittle. 4. This can also be performed “in vitro” by cutting the leaves and submerging them fully in the luciferin solution in a tube. Further analysis can be carried out as mentioned. 5. When analyzed in darkness after a shift from light conditions, the plants produce a significant autoluminescence. This can easily be excluded by incubating the plants in dark conditions for 15–20 min. Additional time is provided for luciferin to spread in plant tissues. References 1. Molinier J, Ries G, Hohn B (2006) Transgenerational memory of stress in plants. Nature 442:1046–1049 2. Ries G, Heller W, Puchta H, Sandermann H, Seidlitz H, Hohn B (2000) Elevated UV-B radiation reduces genome stability in plants. Nature 406:98–101 3. Boyko A, Kovalchuk I (2008) Epigenetic control of plant stress response. Environ Mol Mutagen 49:61–72 4. Burk LG, Menser HA (1964) A dominant aurea mutation in tobacco. Tobacco Sci 8:101–104 5. Christianson ML (1975) Mitotic crossing-over as an important mechanism of floral sectoring in Tradescantia. Mutat Res 28:389–395 6. Swoboda P, Gal S, Hohn B, Puchta H (1994) Intrachromosomal homologous recombination in whole plants. EMBO J 13:484–489 7. Kovalchuk I, Bojko V, Kovalchuk O, Gloeckler V, Filkowski J, Heinlein M et al (2003) Pathogen induced systemic plant signal triggers genome instability. Nature 423:760–762
8. Kovalchuk I, Kovalchuk O, Hohn B (2000) Genome-wide variation of the somatic mutation frequency in transgenic plants. EMBO J 19:4431–4438 9. Ilnytskyy Y, Boyko A, Kovalchuk I (2004) Luciferase-based transgenic recombination assay is more sensitive than beta-glucoronidase-based. Mutat Res 559:189–197 10. Boyko A, Kathiria P, Zemp F, Yao Y, Pogribny I, Kovalchuk I (2007) Transgenerational changes in the genome stability and methylation in pathogen-infected plants. Nucleic Acids Res 35:1–12 11. Boyko A, Filkowski J, Kovalchuk I (2006) Double strand break repair in plants is developmentally regulated. Plant Physiol 141:1–10 12. Boyko A, Greer M, Kovalchuk I (2006) Acute exposure to UVB has more pronounced effect on plant genome stability than chronic exposure. Mutat Res 602:100–109 13. Boyko A, Hudson D, Bhomkar P, Kathiria P, Kovalchuk I (2006) Increase of homologous
252
Kathiria and Kovalchuk
recombination frequency in vascular tissue of Arabidopsis plants exposed to salt stress. Plant Cell Physiol 47:736–742 14. Lyznik LA, McGee JD, Tung PY, Bennetzen JL, Hodges TK (1991) Homologous recombination between plasmid DNA molecules in maize protoplasts. Mol Gen Genet 230:209–218 15. Puchta H, Hohn B (1991) The mechanism of extrachromosomal homologous DNA recombination in plant cells. Mol Gen Genet 230:1–7 16. Li L, Santerre-Ayotte S, Boivin EB, Jean M, Belzile F (2004) A novel reporter for intrachro-
mosomal homoeologous recombination in Arabidopsis thaliana. Plant J 40:1007–1015 17. Peterhans A, Schlűpmann H, Basse C, Paszkowski J (1990) Intrachromosomal recombination in plants. EMBO J 9:3437–3445 18. Scheurmann D, Molinier J, Fritsch O, Hohn B (2005) The dual nature of homologous recombination in plants. Trends Genet 21:172–181 19. Jelesko J, Carter K, Thompson W, Kinoshita Y, Gruissem W (2004) Meiotic recombination between paralogous RBCSB genes on sister chromatids of Arabidopsis thaliana. Genetics 166:947–957
Chapter 19 Plant Transgenesis Alicja Ziemienowicz Abstract Epigenetic effects such as gene silencing and variable expression are unintended consequences of plant transformation, a problem that is present in the transformation of all plant species. There is not yet a reliable way to prevent epigenetic silencing; however, the probability of epigenetic effects may be reduced by choosing an appropriate method of transgene introduction into a plant cell. Most methods used in plant biotechnology, such as direct gene transfer and particle bombardment, result in the introduction of multiple DNA molecules and, as a consequence, multi-copy multi-locus insertion patterns. These multiple insertions may lead to variations in transgene expression, epigenetic silencing being the most extreme. In contrast, Agrobacterium-mediated plant transformation procedures rarely cause such unintended effects. In this chapter, we present advantages and disadvantages of the Agrobacterium-mediated plant transformation method as well as protocols for transformation of Arabidopsis generative tissues and tobacco seedlings as the most classical techniques in these model plants, i.e., vacuum infiltration of explants and floral dip methods. Moreover, epigenetic effects of transgenes such as silencing related to the position and insertion effects as well as effects of the regeneration procedure causing somaclonal variation will be briefly discussed. Key words: Agrobacterium, T-DNA tagging, Vacuum infiltration, Floral dip, Arabidopsis, Tobacco, Transgene silencing, Position effect, Insertion effect, Somaclonal variation, OE lines, KO lines
1. Introduction The introduction of DNA into plant cells can be achieved by a variety of techniques, including polyethylene glycol (PEG) treatment, electroporation, microinjection, biolistic transformation as well as the use of Agrobacterium, plant viruses, and liposomes as DNA carriers (1–3). The most commonly used method for obtaining transgenic plants involves the introduction of donor DNA into plant cells by a pathogenic bacterium Agrobacterium tumefaciens or Agrobac terium rhizogenes (4). Agrobacterium initiates T-DNA transfer Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7_19, © Springer Science + Business Media, LLC 2010
253
254
Ziemienowicz
Fig. 1. A general scheme of a pTi plasmid. T-DNA is located between LB (left border) and RB (right border), as indicated by arrowheads. Dots indicate other genetic elements, such as origin of replication, conjugal transfer region, and opine catabolism operon
from its tumor inducing (Ti) or root inducing (Ri) plasmid in response to phenolic signals produced by wounded plants. Plant phenolics then trigger the induction of the bacterial plasmidencoded virulence (vir) genes, which in turn mediate the T-DNA transfer. The T-DNA (10–20 kb) on the Ti plasmid (Fig. 1) is flanked and delimited by border sequences (RB and LB – right and left border) that consist of 25 bp long, highly conserved direct DNA repeats. Under normal circumstances, the DNA sequence flanked by these borders is transferred to the plant, where it integrates randomly into the plant genome. The naturally occurring genes within the T-DNA are not necessary for the T-DNA transfer but are necessary for tumorigenesis and opine production. Thus, these naturally occurring T-DNAencoded “oncogenes” can be deleted, thereby disarming the T-DNA. Such disarmed Ti plasmids are still capable of transferring T-DNA to plants, yet they will not cause disease. Thus, normal plants harboring novel gene(s) inserted between the T-DNA borders can be regenerated after cocultivation of plants with Agrobacterium carrying a modified Ti plasmid (vector). For species that are amenable to transformation and regeneration using A. tumefaciens, T-DNA-mediated gene transfer remains the method of choice because of the simplicity and efficiency of delivering a neat package of DNA that is integrated into the plant genome. In addition, this method mostly results in a single-copy, single-locus integration pattern, in contrast to other gene transfer methods (electroporation, microinjection, or particle bombardment), which suffer from a relatively low efficiency of transformation and frequent integration of catenated and/or rearranged DNA sequences (a multi-copy/multi-locus integration pattern).
Plant Transgenesis
255
Transformation methods used in plant biotechnology and modern agriculture have the potential to generate unintended genetic and epigenetic variations. (5–8). These unintended effects can be divided into three main groups: (1) insertion effects, (2) position effects, and (3) somaclonal variation. They are often observed when direct methods of DNA transfer are used; however, the Agrobacterium-mediated plant transformation method may also, although more rarely, cause unintended effects including those of epigenetic nature. This can be explained, at least partially, by the difference in transgene copy number in plants obtained by direct gene transfer (a multi-copy multi-locus integration pattern) and Agrobacterium-mediated transformation (a single-copy single locus integration pattern). Insertion effects are defined as pleiotropic effects of integrated DNA on the host plant genome (8). The insertion effect can be mutagenic in nature, resulting in null, loss-of-function, gain-of-function, and other possible phenotypes depending on the specific region of transgene insertion into the plant DNA. Mutagenic gene disruption is not the only mechanism by which transgene insertion may affect the phenotype of a transgenic plant. Coding sequences surrounding insertion sites may fall under the influence of transgene regulatory elements (e.g., promoters) thus leading to sense and antisense transcripts, depending on the orientation of the effected host gene in relation to the transgene promoter. The sense transcripts may serve as mRNAs for recombinant (fusion) proteins, whereas the antisense transcripts may interfere with host gene expression via RNA interference (RNAi). Position effects represent the influence of the integration site and transgene architecture on transgene expression level and stability. Most of the variations in transgene expression are attributed to differences in transgene copy number and its integration sites. This may lead to changes in locus configuration and/or induction of silencing mechanism (5). Silencing resulting from interactions among multiple copies of transgenes and related endogenous genes involves homology-based mechanisms that function at both the transcriptional and post-transcriptional levels (RNA interference) (9, 10). Surrounding sequences can also influence transgene expression, and they do it in many ways (8). First, transgene expression may be enhanced by strong promoter and/or enhancer elements, if it integrated in their vicinity. Second, specific S/MARs (scaffold/matrix attachment regions) may have a beneficial influence on transgene expression. Third, the cytosine methylation status of the insertion site may influence transgene expression leading to transcriptional silencing of transgene. Somaclonal variation is defined as the effect of various stresses related to tissue handling, regeneration, and clonal propagation.
256
Ziemienowicz
Most of the transformation protocols apply an in vitro selection and regeneration step, and these procedures always induce somaclonal variation, which leads to changes in numerous plant characteristics, most often as an unintended effect. Somaclonal variations can be manifested as either somatically or meiotically stable events. They are exhibited as cytological abnormalities, frequent qualitative and quantitative phenotypic mutations, sequence changes, and gene activation or silencing (e.g., transposable elements and retrotransposons) (7). The main factors contributing to somaclonal variation are generally classified as stress factors. Stress response can be potentially mutagenic, and it leads to: (1) genetic changes such as polyploidy, aneuploidy, chromosome rearrangements, somatic recombination, gene amplifications, point mutations, excisions and insertions of (retro)transposones and (2) epigenetic changes including DNA methylation and histone modifications (8, 11). Agrobacterium-mediated plant transformation has found many applications in plant biotechnology for the creation of transgenic plants with improved characteristics such as insect and herbicide resistance, phytoremediation abilities, improved nutrient content, production of biopharmaceuticals, and many others. This method is also frequently used for research purposes through T-DNA mutagenesis and creation of gene over-expressing (OE) or knock-out (KO) lines to study gene function. The same strategy and Agrobacterium vectors are used for biotechnological applications and research purposes mentioned above. Although the effect of T-DNA mutagenesis is the same as that of creation of specific knock-out plant (gene inactivation), the mechanism of gene inactivation is different. T-DNA mutagenesis is a type of insertional mutagenesis in which T-DNA integration into a gene sequence (T-DNA tagging) usually causes its inactivation. In the case of KO lines, T-DNA carries an anti-sense sequence of the gene of interest or another sequence that will anneal to target mRNA molecules to prevent gene expression (RNA interference), including secondary structure-forming transcripts (so-called hairpin vectors). Overexpression vectors contain an expression cassette composed of an appropriate promoter (e.g., 35S promoter from CaMV), a gene of interest, and a transcription termination signal (e.g., nos terminator). Introduction of such cassette into the plant genome results in overproduction of gene products that may also exhibit a phenotypic effect, shedding more light on gene function. Thus, modified A. tumefaciens strains and T-DNA vector systems have been generated to perform multiple functions. However, there is still a need to improve the Agrobacteriummediated plant transformation method in order to enrich the arsenal of techniques for plant genetic engineering.
Plant Transgenesis
257
2. Materials 2.1. Materials for Transformation of Tobacco
1. Tobacco seeds: N. tabacum cv. SR1 (see Note 1). 2. 10% (v/v) bleach solution (1.2–1.4% sodium hypochlorite). 3. Sterile distilled water.
2.1.1. Sterilization of Nicotiana tabacum Seeds 2.1.2. In Vitro Cultivation of N. tabacum Plants
1. Petri dishes containing standard ½ MS solid medium (see Note 2) or water-soaked filter papers (see Note 3).
2.1.3. Preparation of Agrobacterium Suspension
1. A. tumefaciens strain, for example: GV3101[pPM6000; pTd33] grown on solid YEB medium containing rifampicin (100 mg/L) and gentamycin (20 mg/L). 2. Liquid YEB medium (see Note 4) containing 100 mg/L rifampicin (Sigma) and 20 mg/L gentamycin (Bioshop). 3. 10 mM MgSO4 solution (filter sterilized). 4. Liquid MS medium (see Note 3). 5. Optional: 100 mM acetosyringone (3¢,5¢-Dimethoxy-4¢hydroxyacetophenone; Aldrich) in alcohol solvent (isopropanol or ethanol).
2.1.4. Transformation by Infiltration
1. 10–14-day-old tobacco seedlings or 1–2 month-old tobacco plants cultured in vitro (see Subheading 3.1.2). 2. Agrobacterium suspension (see Subheading 3.1.3). 3. MSH1 solid medium (for modified MS media composition see Note 5). 4. MSH1T or MSH1C solid medium (see Note 5).
2.1.5. b-Glucouronidase Histochemical Assay
1. b-glucouronidase (GUS) staining solution: 100 mM Na-phosphate buffer, pH 7.0; 0.05% X-Gluc (5-Bromo-4chloro-3-indoxyl-beta-D-glucuronide, GBT) dissolved in DMF (dimethylformamide, Sigma); 0.1% (w/v) Na-azide. 2. 99.6% and 70% (v/v) ethanol.
2.1.6. Selection and Regeneration of Transformants
1. MSH1TK or MSH1CK solid medium (see Note 5). 2. MSH2 solid medium (see Note 5).
258
Ziemienowicz
2.2. Materials for Dipping Arabidopsis Flowers
1. Arabidopsis seeds: A. thaliana ecotype Columbia (see Note 6).
2.2.1. Cultivation of Arabidopsis thaliana plants in soil
4. Dry sand.
2.2.2. Preparation of Agrobacterium Suspension
1. The same materials Subheading 2.1.3.
2.2.3. Transformation by Flower Dipping/Infiltration
1. Flowering Arabidopsis plants (see Subheading 3.2.1).
2.2.4. Selection of Putative Transformants Using a Kanamycin Resistance Marker
1. 70% (v/v) ethanol.
2. Standard horticultural soil. 3. Artificial carrier for plant soil culture, e.g., Vermiculite.
are
required
as
described
in
2. Agrobacterium suspension (see Subheading 3.2.2).
2. 10% (v/v) bleach solution (1.2–1.4% sodium hypochlorite) containing detergent, e.g., 0.05% (v/v) Tween 20. 3. MS (or ½ MS) medium containing kanamycin (50 mg/L).
3. Methods The Agrobacterium-mediated plant transformation method employs a variety of different techniques for infecting plants to permit T-DNA transfer. In this chapter, we present the most commonly used techniques: vacuum infiltration of explants and floral dipping. Of the two, only vacuum infiltration requires plant regeneration steps (Fig. 2). In both cases, however, the same strain of binary vector containing A. tumefaciens can be used. In general, two types of Agrobacterium vectors are commonly used, co-integrate vectors and binary vectors. In both types of vectors, Agrobacterium plasmids are disarmed by deleting naturally-occurring T-DNA encoded “oncogenes” and replacing them with a gene of interest. In the co-integrative vector system, a donor vector containing the gene of interest is integrated with the disarmed Ti plasmid. The binary vector system consists of two autonomously-replicating plasmids within Agrobacterium, a binary vector that contains gene(s) of interest between T-DNA borders and a helper Ti plasmid that provides vir gene products required for T-DNA transfer to plant cells. Binary vectors have been recently developed, and are most commonly used now because of the ease of DNA manipulation both in vivo and in vitro, and for their higher transformation efficiencies. The Agrobacterium strain used in the protocols described in the present paper is called GV3101[pPM6000; pTd33] and is a binary vector system (Fig. 3). GV3101 is a non-virulent
Plant Transgenesis
259
Fig. 2. A general scheme of Agrobacterium-mediated transformation of explants using vacuum infiltration technique
Agrobacterium strain, a derivative of the wild type C58 strain but lacking the Ti plasmid (C58 is a virulent strain containing the nopaline Ti plasmid) (12). Both stains are characterized by their resistance to the rifampicin antibiotic. pPM6000 is a derivative of the octopine plasmid pTiAch5 lacking the T-DNA region (12), and it is used as the helper plasmid, whereas pTd33 is the binary vector (13). This binary vector contains several important sequences: (1) T-DNA with genes encoding b-glucuronidase (uidA = GUS gene) and kanamycin resistance (nptII = KanR gene) under the control of plant regulatory elements (CaMV 35S promoter and nos terminator), (2) two origins of replications allowing plasmid amplification in both Escherichia coli and Agrobacterium
260
Ziemienowicz
Fig. 3. A binary system of the Agrobacterium GV3101[pPM6000; pTd33] strain (see text for details)
cells (ori pBR322 and ori Ri), and (3) a gentamicin resistance gene (GmR) for bacteria selection. 3.1. AgrobacteriumMediated Transformation of Plant Explants (e.g., Tobacco Seedlings or Leaf Discs)
1. Open seed capsules of N. tabacum plants, and place the seeds in a sterile 50 ml Falcon tube.
3.1.1. Sterilization of N. tabacum Seeds
5. Place the sieve with sterilized seeds in a plastic Petri dish, and leave in a laminar flow cabinet till the seeds are dry (overnight).
2. Fill the tube with bleach solution, and mix it for 10 min. 3. Sediment the seeds by pouring the suspension through a sterile 50 mm sieve. 4. Rinse the seeds 3–5 times with sterile distilled water.
6. Close the Petri dish with Parafilm, and store the seeds at room temperature until use. 3.1.2. In Vitro Cultivation of N. tabacum Plants
1. Sow ~ 100 sterilized seeds uniformly onto the surface of ½ MS solid medium or moisten filter paper (see Subheading 2.1.2 and Note 3) in a 9 mm Petri dish by using a sterile scalpel or spatula. 2. Close the plate with Parafilm, and place it at 4°C for 3–7 days for seed stratification (see Note 7). 3. Transfer the plate to a plant in vitro culture cabinet under the following culture conditions: photoperiod – 16 h light/8 h dark (or 24 h light), temperature – 22–24°C, humidity – 60%. 4. Cultivate the culture for 10–14 days to obtain seedlings.
Plant Transgenesis
261
5. Optional: transfer 2-week-old seedlings into glass jars containing solid MS medium, and cultivate the plat cultures in vitro under the same conditions for additional 1–2 months to obtain young plants with fully developed leaves. 3.1.3. Preparation of Agrobacterium Suspension
1. Inoculate 10.5 mL of liquid YEB medium containing 100 mg/L rifampicin, 20 mg/L gentamycin, and 0.5 mM acetosyringone (optional, see Note 8) with a single colony of the A. tumefaciens strain GV3101[pPM6000; pTd33]. 2. Incubate the culture on a rotary shaker at a speed of 120 rpm at 28°C for 24–48 h. 3. Centrifuge the Agrobacterium culture at 5,500×g for 15 min. at 4°C. 4. Discard the medium and resuspend the bacterial pellet in 35 mL of sterile 10 mM MgSO4. 5. Repeat steps 3 and 4. 6. Measure the suspension density (see Note 9). 7. Centrifuge the Agrobacterium suspension at 5,500×g for 15 min at 4°C. 8. Discard the supernatants and resuspend the bacterial pellet in liquid MS medium to bring the OD600 to 0.1 (see Note 10). Optional: add sterile acetosyringone (the final concentration: 0.5 mM).
3.1.4. Transformation by Infiltration
1. If you use young tobacco plants (1–2 months old), cut out leaf discs (~1 cm in diameter) under sterile conditions using a sterile cork-cutter. 2. Place tobacco seedlings (optimum: 10–14-day-old) or leaf discs in tubes containing the Agrobacterium suspension or sterile MS medium (control). 3. Infiltrate the plant material with the Agrobacterium suspension for 5 min (minimum pressure 130 mbar) in a sterile desiccator connected to a vacuum pump (see Note 11). Repeat the infiltration once more. 4. Take out the infiltrated seedlings/leaf discs and dry them shortly (approximately for 1 min) on a sterile filter paper to remove excess of Agrobacterium cells. 5. Transfer the infiltrated seedlings onto the surface of MSH1 medium and tightly seal the Petri dishes with Parafilm. 6. Co-cultivate the plants with Agrobacterium for 3 days in a plant in vitro growth chamber to allow Agrobacterium to infect explants and transfer the T-DNA carrying transgene(s) to the plant tissue.
262
Ziemienowicz
7. After 3 days of co-cultivation, transfer the seedlings to MSH1T or MSH1C solid medium containing a bacteriostatic agent (see Note 12), and culture again for a week under the same conditions. 8. Perform a test of transient expression of the GUS reporter gene, and/or continue with selection and regeneration. 3.1.5. b-Glucouronidase Histochemical Assay
1. Transient or stable expression of GUS can be tested using this assay (14) (see Note 13). 2. Place the plant material in 15 ml Falcon tubes, and soak it in the GUS staining solution. 3. Infiltrate the plant material with the GUS staining solution for 10 min. using a desiccator connected to a vacuum pump (minimal pressure 130 mbar). 4. Remove the tubes from desiccator, and incubate them for 2–3 days at 37°C; leave lids open to permit oxidation. 5. Pour off the staining solution, and wash the plant material with distilled water. 6. Wash out chlorophyll from plant tissues by incubating them for several hours in 99.6% ethanol; this can be accelerated by gentle shaking and warming. Exchange ethanol until plants are white. 7. Replace 99.6% ethanol with 70% ethanol solution (this makes the plant material less brittle). These samples can be kept indefinitely. 8. Place the plant material in a plastic Petri dish, and observe GUS staining with a binocular microscope. The presence of blue spots indicates GUS expression (Fig. 4).
3.1.6. Selection and Regeneration of Transformants
1. After a week (see Subheading 3.1.4, step 7), transfer the seedlings/leaf discs onto the surface of the selection medium containing kanamycin (MSH1TK or MSH1CK). Culture for 3 weeks under the same conditions. 2. Transfer surviving tissue sectors on MSH1TK (or MSH1CK) medium for the next cycle of selection (see Note 14). Perform 3–4 cycles of selection. Green shoots will start to emerge from the tissue (see Note 15). 3. Transfer regenerating shoots to MSH2 solid medium for root induction. If needed, add bacteriostatic antibiotic to MSH2 medium (timentin, carbenicillin or cefotaxime). 4. The selected plants (survivals) are putative transgenic plants. For further procedures with putative transformants (see Notes 16 and 17).
Plant Transgenesis
263
Fig. 4. Transformed tobacco seedlings after the histochemical b-glucuronidase (GUS) assay
3.2. AgrobacteriumMediated Transformation of Female Reproductive Tissues of A. thaliana (Floral-Dip Method) 3.2.1. Cultivation of A. thaliana Plants in Soil
1. Mix standard horticultural soil with an artificial carrier (volume ratio 3:1), sterilize the mixture by autoclaving, and place it in small plastic flower-pots. 2. Put the pots on saucers (supports) filled with water, and leave them for 4–24 h. 3. Mix A. thaliana seeds with small amount of dry sand, and sow them in the soil. 4. Cover the pots with foil, and place them at 4°C for 3–7 days for seed stratification. 5. Remove foil, and transfer the pots to a culture cabinet. Culture conditions: photoperiod – 16 h light/8 h darkness, temperature – 22–24°C, 60% – humidity. 6. Culture the seeds/plants for 2–3 weeks until the plants reach the stage of a rosette with 2–4 leaves. Transfer the plants to individual pots (see Note 18). 7. Continue cultivation for the next 1–2 weeks to reach the flowering stage.
3.2.2. Preparation of Agrobacterium Suspension
1. Prepare the Agrobacterium suspension according to the protocol described in Subheading 3.1.3. The Agroabcterium suspension used for the floral dip method may have a higher density (OD600 = 0.6–1.0). 2. Transfer the Agrobacterium suspension to a sterile glass beaker.
264
Ziemienowicz
3.2.3. Transformation by Flower Dipping/Infiltration
1. Put the pot with flowering Arabidopsis plants upside down, and place flower shoots into the Agrobacterium suspension (see Note 19) for up to 30 min. (dipping). Alternatively, dipping may be combined with vacuum infiltration (see Note 20). 2. Remove the flower-pot containing the plant from the beaker, and place it on a support. 3. Cover the flower-pot with a transparent plastic lid (see Note 21) to maintain humidity. Leave plants in a low light or dark location overnight. Keep the domed plant out of direct light. 4. Remove the lid 12–24 h after treatment (infiltration), and transfer the inoculated plant to a culture cabinet. Culture conditions: photoperiod – 16 h light/8 h dark or 24 h light, temperature – 22–24°C, humidity – 60%. 5. Optional: repeat dipping (or infiltration) of Arabidopsis flower shoots in the Agrobacterium suspension 2–3 times at 5–6 day intervals (see Note 22). 6. Grow plants for a further 3–5 weeks period until siliques are brown and dry. Stop watering the plants at this stage. 7. Harvest seeds by gentle pulling groups of inflorescences through your fingers over a piece of clean paper. Remove the majority of the stem and pod material. 8. Store seeds in Eppendorf tubes either at room temperature or at 4ºC under desiccation.
3.2.4. Selection of Putative Transformants Using a Kanamycin Resistance Marker
1. Sterilization of Arabidopsis seeds: – Place a desired amount of seeds in an Eppendorf tube, and soak them for 1–2 s in 1 mL of 70% ethanol – Under sterile conditions, remove ethanol, and soak seeds in 1 mL of 10 % bleach solution containing a detergent (e.g. 0.05% (v/v) Tween 20) and incubate for 5–10 min – Remove bleach solution – Rinse seeds 5 times with 1 mL of sterile distilled water (leave them in water after the last wash) 2. Transfer the suspension of sterilized seeds onto the surface of solid ½ MS medium containing kanamycin (50 mg/L). Leave the plates open in a laminar flow cabinet for 1 h or until excess water is evaporated. 3. Close the plates, and seal them with Parafilm. Place at 4ºC for 3–7 days for seed stratification. 4. Place the plates in a plant in vitro culture cabinet/growth chamber. Culture conditions: photoperiod – 16 h light/8 h dark or 24 h light, temperature – 22–24°C, humidity – 60%.
Plant Transgenesis
265
5. Grow the plants for 1–2 weeks. Transformants should be identified as kanamycin-resistant seedlings that produce green leaves and a well-established root system within the selective medium (see Note 23). 6. Grow some putative transformants to maturity by transplanting into heavily moistened potting soil (preferably after the development of 3–5 adult leaves). For further procedures with putative transformants, see Notes 16 and 17.
4. Notes 1. All tobacco species, including N. tabacum, Nicotiana benta miana, and others as well as all cultivars within each species (e.g., SR1 and Samsun of N. tabacum) can be used. 2. Standard MS medium (15) is prepared using pre-mixed macro- and micro-elements (MS basal salts; 4.33 g/L; Sigma), 2–3% (w/v) sucrose and MS vitamins (1,000× stock, Sigma). pH 5.7–5.8 should be adjusted using 1 N KOH. ½ MS medium contains half the amount of MS basal salts and unchanged amounts of other compounds. Solid medium contains 0.8% (w/v) agar (Sigma) in addition. 3. 2–3 layers of filter paper (e.g., Whatmann paper, SigmaAldrich) may be used instead of ½ MS solid medium. 4. YEB medium contains 0.1% (w/v) yeast extract (Difco), 0.5% (w/v) peptone (Difco), 0.5% (w/v) beef extract (Bioshop), and 0.5% (w/v) sucrose. pH 7.2 should be adjusted using 1 N NaOH. After autoclaving, MgSO4 (1 M filter sterilized solution) should be added to achieve the final concentration of 2 mM. 5. Modified MS media are prepared based on the standard MS medium (see Note 2) supplemented with plant hormones and/or antibiotics. MSH1 solid medium is MS solid medium supplemented with the following phytohormones: BAP (6-Benzylaminopurine, Sigma; the final concentration: 1 mg/L) and NAA (1-Naphtaleneacetic acid, Sigma; the final concentration: 0.1 mg/L). MSH1T solid medium is MSH1 solid medium containing 100 mg/L timentin (Ticarcillin disodium + Potassium clavulanate 15:1, Duchefa Biochemie), whereas MSH1C solid medium is MSH1 solid medium containing 300 mg/L carbenicillin (Bioshop) or 300–500 mg/L cefotaxime (Sigma). MSH1TK (or MSH1CK) solid medium is MSH1T (or MSH1C) solid medium containing 50–100 mg/L kanamycin (Sigma). MSH2 solid medium is MS solid medium containing 0.5–1.0 mg/L IBA (indole-3-butyric acid, Sigma).
266
Ziemienowicz
6. Different Arabidopsis ecotypes, including Columbia (Col-0), C24, Wassilewskija (WS-0), Landsberg (La-0, La-1), and Landsberg erecta (Ler-0), may be used. Seeds are available from The European Arabidopsis Stock Centre (NASC; http:// arabidopsis.info) or Arabidopsis Biological Resource Center (ABRS; http://www.biosci.ohio-state.edu/~plantbio/Facilities/ abrc/abrchome.htm). 7. Some seeds need a period of moisture and cold after harvest before they will germinate; usually this is necessary to either allow the embryo to mature or to break dormancy. This period can be artificially stimulated by placing the moistened seed (e.g., on moist medium) in a refrigerator for a certain period of time. Seed stratification is also used for those seeds that do not require stratification, as this treatment allows synchronization of germination. 8. Liquid YEB medium and/or MS medium used to prepare the Agrobacterium suspension may be supplemented with acetosyringone (the final concentration: 0.5 mM), especially when intact plant tissues are used for transformation. In case of explant transformation (leaf discs, cotyledons), the addition of acetosyringone is not required, since wounding causes the production of plant compounds (including acetosyringone) that induce vir gene expression in Agrobacterium. However, when the problem with low transformation efficiency occurs, addition of acetosyringone to bacterial culture or bacterial suspension (or even both) is also recommended for explants transformation. 9. OD600 of diluted bacterial suspension should be measured. Dilute bacterial suspension in 10 mM MgSO4 at the ratio of 1:2–1:5. OD readings should be in the range of 0.2–0.5. Higher values indicate that the bacterial culture was overgrown and thus contains too many old bacterial cells, which are not likely to be competent for DNA transfer. Lower values indicate that bacteria did not grow properly and will not be able to transfer DNA either. 10. The Agrobacterium suspension with higher density may be used (OD600 up to 1.0). However, the higher density increases the probability of multi-copies and/or multi-loci T-DNA integration patterns. 11. For releasing the air pressure after infiltration, place a desiccator in a laminar-flow hood and only then open the desiccator’s valve. This will prevent contamination of infiltrated plant material. 12. Any bacteriostatic agent that kills Agrobacterium (or at least inhibits its growth) can be used, including timentin, carbenicillin, and cefotaxime.
Plant Transgenesis
267
13. Transient expression of GUS can be monitored 0–3 days after transfer of seedlings/leaf discs to the MSH1 medium. For analysis of the stable expression of GUS, plant material from T1 or next generations of transgenic plants selected for kanamycin resistance may be used. 14. The bacteriostatic agent may be omitted at any later round of selection if no growth of Agrobacterium on the surface of plant tissue (or medium) is observed. 15. The first shoots will appear during the first and second round of selection. It is recommended to transfer both explants containing emerging shoots and those which do not contain shoots but are still not dead (completely brown) to new selection media every 3 weeks. In the first case, it will help to eliminate false-positive lines, whereas in the second case – it will allow the recovery of late-appearing transgenic lines. 16. Putative transformants should be further tested for stable integration of transgenes by the PCR or Southern hybridization method (16, 17). 17. To obtain a homozygous transgenic line, perform self-crosses of primary transgenic plants. The purity of the homozygous line can be confirmed in back-crosses with a wild type nontransgenic line (18). 18. The soil in the individual pots should be covered with a net to prevent loss of soil during dipping of Arabidopsis flowers in Agrobacterium suspension. As an alternative to a net, you may cover Arabidopsis leave rosettes with gauze (fixed to a pot with a rubber ring) at the time of the first flower shoot appearance. 19. Remove any siliques that may have already developed. Infiltration of premature (not fully developed) flowers is recommended. Agrobacterium must be delivered to the interior of the developing gynoecium prior to locule closure if efficient transformation is to be achieved, as ovules were shown to be the site of productive transformation in the floral-dip method (19). 20. Place the beaker with flower shoots dipped in the Agrobacterium suspension solution in a desiccator connected to a vacuum pump and infiltrate the plant material for 5 min (minimal pressure 130 mbar). Repeat infiltration once more (optional). 21. Instead of using a commercial flower-pot lid, you may form a dome from a plastic bag or saran-wrap. 22. Repeating the dipping (or infiltration) is strongly recommended, as it increases the number of transgenic seeds obtained as a result of using the floral dip transformation method, mainly due to the higher number of treated flowers. While repeating the dipping/infiltration, do not remove
268
Ziemienowicz
siliques developed after previous dipping/infiltration, since they most likely contain developing transgenic seeds. 23. In case of a high number of false-positive plants, increase the strength of selection by increasing the concentration of the selective agent (for kanamycin: 50–250 mg/L).
Acknowledgments The author would like to thank Prof. Barbara Hohn and her co-workers for establishing the protocol for Agrobacteriummediated transformation of tobacco seedlings, Dr. Igor Kovalchuk for his advice and encouragement, and Dr. Valentina Titova for language revision. References 1. Birch RG (1997) Plant transformation: problems and strategies for practical application. Annu Rev Plant Physiol Plant Mol Biol 48:297–326 2. Chawla HS (2002) Gene transfer in plants. In: Introduction to plant biotechnology, 2nd edn. Science Publishers Inc., Enfield, NH, USA, pp 359–395 3. Slater A, Scott NW, Fowler MR (2008) Techniques for plant transformation. In: Plant biotechnology: the genetic manipulation of plants, 2nd edn. Oxford University Press, New York, USA pp 54–74 4. Tzfira T, Citovsky V (2008) Agrobacterium: from biology to biotechnology. Springer, NY 5. Matzke AJM, Matzke MA (1998) Position effects and epigenetic silencing of plant transgenes. Curr Opin Plant Biol 1:142–148 6. Matzke MA, Matzke AJM (1998) Epigenetic silencing of plant transgenes as a consequence of diverse cellular defense responses. Cell Mol Life Sci 54:94–103 7. Kaeppler SM, Kaeppler H, Rhee Y (2000) Epigenetic aspects of somaclonal variation in plants. Plant Mol Biol 43:179–188 8. Filipecki M, Malepszy S (2006) Unintended consequences of plant transformation: a molecular insight. J Appl Genet 47:277–286 9. Henderson IR, Jacobsen SE (2007) Epigenetic inheritance in plants. Nature 447:418–424 10. Gendrel A-V, Colot V (2005) Arabidopsis epigenetics: when RNA meets chromatin. Curr Opin Plant Biol 8:142–147 11. Madlund A, Comai L (2004) The effect of stress on genome regulation and structure. Ann Bot 94:481–495
12. Bonnard G, Tinland B, Paulus F, Szegedi E, Otten L (1989) Nucleotide sequence, evolutionary origin and biological role of a rearranged cytokinin gene isolated from a wide host range biotype III Agrobacterium strain. Mol Gen Genet 216:428–438 13. Tinland B, Schoumacher F, Gloeckler V, BravoAngel AM, Hohn B (1995) The Agrobacterium tumefaciens virulence D2 protein is responsible for precise integration of T-DNA into the plant genome. EMBO J 14:3585–3595 14. Rossi L, Escudero J, Hohn B, Tinland B (1993) Efficient and sensitive assay for T-DNA-dependent transient gene expression. Plant Mol Biol Rep 11:220–229 15. Murashige T, Skoog F (1962) A revised medium for rapid growth and bioassays with tobacco tissue cultures. Physiol Plant 15:473 16. Chawla HS (2003) In vitro amplification of DNA by PCR: detection of transgenes. In: Plant biotechnology: a practical approach. Science Publishers Inc., Enfield, NH, USA, pp 215–222 17. Chawla HS (2003) Sothern blotting, Southern hybridization. In: Plant biotechnology: a practical approach. Science Publishers Inc., Enfield, NH, USA, pp 237–252 18. Koornneef M, Alonso-Blanco C, Stam P (2006) Genetic analysis. In: Salinas J, SanchezSerrano JJ (eds) Arabidopsis protocols, 2nd edn. Humana Press, Totowa, NJ, pp 65–77 19. Desfeux C, Clough SJ, Bent AF (2000) Female reproductive tissues are primary target of Agrobacterium-mediated transformation by the Arabidopsis floral-dip method. Plant Physiol 123:895–904
Index A Amplification gene.................................................................. 121, 256 global DNA.............................................................. 142 linker-mediated................. 143, 146–148, 151, 153–158 locus-specific PCR............................................... 26–28 non-specific primer..................................................... 10 PCR.................................................. 3–6, 13, 24–28, 30, 64, 69, 76, 80–81, 85, 121, 125, 127, 132–134, 144, 146–148, 151, 153–156, 164, 217, 226 selective......................................................64, 68–70, 72 sequence..................................................................... 28 strand-specific.............................................................. 9 Amplified Fragment Length Polymorphism (AFLP)........................................................... 63–73 Analysis functional.......................................................... 187, 221 functional enrichment...................................... 198–199 metaanalysis............................... 140, 142, 167, 185–206 transcriptional................................................... 187, 188 Antibody anti-5 MeC................................................................ 42 chromophore-conjugated secondary........................... 42 antibody cross-reactivity........................................... 143 Approach approximate matching................................................ 83 functional genomics.................................................. 223 k-clustering............................................................... 194 linker-mediated PCR (LM-PCR)........................... 143 locally-weighted regression (LOESS) normalization....................... 164, 165, 172, 173, 175 nested PCR................................................................ 10 rotated LOESS..........................................165, 173, 174 semi-quantitative gel................................................ 156 statistical................................................................... 187 Arabidopsis thaliana......................................... 76, 82, 85, 93, 95, 100, 140, 162, 165, 167, 170, 171, 177, 183, 185–188, 198, 258, 263
B Bioinformatics.................................................141, 230, 234 Bisulfite sequencing bisulfite conversion control....................................... 3, 4 bisulfite primer design.................................................. 4
Cytosine Methylation Analysis Tool for Everyone (CyMATE)........................... 2, 13–21
C cDNA libraries................................223–225, 229–231, 234 Chromatin ChIPR data analysis.................. 162–169, 178, 182, 183 chromatin immunoprecipitation (ChIP)...................................42, 139–184, 209–220 chromatin immunoprecipitation with DNA microarray (ChIP-chip).............................. 139–206 Cleavage...........................................................8, 34–36, 50, 60, 109, 111, 241 Clustering hierarchical................................................192–195, 197 k-means.....................................................192–197, 202 Combined bisulfite restriction analysis (COBRA)................................................. 23–31, 50 Conversion bisulfite...............................2–6, 8, 18, 24, 25, 27–31, 42 full conversion of DNA................................................ 5 reproducible.............................................................. 4, 8 Cross-annotation............................................................ 167 Cytosine-extension assay [3H]dCTP................................. 34, 35, 37, 38, 240, 241 single nucleotide extension reaction..................... 35–37 Cytosines methylated.............................................................. 9, 14 non-methylated/unmethylated........................1, 2, 9, 17
D Damage DNA................................................................ 238, 239 oxidative................................................................... 237 Denaturation......................................................4, 8, 18, 25, 26, 28, 29, 51, 53, 54, 59, 81, 111, 120, 137, 159, 238, 239 DNA converted.....................................................1, 4, 6, 9, 10 denaturation.............................. 8, 26, 29, 137, 238–239 depurination.......................................... 8, 26, 27, 30, 59 fragmentation............................................2, 8, 238, 241 loss........................................................................ 2, 159 stability......................................................................... 2
Igor KovaIchuk and Franz Zemp (eds.), Plant Epigenetics: Methods and Protocols, Methods in Molecular Biology, vol. 631, DOI 10.1007/978-1-60761-646-7, © Springer Science + Business Media, LLC 2010
269
Plant Epigenetics 270 Index
DNA (Continued) genomic.................................................. 1–4, 6, 8, 9, 13, 14, 21, 24, 27, 29, 34–37, 50, 51, 54, 57, 58, 64, 68, 69, 87, 143, 240, 241 single-stranded................................................8, 25, 239 unconverted.............................................................. 4, 9 DNA methylation 5-methylcytosine (5mC)..............................1, 13, 26, 63 symmetric/asymmetric................................................ 13 Dye bias...........................................................163, 165, 173
E Effect(s) epigenetic.................................................................. 253 phenotypic................................................................ 256 pleiotropic................................................................. 255 transgenerational...................................................... 238 Endogene....................................................................... 248 Enzyme(s) chromatin modifying........................................ 209, 210 Klenow fragment...................................................... 238 methylation sensitive.................................................. 38 restriction....................................................3, 21, 24–26, 28–30, 34–38, 50, 52–54, 64, 67–69, 73, 224 Expression Data AtGenExpress.................................. 188, 190, 194, 199, 202, 205 Gene Ontology (GO)........................186, 187, 198–204
F Frequency of chromosomal rearrangements................................. 33 of homologous recombination.......................... 238, 244 global sequence-independent recombination.... 246–247 locus-specific endogenous recombination......... 247–248 mutation..........................................................50, 51, 57 recombination................................................... 244–248 sequence-specific/sequence-unspecific recombination..............................244, 245, 247–248 spontaneous recombination...................................... 244 strand break.............................................................. 238 stress-induced recombination..................................... 50 Functional genomics....................................................... 223
G Gel electrophoresis agarose.................................................. 6, 26, 29, 35, 53, 54, 67, 90, 99, 111, 117, 134, 151, 212, 240 polyacrylamide..........................................64, 70, 88–92, 97, 104, 120, 125–126 Gene(s) cluster................................ 190, 193, 195–197, 202, 206 coexpressed....................................................... 186, 192 coregulated............................................................... 186
endogenous................................................119, 247, 255 of interest..................................................188, 190–191, 199, 205, 210, 217, 220, 221, 232, 256, 258 H3K27me3-postivie.................................188, 190, 191, 193, 202–203 host........................................................................... 255 joint.......................................................................... 192 orthologous............................................................... 204 protein-coding............................................................ 87 pseudogenes...............................................167, 178–180 putative paralogous................................................... 204 reporter......................................................243–251, 262 resistance (R)................... 57, 58, 60, 244, 246–248, 260 single.................................................185, 187–192, 202 stress-regulated......................................................... 109 target................................................119, 185–187, 190, 191, 194, 202, 205, 210, 218 virulence (vir).............................................254, 258, 266 Gene disruption.............................................................. 255 Gene expression epigenetic.................................................................. 139 GUS (uidA).......................................248–250, 262, 267 luciferase....................................................247, 249–250 Gene inactivation........................................................... 256 Gene Ontology (GO)..............................186, 187, 198–204 Genome instability DNA strand breaks........................................... 237–242 double-strand breaks........................................ 239, 243 single-strand breaks.................................................. 238 Genome stability.............................................238, 243–251 Genomic sequence...................................................... 14, 64 Genotoxicity................................................................... 238
H Histone modification........................................41, 139, 140, 142, 143, 162, 166, 185, 209–220, 256 Homologous recombination intermolecular................................................... 245–247 intramolecular........................................................... 246 sequence-independent...................................... 246–247 sequence-specific.......................................244, 247–248 Hybridization blot............................................ 50, 87–89, 99, 100, 102 microarray......................................................... 139–184 probe...............................................................24, 25, 30, 31, 51, 52, 60, 92 RNA................................................................87, 88, 92 whole genome tiling array................................. 144, 185 Hypersensitive response (HR).............................75, 76, 246
I Immunohistochemistry.................................................... 41 Immunoprecipitation.........................42, 139–184, 209–220 In situ analysis............................................................ 41–47
Plant Epigenetics 271 Index
Interactions chromatin–protein.................................................... 209 histone/DNA............................................................ 142 protein–DNA............................................139–184, 210 protein–protein......................................................... 139
K Klenow fragment polymerase................................. 238–241
L Ligation...................................................... 6, 10, 64, 67–69, 73, 77–80, 85, 124, 127, 129–132, 134, 136, 143, 223, 224, 228–229, 234 Locus DNA.............................................................. 24, 49–61 endogenous............................................................... 248 FLC.......................................................................... 219 gene.................................................................... 52, 248 polymorphic............................................................... 50 transgene.................................................................. 248
M Methylation asymmetric DNA....................................................... 13 cytosine.............................................................1, 14, 24, 25, 28, 29, 33, 34, 36, 41, 50, 61, 63–73, 255 DNA.............................................. 1–11, 14, 23, 24, 27, 33, 34, 36, 38, 41–47, 66, 75, 256 global genome...................................................... 33–38, locus-specific........................................................ 23–31 non-symmetrical CpHpH.......................................... 28 pattern-specific..........................................14, 50, 52, 71 symmetric DNA................................................... 13, 18 Microarray data normalization.................................................... 140 Gene Ontology functional enrichment analysis........................................................ 198–199 hierarchical clustering................................192–195, 197 k-means clustering.............................192, 194–197, 202 metaanalysis.......................................142, 185, 186, 188 quality evaluation...................................................... 162 target annotation.............................................. 163, 167 tiling array............................................................ 80–85 Mismatch................................................ 17, 18, 83, 92, 238 Mutation(s) paramutations............................................................. 23 point..............................................................50, 52, 256 qualitative phenotypic............................................... 256 quantitative phenotypic............................................ 256
N Nicotiana benthamiana............................. 100, 222, 225, 231, 232, 234 Northern blot................................................... 87–105, 136
Nucleotide(s) CpG dinucleotides....................................27, 30, 33, 50 [3H]dCTP.................................................25, 34, 37, 38 degenerated.................................................................. 9 deoxyribonucleotides................................................ 126 methylated (AmCGC) cytosine................................... 25 oligonucleotides................................................... 64, 76, 77, 79, 84, 88, 92–95, 102, 112, 126–127, 143, 144, 161, 173, 237–242 radioactively labeled.......................................... 238, 239 ribonucleotides......................................................... 126 unincorporated..............................................38, 81, 242 unmethylated (ACGC) cytosine................................................................. 25
P Pathogen............................................................76, 244, 247 Pathway(s) DNA repair.............................................................. 237 genetic...................................................................... 139 homologous/non-homologous DNA repair........................................................ 237 RNA silencing.......................................................... 123 Plant epigenome............................................................... 34 Plant transformation Agrobacterium tumefacians.................................223, 224, 226, 229–235, 253, 254, 256–258, 261 floral dip........................................................... 263–265 insertion effect.......................................................... 255 knock-out (KO) lines................................................ 256 overexpression (OE) lines......................................... 256 position effect........................................................... 255 somaclonal variation......................................... 255, 256 T-DNA tagging.............................................................. 256 Plant viruses Cabbage Leaf Curl Virus (CaLCuV).................................................... 93, 102 tobacco rattle virus (TRV)............................... 222–224, 226, 228, 231, 234 Plasmids Agrobacterium.............................................................258 autonomously-replicating......................................... 258 cDNA and digested.................................................. 229 recombinant.................................................................. 6 Ti 254 TRV-cDNA............................................................. 226 Polycomb proteins.................................................. 202, 210 Polymerase chain reaction (PCR)...........................1–11, 13, 14, 18, 21, 24–31, 35, 37, 42, 53, 55, 56, 60, 64, 67, 69–73, 76, 77, 80, 81, 83, 85, 109–121, 124, 125, 127, 132–135, 142–148, 151–153, 155–158, 164, 210–212, 214, 217–219, 223–230, 233, 248, 267 Preamplification......................................................... 69, 72 Primer binding sites...................................................... 4, 68
Plant Epigenetics 272 Index
Primer(s) labeled..............................................................64, 71, 77 miRNA-specific forward.................................. 110, 111 nonlabeled.................................................................. 72 stem-loop RT............................................110–113, 115 universal reverse.........................................110–113, 118 Promoter 35S....................................................244, 246, 248, 256 transgene.................................................................. 255
Q Quantitative reverse transcription (qRT)-PCR stem-loop reverse...................................................... 110 SYBR Green I assay................................................. 116 Universal probe library (UPL) probe assay.......................................................... 116
R Random oligonucleotide primed synthesis (ROPS)....................................................... 237–242 Regulation epigenetic.................................................................... 41 transcriptional............................................................. 41 Reporter gene-based reporter constructs Firefly luciferase (LUC)........................................... 244 b-glucuronidase (uidA/GUS)................................... 244 Residue(s) cytosine..................................................... 4, 7, 8, 19, 20, 24–26, 28, 29, 31, 38, 41, 61, 63 deoxyribose............................................................... 241 methylated/unmethylated cytosine....................... 24, 25 Restriction digestion.............................................24, 28, 30, 31, 34, 38, 50, 51, 59, 134, 228 Restriction endonuclease (enzymes) isoschizomers.............................................................. 36 methylation-sensitive restriction enzymes................................................................ 34 Restriction fragment length polymorphism (RFLP)..............................................42, 49–61, 247 Reverse transcription-polymerase chain reaction (RT-PCR).................................................. 110, 116 RNA hybridization........................................................... 87 RNA silencing small RNAs.................................................88, 123, 124 cloning...................................................................... 124 R statistical programming language............................... 141
S Sequence heterogeneity...................................................... 7 Sequence(s) border....................................................................... 254 coding....................................................................... 255 complementary......................................................... 109
DNA............................................ 24, 25, 27, 33, 49, 50, 52, 60, 73, 135, 247, 254 fully methylated (mCmCGG)...................................... 71 gene.....................................................52, 119, 230, 256 Haemaglutinin (HA) tag.......................................... 248 non-methylated CCGG............................................. 71 overlapping homologous........................................... 246 restriction site............................................................. 68 RNA/miRNA....................................113, 116, 123–137 symmetrical CG and CNG........................................ 63 Sequencing..........................1–11, 13–22, 67, 83, 84, 86, 90, 105, 127, 135, 139, 225, 226, 229, 230, 233, 248 Sister chromatids.............................................244–246, 248 Small interfering RNA (siRNA) trans-acting small interfering RNA (tasiRNA)................................................... 123, 136 heterochromatic siRNA (hc-siRNA)...........88, 105, 123 microRNA (miRNA).............................75, 88, 105, 109, 110, 123 natural cis-antisense transcripts-associated siRNA (nat-siRNA)........................................... 123 virus-derived siRNAs (viRNAs)............................... 123 Specific amplicon....................................................... 6, 117 Stress abiotic..................................................75, 188, 210, 244 biotic......................................................................... 244
T Tail(s) homopolymeric......................................................... 226 N-terminal........................................................ 209, 210 poly(A/T)......................................................... 223, 226 positive..................................................................... 172 Target annotation........................................................... 163 Tiling array.................................. 75–86, 144, 160, 166, 173 Tissue(s) animal................................................................... 42, 88 generative................................................................. 253 plant.................................................... 44, 110, 114, 120, 124, 125, 128, 129, 210, 214, 244, 249, 251, 261, 262, 266, 267 reproductive.............................................................. 214 Transcription gene............................................................................ 23 reverse...................................................77–80, 110, 111, 113, 125, 127, 132–134 Transcriptional regulation................................................. 41 Transcription factor.................................109, 162, 222, 223 Transformation Agrobacterium-mediated.............................255, 259–265 biolistic..................................................................... 253 explant...................................................................... 266 floral dip........................................................... 263–265 plant.......................................................................... 221
Plant Epigenetics 273 Index
Transgene...............................................................244, 247, 248, 255, 261, 267 Transgene silencing........................................................ 253 Transgenesis........................................................... 253–268 Transposons, retrotransposons........................................ 256
Trithorax proteins................................................... 210, 214
V Virus-induced gene silencing (VIGS).................... 221–235