PROGRESS
IN
Nucleic Acid Research and Molecular Biology Volume 47
This Page Intentionally Left Blank
PROGRESS IN
...
22 downloads
906 Views
14MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PROGRESS
IN
Nucleic Acid Research and Molecular Biology Volume 47
This Page Intentionally Left Blank
PROGRESS IN
Nucleic Acid Research and Molecular Biology edited by
WALDO E. COHN Biology Division
KlVlE MOLDAVE Department of Biology
Oak Ridge National Laboratory Oak Ridge, Tennessee
University of California Santa Crrrz, California
Volume 47
ACADEMIC PRESS, INC. Harcourt Brace ]manmich, Publishers Son Diego New York Boston London Sydney Tokyo Toronto
This book is printed on acid-free paper. @ Copyright 0 1991 BY ACADEMIC PRESS, INC. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means. electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc. San Diego, California 92101 United Kingdom Edition published by ACADEMIC PRESS LIMITED 24-28 Oval Road, London NWl 7DX
Library of Congress Catalog Card Number:
ISBN 0-12-540041-1 (alk. paper)
PRINTED IN THE UNITED STATES OF AMERICA 91
92 93 94
9 8 7 6 5 4 3 2 I
63- 15847
Contents
ABBREVIATIONS AND SYMBOLS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
SOME ARTICLES PLANNED FOR FUTUREVOLUMES . . . . . . . . . . . . . . . . . . . . . . .
xi
Molecular Structure and Transcriptional Regulation of the Salivary Gland Proline-Rich Protein Multigene Families. . . . . 1 D o n M. Carlson, Jie Zhou and Paul S. Wright
I. 11. 111.
IV. V. VI.
Background .......................... PRP mRNAs 11-free Translation Analysis . . PRP cDNAs and Amino-acid Sequences . . . . . . . . . Sequence and Structural Analyses of PRP Genes . . . . . . . . . . . . . . . . . Regulation of Expression of PRP Genes . . . . . . . . . . . . . . . . . . . . . . . . . Functional Aspects of PRPs ................
...............................
3 6 9 10 16
18 20 21
Recognition of tRNAs by Aminoacyl-tRNA Synthetases . . . . .23 LaDonne H. Schulman I. Recognition versus Identity . . . , . . . . . . . . . . . . . . 11. Assays of the Amino-acid-acceptor Specificity of 111. Role of the Anticodon . . . . . . . . . . . . . . . . . . . . . . IV. Role of the Acceptor Stem and the “Discriminat at Position 73 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............... V. Other Recognition Profiles VI . Role of Modified Bases . . . . . . . . . . . . . . . . . . . . . . . VII . The Complex of E. coli Glutamine tRNA and Glut VIII. tRNA Binding Domains of Other Synthetases . . . ...................... IX . Concluding Remarks References ...........................
Ribosome Biogenesis in Yeast.
24 25 29
44 58 64 66 72 81 82
. . . . 89
H. A. Rau6 and R. J. Planta I. Transcription of Ribosomal-RNA Genes . . . . . . . . . . . . . . . . . . . . . . . . . . 11. Expression of Ribosomal-protein Genes . . . . . . . . . . . . . . . . . . . . . . . . . V
91 103
CONTENTS
vi
111.
Processing and Assembly of Ribosomal Constituents . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Structural Elements in RNA
111 124
. . . . . 131
Michael Chastain and Ignacio Tinoco. Jr. I . Secondary Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
132
111. Tertiary Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV. Predicting Tertiary Interactions . . . . V. Three-dimensional Structure . . . . . . VI . Determining RNA Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII . Protein-RNA Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............... VIII . RNA-RNA Interactions . . . . . . . . . . . . . . . . . . . . . IX . RNA-DNA Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
150
I1 . Predicting Secondary Structure . .
161 167 169 170 171
Nuclear RNA-binding Proteins . . . . . 179 Jack D . Keene and Charles C . Query
. . . . . . . . . . . . . . . . . . 180
.
RRM Family of Proteins
.......
. . . . . . . 202
Amplification of DNA Sequences in Mammalian Cells . . . . . 203 Joyce L . Hamlin. Tzeng-Horng Leu. James P. Vaughn. Chi Ma and Pieter A . Dijkwel I . Historical Development of the Amplification Field . . . . . . . . . . . . . . . . I1. Occurrence of Amplified DNA Sequences . . . . . . . . . . . . . . . . . . . . . . . 111. Properties of Amplified DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV. Possible Mechanisms and Ways to Discriminate among Them . . . . . . V. Usefulness of Cell Lines Bearing Amplified Genes . . . . . . . . . . . . . . . . VI . Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205 206 207 218 228 232 232
CONTENTS
vii
Molecular-Biology Approaches to Genetic Defects of the Mammalian Nervous System . . . . . 241 J. Gregor Sutcliffe and Gabriel H. Travis I.
Neural Mutants . . . . . . . . . . . . . . .
11. The rds Gene . . . . . . . . . . 111. Secretogranin 111 . . . . . . . . . . . . .
IV. V. VI.
Making Mutants . . . . . . . . Getting All of the Genes Reprise . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lens Proteins and Their Genes.
257
. . . .259
Hans Bloemendal and Wilfried W. de Jong I. The Lens and Its Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11. The Lens and Its DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References
..................................................
INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
259 269 277 277
283
This Page Intentionally Left Blank
Abbreviations and Symbols All contributors to this Series are asked to use the terminology (abbreviations and symbols) recommended by the IUPAC-IUB Commission on Biochemical Nomenclature (CBN) and approved by IUPAC and IUB, and the Editors endeavor to assure conformity. These Recommendations have been published in many journals (I. 2) and compendia (3)and are available in reprint form from the Office of Biochemical Nomenclature (OBN); they are therefore considered to be generally known. Those used in nucleic acid work, originally set out in section 5 of the first Recommendations (I)and subsequently revised and expanded (2.3),are given in condensed form in the frontmatter of Volumes 9-33 of this series. A recent expansion of the one-letter system ( 5 ) follows. SINGLE-LETTER CODE Symbol
RECOMMENDATIONS.
Meaning
(5)
Origin of symbol Guanosine Adenosine (ribo)Thymidine (Uridine) Cytidine
R
G or A
Y
T(U) or C A or C G or T(U) G or C A or T(U)
M
K S W‘ H B
puRine pyrimidine aMino Keto Strong interaction (3 H-bonds) Weak interaction (2 H-bonds)
or C or T(U)
not not not not
G; H follows G in the alphabet A; B follows A T (not U); V follows U C; D follows C
D
A G G G
N
G or A or T(U) or C
aNy nucleoside (i.e., unspecified)
Q
Q
Queuosine (nucleoside of queuine)
V
or T(U) or C or C or A or A or T(U)
‘Modified from Proc. Natl. Acad. Ski. US.A. 83, 4 (1986). *W has been used for wyosine, the nucleoside of “base Y” (wye). ‘D has been used for dihydrouridine (hU or H, Urd). Enzymes
In naming enzymes, the 1984 recommendations of the IUB Commission on Biochemical Nomenclature ( 4 ) are followed as far as possible. At first mention, each enzyme is described either by its systematic name or by the equation for the reaction catalyzed or by the recommended trivial name, followed by its EC number in parentheses. Thereafter, a trivial name may be used. Enzyme names are not to be abbreviated except when the substrate has an approved abbreviation (e.g.. ATPase, but not LDH, is acceptable).
ix
ABBREVIATIONS AND SYMBOLS
X
REFERENCES 1. JBC241,527 (1966); &hem 5,1445 (1966); MlO1, I (1966); ABB 115. I (I%), 129,l (1%9); and e1smhere.t General. 2. EJB I S , 203 (1970); JBC 245, 5171 (1970);JMB 55, 299 (1971); and e1sewhere.t 3. “Handbook of Biochemistry” (G. Fasman, ed.), 3rd ed. Chemical Rubber Co., Cleveland. Ohio, 1970, 1975, Nucleic Acids, Vols. I and 11, pp. 3-59. Nucleic acids. 4. “Enzyme Nomenclature” [Recommendations (1984) of the Nomenclature Committee of the IUB]. Academic Press, New York, 1984. 5. /LIB 150, I (1985). Nucleic Acids (One-letter system).t Abbreviations of Journal Titles
Journals
Abbreviations used
Annu. Rev. Biochem. Annu. Rev. Genet. Arch. Biochem. Biophys. Biochem. Biophys. Res. Commun. Biochemistry Biochem. J. Biochim. Biophys. Acta Cold Spring Harbor Cold Spring Harbor Lab Cold Spring Harbor Symp. Quant. Biol. Eur. J. Biochem. Fed. Proc. Hoppe-Scyler’s Z. Physiol. Chem. J. Amer. Chem. Soc J. Bactcriol. J. Biol. Chem. J. Chem. Soc. J. Mol. Biol. J. Nat. Cancer Inst. Mol. Cell. Biol. Mol. Cell. Biochem. Mol. Gen. Genet. Nature, New Biology Nucleic Acid Research Proc Natl. Acad. Sci. U.S.A. Proc SOc Exp. Biol. Mcd. Progr. Nucl. Acid. Res. Mol. Bid.
ARB ARGen ABB BBRC Bchem BJ BBA CSH CSHLab CSHSQB EJB FP ZpChem JACS J. Bact. JBC JCS JMB JNCl MCBiol MCBchem MGG
Nature NB NARes PNAS PSEBM This Series
tbprints available from the Office of Biochemical Nomenclature (W. E. Cohn, Director).
Some Articles Planned for Future Volumes
Phosphotransfer Reactions of Plant Virus Satellite RNAs
GEORGEBRUENING Positive and Negative Regulation of Gene Expression by Steroid Agonists and Antagonists ANDREW B. CATO, H. PONTA AND P. HERRLICH
c.
Regulation of Gene Expression in Trypanosomes
CHRISTINE CLAYTON Oligonucleotides as Antisense Inhibitors of Gene Expression JACK
s. COHEN AND M.
GHOSH
The DNA Binding Domain of the Zn(ll)-containing Transcription Factors JOSEPH
E. COLMAN AND T.
PAN
Specific Hormonal and Neoplastic Transcriptional Control of the Alpha 2u Globulin Gene Family
PHILIPFEIGELSON Cellular Transcriptional Factors Involved in the Regulation of HIV Gene Expression
RICHARDGAYNORAND C. MUCHARDT Correlation between tRNA Structure and Efficient Aminoocylation
RICHARD GIEGE, C. FLORENTZ AND J. PUGLISI snRNA Genes: Tronscription by RNA Polymerase II and RNA Polymerase 111
NOURIAHERNANDEZ AND S. LOBO Regulation of mRNA Stability in Yeast ALLAN JACOBSON Recombination Enzymes from E. coli and S. cerevisiae
RICHARD KOLODNER Cell Delivery and Mechanisms of Action of Antisense Oligonucleotides
BERNARDLEBLEU,J. P. LEONETTIAND G . DEGLSO Signal-tronsducing G Proteins: Basic and Clinical Implications
MICHAEL A. LEVINE xi
SOME ARTICLES PLANNED FOR FUTURE VOLUMES
xii Synthesis of Ribosomes
LASE LINDAHL AND J. M . ZENCEL Enzymes of DNA Repair
STUARTLINN RNA Replication of Plant Viruses Comprising an RNA Genome
ANNE-LISEHAENNI, R. GARCOURI-BOUZIDAND C. DAVID Nitrogen Regulation in Bacteria and Yeast
BORIS MACASANIK Alkylation Damage Repair Genes: Molecular Cloning and Regulation of Expression SANKAR MITRA An Analysis of lntron Splicing in Monocot Plants RALPH SINIBALDIAND I. METTLER trp Repressor, A Ligand-activated Regulatory Protein
RONALDL. SOMMERVILLE lmmunochemical Analyses of Nucleic Acids
DAVIDSTOLLAR The Structure and Expressions of the Insulin-like Growth-factor Gene LYDIAVILLA-KOMAROFF AND K. M. ROSEN
Molecular Structure and Transcriptional Regulation of the Salivary Gland Proline-Rich Protein M uItigene FamiIies DON M. CARL SON,^ ZHOU~ AND PAUL S. WRIGHT~ JIE
Department of Biochemistry and Biophysics University of California-Davis Davis, California 95616
I. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. PRP mRNAs and Cell-free Translation Analysis ........ 111. PRP cDNAs and Amino-acid Sequences ............................ IV. Sequence and Structural Analyses of PRP Genes V. Regulation of Expression of PRP Genes ............................ VI. Functional Aspects of PRPs .........
16 18
...........
21
3 6 9
The proline-rich proteins (PRPs) in mammalian salivary glands are encoded by tissue-specific multigene families whose members have diverged with respect to structure and regulation of expression. A common evolutionary origin of the P R P genes is evident from the extensive conservation of 5’untranslated regions, coding sequences, and intronlexon organizations. The 42-nucleotide repeat unit CCA CCA CCA CCA GGA GGC CCA CAG CCG AGA CCC CCT CAA GGC has been proposed (1) as the ancestral unit, multiples of three bases probably being recruited into, or deleted from, this ancestral sequence during gene duplication. Gene conversion possibly was the mechanism of homogenization of the divergence of the internal repeats. Two nonallelic mouse P R P genes ( M P 2 and M 1 4 ) have essentially identical sequences, with two major differences (2). M P 2 has 13tandemly arranged 42-nucleotide repeats, whereas M 1 4 has 17 repeats. M 1 4 has an insertion by transposition of a two-kilobase member of the long, interspersed elements of repeated mouse DNA (LINE family) into intron I. The 5’-untranslated se-
* To whom correspondence may be addressed. Present address: Neurological Sciences Institute, Good Samaritan Hospital and Medical Center, Portland, Oregon 97209. Present address: Merrell Dow Pharmaceuticals. Inc., Cincinnati, Ohio 45215. 2
1 Progress i i i Nucleic Acid Hrearch and Moleciilar Biology, Vnl. 41
Copyright 8 1991 by Academic Press. Inc. All rights of reproduction in any form reserved.
2
DON M. CARLSON ET AL.
quences and regions encoding the signal peptides of all PRP mRNAs, regardless of source, are nearly identical. In another multigene family from rat submandibular glands that encodes contiguous repeat proteins (CRPs) or glutamic acid/glutamine-rich proteins (Glx-rich proteins), the 5'-untranslated sequences and the regions encoding the signal peptides of the mRNAs are 91% identical (nucleotides) and 92% identical (amino acids) to the PRP mRNAs (3, 4). Two mRNA size-classes, each containing multiple PRP mRNAs, are transcripts from PRP gene families of mice (5), hamsters (6),rats (i'),and humans (8).The CRP or Glx-rich multigene family also encodes two size-classes of mRNAs, and this multigene family has the same introdexon organization as the mouse and rat PRP genes. Cell-free translations show some unusual differences in PRPs encoded by mRNAs from parotid glands of four mouse strains (BALB/cJ, DBA/2J, CD-1, and C57BL/6J) after isoproterenol treatment (5).Reasons for the variations of translation products in these mouse strains after induction of the PRP gene families are unknown. Repeated administration of the P-agonist isoproterenol causes hypertrophy and hyperplasia of rat and mouse parotid and submandibular glands (9, 10).The morphological changes are accompanied by a dramatic increase, or induction, in the synthesis of PRPs. Typically, these proteins contain 25-45% proline, 18-22% glycine, and 18-22% glutamine and glutamic acid. Aromatic and sulfur-containing amino acids are either very low in amount or absent. Generally, PRPs can be divided into acidic and basic groups, and both groups may be glycosylated and phosphorylated. PRPs may compose more than 70% of the protein in salivary gland extracts after treatment with isoproterenol. All proteins derived from the nucleotide sequences of PRP cDNAs and PRP genes are characterized by four general regions: a signal peptide region, a transition region, a repeat region, and a carboxyl-terminal region (11). The apparent tissue-specific synthesis and the appearance of PRPs in saliva in such large quantities, either constitutive (as in humans) or induced by isoproterenol, suggest biological functions in the oral cavity and the gastrointestinal tract. Several functions, such as calcium binding, inhibition of hydroxylapatite formation, and formation of the dental-acquired pellicle, have been attributed to the human salivary PRPs (12). PRPs have an unusually high &nity for such multihydroxylated phenols as tannins; feeding tannins to rats and mice mimics the effects of isoproterenol on the parotid glands (13). The induction of PRP synthesis by dietary tannins clearly results in a protective response against the detrimental effects of the tannins (13). Unlike mice and rats, hamsters do not respond to tannins in the diet by the induction of PRPs. Pronounced detrimental effects are observed in weanling hamsters specifically. When these animals are maintained on a 2%
t
tannin diet for 6 months, they fail to grow (6).Tannins are unusually toxic to weanling hamsters; an increase of tannin in the diet to 4% causes death to most animals within 3 days. The association of tannins with pathological problems, including carcinogenesis and hepatotoxicity, and the influences on growth and toxicity in hamsters, have led to the proposal that PRPs may act as a first line of defense against these multihydroxylated phenols (13). This review focuses on the biochemistry and molecular biology of the salivary PRPs; it is not intended to be an overall or complete review of PRPs. To those who have contributed to the PRP literature and whose work is not mentioned, we apologize. Previous reviews are used for many references and studies.
1. Background4 Salivary glands of various animals synthesize, or can be induced to synthesize, a group of proteins unusually high in proline, the so-called prolinerich proteins (PRPs) (12, 14-20). These proteins collectively constitute the largest group of proteins in human salivary secretions, making up more than 70%of the secreted proteins (12).PRPs may be divided into acidic and basic groups, and members of each group may be phosphorylated or glycosylated, or both. These unusual proteins are constitutive in human saliva, but families of similar proteins are dramatically increased or induced in parotid and submandibular glands of rats, mice, and hamsters by isoproterenol treatment (6, 18, 19,21).Profound morphological effects on rat parotid glands by isoproterenol treatment were first observed in 1961 (9, 10). Repeated pharmacological doses cause dramatic glandular hypertrophy (Fig. 1). The increase in DNA synthesis with isoproterenol treatment (25, 26) probably results mainly from polyploidy; by 4-5 days, more polyploid than diploid nuclei are seen (Fig. 2) (see 27 for a review on the regulation of salivary gland size and the effects of isoproterenol). The dramatic accumulation of PRPs in the parotid glands of rats treated with isoproterenol was first described in 1974 (16, 18,28).After 7-10 days of treatment (5 mg of isoproterenol per day), PRPs composed about 70%of the total soluble proteins in parotid gland extracts. Initially, an acidic PRP (PI =
4 Reviews describing mainly the human PRP families are available (12, 22, 23). These unusual proteins were first observed in human saliva by Mandel, Thompson, and Ellison (24) and were first purified and characterized by Bennick and Connell(14) and by Oppenheim, Hay, and Franzblau (15).The genetics of this human multigene family were described in a review by Bennick (23). Other than for comparisons of the human cDNAs and multigene families, this review focuses primarily on the tissue-specific inducible multigene PRP families of mouse, rat, and hamster.
4
DON M. CARLSON ET AL.
FIG.1. Hypertrophic effects of isoproterenol treatment on rat salivary glands. Rats (150200 g of body weight) were injected intraperitoneally with 5 mg of isoproterenol daily for 7 days. The parotid glands (p), submandibular glands (sm), and sublingual glands (sl) were removed from control (bottom) and isoproterenol-treated animals (top). No changes were noted for the sublingual glands, which secrete principally mucous glycoproteins. Parotid glands, which are serous secretors, showed a dramatic increase in weight of about 6- to l0-fold. Submandibular glands are of a mixed cell population and showed an intermediate response to isoproterenol.
4.5) was identified (Ipr-lA2), and this protein was phosphorylated and glycosylated (16, 18, 19). Subsequently, six basic PRPs unusually high in proline (40-44%), glutamine plus glutamate (22-25%), and glycine (18-20%), containing varying amounts of lysine plus arginine (7-9%), were isolated and characterized (18, 19). Aromatic and sulfur-containing amino acids were either absent or present in very low amounts. Therefore, PRPs have little or no absorbance at 280 nm. Neither hydroxylysine nor hydroxyproline is present and the treatment of these PRPs with purified prolyl hydroxylase failed to convert proline into hydroxyproline. The molecular weights of the basic proteins, from sedimentation equilibrium, ranged from 15,000 to 18,OOO, and that of PRP Ipr-1A2 was 25,000. A high MW,,, (71,000) was observed following chromatography on Sephadex G-100, but the unusually high axial ratio (>25) of these proteins undoubtedly caused this value to be substantially overestimated. S values ranged from 1.1 to 1.4. Circular dichroism spectra showed no a-helical or polyproline conformations.
FIG. 2. Karyotypes of (a) a mouse bone marrow cell and (b) a monse parotid gland cell. The chromosomal display of the mouse hone marrow cell showed the normal 2n (= 40) chromosomes after 2 days of isoproterenol treatment. The mouse parotid gland cells (>50% of the cells) showed 471 chromosomes after 2 days of isoproterenol treatment. (Courtesy of Christopher Bidwell.)
6
DON M. CARLSON ET AL.
II, PRP mRNAs and Cell-free Translation Analysis Studies by cell-free translation analysis using the reticulocyte lysate system and labeling with [3H]proline or 135Slmethionine showed dramatic and definitive changes in the patterns of protein synthesis in parotid glands of isoproterenol-treated rats, and PRP mRNAs were highly elevated in the treated animals (29).There was very little synthesis of PRPs from poIy(A)+ RNAs from glands of control rats: poly(A) RNAs from the glands of treated animals synthesized mainly PRPs; translation patterns with [3H]proline and [35S]methionine gave identical labeling patterns; and PRPs from cell-free translations were all precipitated by antibodies to PRPs. [35S]Methionine was incorporated only into the initiation site, as determined by sequence analysis and by the fact that PRPs synthesized by tissue slices of parotid glands of isoproterenol-treated rats in the presence of [35S]methionine contained no 35S label. Because most PRPs are acid-soluble, a property first used in the purification procedures of rat submandibular gland PRPs (30),it is imperative that cell-free translation products be precipitated with a solution containing both trichloroacetic and phosphotungstic acids (29). The induction of PRP mRNAs in the parotid and submandibular glands of both rats and mice by isoproterenol treatment has been demonstrated by Northern and dot-blot hybridizations (21). PRP mRNAs either are very low or are not detectable in the glands of untreated rats and mice. After 4-5 days of isoproterenol treatment, mRNAs encoding these unusual proteins compose over 50% of the total glandular mRNAs (5). For example, plasmid pRP25 does not hybridize with RNAs from control rats (Fig. 3A), but does hybridize with PRP mRNAs of two size-classes, ranging from 600 to 1100 bases, from isoproterenol-treated animals. These size ranges of mRNAs are consistent with all rat RNA preparations tested. The multiplicity of PRPs encoded by the PRP mRNAs from treated rats is evident from Fig. 3B, since about 12 PRPs were identified by cell-free translation analysis and immunoprecipitation. The PRP cDNA insert of pUMP40 (11), prepared from mRNAs from BALB/cJ mice, has been tentatively identified as the transcript of the mouse PRP gene MP2 (1).However, the nucleotide sequences of MP2 and the PRP insert of pUMP40 showed only 98% homology (1).MP2 was cloned from a genomic library prepared from chromosomal DNA from the CD-1 mouse strain. In an attempt to reconcile the heterologous regions and base differences between the CD-1 mouse gene MP2 and the BALB/cJ mouse mRNA, we isolated mRNAs from four mouse strains. Northern blots of total RNA from the parotid glands of mouse strains CD-1 and BALB/cJ and from strains DBA/2J and C57BL/6J, from both control and isoproterenol-treated mice, were probed with 32P-labeled exon +
7
PROLINE-RICH PROTEIN MULTICENE FAMILIES
A
B
1078 1353
872 -
603 -
FIG.3. Northern blot of parotid gland RNA from normal and isoproterenol-treated rats and cell-free translations of “sized” PRP mRNAs. (A) Parotid gland RNAs (10 pg) from normal and isoproterenol-treated rats were electrophoresed on a 1.5% agarose gel containing 5 mM methyl mercury hydroxide and transferred to nitrocellulose. The blot was probed with 32Plabeled pRP25 (11).(B) RNA was isolated from a methyl mercury denaturing low-melting-point agarose gel and translated in oitro with [SSImethionine. The translation products were separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Lanes l and 8 show S S label incorporated in the absence of RNA and with total RNA from the parotid glands of isoproterenol-treated rats, respectively. Lanes 2-7 are the translation products obtained from RNA indicated in (A). Molecular-weight standards ( X l O - 3 ) are indicated at the right, and nucleotide standards are indicated at the left. [Reprinted with permission from the Journal of Biological Chemistry (S).]
IIb (see Fig. 10) of PRP gene MP2 (5). Two major classes of PRP mRNAs were detected in the treated animals. RNA species of about 1050 and 1300 bases for BALB/cJ and DBA/2J mice and about 1100 and 1200 bases for CD-1 and C57BL/6J mice were observed. Cell-free translations of total RNA from these four mouse strains showed interesting and unusual differences in the PRPs synthesized (Fig. 4). Similar labeling patterns were observed with both [3H]proline and [35S]methionine. The amounts or levels of incorporation varied considerably between controls and treated animals, and cx-
8
DON M. CARLSON ET AL.
35S-Met
I
3
M.W. Std.
I PR
NORMAL a
-3
x c 0 m
<m -J
U
\ -I
m
I
lc
n m
m V
V
3 H -Pro ’
cv
*
a
m
-
-I
I
m U 0 n m V
m lc In
o
3
(0
(D
\ -I
I PR
NORMAL 7
7
3
\
1
cv
\ -I
U
lc
U
In 0
D
3
\
m
0
m
3
cv
\
m
(D
7
<m
‘
-
3
-IN
m \
I + a a n o m -I
m o
o n
45.0
31 .O
21.5
FIG.4. Translation products of RNAs prepared from four different mouse strains. Ten micrograms of total RNA from parotid glands of the mouse strains indicated, both before and after isoproterenol treatment, was translated with either [35S]methionine or [3H]proline; 200,000cpm of 35s and 50,000 cpm of 3H were applied to gel electrophoresis. a-Amylase and parotid-specific protein are indicated by the upper and lower arrows, respectively. Molecularweight standards (XlOW3) (M.W. Std.) are indicated at the left. [Reprinted with permission from the Journal of Biological Chemistry (5).]
amylase (upper arrow, Fig. 4)and parotid-specific protein (lower arrow, Fig. 4) were dramatically reduced. a-Amylase and parotid-specific protein expressions appear to be regulated in concert (31).Earlier cell-free translation experiments (21, 29) and RNA/DNA hybridization results (5) show that a-amylase mRNA is dramat-
PROLINE-RICH PROTEIN MULTIGENE FAMILIES
9
ically decreased by isoproterenol treatment. In a related study, a polymorphism in an androgen-regulated single-copy mouse gene (RE?)produced three major mRNAs (32).These RP2 mRNAs differed in the lengths of their untranslated 3’ regions as a result of using different polyadenylation sites, and additional variability resulted from the insertion of a member of the mouse B1 family. However, these RP2 polymorphisms had no effect on the translation product. Whether the PRP polymorphisms are the result of different PRP genes or are caused by differential RNA splicing remains to be determined.
111. PRP cDNAs and Amino-acid Sequences Plasmids containing cDNAs for PRPs were first isolated from a cDNA library prepared from RNA isolated from the parotid glands of isoproterenoltreated rats (7). Four recombinant plasmids (pRP8, pRP18, pRP25, and pRP33) were selected. Several mRNAs hybridized to each PRP cDNA, which emphasized the similarities in nucleotide sequences of the PRP mRNAs. This could have resulted from the expression of a family of closely related genes or from the production of multiple mRNAs from the same or similar genes by different splicing patterns. Whether one or both possibilities are responsible for the multiple PRP mRNAs has not been unequivocally demonstrated. Subsequently, several more PRP cDNAs were cloned from mouse and rat parotid gland mRNAs after isoproterenol treatment (11) and from the human parotid gland (33).5 The nucleotide sequence of the PRP cDNA insert of pRP33 (7) encodes the acidic PRP, Ipr-1A2 (Fig. 5). The 13 amino-terminal amino acids are highly hydrophobic and are probably part of a signal peptide (the signal peptide region) (see Fig. 8). The next 60 amino acids (the “transition” region) contain numerous acidic residues, with 10 aspartic acids in the 16-aminoacid sequence of Asp-58 to Asp-73. The “proline-rich” region (residues 80-189) is high in proline, glycine, and glutamine, and includes six repeats of 18 or 19 amino acids (the “repeat” region). The 17 carboxyl-terminal amino acids (the carboxyl-terminal region) contain single residues of tyrosine (-2O1), tryptophan (-203), and phenylalanine (-204) clustered close to the carboxylterminal serine (-206). These data, derived from the nucleotide sequence of pRP33, gave the first complete amino-acid sequence of a PRP. This sequence has been compared (7) with the partial amino-acid sequences reported for the human acidic (34) and basic (12)PRPs. Subsequent data derived from several PRP cDNA and PRP gene sequences show that the first 100 nucleotides in the 5’ regions of PRP mRNAs, which contain the 5’5 Differential splicing is considered to contribute to the multiple PRP mRNAs in the human salivary gland (33).
10
DON M. CARLSON ET AL.
'
M
L
V
V
L
L
T
A
A
L
L
V
L
S
S
A
H
G
S
m
D
E
E
V
T
Y
E
D
S
S
S
Q
L
L
D
V
E
Q
Q
a
N
Q
K
H
G
Q
H
H
Q
K
P
P
P
A
S
D
E
N
G
s
D
G
D
D
S
D
D
G
D
D
D
G
S
G
D
D
G
N
R
E
R
8oP P
P
H
G
G
N
H
Q
R
P
P
P
G
H
H
H
G
9BP P
P
S
G
G
P
Q
T
S
S
Q
P
G
N
P
Q
G
P
P
Q
P
G
N
P
Q
G
P
P
nP
' " P P P Q G G P Q G "
P
P
P
Q
G
G
P
Q
Q
P
P
Q
G
G
P
Q
G
P
P
Q
G
G
H
Q
Q
I w P A Q D A T H
E
Q
"'P
R
Q
P
G
K
P
Q
G
P
P
Q
P
G
N
P
Q
G
R
P
P
Q
P
R
Q
D
P
S
Y
L
W
F
K P 206 S S
FIG. 5. The amino-acid sequence derived from primer-extended pRP33, arranged to align similar sequences. [Reprinted with permission from the Journal of Biological Chemistry (7.1
flanking sequence and encode exon I (or the putative signal peptide), have unusually strong homologies (>95% identity) (11)(Fig. 6). Sequence data obtained from a multigene family encoding CRPs from rat submandibular glands (3), are unusually high in glutamine and glutamate (4) and are very similar in this region, 91% of the nucleotides and 92% of the amino acids being identical to the sequences of the PRPs (3).
IV. Sequence and Structural Analyses of PRP Genes One of the family of mouse PRP genes was isolated on a cloned 3600-bp EcoRilBgZII-generated DNA fragment from a partial Sau3A bacteriophage library of CD-1 mouse chromosomal DNA (1).The transcriptional unit included three exonic sequences separated by 1434 bp (intron I) and by 450 bp (intron 11). The upstream sequence (Fig. 7) had putative induction sites for CAMP(box I11 and box I) and an activator or enhancer sequence (box II), ZDNA sequences that flanked an 86-bp sequence, a TATA box, and a CAAT box (1).The derived amino-acid sequence of this PRP gene ( M P 2 ) revealed a protein that contained 13 tandemly arranged repeats of 14 amino acids with the prototype sequence P P P P G G P Q P R P P Q G (Fig. 8). Each amino
-30
-20
I
I
up2 A
M(terenrs:
C
I
A
C
-10 I T
T
C
.
~
0
10
20
30
40
50
I
I
I
I
I
I
~
G
M
A
M
C
T
C
C
T
T
C
C
~
~
60 I ~
T
~
T
~
~
02030000000330220001003011213010000000230130300130330003301030010123311010010230111123333333300010
FIG.6. Comparisons of the S’-flanking regions and sequences e n d i n g exon 1 of PRP cDNAs and PRP genes from mouse ( M P 2 , pUMP40, pUMPl2, and pUMPl), rat (pRP25, pRP33, and pRP18), human (CPTI, CPT3, and CP6), hamster (H29), and rat CRPB. The legend indicating differences of 0, 1, 2, and 2 3 denotes the relative conservation of bases at each position. In 65% of the positions, there is only 0-1 base change.
~
T
C
C
T
H29
HE2 C T A A C C T T A A G C A T C T T T A A T A G A A C A A ~ T ~ G ~ G C ~ T C T-650 AT
I l l
I I
I
I Ill
I
-494 TCCCTACTGGGTGAGCTAACTCCCTACACAAT"AAACAAATCAATCAACT
I
D
I
GGTCCTTCAIAAATGTAACAGTCAAA-CAIACTCAC-CAGGAATTACGGATT-602
II
IIIIIIII Ill
-
1 1 1 1 I II II
I
-444 AAGTGTTAT GCATGTAACA-TCATGCCA A-TAACACUTGAA
+
I
I
-
d
CAAGATATTGACTCATGTATACCTCATATGTGTTGTGACTCCACTTTTAC -552
0
TGGGACTTTATAGATGAAATAGGTCTCATGCTTTACTA GCCAAT GTTGTA -502 GTTATTGTGTTAGGTCAGGAGAATAGTGGGCACTCTTACTGAGGCTTAGC -452 ATGTTAGGGATTCCAAGGGTCTTGGTGTAATTGATAlTTGlTTATGAATA
-402
GCCTCAACACCATCACTCTTAACTAATTATAGAATATATAAGAACATATA
-352
TAAGTGACAGTGGTTAAGCTATCCTACTGATCATAAA?iATTGACCACATT
-302
CAATTTGGACAGAAATCATTACTGTCAA TATAAACAAAT
-263
I I I I I I l l II CATAATTTTGCACCTTTAGTCTCAGTGA CAGGAA
-404
-370
I l l IIIIII IIII I Ill I I II II IIIII GAATATGGACA AAATTA TACAGGTATGTAGAAGCACCTCCCACAA
-322 TCATACCTAATAGGTCAGAGTCAGAGTTATGTCAATAACAGTGTCTTACA -274 CAATGATAGGCCTTAAAGGACAATAGACTTATTG -238 ATAGAACTATATATCTAATGTCTAGACTTTGCCTGTATCACTTAAACTAT GTGTATGCACACTAGTTTTA -244
I
I I
-1aa TGTTGTCAAAATTTCACATTGTACCATAGAGAACTGAAACATTGACTGCA CCCCAATGCACATTGATACACAAA AAATGTCAGCAAATGCA ATGAGATAT -193
I
II I
I I II
I
-138 TCCTGCTGGGCTAGAGTCCCAAAG AAAAGTCAGT GATGCA AAG
TTATATATTGTTAGTCATTACTGCAATAACTGGGTTATATGATTACATAG -143 GAGTTTTTTCTAGTAGGGACACTAGCAGCTAGC
TCTTCCTTACCTCATCCTGATGGGCAAAAGTCCCAGTGTCACACAAAGGA
-60
GAAAGGTGACATTCTTCTGCTCCTCCTTATAAAGGCAGTGTCTTACT
-12
II II I I l l II I I II I l l I I IIII TA TCCTGCTG TG TC AGGT CAGATCAATAGTGAGGA
-95 C
-60
I IIIIIIII IIII IIIIII I IIIIIIIIIII II I I CATGAAAGGTGCCATTGTTCTGC CTTCCTTATAAAGATTTTGGCCTTGC TCTTCCAGCACAGACTTGG
I
IIIIIIIIIIIIIII
-11 TGGCCCAGCACAGACTTGG FIG.7. Comparisons of upstream sequences of mouse PRP gene M P 2 and hamster PRP gene H 2 9 . The upstream sequences of M P 2 and H 2 9 are aligned to maximize sequence similarities. Putative regulatory regions are indicated. Boxes 111 and I, M P 2 , -640 to -623 and -218 to 203. respectively; arrows, AP-1 binding sites; GCCAAT, -513 to 508. CCAAT box.
13
PROLINE-RICH PROTEIN MULTICENE FAMILIES
I M L V V L F T V A L L A L S S 1 6 A Q G P R E E L Q N Q I Q I P N Q R
SIGNAL PEPTIDE TRANS I T I O N REG ION
3 4 P P P S G F Q P R P P V N G S Q Q G 52P P P P G G P Q P R P P
Q
G
REPEAT REGION
66P P P P G G P Q P R P P Q G 8oP P P P G G P Q P R P P
Q G
g4P P P P G G P Q P R P P Q G 108P P P P G G P Q Q R P P Q G 122P P P P G G P Q P R P P Q G 136P P P P G G P Q L R P P Q G 15oP P P P A G P Q P R P P Q G 16‘4P P P P A G P Q P R P P Q G 178P P T T - G P Q P R P T Q G 191P P P T G G P
Q Q R P P Q G
205P P P P G G P Q P R P P Q G 219P P P P G G P Q P S P T Q G 2 3 3 P P P T G G P Q Q T P P L A G N T 61 G
CARBOXYL TERMINUS
2 5 2 P P Q G R P Q G P R STOP 26 1 FIG. 8. Amino-acid sequence of PRP GPMsm derived from the nucleotide sequence of mouse PRP gene MP2.
acid within the repeat had its “favored” codon ( 1 ) (Table I), and six amino acids had a total conservation of codons for all 13 repeats. Subsequent studies (2) showed that two nonallelic PRP genes ( M P 2 and M 1 4 ) are tandemly arrayed and separated by about 30 kbp. Analysis of DNA sequences suggested that M P 2 and M 1 4 arose via gene duplication of a common ancestor. A homology matrix, or “dot-plot,” showed virtually no spurious background, and, aside from three differences, the sequences of the two genes, including the introns, were nearly identical. The differences observed were two additional sequences in intron I of M 1 4 of 223 and 2005 bp, four additional repeats (17 repeats total) in M 1 4 , and fractional sequence differences of the simple repetitive sequences (CA, TA, and TAGA) of intron
DON M. CARLSON ET AL.
14 TABLE I COMPARISON OF CODON USAGE’ Codon
MP2
Other mouse genes
CCU Pro (47)b CCA CCG
16 16 60 8
31 25 38 6
GGU Gly (19)b GGC GGA GGG
0 62 31 7
21 26 32 21
CAA Gln (17)b CAG
53 47
35
CCU Arg (7)b CGC CGA CGG AGA AGG
6 0 6 6 64 18
7 13 13 12 31 24
ACU Thr (4)b ACC ACA ACG
23 0 77 0
25 36 32 7
UAA term UAG UGA
100 0 0
0 0 100
ccc
65
“Reprinted with permission from theloumul of Biological Chemistry ( 1 ) . b Amino-acid composition in mol%.
I. The additional 2005-bp sequence was the 3‘ portion of the mouse LINE element, and it apparently had been transposed into intron I, but in the opposite orientation of M 1 4 (Fig. 9). This mouse LINE sequence (LIMdPRP), like most mouse LINE elements, is truncated at the 5‘ end (35). LIMd-PRP contains the typical polyadenylation signal (AATAAA) and an adenine-rich sequence, and it is flanked by a pair of 10-bp imperfect repeats (TGTCTTTTTT and TGTCTTTCTT). This IO-bp sequence is present only once in MF2. These and other data are strong evidence that LIMd-PRP entered this PRP locus via transposition. Both M P 2 and M 1 4 are transcriptionally active in the parotid gland when the mouse is treated with isoproterenol (11). PRP cDNAs pUMP4 and pUMP40 are encoded by M 1 4 and MF2, respectively. The number of tan-
15
PROLINE-RICH PROTEIN MULTIGENE FAMILIES 0
10
5
15
20
30 35
25
40
45
50
60
55
65
70 75
80 kb
GENES
Hindm Sol1 BamHI EcoRI
CLONES
I
9
I,
1
0
I
,
b
,
,,
I
f
,,,
, I
c
,d,
e
,
I
,
MC I6 MC22 M I4
FIG. 9. Linkage of mouse PRP genes MP2 and M14.The organization of MP2 and M I 4 are shown by the expanded scales and the relative lengths are indicated by the kilobase bar. Solid bars show the three exonic regions, and the open arrow ( M 1 4 ) represents the LINE insert. Arrows show the direction of transcription. [Reprinted with permission from the Journal of Biological Chemistry @).I
dem repeats within each gene varies, as we indicated (I, 36), and is similar to those reported (37) using variable number of tandem repeats (VNTRs) as markers for mapping human genes. However, PRP tandem repeats are the major body of the active gene, and no sequence similarity exists between this repeat and the invariant core sequence of VNTRs (37, 38). Sequence analysis of a hamster PRP gene (H29) showed that the hamster, rat, mouse, and human PRP genes are all closely related. Mouse and rat PRP genes have two exons encoding PRPs (I and IIb) (Fig. lo), while hamster and human genes have three exons (I, IIa, and IIb). Exon IIa of hamster and human PRP genes are both comprised of 36 bp, seemingly coming from the 5’ sequences of exon IIb of the mouse and rat. Whether this difference in PRP gene organization resulted from a separation or combination of exonic regions is unknown. Upstream regulatory regions of mouse, hamster, and human PRP genes are discussed in Section V. Unlike PRPs from mice and rats, which are all blocked on the amino terminus, hamster PRPs Hp43a and Hp43b were partially sequenced from the amino terminus (39). The open reading frame of exon I (H29) encodes hydrophobic residues, the putative signal peptide. Exon IIa contains only 36 bp, and the derived amino-acid sequence is A T I Y E D S I S Q L S, which is exactly the sequence of the amino terminus of Hp43a, except in position 8, which is D instead of I. Exon IIb contains 514 bp and encodes the mature protein, except for the first 12 amino acids. Exon IIb has 10 Hue111 sites and six Sau96I restriction enzyme sites, which is one of the unusual characteristics of PRP genes (I).One open reading frame of exon IIb encodes a 20-
DON M. CARLSON ET AL.
16 EI
E Ilo
\ ,I
Hamster
'\\
H29
M
ll
0
Human PRHl
,'
N
\
I,
I,
,
,
,
,/
,
\I
'\
'
;
:: I t
I
U
I
3'
5' 1 kb
FIG. 10. Comparison of PRP gene organizations in mice, hamsters, and humans. Related exonic regions (bars)are connected by dashed lines. The gene organizations of hamster H29 (36) and human PRHl (8)show the additional 36-bp exonic sequence. [Reprinted with permission from the Journal of Biological Chemistry (2).]
aminoacid peptide that is repeated five times and has a prototype sequence of P P Q Q E G Q Q Q N R P P K P Q N Q E G. The first 43aminoacids and the last 12 amino acids derived from the nucleotide sequences of exons IIa and IIb diverge from this prototype repeat pattern and give rise to the transition region and the carboxyl-terminal region, respectively. While the 5'-noncoding regions and the sequences encoding the putative signal peptides (about 100-110 bp) are highly conserved in all PRP mRNAs (Fig. 6), there is a discrepancy in the apparent cleavage site by the signal peptidase in hamster PRPs, as suggested by the amino-acid sequence of Hp43a. From the amino-acid sequence derived from PRP gene H 2 9 , the nascent polypeptide chain has the sequence M L V V L L T A A L L A & E H f A T I Y E----, with Glu (E) and His (H) preceding the apparent site of cleavage ( t ) (see the amino-terminal sequence encoded by exon IIa, above, A T I Y E D-----). Histidine has not been observed in position -1 (counting from the cleavage site) (40). We have proposed that the signal peptidase cleaves between Ala and Glu ( J. ) and that Glu and His are removed by further processing (36). However, this proposal predicts an unusually short signal peptide of only 12 amino acids.
V. Regulation of Expression of PRP Genes The upstream sequences of mouse PRP genes M P 2 and M14 and hamster PRP gene H 2 9 contain potential regulatory elements (Figs. 7 and 11). In each of these genes, three highly conserved regions were identified (boxes I111) (1).These regions include two putative CAMPresponse elements (boxes
17
PROLINE-RICII PROTEIN MULTIGENE FAMILIES
Source
Sequence
E. coli CRP Binding Site
Bovine a-gonadotropin Human VIP
A A N T G T G A N N -122 -81
Reference (411
- T N N N - N C A
T T A T G T G A A G - T A C -
- .C A
-108
(55)
T A C T G T G A C G
- T C A
-65
(56)
- T C T T
PRP Genes, Box 111 Mouse MP2 Mouse M14 Hamster H29
-640 -640 -435
A A A T G T A A C A G T C A A - A C A C A A T G T A A C A G T C A A - A C A G C A T G T A A C A - T C A T G C C A
-623 -623 -418
(1) (2) (36)
PRP Genes, Box I Mouse MP2 Mouse M14 Hamster H29
-218 -217 -114
A A A T G T C A G C - A A A T - G C A A A T T G T C A G C - T A A T - G C A A A A T G T C A G T - - G A T - G C A
-203 -202 -99
(1) (2) (36)
Human PRP Genes PRHl PRH2
-484 -484
A A A T G T G A A A A T A C C A A A T A T A C A A A T A T C
-467 -467
(8) (8)
Relative Base Frequencies: A T G C
0 1 0 411 12 11 12 0 0
0 0 0 5 0 0 0 0 3 1
1
11 0 0
- T C A - C A T 1lJ 0 1 0 11 0
FIG. 11. Sequence comparisons of the E . coli CRP binding site with putative cAMP regulatory sites (boxes I and 111) of PRP gene M P 2 . Similar sequences in the E . coli CRP binding site are reported in bovine a-gonadotropin, human vasoactive intestinal peptide (VIP), and human PRHl (PRP) genes. The relative frequencies of A, T, G , and C at each position are indicated. Positions 4, 5, 6, and 8 (TGT-A) are totally conserved. Positions 12, 18, and 19 each show only one substitution: A for T ( 1 2 ) in M P 2 , and A for C (18)and C for A (19),both found in human PRHI.
I and 111) with sequences similar to the CRP binding site required for transcriptional activation of CAMP-regulated genes in Escherichia coli (1,41) (Fig. 11). Also, box I11 of the mouse PRP genes contains a sequence (-637 TAACAGTCA - 629) which resembles the 8-bp palindromic sequence TGACGTCA, a cAMP response element (CRE) in eukaryotes (42).The palindrome is imperfect by one base (A for G), and it is interrupted by an A. This sequence in box I11 of the hamster gene is similar, but it lacks a G (TAAC-TGA). Such overlapping sequences of CRP and CRE have been shown to be functionally related (43). Mammalian activation-translation factors (ATF)can bind specifically to some E. coli CRP sites, and, conversely, E. coli CRP-binding protein specifically binds to some mammalian ATF sites (43). Of considerable interest is the observation that multiple AP-1 binding sites (44) are present immediately 3' to box I11 in PRP genes MP2 and M14 (Fig. 11).These sequences are not present in the hamster gene. A perfect
18
DON M. CARLSON ET AL.
copy of the AP-1 heptamer, TGACTCA, is located at positions -594 to -588. A totally conserved CCAAT box (GCCAAT) is located at positions -516 to -509 in MP2 and M14. Proteins known to bind to the CCAAT box (CTF/NF-1 proteins) activate both transcription and replication (45). The proline-rich transcriptional activator of CTF/NF-1 is distinct from the replication and DNA binding domain in that it requires an additional carboxylterminal domain (46). Preliminary studies using gel-mobility-shift assays (47) have shown that nuclear extracts from the parotid glands of isoproterenoltreated mice have about a 6-fold increase in protein(s) binding to the upstream sequence (-702 to -574 bp) of MP2. “Footprint” assays indicate that the nuclear protein(s) binds to the AP-1 repeats. Adding Bt,cAMP or forskolin to hamster parotid gland primary-cell cultures resulted in a large increase (i.e., 15-to 30-fold) in PRP mRNA levels (48). The increase was most dramatic between 10 and 18 hr of treatment. Treatment of the cells with cycloheximide blocked this induction of PRP mRNAs, which is added evidence that the synthesis of a trans-acting factor is necessary for the dramatic increase in transcriptional activation of the PRP genes. &-AmylasemRNA was not significantly ailected by the cycloheximide treatment. Transfections have been performed using the plasmid pUMP2BE, which contains the complete MP2 gene, and with various constructs containing deletions of the upstream sequence of MP2. Constructs containing the sequence -702 to -574 bp of MP2 in tandem with the Rous sarcoma virus (RSV) promoter and the chloramphenicol transferase (CAT) gene showed induction of PRP mRNAs of 2- to 4-fold. Various cell types have been used for the transfection experiments, including PC-12, AtT20, and L M cells. Presently, we are attempting transfection experiments with a parotidhepatoma cell line prepared by fusion of FTO-2B cells and parotid gland primary cells (49). This may be the only “immortalized cell line of parotid gland cells available, and these cells may respond more dramatically to transfections with the PRP gene regulatory sequence.
VI. Functional Aspects of PRPs The high conservation of the sequences and structures of PRP genes and PRPs argues for specific biological functions for these unusual proteins. Some of the proposed functions, such as calcium binding, hydroxylapatite binding, formation of the dental-acquired pellicle, and agglutination of oral bacteria, have been reviewed, especially for human PRPs (12, 22, 23). In 1983, we showed that PRPs, which were dramatically induced in rat parotid glands by feeding tannins, are beneficial to the rat (SO).Condensed tannins (proanthocyanidins, oligomers G f fiaan-3-01s) and hydrolyzable tannins
PROLINE-RICH PROTEIN MULTICENE FAMILIES
19
(oligomers of gallic acid) are present in many foods. Antinutritional effects and other toxic and pathological properties, such as carcinogenicity and hepatotoxicity, have been associated with the ingestion of tannins and with the medicinal use of tannins. The general properties of tannins, their effects on biological systems, and specifically their roles in the induction of PRPs have recently been reviewed (13). Seeds of bird-resistant cultivars of sorghum, a major cereal crop of the semi-arid tropics, contain high levels of tannin (high-tannin sorghum), which diminishes the nutritional value of the grain. Studies designed to define the interactions of tannins and proteins show that tannins have an extremely high affinity for proteins rich in proline (51), and that the salivary PRPs have the highest affinity (50). Because the gastrointestinal tract, specifically the oral cavity, is the source of PRPs, it was suggested that salivary PRPs might interact with tannins and serve as a defense against the detrimental effects of dietary tannins. While the dramatic induction of PRPs in the rat following isoproterenol treatment clearly offsets the usual detrimental effects of dietary tannins (50),parotid glands from rats fed high-tannin sorghum (i.e., 2% of their diet) without isoproterenol treatment were also enlarged about 4fold, and there was a dramatic increase in PRPs within 3 days. Thus, tannins in the diet mimic the effects of the P-agonist isoproterenol on the parotid glands. There was an initial weight loss on the 2% tannin diet, reversed at 3 days, or at the time of maximal stimulation of PRP synthesis. After this time, the animals grew at close to the normal rate. Amino-acid analysis, electrophoretic patterns of proteins, and cell-free translations of mRNAs all confirmed that the PRPs induced in parotid glands by feeding tannin are identical to those induced by isoproterenol. Subsequent studies show that the P-agonist propranolol (a mixed PI, P,-agonist) and atenolol (a PI-agonist), when included in the diet, block the induction of PRPs by dietary tannin. Butoxamine, a P,-specific blocker, had no effect and therefore the P-adrenergic receptor affected by tannin feeding is the PI-receptor. The addition of either propranolol or atenolol to the diet of rats also causes substantial increases in four proteins in the submandibular glands, of MW 145,000 (GP145), 42,000 (P42), 40,000 (P40), and 39,000 (P39) (52). GP145 is glycosylated. These proteins are tissue-specific, as they were not detected in the parotid or sublingual gland, lung, liver, pancreas, kidney, heart, or small intestine either before or after propranolol treatment. We believe that this is the first report on the induction or regulation of protein synthesis by a P-adrenergic blocker. The hamster was used as another animal model to study the regulation and expression of PRPs. The hamster responded to isoproterenol by the induction of a series of proteins (39). However, the protein encoded by “PRP” gene H29 was unusually high (34%) in Gln and was only 15% Pro.
20
DON M. CARLSON ET AL.
Also, there was no evidence of a hypertrophic response in the hamster salivary glands. Subsequent studies of feeding tannins to hamsters also showed essentially no hypertrophic response, and PRPs were not induced. Weanling hamsters fed a diet of 2% tannin lose weight for about 3 days, as do rats and mice, but then an unusual growth inhibition occurs (39). Hamsters maintained on a 2% tannin diet failed to grow, and even at 60 days were essentially the same body weight as at 3 days after starting the feeding trial. When diets were switched, the experimental animals gained weight at close to the normal rate for young hamsters, while the control animals, then on a 2% tannin diet, lost about 20% of their weight. Clearly, the detrimental effects of tannins are reversed or inhibited by the induction of PRPs in rats and mice, but hamsters are unusually susceptible to tannins. In fact, increasing the tannin content of the diet to 4% was fatal to most hamsters within 3 days.
VII. Discussion Specialized cells in eukaryotes variably express different genes during differentiation and development. Exocrine glands, such as the pancreas and the salivary glands, have served as models of secretory tissues. Under ordinary conditions, the salivary glands of adult animals are relative stable and do not change appreciably in cell size or number (27). However, administration of the catecholamine isoproterenol causes dramatic morphological, cytological, and biochemical changes. Morphologically, the parotid glands can increase up to 10-fold in size. Cytologically, about 50% of the acinar cells are polyploid within 2 days of treatment. Biochemically, a dramatic induction of the multigene family encoding the PRPs is observed. The expression of PRPs for the parotid and submandibular glands is tissue-specific or, possibly more correctly, cell-specific. PRPs have been identified immunochemically in the trachea (53) and the pancreas but there is no evidence that PRP genes in these tissues respond as in the salivary glands to isoproterenol treatment. Small amounts of PRP mRNAs are observed in the mouse pancreas after isoproterenol treatment (C. A. Bidwell and D. M. Carlson, unpublished), but these results were variable. Current data suggest that transcriptional controls, tissuespecific factors, and post-translational modification share the role of principal modulators of the expression of the PRP gene families.
(a),
ACKNOWLEDGMENT These studies were supported in part by NIH grant DK 36812. P.S.W. was supported by
NIH training grant T32 HL 7013-13.
PROLINE-RICII PROTEIN MULTIGENE FAMILIES
21
REFERENCES D. K. Ann and D. M. Carlson, JBC 260, 15863 (1985). D. K. Ann, M. K. Smith and D. M. Carlson, JBC 263, 10887 (1988). G. Heinrich and J. F. Habener, JBC 262, 5262 (1987). L. Mirels, G. S. Bedi, D. P. Dickison, K. W. Grossand L. A. Tabak,JBC262,7289(1987). D. K. Ann, S. Clements, E. M. Johnstone and D. M. Carlson, JBC 262, 899 (1987). H. Mehansho, D. K. Ann, L. G. Butler, J. Rogler and D. M. Carlson, JBC 262, 12344 (1987). 7 . M. A. Ziemer, W. F. Swain, W. J. Rutter, S. Clements, D. K. Ann and D. M. Carlson,JBC 259, 10475 (1984). 8. H. S. Kim and N. Maeda, JBC 261, 6712 (1986). 9. H. Selye, M. Cantin and R. Veilleux, Growth 25, 243 (1961). 10. K. Brown-Grant, Nature 191, 1076 (1961). 11. S. Clements, H. Mehansho and D. M. Carlson, JBC 260, 13471 (1985). 12. A. Bennick, MCBchem 45, 83 (1982). 13. H. Mehansho, L. G. Butler and D. M. Carlson, Annu. Reu. Nutr. 7, 423 (1987). 14. A. Bennick and G. E. Connell, BJ 123, 455 (1971). 15. F. G. Oppenheim, D. I. Hay and P. Franzblau, Bchem 10, 4233 (1971). 16. A. Fernandez-Sorenson and D. M. Carlson, BBRC 60, 249 (1974). 17. D. L. Kauffman and P. J. Keller, Arch. Oral B i d . 24, 249 (1979). 18. J. Muenzer, C. Bildstein, M. Gleason and D. M. Carlson, JBC 254, 5623 (1979). 19. J. Muenzer, C. Bildstein, M. Gleason and D. M. Carlson, JBC 254, 5629 (1979). 20. R. S. C. Wong and A. Bennick, JBC 255, 5943 (1980). 21. H. Mehansho, S. Clements, B. T. Sheares, S. Smith and D. M. Carlson, JBC 260, 4418 (1985). 22. A. Bennick, J . Dental Res. 66, 457 (1987). 23. A. Bennick, J . Dental Res. 68, 2 (1989). 24. I. D. Mandel, R. H. Thompson and S. A. Ellison, Arch. Oral B i d . 10, 499 (1965). 25. T. Barka, Exp. Cell Res. 37, 662 (1965). 26. R. Baserga, FP 29, 1443 (1970). 27. C. A. Schneyer, in “Regulation of Organ and Tissue Growth” (R. J. Gross, ed.), 211 pp. Academic Press, New York, 1972. 28. M. R. Robinovitch, P. J. Keller, D. A. Johnson, J. M. Iverson and D. L. Kaufman,]. Dental Res. 56, 290 (1977). 29. M. A. Ziemer, A. Mason and D. M .Carlson, JBC 257, 11176 (1982). 30. H. Mehansho and D. M. Carlson, ]BC 258, 6616 (1983). 31. H. 0. Madsen and J. B. Hjorth, NARes 13, 1 (1985). 32. D. King, L. D. Snider and J. B. Lingrel, MCBiol 6, 209 (1986). 33. N. Maeda, H . 4 . Kim, E. A. Azen and 0. Smithies, JBC 260, 11123 (1985). 34. D. Kauffman, R . Wong, A. Bennick and P. Keller. Bchem 21, 6558 (1982). 35. M . F. Singer and J. Skowronski, TZBS 10, 119 (1985). 36. D. K. Ann, D. Gadbois and D. M. Carlson, JBC 262, 3958 (1987). 37. Y. Nakamura, M. Leppert, P. O’Connell, R. WOK, T. Holm, M. Culver, C. Martin, E. Fujimoto, M. Hoff, E. Kumlin and R. White, Science 235, 1616 (1987). 38. A. J. Jeffreys, V. Wilson and S . L. Thein, Nature 314, 67 (1985). 39. H. Mehansho, D. K. Ann, L. G. Butler, J. Rogler and D. M. Carlson, JBC 262, 12344 (1987). 40. G. von Heijne, J M B 173, 243 (1984). 1. 2. 3. 4. 5. 6.
22
DON M . CARLSON ET AL.
41. B. de Crombrugghe, S. Busby and H. Buc, Science 224, 831 (1984). 42. W. J. Roesler, G. R. Vanderback and R. W. Hansen, JBC 263, 9063 (1988). 43. Y.-S.Lin and M. R. Green, Nature 340, 656 (1989). 44. P. K. V. Vogt and T. J. Bos, TZBS 14, 172 (1989). 45. C. Santoro, N. Mermod, P. C. Andrews and R. Tjian, Nature 334, 218 (1988). 46. N. Mermod, E. A. O’Neill, T. J. Kelly and R. Tjian, Cell 58, 741 (1989). 47. J. Zhou and D. M. Carlson, FASEB]. 4, 2131 (1990). 48. P. S. Wright, C. Lenney and D. M. Carlson, J . Mol. Endocrinol. 4, 81 (1990). 49. P. S Wright and D. M. Carlson, FASEB]. 2, 3104 (1988). 50. H. Mehansho, A. Hagerman, S. Clements, L. G. Butler, J. Rogler and D. M. Carlson, PNAS 80, 3948 (1983). 51. A. Hagerman and L. G. Butler, JBC 256, 4494 (1981). 52. V. N. Subramaniam and D. M. Carlson, FASEB J. 4, 1980 (1990). 53. T. F. Warner and E. A. Azen, Am. Reu. Respir. Dis. 130, 115 (1984). 54. S. Ito, S. Isemura, E. Saitob, K. Sanada, T. Suzuki and A. Shibita, Acta Endocrinol. 103, 544 (1983). 55. R. G. Goodwin, C. L. Moncman, F. M. Rottman and J. H. Nilson, NARes 11, 6873 (1983). 56. T. Tsukada, J. S. Fink, G. Mandel and R. H. Goodman, JBC 262, 8743 (1987).
Recognition of tRNAs by Aminoacyl-tRNA Synthetases’
I
LADONNE H. SCHULMAN Albert Einstein College of Medicine Bronx, New York 10461
I. Recognition versus Identity
................
11. Assays of the Amino-acid-am A. In Vitro Assays
B. In Vioo Assays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Role of the Anticodon
IV.
V.
VI. VII. VIII. IX.
A. Anticodon Recognition in E . coli tRNAs . . . . . . . . . . . . . . . . . . . . . . . . B. Anticodon Recognition in Yeast tRNAs C. Summary . . . . . . . . . . . . . . . . . . .. . . . . . Role of the Acceptor Stem and the “Discriminator” Base at Position 73 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Alanine Synthetases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. E. coli Serine Synthetase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. E. coli Glutamine Synthetase . . . . D. Other E . coli Synthetases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Recognition Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. E . coli Arginine Synthetase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Yeast Phenylalanine Synthetase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. E. coli Phenylalanine Synthetase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Summary ...................... Role of Modified Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Complex of E. coli Glutamine tRNA and Glutamine Synthetase . . . tRNA Binding Domains of Other Synthetases . . . . . . . . . . . . . . . . . . . . . . . Concluding Remarks References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24 2.5 25 25 29 30 43
44 44 44 51 53
55 57 58 58 60 62
63 64 66
72 81 82
The highly specific selection of tRNA substrates by aminoacyl-tRNA synthetases is an intriguing problem in RNA-protein recognition. Synthetases specific for each of the 20 amino acids encounter a pool of tRNAs in the cell having similar overall structures (1-3). Selection of the appropriate tRNAs for attachment of each amino acid occurs by the formation of RNA-protein contacts unique to each cognate tRNA-synthetase pair. The sites in tRNAs that govern these interactions have been investigated by a variety of techAbbreviations: synthetase, aminoacyl-tRNA synthetase; MetRS, methionyl-tRNA synthetase; GlnRS, glutaminyl-tRNA synthetase, etc.; tRNAMet(CAU), methionine tRNA have the anticodon sequence 5‘CAU3‘; tRNATw(CCA), tryptophan tRNA having the anticodon sequence 5’CCA3’, etc.; DIIFR, dihydrofolate reductase.
23 Progress in Niicleic Acid Research and Molecular Biology, Vol. 41
Copyright 0 1991 by Academic Press. Inc. All rights of reproduction in any form reserved.
24
LADONNE H. SCHULMAN
niques for over 20 years, and numerous reviews cover this literature (4-6). However, recent technical advances have allowed a burst of new activity in this area, leading to the identification of nucleotide bases required for the recognition of a number of specific tRNAs (7-11).In addition, new studies have begun to reveal the corresponding sites in the synthetases that govern specific tRNA interactions (12-18). This article focuses on these exciting recent developments, dealing first with the tRNA studies, and following with a summary of data on tRNA recognition sites in synthetases.
1. Recognition versus Identity The terms “tRNA recognition” and “tRNA identity” have both been widely used in discussions of tRNA aminoacylation specificity. I use “recognition” here to refer to the specific identification of a cognate tRNA by its corresponding synthetase. The recognition elements in a given tRNA are defined as the set of structural features unique to that tRNA required for its efficient aminoacylation by the cognate synthetase. Recognition elements are identified by structural changes that significantly reduce the efficiency of aminoacylation by the cognate synthetase in in oitro assays. In addition, recognition elements can be transferred to noncognate tRNAs, allowing them to be charged with the corresponding amino acid in uitro and/or in oiuo. Further, loss of an important recognition element from an essential tRNA in uiuo should lead to a decrease in aminoacylation efficiency sufficient to impair cell growth. In the case of several tRNAs, recognition is now known to require only a small number of specific nucleotides. The prospects are good in the next few years for identification of the major recognition elements in tRNAs specific for all 20 amino acids in Escherichia coli. “tRNA identity” (19)is the term currently in general use to indicate the amino-acid-acceptor specificity of a tRNA. Identity elements include recognition elements for the cognate synthetase plus additional structural features that prevent recognition of the tRNA by noncognate synthetases. The locations of these positive and negative identity elements coincide when two synthetases recognize the same sites in their cognate tRNAs, but diverge in cases in which the recognition patterns differ. Identity elements need not be conserved among tRNAs specific for the same amino acid, as isoacceptor tRNAs could require different negative elements to protect against mischarging by different noncognate synthetases. Analysis of an extensive set of mutant tRNAs is often required to distinguish between positive and negative identity elements, and mutants having dual or multiple identities can be created readily. At present, the size of the identity set for specific tRNAs is largely unknown. In addition, inherent difficulties in setting up appropriate in oioo assays may hamper complete solution of the tRNA identity problem for some time.
RECOGNITION OF
25
tRNAs
II. Assays of the Amino-acid-acceptor Specificity of tRNA A. In Vitro Assays In oitro assays of tRNA charging and mischarging by purified synthetases, together with sequence comparisons of the test tRNAs, were used in earlier studies to attempt to define tRNA recognition elements (6, 20-23). In addition, chemical modification experiments helped to identify sites where structural changes led to loss of recognition by cognate synthetases (5, 6, 24). Nonsense and missense suppressor tRNAs generated in genetic experiments were also isolated from cells for in oitro analysis (25-28). Studies using tRNAs synthesized in oioo were sometimes complicated by the fact that mutations, especially those in the anticodon loop, altered the normal patterns of post-transcriptional base modification, and the effects of the two changes could not easily be distinguished (28). In 1982, a method for in uitro anticodon replacement using T4 RNA ligase was reported that avoids this problem (29). T4 RNA ligase was also used subsequently to introduce mutations near the 3' end of tRNAs (30) and to join synthetic oligoribonucleotides into intact tRNA structures (31). More recently, wild-type and synthetic tRNA genes have been cloned, and a variety of tRNAs have been overproduced and purified for in oitro studies (3236). In addition, in oitro transcription of tRNAs using "7 RNA polymerase has allowed the preparation of milligram quantities of specific tRNA sequences for enzymatic assays and physical studies (37-39). Fortunately, these transcripts lacking base modifications have generally been efficient substrates for aminoacyl-tRNA synthetases (Table I), allowing quantitative comparisons of the effects of base changes at specific sites on recognition by purified cognate and noncognate enzymes. This technique has also provided tRNA sequences difficult to obtain from cells due to the effect of mutations on tRNA biosynthesis or cell viability.
B. In Vivo Assays 1. NONSENSESUPPRESSION While much can be learned about tRNA recognition from in vitro experiments, the complete set of tRNA identity elements can only be derived from in oioo studies in which synthetases specific for all 20 amino acids compete for tRNA substrates under normal physiological conditions. Unfortunately, no direct in oioo assay of tRNA amino-acid-acceptor specificity exists. In early in oioo studies using amber suppressor derivatives of E. coli tRNATyr, genetic selections were devised for tRNA base changes that allow suppres-
26
LADONNE
n. SCHULMAN
TABLE I OF AMINOACYLATION OF NATIVE tRNAs AND in Vitro SYNTHESIZED tRNA COMPARISON WITH COGNATE SYNTHETASES" TRANSCRIPTS ~
Organism
E. coli Yeast
E . coli E. coli Yeast
E. coli E. coli
tRNA
[Mg2+](mM)
Asp Native Transcript Asp Native Transcript His Native Transcript Met Native Transcript Phe Native Transcript Thr, Native Transcript Val, Native Transcript Native Transcript
10 10 15 15 10 10 5 8
15 15 8
8 5 11 15 15
Relative specificityd transcript/ native
Reference
1.0
40
0.3Bb l.O=
0.9
41
4.0 0.6
0.80
0.7
42
1.1
0.9r 1.oc 0.8= 1.oc
0.5
43
0.2
37
1.2=
0.4
44
0.75~ 1.0=
0.4
43
0.94c
0.9
45
Apparent K, ( F M )
0.33 0.32 0.044 0.028
3.5
0.096
0.380 0.06 0.20 0.5 1.0 1.6 1.6
kb.c
1.oc 1.oc 0.66b
1.0"'
1.oc
Kinetic parameters vary with aminoacylation conditions. Native tRNAs and tRNA transcripts were assayed in parallel in each case, except for Mgz+ concentration. Transcripts normally require higher [Mgz'] for optimal aminoacylation. It is not clear in all cases whether experiments were carried out at the Mgz+ optimum for the native tRNA or for the transcript. Apparent K,'s are given since assays are carried out at suboptimal amino-acid concentrations. See specific references for details.
bkcal (s-'). Relative V,-. "[ k,,,/K,,,]transcript/[ k,,,/K,]native
or [ V,,/K,]transcript/[
V,,/K,]native.
sion of amber codons at sites for which Tyr insertion fails to yield a biologically active protein (46,47). A number of mutant tRNAs were isolated, a11 of which inserted Gln (48-51). In 1986, the powerful genetic approach to the study of tRNA identity was revived, taking advantage of automated DNA synthesis to construct genes of E. coli amber suppressor tRNAs containing base changes at any desired location (19,52, 53). The genes were cloned in a tRNA expression vector and the amino acid inserted by the suppressor tRNA at the site of specific amber codons in uiuo was directly determined by protein sequencing. E. coli dihydrofolate reductase, containing an amber codon at position 10 of the gene, was used as the target protein due to its ease of purification by methotrexate affinity chromatography and the neutral
RECOGNITION OF
27
tRNAs
effect of amino-acid substitutions at position 10 on enzyme activity (Fig. 1). This approach has also recently been widely used by others (54-61). Much important information on tRNA identity has resulted from these studies; however, there are several problems inherent in such suppression assays. First, the anticodon of the tRNA must be changed to one that is complementary to the nonsense codon. As discussed in greater detail in Section 111, such changes frequently affect recognition of the tRNA by one or more synthetases. Second, the efficiency of suppression cannot be directly related to the efficiency of aminoacylation. Suppression efficiency is usually measured by the rate of translation of a nonsense codon by a given suppressor tRNA relative to the rate of translation of a wild-type codon at the same site by an endogenous tRNA. Suppression efficiency can be affected by the level of synthesis of the tRNA, the type and extent of base modifications (particularly in the anticodon loop), and the interaction of the tRNA with ribosomes and translation factors. Evaluation of the effects of some of these factors on suppression efficiency can readily be made (62); however, it is difficult to access the effect of others. Several types of data are particularly useful in maximizing the information to be obtained on tRNA identity from in uiuo suppression studies. One is knowledge of the relative intracellular levels of tRNAs competing for the same synthetase. It is clear from theoretical considerations (63), as well as from a variety of experimental observations (59, 62, M), that in uiuo mischarging of tRNAs can be brought about by an imbalance of tRNAs and synthetases in the cell. Test tRNAs are normally overproduced in the sup2
Wild-type DHFR
Ile
1 2
Ser
Leu
Ile
Ah
Ala
Leu
Ala
Val
Asp
Arg
ATC AGT CTG A T T GCG GCG T T A GCG G T A G A T CGC DHFR-amber
........................ ........................
... ...
A A ' Asn TAG AAT
2
Pseudo-DHFR
Met
Lys
Leu V a l S e r
Ala
ATG AAG C T T G T A AGC GCG DHFR-start
AA'
Ser
Leu
Ile
Ale
ATC AGT CTG A T T GCG
..............................
NNN... DHFR-shift
Ile
...........................
. . . . . . . . . AA' . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . NNNN . . . . . . . . . . . . . . . . . . . . . FIG. 1. The sequence of the amino terminus of wild-type E . coli DHFR (118) and DHFR derivatives (19, 670), used for in oioo assays of tRNA identity.
28
LADONNE
n.
SCHULMAN
pression assay to maximize the amount of suppressed protein generated for sequencing. This “fixes the game” in the intracellular competition between the test tRNA and endogenous tRNAs. On the other hand, some mutations drastically reduce the amount of mature tRNA produced in the cell, complicating the interpretation of the effect of specific mutations on amino-acid specificity when comparing tRNAs expressed at different levels. Another valuable piece of information is knowledge of whether aminoacylation is rate-limiting in the suppression assay. This can be assessed by determining the effect on suppression efficiency of overproducing the synthetase(s) corresponding to the amino acid(s) inserted by the test tRNA. Genetic screens have also been used to examine tRNA identity in uiuo, using cells that require insertion of a specific amino acid at a nonsense codon for normal growth (19, 56, 58, 64).This assay is a minimal test of tRNA identity, however, since low levels of aminoacylated tRNA (below the amount detectable by sequencing dihydrofolate reductase) have been found to support growth in some cases, and in uiuo charging with other amino acids is not assessed. Interesting tRNA mutants with altered amino-acid-acceptor specificity can also be obtained by selection for changes in chromosomal tRNA genes that suppress missense mutations in specific proteins (65,66). However, such mischarging mutants insert incorrect amino acids at normal translation codons; they are therefore expected to be significantly defective in aminoacylation or some subsequent step in translation, or else they would be highly toxic.
2. NEW in Viuo ASSAYS
The inability to assay tRNAs containing wild-type anticodons has been a major problem in in uiuo studies of tRNA identity. If tRNAs containing anticodon base changes are efficiently charged by their cognate synthetases or by more than one synthetase, they are toxic to cells, frequently inserting incorrect amino acids into proteins. If the tRNAs are completely converted to the identity corresponding to the new anticodon, the usual assay is lost, since proteins indistinguishable from those synthesized by endogenous tRNAs are made. My group has recently been attempting to develop new in uiuo assays that allow studies of tRNAs containing wild-type anticodons (67, 67a). One such assay uses the E. coli initiator Met tRNA (tRNAfMet)as the test tRNA for the effect of specific mutations on amino-acid-acceptor specificity. Plasmid-borne tRNAfMet genes containing non-Met anticodons are expressed in cells harboring a compatible plasmid carrying a target protein gene with a complementary non-Met initiation codon. As in the nonsense suppression assay, dihydrofolate reductase has usually been used as the target protein; however, the gene has been altered to encode six additional
RECOGNITION OF
tRNAs
29
amino-terminal amino acids fused to the second amino acid of the wild-type protein (Fig. 1).Anticodon derivatives of tRNAfMetinitiate DHFR synthesis in this system (see below) and produce biologically active proteins that retain the initiating amino acid ( 6 7 ~ )The . amino-terminal amino acid is identified by protein sequencing to determine the amino-acid-acceptor specificity of the tRNA. An advantage of this system is that anticodon mutants of tRNAfM” are not toxic to cells, because they are unable to participate in polypeptide chain elongation (68,69). However, it remains to be determined whether this assay will be useful for all non-Met amino acids, and to what extent the structure of the tRNA outside of the anticodon can be altered without complete loss of its initiator function. A second potentially useful in uiuo assay now being explored is the use of frameshift-suppressor tRNAs having eight-membered anticodon loops to translate four-base codons near the amino terminus of the target protein. Several such tRNAs are aminoacylated quite efficiently in uitro (70, 71), and many more function as frameshift-suppressor tRNAs in uiuo (72-76). The potential advantages of this system are that any tRNA structure that allows translation can be tested for its effect on tRNA identity in the presence of the wild-type anticodon sequence. The disadvantages include the typically low suppression efficiencies of frameshift tRNAs and the potential of toxic effects caused by frameshifts in essential cellular proteins. Other in uiuo experiments could provide useful information on tRNA identity. Mutant elongator tRNAs that contain the presumed recognition elements and the known anticodon sequence of a noncognate tRNA could be expressed in uiuo and the effect on cell growth determined. Normal growth should be observed if the tRNA has been completely converted to a new identity. A more stringent test of such an “identity swap” would be complementation of cells in which the chromosomal gene(s) for the original tRNA has been inactivated. The latter is the only kind of in uiuo experiment that addresses the issue of aminoacylation efficiency as well as specificity. Such studies have not yet been attempted, and might fail in any case due to effects of the mutant tRNA on cell growth unrelated to tRNA identity.
111. Role of the Anticodon As was pointed out early in the study of tRNA recognition (77), the anticodon is the most logical site to specify the amino acid to be coupled to a tRNA, since this would directly link aminoacylation to the genetic code. Despite considerable evidence in favor of anticodon recognition for a number of tRNAs (reviewed in 5), the importance of the anticodon was not widely accepted until quite recently. Indeed, some current textbooks still state that the anticodon is not a recognition element for tRNAs (78).Arguments against
30
LADONNE H . SCHULMAN
the involvement of the anticodon in tRNA recognition were largely based on early studies of nonsense suppressors derived from Gln, Leu, Ser, and Tyr tRNAs, which insert the correct amino acid despite the presence of singlebase changes in the anticodon. Nevertheless, one of the first amber suppressor tRNAs studied, su +7 tRNATw containing a CCA-CUA anticodon change, has an ambiguous identity, inserting both Gln and Trp in oiuo (79). Subsequent in oitro studies showed that this single mutation affects recognition of the tRNA by both TrpRS and GlnRS, decreasing interaction with the cognate synthetase and increasing interaction with the noncognate synthetase (80, 81). At about the same time, anticodon base changes generated in oitro were shown to have dramatic effects on the recognition of E. coli tRNAfMef(82,), tRNAArg (83), and yeast tRNAVal(84,), and missense suppressors derived from tRNAC1y were found to be highly defective in aminoacylation by E. coli GlyRS (27,28). These early results were just a prelude of things to come. Kisselev presented an extensive review of the literature on anticodon recognition in this series in 1985 (5). I will attempt to summarize here the' data obtained in the following 5 years, with emphasis on E . coli tRNAs, which have been the focus of many of the recent investigations.
A. Anticodon Recognition in E. coli tRNAs 1. In Vitro STUDIES
Tables I1 and I11 summarize the effects of anticodon base changes on the recognition of tRNAs by cognate and noncognate synthetases in oitro. Significant reductions in the efficiency of aminoacylation of E. coli tRNAs specific for Arg, Gly, Ile, Met, Phe, Thr, Trp, Tyr, and Val result from base changes in the anticodon. The magnitude of the effect varies with the particular synthetase, and may be related to the total number of recognition elements in a given tRNA. MetRS and ThrRS also strongly protect the anticodons of their cognate tRNAs from chemical modification and nuclease attack (85, 86). Even stronger evidence for the role of the anticodon in recognition of tRNAs by specific synthetases is obtained when mischarging occurs in an anticodon-dependent manner. Substitution of the anticodons of noncognate tRNAs with anticodons corresponding to those of E . coli tRNAs for Arg, Met, Thr, and Val leads to increases of 104 to lo6 in mischarging by the corresponding synthetases (Table 111). In addition, 103- and 105-fold increases in the efficiency of mischarging of tRNAfMefand tRNATrpby E. coli GlnRS are observed when the wild-type anticodons of these tRNAs are replaced with the amber anticodon CUA. The efficiency of aminoacylation of noncognate tRNAs containing Met and Val anticodons by MetRS and ValRS is similar to that of the corresponding cognate tRNAs, suggesting that the
TABLE I1 EFFECT OF ANTICODONBASE CHANGESON in Vitro RECOGNITIONOF tRNAs Apparent Organism
tRNA".'
Anticodon"
K, (CLW
keJ
E. coli
.41a, Ala,
2.2 ND
1.oe ND 1.W
Yeast
Ala,
VIGC (wt) G G C (wt) CUA IGC (wt) IGU AGC GGC GGCC GC ICG (wt) IUG G U C (wt)
E.coli
A%,
Yeast
Asp"
E. coli E. coli
Glu
Gly I
uuc AUC U8UC (wt) UUA U8UU
2.9
0.039 0.326 0.250
0.7~ 0.035e 0.036<
ccc (wt)
ucu ucc
1.0 ND 1.4 1.0 Active Active Active Active Active 1.0 Inactive 1.0 0.006 0.008
Comments
Reference 36 36
Also U, + A,
36 87 88 88 89 89 83 1
1 1
Also C,2+U32
90 91 92 93
1.0
u * c c (wt) GCC (wt) GCA
Relative specificitye or activity
Active Inactive 1.0 Slightly reduced rate Slightly reduced rate
cccc Gly,
COGNATESYNTHETASES~
1.0
CUC Gly,
BY
0.16 1.3
1.Of 0.005r
Relative rate 10-4 1.0 6 x 10-4 Normal activity
Also A,+tGA,, Also A,7-+ms2i6A,7
27 28 28 27
(continued)
TABLE I1 (Continued) Apparent Organism
tRNAb.c
E . coli
Ile,c
E . coli
Mek MeV
Anticodon
K, (PM)
kef
1
GCU GUC C7AU CAU CAU (wt) CAU (wt) UAU AAU GAU
ccu cuu
E . coli
Met,,, Met,”
Yeast
Met,&
E. coli
Pheb
Yeast
Phe
CAG CAC CUA c 4 A u (wt) CAU CAU UAU GAU CCG GGU UAC CAU (wt)
5.7
0 . W
0.6
1.w
1.1 >100 >100 2.9 >100 >100
1.Of
0.030
Comments
Reduced rate
Mixture of species
1.0 Greatly reduced rate 1.0 1.0
lo-‘ 0.0001f
10-5 4 x 10-5 10-5 10-7
ccu
GAA (wt) AAA CUA GBAA (wt)
Relative specificityg or activity
1.Of
1.0 0.002 1.0 <0.01 Greatly reduced activity 1.00
Reference
94 95
96 96 96 96 96 96 96 96 96 97 98 98 43 18, 43 18, 43 100 18, 44 18, 43
j
k k 101
E . coli
Thr
GAA GAC GAU GCA GUA AAA CAA UAA UGU (wt)
0.043 0.121 0.071 0.104
0.118 0.306 0.250
0.217
1.5f 0.W
0.9w 0.46-f 0.42f 0.84f 0.77f 1.Wf
ucu CUA
E . coli
Trp
Bovine Liver
Trp
E . coli
TYr Tyrb
Yeast
TYr TyrC
CCA (wt) CUA C3CA (wt) U3CA C3UA QUA (wt) CUA GUA (wt) GGA GJlA (wt) GJlA (wt) UJlA CJlA AJlA GUA GCA GAA CUA CCA
0.13 7.7
5.0 8.2 0.32 5.0 0.054
0.052 0.27 0.23 0.55 0.096 0.34 0.43
0.53 2.50
l.W 0.1s
20.50 1.4~ 1.Of 0.076-f 1.W 0.98.f 0.71f 0.48f 0.80f 0.9of 0.72f 0.57f 0.27f 0.1w
1.06 0.24 0.42 0.13 0.11 0.08 0.09 0.14 1.0 Greatly reduced activity Greatly reduced activity 1.0 0.0025 1.0 Inactive Active 1.0 0.043 1.0 0.005 1.0 1.0 0.14 0.11 0.08 0.49 0.11 0.07 0.05 0.01
Also Y,,+G,, Also Y,,+G,, Also Y,,+G,, Also Y,,+G,, Also Y,,+G, Also Y,,+G,, Also Y,,-*G,, Also Y3,-+G3,
101 101 101 101 101 101 101 101
UCU more defective than CUA in oioo
102 102 80
80 103 103 59 59 lOla lOla 104 104 104 104 104 104 104 104 104 104 (continued)
TABLE I1 (Continued) ~~~
Organism
tRNA6.C
E. coli
Val, b
Yeast
Val,
An ticodon UUA UCA UAC (wt) CAU IAC (wt) IAU
Apparent K , (W) 1.10 2.50 1.0 <90
kef 0.51f 0.1w 1.Of
Relative specificityg or activity 0.07
Comments
Reference 104
0.015
104
1.0 10-5
43 43
1.0 Inactive
84
aZn oitro assays have been carried out with native tRNAs unless otherwise indicated. See also footnote a, Table I. wt, wild type. ND, not determined. btRNA in oitro transcript. CAnticudon change introduced using T4 RNA ligase. dMinor bases in anticodons are indicated using the abbreviations of Sprinzl et al. (112) as follows: V1 = uridine-Soxyacetic acid; U8 = 5-methylaminomethyl-2-thiouridine; C7 = lysidine = Nz-(5-amino-5carboxypentyl)cytidine; C4 = N,-acetylcytidine; G 3 = 2'-O-methylguanosine; C3 = 2'-0methylcytidine; U3 = 2'-O-methyluridine. U* in Gly,(U*CC) is an unknown modified U. ekcat (sec-1). fRelative V,=. gRatio of k,/K,,, or V,,/Y, values for mutant versus wild-type tRNAs. hRelative rate at 40 nM tRNA. See (71)and (155) for additional anticodon mutants of E. coli tRNAWc'. 'A. Garcia and R. Giege, unpublished. j F. Fasiolo, personal communication. kE. F. Tinkle and 0. C. Uhlenbeck, unpublished.
RECOGNITION OF
35
tRNAs
TABLE 111 ANTICODON-DEPENDENT RECOGNITIONOF NONCOCNATE tRNAs in Vitro" ~
Organism
tRNALc
E. coli
Arglb (cognate) Met,"
E . coli
Gln, (cognate) Metflr Met,,
Gln,
+ Gln,
(cognate)
Trr,
E. coli
Met, (cognate) Val," Trp" Phe Ile,c
Yeast
E. coli
Phe (cognate) Tvr< Thr," (cognate) Met,"
Yeast
Tyrr (cognate) Phec
E. coli
Val," (cognate) Met,"
Anticodon" CCGl CAU-CCG CAU CUG CAU+CUA CAU CAU-CUA U*UG, CUG CCA-CUA CCA CAU UAC-CAU UAC CCA-CAU CCA GAA-CAU C7AU-+CAU C7AU G3AA GJIAjGAA GGU
CAU-r G G U CAU GW GAA-rCUA UAC CA U-* UAC UAU CAU
Apparent Relative K, ( p M ) V,
1.1 5.2
1.0 0.015
0.12 2.2 >100
1.0 0.05
0.19 6.7
1.0 0.21
0.6 1.2 >loo 2.9
1.0 1.9 0.1
>loo
0.093 1.20 2.50 0.20 0.9
1.0 0.015 0.002 1.0 0.03
1.0 5.0
1.0 0.5
Relative specificity' or activity 1.0 3.3 X 0.8 x 10-7 1.0 2.8 X 10-3 2 x 10-6 4.6 X 1.0 5.9 x 10-3 <10-7 1.0 1.0 1.5 X 10-6 2 x 10-2 2 x 10-6 Normal activity Active Inactive 1.0 1.2 x 10-3 0.7 x 10-4 1.0 7 x 10-3 7 x 10-7 1.0 Increased activity 1.0 0.1
4 x 10-5 5 x 10-6
Reference 100 100 100
97 97 97 69 80 80 80 43 43 43 g
g
h 95 95 105 105
105 44 44 44 106
43 43 43 43
eAs in Table I1 footnotes. Noncognate tRNAs are from the same organism as cognate tRNAs. &Asin Table I1 footnotes. cAs in Table I1 footnotes. "As in Table I1 footnotes. e[ V,,,,/K,, ]noncognate/[ V,,IK,,,]cognate. /The wild-type anticodon for tRNAPg is ICG. EL. H. Schulman and H. Pelka, unpublished. JIB. Bahramian, C. P. Lee and U. L. RajBhandary, unpublished.
36
LADONNE H. SCHULMAN
anticodon is the dominant recognition element for these synthetases. Other mutant tRNAs are aminoacylated less efficiently than the wild-type species, suggesting that important recognition elements are also to be found outside of the anticodon (see Sections IV and V).
2. In Viuo STUDIES In uiuo data also provide strong evidence of the important role of the anticodon in the recognition of cognate tRNAs and discrimination against noncognate tRNAs by E. coli synthetases. Abelson, Miller, and co-workers have constructed genes for amber suppressor derivatives of tRNAs specific for all of the amino acids not represented in the original collection of genetically derived amber suppressors, and have tested their amino-acid-acceptor specificity in the amber suppression assay described earlier (52, 53, 107, 108). Of the 16 different tRNAs tested that were active in translation, nine were mischarged (Table IV). Amber suppressor derivatives of Asp, Ilel, Ile2, Met,, and Val tRNAs did not insert their cognate amino acids in uiuo, while derivatives of Arg,, Glu, Gly,, and Thrz partially retained their original identity. Additional genetic experiments also strongly support a role for the anticodon in the recognition of threonine tRNAs by E. coli ThrRS (102,109111). All of the completely or partially mischarged amber suppressor tRNAs accepted either Lys or Gln (Table IV), indicating the importance of the anticodon for the in uiuo recognition of tRNAs by LysRS and GlnRS as well. Although it is tempting to conclude that the anticodon is not important for in uiuo recognition of the other nine amber suppressor tRNAs that specifically retain their original identity, this conclusion is incorrect. Use of anticodon derivatives of tRNAfMet in the initiation assay described earlier shows anticodon-dependent mischarging of this tRNA by synthetases specific for Ile, Phe, and Val (Table V). Phe is one of the tRNAs that retain their identity as an amber suppressor. The importance of anticodon base A3, (Fig. 2) in the recognition of tRNAs by PheRS is shown by the high level of insertion of Phe by tRNAf Met containing the wild-type Phe anticodon GAA, and the absence of Phe in protein initiated by tRNAfMet containing the closely related GAU,, and GAC,, anticodons. The amber suppressor derivative of tRNAPheretains A3, in the amber anticodon CUA,, and this partially accounts for the retention of Phe identity by this tRNA. A3, is not sufficient to specify Phe, however, since other derivatives of tRNAfMet that contain A3, insert no Phe (67a). Sites outside of the anticodon also contribute to the identity of E. coli tRNAPhe (see Section V,C). The apparent paradox of the retention of Phe identity by tRNAPhe(CUA)and recognition of all three Phe anticodon bases by PheRS is resolved by the following scenario: tRNAPhe(CUA)retains suffi-
ON
TABLE IV EFFECTOF THE AMBER ANTICODON CUA THE MISCHARGING OF E . coli tRNAs in Vioo" Partially mischarged
Mischargedb AspMe Ile, (748)f (16%)r Ile,. Metme Val,e
(a).
Arg (55%)p G h A (17%)f,g Gly, (37%)J Met,$ Thr, (86%). su+7 Trpd
Not mischargedd A h CYS Gln, (su+2) GlY, HisA Leu, (su+6) LYs Phe ProH Ser (su+l) Tyr (su+3)
OBased on the insertion of amino acids into position 10 of DHFR-amber unless otherwise indicated. All of the tRNAs conU,,, tain a CUA anticodon. In addition, AspM contains A, A,, . U,, and A,; Ala,, GluA, and HisA contain A,; and ProH contains residues 1-26 and 44-76 from tRNAproand residues 27-43 from tRNAPhe(CUA).The suppressor tRNAs are derived from the following wild-type tRNAs: Ile,, tRNA:"(GAU); Ile,, tRNAp(C,AU); Val, tRNA?'(U*AC); Arg, tRNAtrp(ICG); Gly,, tRNA?''(CCC); Gly,, tRNAglY(U*CC); Thr,, tRNAThr(UCU), Ala,, tRNA;'"(GGC); Gln,, tRNAg'"(CUG); Leu, tRNAp(U*AA); ProH, tRNAPro(CGG);Ser, tRNA,s"(CCA). The effect of changes in base modifications, if any, have not been assessed. All tRNAs were overproduced except Al%, Gly,, HisA, and Trp. Data for AI%, AspM, GluA, Gly,, Gly,, HisA, Ile,, Ilez, Lys, Met,,,, ProH, Thr,, and Val are from 107 and 108. Data for Arg is from 55; for Cys and Phe, from 52; for Gln,, Leu,, and Ser, from 26; for Tyr, from 59; and for Trp, from 79-81. Mischarging of Metf with Gln is based on in oitro (97, 69) and in oioo (113) data. No data are presented for Asn, which is inactive as an amber suppressor tRNA. U* in the anticodon of tRNAy" is uridine-5-oxyacetic acid, and in the anticodon of tRNA:IY and t R N A p is an unknown modified U. bMischarging is at least 95% based on DHFR-amber sequence analysis. "Partially mischarged tRNAs insert a mixture of cognate and noncognate amino acids into DHFR-amber. The percentage of noncognate amino acid inserted is indicated; however, it should be noted that this value does not necessarily correspond to the percentage of in oiuo mischarging. For example, tRNATrP(CUA)inserts 90% Gln, 10%Trp, but is charged equally with Gln and Trp in oioo (81). dLess than 5% of any other amino acid inserted into DHFR-amber. eMischarged with Lys. fMischarged with Gln. EAlso 6% Tyr and 6% Arg.
38
LADONNE A 76 C C
A I
3
.-. .-.
Acceptor stem D loop .''A
G
G
.-.
0
72
"Discriminotor"
70
.-.-. . . . . cT loop. 6.0
. D stem .-. .-.
T S .*45
.-.
*
.-.
30--*40
U
!b
I '
I?.
25'
0'2
n. SCIIULMAN
. . 35
Variable loop
A
3' end
1
Acceptor stem
Anticodon stem
w
Anticodon
FIG. 2. (A) Cloverleaf structure of a typical class-I tRNA, numbered according to 112. Bases conserved in all E . coli tRNAs are indicated. (B) Three-dimensional structure of yeast tRNAPhe (1, 2).
cient recognition elements to be aminoacylated by PheRS in uiuo (undoubtedly with significantly reduced efficiency). The amber suppressor tRNA is overproduced several hundred fold (52, 53), partially compensating for its lower affinity for PheRS; the level of charged tRNAPhe(CUA)is sufficient for translation of the relatively low concentration of UAG codons in the in uiuo suppression assay; and tRNAPhe(CUA)is a poor substrate for GlnRS and LysRS due to the presence of negative elements outside of the anticodon which prevent mischarging by these synthetases. Isoacceptor Val tRNAs contain either U or G at position 34 of the anticodon. The remaining two anticodon bases, A, and c36, are critical for in uiuo recognition of tRNAs by ValRS, since tRNAfMefcontaining the anticodon GAC,, inserts only Val, while tRNAfMet(GAU3&and tRNAfMCt (GC,,C) insert no detectable Val (Table V; 67a). A summary of key anticodon bases for in uiuo recognition of tRNAfMetand nonsense suppressor tRNAs by different synthetases derived by analyses of this kind is included in Table VI. It should be noted that the in uiuo data obtained by both the nonsense suppression and initiation assays are in excellent agreement with in uitro data, where available.
RECOGNITION OF
tHNAs
39
TABLE V ANTICODON-DEPENDENT MISCHARCINC OF E. coh tRNAfMetin Viuoa tRNAfMet anticodonb
Cognate amino acid.
GAU GAA GAC CUA
Ile Phe Val
-
Amino acid inserted Ile (>90%) Phe (>90%) Val (>90%) Gln (ND)
aBased on the insertion of amino acids into position 1 of pseudo-lacZ (Phe and Val; 67) or position 1 of DHFR-start (Ile; L. Pallanck and L. H. Schulman, 67a). tRNAfMetGAU, GAA, and GAC anticodon derivatives were expressed in oioo at a level comparable to that of endogenous tRNAs. tRNAfMet(CUA)was overproduced in oioo and was shown to initiate protein synthesis from a UAG codon at the start site of the chloramphenicul-acetyltransferasegene. The protein was not sequenced; however, complementary in oitro experiments showed that Gln was inserted by the tRNA (113). ND, Not determined. hChanges in base modifications have not been assessed. CAmino acid corresponding to the anticodon in the mutant tRNAfMet.
As illustrated by the example of Phe, it is not safe to assume that anticodon recognition is unimportant for the other amber suppressor tRNAs that retain their identity in oioo. Several such tRNAs, in addition to Phe, retain one or more bases present in the wild-type anticodon: Cys, As6; His, U3,+ and Tyr, U3,A3,. In uitro studies show that U,, is a recognition element for TyrRS (101~). However, it is likely that amber suppressor tRNAs that retain their identity have important recognition elements outside of the anticodon, and they may also contain negative elements for GlnRS and LysRS. Direct investigation of the effects of base changes at each site of the anticodon on the efficiency of aminoacylation by cognate synthetases will be required to determine the role of the anticodon in recognition of these tRNAs. Such experiments have been carried out only on tRNAA1"(36).This tRNA requires no specific anticodon bases for recognition by AlaRS (Table 11). The recognition of five Ser tRNAs containing different anticodon sequences by a single E. coli SerRS also suggests that the anticodon plays no role in the recognition of tRNAs by this enzyme. Table VI summarizes current knowledge on the role of the anticodon in recognition of specific E. coli tRNAs.
40
LADONNE H. SCHULMAN
ROLE OF
TABLE VI ANTICODON IN THE RECOGNITIONOF E. coli tRNAs BY E . coli AMINOACYL-tRNA SYNTHETASES~
THE
~
Anticodon recognized ‘4% Gln GlY Ile LYs Met Phe Thr Trp TYr Val
~~~~~~~
An ticodon recognition elements
,Y
c,
’
u,
’
G,
G5 c36 ’
b
usc C, A,. G , . A,.
U, A,
Unknown Asn Aspd CYS Glud His Leu Pro
Anticodon not recognized Ala Sere
c3-5 u, ’
Cmc urnc ‘43-5
’
c,
‘See Tales II-V, XIV, and XV for references. Anticodon-dependent insertion of an amino acid in t h o and/or in oitro aminoacylation studies have been used to assign the E. coli tRNAs that contain recognition eleY in Gln is assigned on the basis of the X-ray ments in the anticodon. , crystal structure of the tRNAC’” GlnRS complex and unpublished data of M. Jahn and D. SOIL b Anticodon recognition sites are unknown. CAdditional anticodon recognition sites may exist. dThe anticodon is required for in oioo identity (107, 108). Data on recognition are unclear. eSerine isoacceptor tRNAs contain base changes at all three positions of the anticodon; therefore, the anticodon probably does not contain recognition elements for SerRS, but this has not been directly determined.
3. E. coli METHIONINESYNTHETASE
The strongest evidence that the anticodon contains the major determinants for recognition of tRNAs by a synthetase comes from the case of E. coli MetRS (43, 82, 96-100). An extensive set of mutants of both tRNAfMetand tRNAMet shows that all three anticodon bases are required for efficient aminoacylation by the cognate synthetase, with the largest effects seen at C, in the “wobble” position (Table 11). In addition, transfer of the Met anticodon to E. coli tRNAVa’, tRNAPhe, or tRNA&Ieleads to efficient mischarging of these tRNAs by MetRS (Table 111). The importance of C, for in oioo recognition of Met tRNAs is illustrated by the loss of Met insertion by tRNAfMefon substitution of G,AU for the normal C,AU anticodon (Table V) and partial retention of Met identity by tRNAfMethaving the anticodon C,,CA (67a).
41
tRNAs
RECOGNITION OF
Comparison of the primary sequences of tRNAs efficiently aminoacylated by MetRS shows that, aside from the conserved bases in all E. coli tRNAs, only the anticodon bases are common (Fig. 3). However, examples of small effects of sequence changes at other sites have been observed. For example, transfer of the Met anticodon to E. coli tRNATv increases the specificity for aminoacylation (V/K,,,)of this tRNA by MetRS 15,00-fold, but is still l/50th of that of wild-type Met tRNAs (Table VII). A conversion of G,, to A,, in tRNATv(CAU) increases V / Y , 10-fold, and an additional change from G,-C,, to C3eG70 raises the activity of the tRNA to that of cognate Met tRNAs. Independent studies of the role of base 73 in aminoacylation of tRNAfMet have shown that A,, is not a recognition element for MetRS (Table VII; 30). Substitutions at this site lead to only small (two- to three-fold) changes in Vma, in the order A,, > U,, > C,,, C,,. Since A and U share no functional groups, direct positive contacts between MetRS and base 73 are unlikely. U,, is also found in bacteriophage T5 tRNAfMet, which is expected to be aminoacylated by MetRS in oioo. We conclude that G,, is a negative ele-
TABLE VII AMINOACYLATION OF tRNAs WITH E . coli MetRS ~~
~~~
tRNA Transcripts"
V (pnol /min/mg)
VlY,
1.1 1.2 1.2
1.9 1.5 1.5
1.7 1.3 1.3
1.4 2.9 >lo0 >loo
0.7 0.08 -
0.5 0.03 2 x 10-6 2 x 10-6
Apparent K , (pM)
tRNA,Met(CAU) tRNAval(CAU) tRNATrP(CAU)G,,+A,, + G, ' C7n+C3 G7n tRNATrP(CAU)G7,+A7, tRNATrP(CAU) tRNATrP(CCA) t RNAVal(UAC)
Relative VIK,
9 x 105 7 x 105 7 x 105
'
Apparent
K, ( W M )
Native tRNAs6 tRNAf'Met(CAU)A7, tRNAfMPt(CAU)A7,+ U, c'73 c73
1.1 1.1 1.1 1.1
-
2 x 105 0.15 X 105 1 1
Relative Vmax
Extent of aminoacylation (96)
1.0
100 99 57 48
0.64
0.36 0.34
OtRNAs were prepared by in oitro transcription. Data are from 43 and from L. H . Schulman and H. Pelka, unpublished. btRNAs were prepared by the addition of N7,CCA-OH to tRNAfMPtmissing the four 3'-terminal nucleotides using T4 RNA ligase (30).
42
n. SCHULMAN
LADONNE
76
0
0 0
0
10-0 0-0 0 - 0 ' 0
0-0 0 - 0 0-0
60
0-0 o
m
0
0
0
O
0 0
0
50
e e . 0 O
0
00.00
0 .
I I I I
0
.
I I I I I
10 0
0
0.0.0
0
0
0
0
20
0 - 0 0 - 0 0-•
30
-
40
0-0 0
0
35
FIG. 3. Composite structure of tRNAs aminoacylated by E. coli MetRS with kinetics similar (within a factor of 2 or 3) to cognate E. coli Met tRNAs (24, 114-117 and B. L. Seong, C. P. Lee and U.L. RajBhandary, unpublished). Open circles indicate the sites of bases conserved in most or all class-I E. coli tRNAs (see Fig. 2). A21 is also conserved in all class-I tRNAs except tRNAcYs. Letters indicate the positions of other conserved bases. Arrowheads indicate the positions of known major recognition elements. Dots indicate the positions of sequence variation. Small dots indicate sites where two different bases have been found; medium dots, where two unbonded bases that share no functional groups or three different bases have been found; large dots, where all four bases have been found. In addition, the size of the D-loop can vary from seven to nine nucleotides, the variable loop can contain four or five nucleotides, and a base-pair is not required at position 1.72 and several other internal sites. Base modifications have been ignored in compiling the composite.
RECOGNITION OF
tHNAs
43
ment for MetRS, which somewhat weakens interaction with noncognate tRNAs having this base. This could be particularly important to maintain fidelity in cases like that of wild-type tRNATrp, which has C , in the anticodon. The magnitude of the negative effect on MetRS produced by G,, is seen to vary with the tRNA context, being greater in tRNATq than in tRNAfMct(Table VII). The small effect (three- to four-fold) of the base-pair change at position 3.70 is also a context effect. C,*G,, is found in E. coli tRNAVa', which is efficiently charged by MetRS (Table VII) and A,.U,, is also an acceptable sequence at this site in tRNAfMet(B. L. Seong, C. P. Lee and U. L. RajBhandary, unpublished). As different nucleotides are functional at all sites outside of the anticodon (Fig. 3), the anticodon bases contain all of the base-specific recognition elements for E. coli MetRS. Functional groups shared by two different nucleotide bases could emerge as additional recognition elements at other sites.
B. Anticodon Recognition in Yeast tRNAs Much less information is available on the role of the anticodon in the recognition of yeast tRNAs. Tables I1 and I11 summarize the effect of anticodon base substitutions on the recognition of cognate and noncognate tRNAs by yeast Ala, Asp, Met, Phe, Tyr, and Val synthetases. In considering these data, it should be realized that, in general, yeast synthetases discriminate less well than E. coli synthetases between cognate and noncognate tRNAs. E. coli synthetases commonly show a specificity for aminoacylation of cognate tRNAs of 105 to 108, while yeast synthetases select cognate tRNAs with a specificity closer to 1W to lo6. The contribution of individual recognition elements, including those in the anticodon, to overall specificity is thus expected to be significantly lower for the yeast enzymes. Strong effects (>1O2) of single anticodon base changes on the recognition of cognate yeast tRNAs have been observed for yeast Asp, Met, and Val synthetases. Genetic studies have also implicated the anticodon as a recognition element for yeast MetRS (119). Single-base changes in the anticodons of yeast tRNAPhe and tRNATyr decrease the efficiency of aminoacylation by the cognate synthetase in uitro; however, the magnitude of the effect of such changes on recognition by these enzymes is much smaller, in the range of 3- to 14-fold (Table 11). Nevertheless, the effects of individual base substitutions are roughly additive, creating large effects on activity in tRNAs containing multiple mutations. In addition, the anticodon bases compose a significant part of the overall recognition set, at least in the case of tRNAPhc(see Section V, B), and switching the anticodons of tRNAPhe and tRNATyr leads to increased mischarging by the corresponding noncognate synthetase (Table 111).
44
LADONNE H. SCHULMAN
Base changes in the anticodon of yeast tRNAAla have little or no effect on the recognition by yeast AlaRS, as was observed for the E. coli Ala synthetase.
C . Summary The anticodons of 11E . coli tRNAs clearly contain one or more important recognition elements for cognate synthetases, and it is likely that this number will increase. In addition, two other E. coli tRNAs have been shown to have identity elements in the anticodon. Although the anticodon is not required for the recognition of tRNAAIa,and possibly tRNASer,these and all other E. coli tRNAs probably contain identity elements in the anticodon that are crucial for the discrimination of cognate and noncognate tRNAs by synthetases in uiuo. Where information is available, yeast tRNAs follow a similar pattern to that observed in E. coli. Thus, it is expected that many eukaryotic tRNAs have retained essential recognition elements in the anticodon as well, as directly demonstrated for bovine tRNATm (103).
IV. Role of the Acceptor Stem and the ”Discriminator” Base at Position 73 Aside from the anticodon, the region of tRNA structure most frequently implicated in synthetase recognition is the domain adjacent to the 3’-terminal CCA sequence, containing the first 3 bp of the acceptor stem (positions 1.72, 2.71, and 3.70) and the fourth base from the 3’ end (position 73) (Fig. 2). The latter base has been suggested to be a universal “discriminator” site that assists synthetases in sorting tRNA substrates (120);the acceptor stem has been postulated to be the site of the earliest recognition elements in the evolution of tRNAs (121).
A. Alanine Synthetases Among E. coli tRNAs, the acceptor stem contains recognition elements for Ala, Gln, His, and Ser, and this region has been implicated in recognition by several other synthetases. The most dramatic results have been obtained with Ala tRNAs, in which a single G,*U, base-pair is a major recognition element for AlaRS both in uiuo (35, 54) and in uitro (35, 36, 59). Amber suppressor tRNA derivatives of Ala tRNAs undergo a large loss of activity on conversion of G,-U,, to a standard Watson-Crick base-pair or on substitu-
RECOGNITION OF
tRNAs
45
tion of a U,*G,, sequence (Table VIII). In addition, transfer of the G,.U,, base-pair to amber suppressor derivatives of tRNAPhe, tRNAcys, and tRNALys converts each of these tRNAs into efficient Ala-inserting tRNAs. The mutant Cys and Lys tRNAs insert only Ala; however, tRNAPhe (CUA)G,-U,, inserts both Ala and Phe, indicating that PheRS can still recognize the mutant tRNAPhe. Mutations at several additional sites (Table VIII) lead to almost complete conversion of the identity of tRNAPheto that of Ala. This could result from a more favorable interaction of the altered tRNA with AlaRS and/or a less favorable interaction with PheRS. McClain et al. (56) have shown that sequences other than G,*U,O allow the insertion of Ala into protein by tRNAAIa(CUA)with low efficiency. Some of these sequences contain neither G, nor UT0. These workers have therefore suggested that a "helix irregularity," rather than a specific sequence at position 3.70, is recognized by AlaRS. However, in uitro data argue that some feature of the G,.U,, basepair is specifically recognized, as no activity is observed with tRNAs containing G,*C,; A,*U70, or U,'G70 using high concentrations of purified enzyme (Table IX). A minihelix consisting of the acceptor and T-stems of tRNAAla plus the unmodified T-loop (Fig. 4) is aminoacylated with a specificity only a fifth of that of the native tRNA, indicating that this truncated RNA contains the major determinants for recognition by AlaRS (122).A microhelix containing only the acceptor stem and a seven-membered loop is aminoacylated with a fiftieth of the specificity of the intact tRNA, indicating loss of contacts that improve both binding and the efficiency of aminoacylation. In uitro footprinting of the tRNAAla.AlaRScomplex show that the enzyme protects phosphates in the acceptor stem on the 3' side of residues 64-70 from nuclease attack (123).Base changes at positions 1.72, 2.71, 5.68, 6.67, and 7.66 in the acceptor stem, and 49.65, 50.64, and 51-63 in the T$C-stem allowed insertion of Ala by tRNA$'"(CUA) in uiuo (35),indicating that none of these sites is essential for recognition by AlaRS. In addition, base-pairs 5-68, 6-67, 7.66, 49.65, 50.64, and 51-63 are not conserved in amber suppressor derivatives of B. mori and human tRNAAla,which are efficiently aminoacylated by E . coli AlaRS in uitro and insert only Ala into DHFR-amber in E. coli (60).It has been suggested that bases 16, 17, 20, and 60 play a role in Ala identity (54); however, these bases are not conserved in the B. mori and human Ala tRNAs. In addition, the low levels (4-6%) of insertion of Lys brought about by mutations at these sites in E. coli tRNAAla(CUA)would very likely be blocked by the presence of a wild-type Ala anticodon containing G,,, making the significance of the Lys mischarging unclear. Sites outside of the acceptor and T-stems are likely to be involved in Ala identity by making negative contacts with noncognate synthetases.
TABLE VIII
EFFECTOF MUTATIONSON THE RECOGNITIONOF tRNAs in Vioo BY E . coh AlaRs tRNA ( a n t i d o n change)
tRNAtL"(GGC)+(CUA)
tRNAPhe(GAA+(CUA)
Additional mutations
Amino acid inserted
(%)a
Ala, 96 Ala, 18; Lys, 29; Gln, 44 Ala, 89 Ala, 55; Lys, 29; Gln, 6 Ala, 83;Gln, 12 Ala, 90 Ala, 75; Lys, 12 Ala, 97
Phe, 100 Ala. 24; Phe, 76 Ala, 63; Phe, 37. Ala, 96; Lys, 4
Suppression efficiency ( % ) b . c 4.126 0.076 0.636 0.3Qb 0.556 0.36h
0.22" 21c Inactive' Inactivec Very weakly activec 12= 786d 106
ND 146
tRNACys(GCA+(CUA)
None c3
tRNAL~5(U,,UU+(CUA)
'
G70*G3
'
u70
None G3
. C70+G3
.
G3 . AiO A3 c i o '
tRNAz'Y(U*CC+(CUA)
None G , C,O+G, None None '
tRNATYr(QUA+(CUA)
u3
3
'
U7"
.
'
7 '0
u3 .
'
7 '0
(1 X (17 x (1 X (17 X
AlaRS)
AlaRS) AlaRS) AlaRS)
CYS Ala, > 9 Y
Xbd
LYS, 94 Ala, 94 Ala, 39; Lys, 49 Ala, 22; Lys, 69 Gly, 16; Gln, 84
31" 34" 1.66
Gly, 5; Gln, 95 Tyr onlyf Tyr onlyf Tyr, 95; Gln, 5.f Ala, 95; Gln, Y
ND
3.6" 24
3" ND ND ND ND
"Insertion into DHFR-amber. The percentage of each amino acid at position 10 is given. Note that this may not correspond to the percentage of aminoacylation of the tRNA in oioo. All tRNAs were overproduced. See Table 11, footnote d, for the definition of anticodon minor bases. bSuppression of the amber allele A,, in the hcZ-Z fusion gene (52). Data are from 56 unless otherwise noted. See this reference for additional mutants. (125). Data are from 35. cSuppression of TrpA(UAG,) dData are from 52. eData are from 35. /Data are from 59.
48
LADONNE
n. SCHULMAN
FIG.4. (A) Structures of E . coli tRNAf", minihelix*la, microhelixAla, and minihelixTYr. Base changes from the wild-type sequence are indicated by arrows. (From 8 with permission.) (B) Sites of known major recognition elements in E. cob Ala tRNAs are indicated by arrows.
RECOGNITION OF
49
tRNAs
Retention of very weak Ala-inserting activity by tRNAA1"(CUA)mutants containing non-G,*U7, sequences may indicate the presence of additional weak recognition elements at other sites. Sequestering of low levels of aminoacylated tRNA by EFTu-GTP may also contribute to the in uiuo activity of these mutants. However, it is likely that these weak suppressor tRNAs would show no measurable in uiuo activity if they were not significantly overproduced. Two nonstandard base-pairs, G3eA70 and A3C70, also lead to the insertion of low levels of Ala by tRNALYs(CUA)(56), suggesting that mismatches at position 3.70 may assist in adapting the structure of a tRNA to the surface of AlaRS in a manner leading to inefficient aminoacylation in the absence of G,*U,,. These results are somewhat reminiscent of those seen in the mischarging of tRNATyr(CUA)acceptor-stem mutants with Gln, where introduction of sequences different from those present in tRNAGln leads to mischarging by GlnRS (see Section IV,C). Several suppressor tRNAs to which the G3.U7, sequence was transferred failed to insert Ala in uiuo, including tRNA,G'y(CUA) and tRNATYr(CUA) (Table VIII). The activity of tRNATyr(CUA)G3-U7,was examined in uitro and found to be a tenth of that of tRNAAIa(Table IX). Comparison of the kinetic
B
I
FIG.4. (cont.)
50
LADONNE
H.
SCHULMAN
TABLE IX AMINOACYLATION OF RNAs WITH E . coli AlaRSO Apparent K, (pM)
RNA
k,,, (SKI)
k , t K
( M - 1 s - 1 x 10-5)
Relative
k,,,lK,,
~~
tRNA:'"(UGC) tRNAf"(CUA) + U3,4+A, tRNATYr(CUA)+ C3G,o-*G,. U7, MinihelixA'a MinihelixTyrC, G7,+G3 . U7, Minihelixc~'C3. G7O+G3 . u70 + U,,+A7, MicrohelixAla tRNA;'"(CUA)G, U,,+G, C, '4, ' G,o '
MinihelixcW, . G7O+G3 MinihelixAlaA7,+ N7,
c70
u70
2.2 2.9 14.0 9.1 8.8 8.8
1.0 1.8 0.6 0.9 0.5 0.3
35.9
0.3 0.078 0.02 No activity at 4-pM tRNA, 20-pM AlaRS No activity at 4-pM tRNA, 20-pM AlaRS No activity at 4-pLM tRNA, 20-pM AlaRS No activity at 2-pM RNA, 0.75-pM AlaRS Rate A,, S- C7, > U,, > G7,
>90
4.5 6.2 0.43 1.0 0.53 0.32
1.0 1.38 0.10 0.22 0.12 0.07
a Data are from 36.59, 122, and 126. The concentration of Ala in the assays is suhsaturating; however, this does not greatly affect the kinetics of aminoacylation by AlaRS (127).
parameters for aminoacylation of the tRNA by TyrRS and AlaRS in uitro and estimates of the endogenous levels of the two synthetases suggested that the intracellular concentration of AlaRS was too low to compete effectively with TyrRS for tRNATYr(CUA)G3.U7,. Elevation of the AlaRS concentration by introducing a plasmid carrying the alas gene resulted in insertion of Ala by the tRNA in uiuo (Table VIII). The failure of tRNA,G'Y(CUA)G,-U,, to insert Ala in uiuo could be due to more favorable interaction of this tRNA with GlnRS (Table VIII) and/or to the presence of negative elements for AlaRS. A missense suppressor derived from wild-type tRNALYs(U*UU)containing a G , ~ C 7 0 ~ G , * U 7mutation 0 was isolated earlier in genetic studies and shown to insert either Gly or Ala in uiuo (124).Suppressor activity is lost when cells containing the mutant tRNA and a temperature-sensitive AlaRS are grown at an intermediate temperature, indicating that Ala is inserted by this tRNALys derivative (F. T. Page1 and E. J. Murgola, unpublished). The mutant Lys tRNA has also been known to accept Ala in uitro. The G3eU70 base-pair was predicted by sequence analysis to be a recognition element for E. coli AlaRS (128,129),as it is uniquely present in Ala tRNAs in E. coli. This structural feature has also been preserved in higher organisms (112).Early studies using reannealed acceptor-stem fragments derived from yeast tRNAAla showed that the acceptor stem contains sufficient information for in uitro aminoacylation by yeast AlaRS (130). The G,.U7, base-pair has recently been shown to be required for in uitro ami-
RECOGNITION OF
tRNAs
51
noacylation of human and B. mori Ala tRNAs by homologous and heterologous Ala synthetases (60), suggesting that this sequence is an important recognition element for all of the Ala enzymes. Although tRNACys(CUA)G,*U7,inserts only Ala in uiuo, this tRNA was inefficiently aminoacylated by AlaRS in uitro (35).Recent studies (126)indicate that this may be due largely to the presence of U,, in tRNACys. A minihelix containing the acceptor and T-arms of the Cys tRNA plus a change of U,, to A,, was aminoacylated in uitro by AlaRS with a specificity only a third of that of the Ala minihelix, while the U,,-containing minihelix was inactive (Table IX). Substitutions of any other base in place of the wild-type A,, sequence in the Ala minihelix also resulted in a significant reduction in both the rate and extent of aminoacylation, indicating that A,, is also a recognition element for AlaRS (Fig. 4). The effect of changes at A,, appear to be mainly on kcat, while changes at G3.U70 strongly affect both kcat and K,,, (36, 126). A change of A,, to U,, in the amber suppressor derivative of tRNA$Ia has no detectable effect on the identity of the tRNA in uiuo, indicating that G,.U,, is a dominant recognition element for this tRNA (Table VIII).
B. E. coli Serine Synthetase The in uiuo amber suppression assay has been very effectively used to study the structural requirements for recognition of tRNAs by E. coli SerRS. The original identity swap type of experiment was carried out by Normanly et al. (19), inserting 12 base changes into the structure of tRNA,L""(CUA)to convert it into a Ser amber suppressor tRNA (19).Subsequent studies show that only eight of these changes (in addition to the anticodon changes) are required for the Leu-Ser conversion (Fig. 5; 7). Six of the important sites are located near the acceptor end of the tRNA. Four generate sequences conserved in all E. coli Ser tRNAs and bacteriophage T, tRNASer: G,.C,,, G,C,,, and G73. Sequences found at position 3.70 in wild-type Ser tRNAs are either A,.U,, or U,*A,,; however, only A,*U,O brought about the desired conversion to Ser identity, possibly by blocking interaction of the tRNA with LeuRS (7). Thus, position 3.70 contains an identity element, but not necessarily a recognition element for SerRS. Substitution of 2 bp at positions 1-72and 3-70 in tRNASer(CUA)leads to a large loss of Ser suppressor activity and a gain of Gln-inserting activity (61) (Table X). The changes are to sequences important for the recognition of tRNAs by GlnRS (see Section IV,C); however, it is not clear that GlnRS would recognize native tRNAser(U*GA) containing the anticodon base G,, and having the same acceptor-stem mutations. Such a tRNA might retain Ser identity. In addition to the mutations in the acceptor-stem region, the complete conversion of tRNAp(CUA) to a Ser-inserting tRNA required an additional change at position 11.24 in the D-stem (Fig. 5). A C,,*G,4 sequence is found
52
LADONNE
N. SCHULMAN
16
0
0 I
G-C G -C
G
.-. .-.
0 - 0 7 0
0-0 0-0
G 0
O 0 * .
20..
.
p,
@$* ..GO
21
b* ttii;
Yo..
C-G 0 - 0 .-• 30 0 60
-
0-0
\
A
C
A A
e
* 35 U
O
0
35
FIG. 5. (A) Base changes involved in the in oiuo conversion of E. coli tRNAk' identity from Leu to Ser (7, 19). (B) Composite structure of E. coli and bacteriophage T, Ser tRNAs (112). Due to uncertainty in the alignment of the D-stem, this region of the selenocysteineinserting tRNASer has been omitted from the composite (see text). The large variable loop is not conserved in size or sequence. See the legend to Fig. 3 for definition of the symbols.
in T4 tRNASer and all four E. coli Ser tRNAs; however, the Ser tRNA that normally inserts selenocysteine at the site of UGA codons in specific E . coli proteins (131)has unusual D-stem and acceptor-stem structures (132).There are 8 bp in the acceptor stem, followed by two unpaired bases (9 and lo), and a 4-bp D-stem. By the conventional cloverleaf arrangement, the base-pair in the position equivalent to 11.24 in the other Ser tRNAs is a G C sequence, raising questions about the exact role of the 11-24 base-pair in the recognition of tRNAs by SerRS. Three of the five Leu tRNAs contain Cll.G24, suggesting that C - G does not inhibit LeuRS. Thus, structural features other than primary sequence may play a role in the interaction of SerRS with this region of its tRNA substrates. The long variable arm of Ser tRNAs also plays a role in discrimination between cognate and noncognate tRNAs by SerRS (101~). In uitro identity swap experiments designed to convert tRNATyr into a Ser-accepting tRNA suggest that the orientation of the variable arm, rather than its primary sequence, influences the interaction of tRNAs with SerRS. Comparison of the conserved sequences in Ser tRNAs (Fig. 5 ) suggests
RECOGNITION OF
53
tRNAs
TABLE X MUTATIONSAFFECTING in Vivo AMINOACYLATIONOF tRNAs
tRNA tRNA;*''
tRNAp
Amino acid inserted ( % ) c
Mutations" U*AA+CUA All (see Fig. 5) Omit G I . U,,+G, Omit C, G7,+GP Omit G , . C,,+A, Omit A,,+G,, Omit U,, Az4+Cl, VIGA-CUA VIGA+CUA + GI VIGA+CUA + A,
Leu, 99 Ser, 92 Leu, 15; Gln, Leu, 91; Gln, Leu, 72; Gln, Leu, 99 Leu, 38; Gln, Ser only
C,, C,,
, U . G,, '
'
C7z+U, U,O+G,
BY
'
A72
'
C70
78; Ser, < I 9 6; Ser, 20
39; Ser, 16
Gln, >90; Ser, 5-6
SerRSa Suppression efficiency (%)d.e 52-5gd 33-49d 12d 5-9d ll-12d 20-35d 35-48d 47e 47p
UData from t R N A p are from 7 and 19 and from Normanly and Abelson, unpublished. Data for tRNA? are from 61. bunnumbered sequences are anticodon sequences. c h e r t i o n into DHFR-amber at position 10. "Efficiency of suppression of derivatives of locl-Z containing amber mutations at different sites in the locl portion of the fusion gene. tRNAs were overproduced. "Etficiency of suppression at the A,, amber allele of lad-Z. tRNAs were not overproduced.
that there are few other sites outside of the acceptor stem region that could contribute to base-specific recognition by SerRS.
C. E. coli Glutamine Synthetase In addition to important sites in the anticodon, E. coli GlnRS also recognizes key structural features in the acceptor-stem region of tRNA substrates. Early evidence for this came from genetic studies, in which mutants of an amber suppressor Tyr tRNA were isolated that insert Gln at the site of UAG codons in uiuo and accept Gln in uitro (Table XI; 46-52). The first mutation obtained converted A,, to the Gln sequence G,, and led to insertion of Gln, but not Tyr, in uiuo. This mutation has the dual effect of increasing activity with GlnRS and reducing activity with TyrRS (49). Mutations at position 1.72 were subsequently isolated that converted the wild-type Tyr G,.C,, sequence to weak or mismatched base-pairs and led to the insertion of both Tyr and Gln in uiuo. These sequences did not correspond to the Gln sequence U,.A,,, suggesting that an easily disrupted base-pair, rather than a specific primary sequence, favors interaction with GlnRS. This was further suggested by later experiments with an amber suppressor derivative of E. coli tRNAfMet,which contains a C,.A,, mismatch at the 5' terminus and is also a substrate for GlnRS in uitro (97). An A*C mismatch at position 2.71 in
AMINOACYLATION
tRNA tRNA:'"(CUG) tRNAfMet(CAU-&UA)
Additional mutations None None 1'
'
A72+U1 1'
cl
tRNATYr(QUA+CUA)
. . G7?. . G72 +
A73jG73 GI . C 7 p A I . C7, GI - u,, '1
G2
'
'71jAP
'
. 7' 2 . 7' 1 '
tRNATyr(QU A)
OF
'71
G* . u,, Wild-type tRNA
tRNAs
TABLE XI BY E. coli GlnRS in Vioo AND in Vitro" Relative V,,,.J&b
6325 29 9 1 5
In oitro activity Incomplete charging at high [GlnRS] Complete charging at high [GlnRS] Complete charging at high [GlnRS] Complete charging at high [GlnRS] ND Active at high [GlnRS] Inactive at high [GlnRS] Inactive at high [GlnRS] Rate relative to tRNA2'" at 1 0 - ~ MtRNA'
aData on tRNAfMctare from 69 and data on tRNATyr are from 46-51. ND, Not determined. bApparent K,,,at subsaturating amino acid concentration. =Amino acids are inserted by tRNATyr derivatives into T4 am H36 head protein. dNo Tyr, according to 48; some Tyr, 46. eData are from 133.
Amino acid inserted in oiooc Gln ND ND ND ND Gln > Tyrd Gln Gln, 20; Tyr, 80 Neutral amino acid Gln, 30;Tyr, 70 Gln and Tyr TYr TYr TYr
+ Tyr
RECOGNITION OF
tRNAs
55
tRNATYr(CUA)also allowed in uiuo mischarging by GlnRS, although this sequence change is actually away from the wild-type G,*C,, sequence of tRNA”’I1. Again, the data suggest that unpairing of the acceptor stem facilitates interaction with GlnRS (49). tRNAfMef(CUA) and the position 1.72 mutants of tRNATYr(CUA)contain A,,, indicating that G,, is not essential for GlnRS recognition. This conclusion is also consistent with the results on the Gln-inserting amber suppressor tRNAs (Table IV), where only three of the five tRNAs mischarged by GlnRS have G,, (two have A,,). Conversion of the C,*A,, sequence in tRNAfMet(CUA)to a “glutamine” U,.A,, base-pair actually reduces the specificity for aminoacylation by GlnRS (Table XI; 69), suggesting that G,, plays a more important role when no mismatch is present at 1-72. The stronger C,*G7, base-pair further reduces the activity of tRNAmet(CUA), and in this structural context, G, is seen to enhance interaction with GlnRS fivefold. The X-ray structure of the tRNAG’”.GlnRS complex (12)reveals that the base-pair at position 1-72 is broken, and base-specific contacts are made at both positions 2.71 and 3.70 by GlnRS (see Section VII). G,, is involved in an RNA.RNA contact that facilitates the conformational change at the 3’ end of the tRNA. Each of the recognition elements in the acceptor stem contributes to the overall interaction between GlnRS and its tRNA substrates, but none is essential. Of the suppressor tRNAs mischarged by GlnRS, only tRNATrl’ (CUA) contains all of these elements. In addition, individual changes at each site do not eliminate the Gln acceptor activity of tRNAG1I1(CUA)in uiuo (M. J. Rogers and D. Soll, unpublished). U,, in the anticodon makes a much larger contribution quantitatively to recognition by GlnRS, increasing the specificity of the enzyme for tRNATq (C,,--*U,,) by 105 and for tRNAfMet(A,,.U,,+U35.A36) by 103 (Table 111). Nevertheless, the sum of the recognition elements in the acceptor stem makes a significant contribution to the recognition of tRNAs by GlnRS.
D. Other E. coli Synthetases tRNA1Iis is unique among E . co2i tRNAs in having only three unpaired bases at the 3’ terminus, plus an acceptor stem containing 8 bp (134)(Fig. 6). The role of this unusual structure has been investigated by examination of a series of tRNAHisderivatives prepared by in uitro transcription (Table XII; 42). In one set of experiments, the extra base at the 5‘ end (designated G- ,) which is paired with C,, has been removed to generate a “standard” tRNA 3’ terminus. This change causes a large decrease in the specificity for aminoacylation of the tRNA by HisRS, indicating the importance of the structure for recognition by the enzyme. The primary sequence at the -1.73 position is also important for efficient aminoacylation by HisRS. C, is
56
LADONNE ? I
H.
SCHULMAN
76
FC
F, - ;73 4 I
w
-I
IG-CC72
-
-
-
-
-
FIG. 6. The unusual structure at the acceptor end of E . coli tRNAHiS(134). Known major recognition elements (42) are indicated by arrows.
unique to His tRNA in E. coli (112),and conversion to any other nucleotide causes significant loss of activity, whether or not G - is present. Substitution of G - 1.C7-3 by an A- 1-U73base-pair also reduces activity to below 0.1%. Most of the observed effect is on the maximal velocity of the reaction. These data indicate that the G - ,*C, base-pair is an important recognition element for E. coli HisRS, affecting the positioning of the 3' end in the catalytic step of the aminoacylation reaction. The extra 5' G but not the base pair has been preserved in the His tRNAs of yeast and higher eukaryotes (112).The 5'-terminal G - is encoded in E. coZi, but is added post-transcriptionally to the cytoplasmic tRNAHisof higher organisms (135, 136). The nature of the base at position 73 plays an important role in the recognition of E. coli tRNAASpby AspRS (Table XIII; 40). Alterations of G,, reduce activity to 1/200 or less. In this case, significant changes in both K , and V,, are observed. The nucleotides that substitute best for G,, (U > A) share some functional groups with G , suggesting that direct contacts may be made by the enzyme at this site. The discriminator base also plays a role in the recognition of tRNATyr by E. co2i TyrRS, since conversion of A,, to G,, greatly reduces the Tyr-insert-
RECOGNITION OF
ROLE OF
57
tRNAs
THE
TABLE XI1 EIGHT-MEMBERED ACCEPTOR STEM IN THE AMINOACYLATION OF E. coli tRNAHisTRANSCRIPTS"
Sequence change
Apparent K , (PM)
Vmax (pmol/minlmg x 10-2)
4.0 3.7 4.8
100 7.8 1.5
None G-I C 7 p C 1 A,, G-I u73 G-I ' CiR Delete G - , Delete G - , + C,,+A,, '
10
6.1
10
4.1
Relative VIK, 1.0 0.084 0.013
u73
G7,
G-lC73+A-l . C, A-I A73 A-, u73 A-I ' G73 '
'
" D a t a a r e from 42
+,
ing activity of su tRNATyr(CUA)in oioo and the rate of aminoacylation by TyrRS in oitro (49, 101a). Modeling of the interaction between Bacillus stearothermophilus TyrRS and the acceptor stem of tRNA substrates has suggested that recognition of A,, may play a critical role in positioning the 3' terminus of tRNATyr at the active site (137; see Section VIII).
E. Summary A number of E. co2i tRNAs contain important recognition elements for their cognate synthetases in the acceptor stem and/or at the discriminator site (position 73). Neither AlaRS nor SerRS appears to recognize the anticodon of cognate tRNAs, and for these synthetases, critical recognition ele-
ON
TABLE XI11 EFFECT OF CHANGES IN THE "DISCRIMINATOR" BASE THE AMINOACYLATION OF tRNA TRANSCRIPTS BY E . coli AsuRS' tRNA
tRNAASP(GUC)G7, tRNAATP(GUC)G73-+A7, u73 c73 a
Data a r e from 40.
Apparent K", (PM)
Relative
0.32 1.4 5.0
1.0 0.008 0.08
a,x'
Relative VIK, 1.0 0.0018 0.0051 <0.0015
58
LADONNE H. SCHULMAN
ments are found in the acceptor-stem region. GlnRS binds to sites in the anticodon as well as the acceptor stem of tRNAGII1(see Section VII). The unusual acceptor stem of tRNAHisserves as a recognition element for HisRS, and the discriminator base is important for the recognition of tRNAAla, tRNAAsp, tRNAGln, tRNAHiS,tRNASer, and tRNATyr. Other E. coli synthetases, such as MetRS (Table VII) and PheRS (Table XVII), do not strongly discriminate one base from another at position 73. However, many E. coli tRNAs may contain identity elements at this site, for example, sequences protecting against mischarging by those synthetases that strongly interact with base 73 in their cognate tRNAs. Similarly, tRNAs recognized primarily through contacts at sites outside of the acceptor stem may contain important negative identity elements in this helical region near the amino-acid attachment site, which is likely to approach closely the surface of a synthetase as it discriminates between cognate and noncognate tRNAs.
V. Other Recognition Profiles A. E. coli Arginine Synthetase Comparison of tRNA sequences by computer analysis (128) led to the prediction that nucleotides 20 and 35 in E. coli Arg tRNAs might be recognition elements for ArgRS. The role of A,, in the D-loop of tRNAArghas been investigated, using an amber suppressor tRNA derivative (55,108).Deletion of A,, or conversion to U,, resulted in almost complete loss of Arg-inserting activity (Table XIV). Introduction of A,, into the D-loop of tRNAPhe(CUA) resulted in little production of mature tRNA from the plasmid-encoded gene. A second mutation at position 59 restored synthesis and resulted in the insertion of mainly Arg into H, folate reductase-amber. Positions 16, 17, 20, 59, and 60 contain some single-stranded bases that are brought into close proximity by tertiary structure interactions between the D- and T-loops (see Fig. 2). This has been referred to as a “variable pocket,” since this cluster contains different numbers and types of bases in different tRNAs. Early in the analysis of tRNA structure, it was suggested that this cluster, located in an accessible patch between sets of highly conserved residues, could serve as a recognition system for the discrimination of tRNAs by synthetases (138). A,, could be a recognition element which directly interacts with ArgRS, since it is unique to Arg tRNAs in E. coli. The exact position of this residue in the three-dimensional structure of the tRNA is dependent on the structure of the variable pocket. In the case of tRNAArg,residues in the T-loop do not appear to contribute directly to recognition by ArgRS, since none of these bases is conserved in Arg tRNAs (Fig. 7).
RECOGNITION OF
59
tRNAs TABLE XIV
EFFECTOF MUTATIONSON
THE RECOGNITION OF
tRNAs in Vivo BY E . coli ArgRSa
Amino acid inserted tRNA tRNAfrp(ICG)
CUA CUA CUA CUA UCA tRNAPhe(GAA) CUA CUA CUA CUA
Mutationsb
+ A,,+U,
+ AsS+Us,
+ A, + A,
+ +
deletion
insertion A, + Us9+A, Us9+A,
Efficiency (%)d Arg, 37; Lys, 55 Lys, 91 Arg, 38; Lys, 50 Arg, 5; Lys, 91e Arg only f Phe, 86 Arg, 72; Lys, 6; Thr, 16 Phe, 91; Leu, 7
30-62 12-28 29-59 43-55' 11-22 Poor tRNA synthesis 3-11 4-13
Data are from 55 unless otherwise noted. bunnumbered sequences are anticodon bases 34, 35, and 36. c h e r t i o n into DHFR-amber at position 10. dEficiency of suppression at the A,, and A, amber alleles of lacl-Z. eData are from 107 and 108. fData are from 1380. (1
The amber suppressor derivative of tRNAArgcontains a base change at C,,, a residue implicated in ArgRS recognition by earlier biochemical studies (83).Consistent with an important role for C, tRNAArg(CU,A) inserts mainly Lys in oioo (55), while the corresponding opal suppressor tRNA having a UC,,A anticodon inserts only Arg (138~). In oitro experiments using tRNA transcripts of tRNAArgwere carried out to evaluate the relative contributions of A,, and C, to recognition of tRNAs by ArgRS (Table XV; 100). The primary sequence of E. coli tRNAMetdiffers from the consensus sequence of conserved sites in Arg tRNAs by only these two nucleotides (Fig. 7). Conversion of the Met anticodon to an Arg anticodon resulted in a 40,OOO-fold increase in V,,/K, for aminoacylation by ArgRS, while substitution of A,, increased the activity 1000-fold. Introduction of both changes yielded a tRNAMefderivative that had nearly normal Arg acceptor activity (Table XV). Thus, A,, and C,, are both important recognition elements for ArgRS, and the bases at these sites make a large contribution to both the K , and V,, for aminoacylation. Few other conserved sites are found in the consensus structure of Arg tRNAs (Fig. 7). Nevertheless, tRNAArg(CUA)having a deletion of A,, and a base change at C,, retains the ability to insert a low level of Arg in uiuo (Table XIV). This raises the possibility of additional recognition elements, or of a weak compensating interaction with C, in the absence of C, as has been suggested for tRNAArg(CUA)having a deletion of base 26 (55).
60
LADONNE H . SCHULMAN 76
0
A
0
B
.
0
I
.-. .-. .-.
G--.
.-a70
-a
0-0
-
c
I I I I I
n
1 1 1 1 I
'
- 0
..... ....
U,D 20-1
G
0
.%
I l l 1
20A
OG
10.
0 - * G
0
4
: a e
C.
0 . .
G
e
.
..-. .-. -0
0-0
30.-040
C
.
A
a
c
C I,
.
35
35
FIG. 7. (A) Base changes involved in the in uitro conversion of E . coli tRNAMet into an efficient arginine acceptor tRNA (100)are indicated by arrowheads. Other bases shared by E. coli tRNAMet and E . coli and bacteriophage T4 Arg tRNAs are also indicated on the cloverleaf structure. (B) Composite structure of E. coli and bacteriophage T4 Arg tRNAs (112). The sequence of tRNAfB(ACG) is in question (55) and has been excluded from the composite. Arrowheads indicate major known recognition elements. See the legend to Fig. 3 for definition of the symbols.
B. Yeast Phenylalanine Synthetase Earlier in uitro studies revealed the importance of the anticodon nucleotides to the recognition of tRNAs by yeast PheRS (Table 11; 101).Subsequent experiments show that bases at positions 20 and 73 also play an important role in the aminoacylation of tRNAPhe (Table XVI; 37, 139-141). TABLE XV AMINOACYLATIONOF tRNA TRANSCRIPTS WITH E. coli ArgRSa Apparent
K,,, (+M)
tRNA ~~~~
tRNA*=(CCG) tRNA,Met(CCG)Uzo+Azo tRNA,Met(CCG) tRNA,Mct(CAU)Uzo+Az, t RN A,Met(CA U) Data are from 100.
~
1.1 1.2 5.2 4.4
>loo
V (+mol/min/mg)
Relative
VIK,
~
2.0 0.9 0.03 0.0007 -
10 x 1oR 5 x 106 40 X I@ 1 x 18 1
61
tRNAs
RECOGNITION OF
TABLE XVI AMINOACYLATION OF tRNA TRANSCRIPTS WITH YEAST PheRSa tRNA Yeast tRNAPhe(CAA) Yeast tRNAPhe(GAA)Gzo+U,, A7dJ7, E . coli tRNAPhe(GAA)U,, + c, G7n+G, c70 E . coli tRNAPhe(CAA)Uz,+G,, + c, G7,-rC3 c7, Yeast tRNAMet(GAA)A,,+G, + G,, . C,+C,, C , + A,g+U5g Yeast tRNAAr~(GAA)Cz~+G,o+ U,, + c59+u59 + C73+A7, Yeast tRNAArg(GAA)Cm+Czo + U,, + c59+us9 Yeast tRNATyr(GAA) U,u~o_,U,-2+~,, + other changes6
Apparent K , (PW
k,,,
Relative
(min-1)
kcatlKm
0.35
160
2.10
80
1.0 0.083 0.083
'
'
1.80
35
0.042
'
'
0.42
100
0.52
0.41
130
0.68
0.38
110
0.64
1.00
60
0.13
0.36
250
1.5
Data are from 37 and 139. b o t h e r changes: C , . G72+G, . c72, U S A7,+C2 U,, A,, A,, A,,-C,, . G,, and A4,pG&. 0
G,,,
c, . G7o+G3.
c70, c12. G,+
Conversion of G,, to U,, or of A?, to G,,, reduces k,,,lK, for Phe acceptor activity to 1112th. G,, is unique to tRNAPhe in yeast and is one of the variable pocket nucleotides. High-resolution NMR studies of tRNAPhetranscripts containing G or U at position 20 indicate that the structures of the two tRNAs are nearly identical (142), suggesting that the change in activity on mutation of this site is due to the loss of specific contacts with PheRS. Footprinting studies of the yeast tRNAPhe.PheRS complex show that the protein contacts the entire surface of the tRNA (143),consistent with the location of widely separated recognition elements. Extensive studies of the interactions involved in maintaining the threedimensional structure of tRNAPhe reveal that none of the specific bases involved is required for PheRS recognition; however, the proper tertiary structure must be maintained (141).In addition, changes in unbonded bases at positions 16, 17, 59, and 60 have less than a two-fold effect on activity, indicating that G,, is the only required base in the variable pocket. Transfers of the GAA anticodon, G,,, and A,, to other yeast tRNAs yield mutants with nearly wild-type Phe acceptor activity (Table XVI). In addition, conversion of U,, in E. coli tRNAPheto G,, makes the E. coli tRNA a good substrate for the enzyme. Comparison of the sequences of all of the tRNAs that are efficient substrates for PheRS (Fig. 8) indicates that there are few additional conserved sites.
62
LADONNE €1. SCHULMAN 0
76
0 0
.-. .-.
1G 0
A
-
-
4
c
.
0 - a 7 0 - 0
0
.-a
60 m
0 - 0
0
0 0 0 O
.
0
50
0
0
O
0
.
U
I I I I *.*.
0
0
G
20
A
G
I I I I I
0e.e
lo.
.
0
..*.o
0 0
.
0 .
0 - 0 0 - 0 30 G CtO
-
- u
A
0
s
0 A
G
35 FIG. 8. Composite structure of tRNAs aminoacylated by yeast PheRS with kinetics similar (within a factor of 2 or 3) to cognate yeast tRNAPhe(139-141). Arrowheads indicate major known recognition elements. See the legend to Fig. 3 for definition of the symbols. (Adapted from 139 with permission.)
C. E. coli Phenylalanine Synthetase Both in uiuo and in uitro studies indicate that the anticodon contains major recognition elements for E. coli PheRS (Tables V and XVII; 67, and E. F. Tinkle and O.C. Uhlenbeck, unpublished). In uiuo experiments using the amber suppressor derivative of tRNAPhehave suggested that additional sites, including U,,, G27.C43, G,,*C,,, G,,, U,,, U,,, ,U and A,, contribute to Phe identity based on mischarging of tRNAs containing mutations at these sites (54,57). Since tRNAPhe(CUA)retains its identity in uiuo with only one
RECOGNITION OF
63
tRNAs
TABLE XVII AMINOACYLATION OF E . coli tRNAPheTRANSCRIPTS WITH E . coli PheRSa Apparent
K , (PM)
Mutation None C3 ' G7n-tC3 G o G3 C, + GAA-tAAA CUA '
+
ul6-tcl6
+ Cl7-tUl7
+ U2n-tGzo +
'27
'
c43+A27
+ U45+G, + u.5Ll-tc.59
+ u,+c, +
A73-tG73 c73
u73
'
'43
Relative
k,
0.20 0.11
100 70
0.22 0.19 1.10 0.30 0.11 0.95 0.19 0.23 0.35 0.35
80 80 110 100 110 70 70 80 80
70
Relative
k,llK, 1.0 1.2 <10-2 Greatly reduced 0.7 0.8 0.2 0.7 0.12 0.15 0.7 0.7 0.5 0.4
aE. F. Tinkle and 0. C. Uhlenbeck, unpublished.
wild-type anticodon base (A3&,it is likely that additional recognition elements exist outside of the anticodon. However, the interpretation of results obtained in the absence of G , and A,, is unclear, since these bases contain strong recognition elements that would enhance interaction with PheRS and inhibit mischarging by noncognate enzymes. Examination of the effect of mutations on in uitro aminoacylation of E . coli tRNAPheby E . coli PheRS (Table XVII) shows that the largest effects outside ofthe anticodon occur at positions 20,45, and 59 (five- to eight-fold reductions in kcat/&), suggesting that these sites play some role in tRNAPherecognition. It remains to be determined whether other positive or negative identity elements are present in E. coli tRNAPhe. The exact role of nucleotides in the variable pocket is also unclear. Native tRNAPhecontains a modified base, dihydrouridine, at position 20. This base occurs commonly at position 20 in E . coli tRNAs. In addition, it is structurally quite different from U, which also functions well at position 20.
D. Summary Both the anticodon and the variable pocket have been implicated in the recognition of E. coli Arg tRNAs and yeast and E. coli Phe tRNAs. While specific contacts may be made with cognate synthetases in the variable pocket domain, particularly at position 20 in the D-loop, the structural complexity of this region of tRNA structure leaves some doubt as to the nature of the interaction. The variable pocket also contains a metal-ion binding site
64
LADONNE €1. SCHULMAN
(144),which could play a role in tRNA-synthetase recognition. In addition, changes in the anticodon loop affect the structure at the junction of the D- and T-loops (145,146), suggesting that interaction of synthetases with bases in the anticodon could affect recognition sites in the variable pocket domain in a complex manner. The distribution of recognition elements over a larger number of widely separated sites also places more stringent constraints on the overall structure of the tRNAs recognized by the enzymes in this group.
VI. Role of Modified Bases A. E. coli lsoleucine Synthetase A minor species of E. coli Ile tRNA that reads the codon AUA (147) contains a modified base at the wobble position of the anticodon (148).This base, called lysidine, is derived from cytidine by attachment of a Lys residue to the 2 position of the pyrimidine ring (Fig. 9). Prior to modification, the tRNA contains a CAU Met anticodon and is efficiently aminoacylated in uitro by E. coli MetRS, but not by IleRS (149).Conversely, the modified tRNA1Ie is a poor substrate for MetRS, but a good substrate for the cognate synthetase. Thus, the lysidine base plays a dual role in the identity of the tRNA, blocking mischarging by the noncognate enzyme and enhancing interaction with IleRS. A post-transcriptional modification also occurs in the AUA-reading Ile tRNAs of bacteriophage T4and spinach chloroplasts, which similarly encode tRNA"" genes having a CAU anticodon (112). The major species of E. coli tRNA"" has the anticodon GAU. If the base at the wobble position is directly recognized by IleRS, either two different interactions occur with G,, and lysidine 34, or an equivalent functional group is contacted (Fig. 9). C, may also be a negative element for IleRS, while any base other than C,, strongly inhibits the recognition of tRNAs by
a
b
C
FIG. 9. Structures of nucleotides present at position 34 of the anticodons of E . coli Ile tRNAs before and aRer post-transcriptional modification (148).(a) Guanosine, (b) lysidine, and (c) cytidine.
RECOGNITION OF
65
tRNAs
E. coli MetRS, as noted earlier. Substitution of the Ile anticodon GAU for the Met anticodon CAU in E. coli tRNAfMetconverts this tRNA into an Ileinserting species (Table V; 67a). This suggests direct recognition of one or more anticodon bases by IleRS; however, post-transcriptional modifications resulting from the anticodon substitution have not yet been determined.
6. Yeast Arginine Synthetase I n oitro studies indicate that base modifications play an important role in the discrimination of yeast ArgRS against yeast tRNAAsp (150).Transcripts of wild-type tRNAAsP containing no modified bases are aminoacylated by ArgRS with a specificity 500-fold that of fully modified native tRNAAsPp, reducing the discrimination to only 20-fold (Table XVIII). Mischarging of unmodified tRNAAsPwas not observed with yeast HisRS, PheRS, or ValRS (150). tRNAAspcontains unique modified bases at only three sites (IJI13, ,IJ and mlG,,). Native tRNAAsPhas a K, for aminoacylation by ArgRS only 12fold higher than that of tRNAArg,indicating significant &nity of the native noncognate tRNA for the synthetase. Mischarging of native tRNAAsp is mainly avoided by a reduced kcat for aminoacylation (Table XVIII), as observed for a number of other noncognate interactions (151). The unmodified tRNA has an altered conformation, which may allow it to be more readily accommodated to the tRNA binding site of ArgRS, and/or one or more of the modified bases in native tRNAAspmay interfere directly with aminoacylation by the noncognate enzyme (150).
C. Summary Modified bases could contribute to tRNA identity by serving as positive recognition elements for synthetases or as negative elements blocking mischarging reactions. E. coli IleRS and yeast ArgRS are examples of synthetases for which modified bases clearly assist in the discrimination of TABLE XVIII
MISCHARCINCOF YEAST tRNAAsP TRANSCRIPT BY YEAST ArgRSa tRNA
Yeast synthetase
Native tRNAAsP(GUC) tRNAAsP(GUC) transcript Native tRNAArqUCU) Native tRNAA\p(GUC) tRNAAsP(GUC) transcript
Asp Asp Arg Arg Arg
K, (pM)
0.044 0.028 0.073 0.86 0.11
k,,, (s-l)
Relative k,,J~,”
0.66 0.38 0.70
1.0 0.9 1.0
O.ooo65
O.ooOo79
0.045
0.043
“Data are from 150. ‘Jk,,,/K,,, are relative to the native tRNA for each synthetase.
66
LADONNE H . SCHULMAN
tRNAs. Enhanced activity resulting from a base modification was reported earlier, when it was shown that methylation of G,, in E. coli tRNAPhe increases its rate of aminoacylation by yeast PheRS 10-fold (152). The interpretation of this result is unclear, however, since Schizosaccharomyces pombe tRNAPhe lacks this modification and is aminoacylated with the same kinetics as yeast tRNAPhecontaining the modification (153).A modified base in the middle position of the anticodon of yeast tRNATyr ($35) also has a small effect on the aminoacylation of this tRNA by yeast TyrRS (Table 11). The efficient aminoacylation of plant viral RNAs by yeast and plant synthetases (154) and of tRNA transcripts by cognate enzymes (Table I) suggests that modified bases have no direct positive role in the recognition of most tRNAs. Native tRNAs have a more rigid ordered structure, however, which can be mimicked in tRNA transcripts by elevating the divalent cation concentration (37). Thus, modifications enhance aminoacylation activity by “fine-tuning” the structure of tRNAs. At present, there is little direct evidence for a general role for modified bases as negative elements for tRNA identity, although modified bases in or near the anticodon are likely candidates. In uitro transcripts of yeast tRNAPhe are discriminated against by yeast TyrRS to the same extent as native tRNAPhe (37). Similarly, transcripts of E. coli tRNAMetand tRNAVa’ are not mischarged by E. coli GlnRS, GluRS, IleRS, LysRS, or PheRS (43), and transcripts of E . coli tRNAVa’, tRNAThr, and tRNAArg are not mischarged by E. coli MetRS (43, 44, 100). The results with E. coli IleRS and yeast ArgRS suggest, however, that other specific examples of negative regulation of aminoacylation by post-transcriptional tRNA modifications will come to light as more extensive studies are carried out.
E. coli Glutamine tRNA and Glutamine Synthetase
VII. The Complex of
The most exciting event in recent studies of tRNA-synthetase interactions has been solution of the crystal structure of a complex of E. coli tRNAGIn, GlnRS, and ATP at 2.8A resolution (12), allowing details of the molecular basis for the recognition of a tRNA by a synthetase to be revealed for the first time. This achievement was made possible by the earlier cloning and sequencing of the gene for GlnRS and overproduction of the protein and the isoacceptor tRNAF’”(CUG) (156-158). GlnRS is a 63.4-kDa protein containing 553 amino acids arranged in an elongated structure made up of four domains (12).The surface of the protein contacts regions of the tRNA extending from the anticodon to the 3’terminus, along the inside of the L-shaped tRNA structure (Fig. lo), in a manner similar to that originally proposed by Rich and Schimmel (159) as a general binding mode for tRNA-synthetase
FIG. 10. E. coli GlnRS complexed with E. coli tRNA;ln(CUG) and ATP. Protein a-helices are represented as tubes sequentially lettered and p-strands are shown as arrows sequentially numbered, both from the amino terminus. The dinucleotide fold domain includes residues from P-strands 1-3 plus a-helix G and P-strand 10. The acceptor-end binding domain includes the chain between the amino end of a-helix D to the carboxyl end of P-strand 8. The two P-barrel anticodon binding domains consist of P-strands 13-19 and P-strand 20 to the carboxy terminus plus p-strand 12. (From 12 with permission.)
68
LADONNE H. SCHULMAN
complexes. The tRNA adopts an unusual conformation at both ends of the molecule. Bases C,, and G,, in the anticodon are unstacked, and the central anticodon base (U35) is stacked under A,, and buried in a tightly fitting protein pocket. The base-pair at the end of the acceptor stem (1.72) is broken, and the 3' terminus forms a hairpin structure in which the unbonded bases loop back toward the anticodon, rather than extending away from the body of the tRNA (Fig. 10). Base-specific contacts are made in both of these widely separated regions of the tRNA. Refinement of the crystal structure in the anticodon binding domain reveals a set of amino-acid side-chains that interact with a known major recognition element, U,, (99). These strong interactions, involving several charged amino-acid residues, explain the requirement for U at the middle position of the anticodon of GlnRS substrates, since no other base could make the appropriate hydrogen bonds with the protein (Fig. 11). Contacts are also made between the protein and each of the other anticodon bases. The protein pocket containing C,, could also accommodate a U residue, the base present in tRNA7'" (U*UG), but not a purine. Most of the contacts to G,, could not be made with the A residue found at the corresponding position of the amber anticodon CUA,,, and in uitro studies show that tRNA,G'"(CUA) is less efficiently aminoacylated by GlnRS than wild-type tRNA,Gln(CUG)(M. Jahn and D. Sol], unpublished). Thus, all three anticodon bases appear to contribute to the recognition of Gln tRNAs, with U,, having the dominant role. In addition, residues 37 and 38 on the 3' side of the anticodon interact with the protein, although it is not yet clear whether these are base-specific contacts (99). The hairpin structure at the 3' end of tRNAGlnis stabilized by extensive interactions with GlnRS and by a sequence-specific RNA-RNA interaction between the 2-amino group of G,, and a phosphate oxygen of A,, (Fig. 12). The bases of residues A,, C,,, and G,, are stacked on each other, while C,, is looped out to contact side-chains of the protein. The side-chain of an aliphatic amino acid (Leu,,,) is wedged between base A,, of the disrupted U,.A72 base-pair and base-pair G2.C7, (Fig. 12). No sequence-specific contacts are made with base 1 or 72, in keeping with the data described earlier showing that a variety of weak or mismatched base-pairs can function at this position. However, base-specific contacts are made with the adjacent basepairs G,.C7, and G3*C70, implicating these sites as recognition elements for GlnRS (12). Consistent with this, in uitro studies show that a G,*C,,+A,.U7, mutation in tRNAg'"(CUA) greatly reduces the rate of aminoacylation of the tRNA by GlnRS (M. Jahn and D. Sol], unpublished). The side-chain of Asp,,, contacts the 2-amino group of base G, in the minor groove of the acceptor stem, and the main-chain carbonyl of Pro181is hydrogen-bonded to the 2-amino group of G2. These amino-acid residues plus
RECOGNITION OF
tRNAs
69
Cytosine
H\N/H
I
I Adenine
Guanine
FIG.11. Schematic drawing of hydrogen-bonding interactions between the anticodon base Us in tRNACln and GlnRS (99). The protein binding pocket excludes cytidine due to ionic hydrogen bonds from an Arg residue to the pyrimidine 0 - 4 and from a Glu residue to the ring N-3. Adenine and guanine are excluded by size from the binding site. (Courtesy of T. Steitz.)
70
LADONNE
n.
SCHULMAN
FIG. 12. The acceptor end of tRNAClnin the complex with E . coli GlnRS. The side-chain of L e ~ extends ~ 3 ~from a p-turn and wedges between the bases of G2 and A72, disrupting the last base-pair of the acceptor stem, UI.A72. The enzyme stabilizes the hairpin conformation via the interaction of several basic side-chains with the sugar-phosphate backbone. An intramolecular hydrogen bond between the 2-amino group of G73 and the phosphate of A71 further stabilizes this conformation. (From 12 with permission.)
IleIs3 and a buried water molecule which contacts G,C,, form a hydrogenbonding surface which is complementary to that of the two G.C base-pairs at positions 2.71 and 3-70. Interestingly, mutation of Asp,, to Asn,,, leads to a partial relaxation of specificity of GlnRS, allowing in vivo and in uitro mischarging of the amber suppressor derivative of tRNATyrand in uitro mischarging of a specific subset of wild-type E. coli tRNAs (13, 133, 160), further establishing a role for this
RECOGNITION OF
71
tRNAs
region of the protein in tRNA recognition. The mutant enzyme aminoacylates tRNAF'"(CUG) with one-tenth the K,,,of wild-type GlnRS, and much more efficiently aminoacylates the A,.U,, derivative of tRNA2'" (CUA) than the wild-type enzyme (S. Englisch, P. Hoben and D. Soll, unpublished). A mutation from Ile~,gto ThrlZ9 in GlnRS allows weaker mischarging of su 3 tRNATYr(CUA)than observed with the Asn,,, mutant (13).I1elzglies close to the phosphate of C74, and the mutation may result in the formation of a stabilizing contact not made by wild-type GlnRS (13). The known recognition elements in tRNAz'"(CUG) are summarized in Fig. 13. Other bases that contact the protein and that may provide additional discrimination are also indicated. Recognition elements include bases that make sequence-discriminating interactions with GlnRS, and two structural features that facilitate the essential conformational change at the 3' terminus of tRNAFln(CUG), the RNA-RNA interaction involving G,, and the weak base-pair at position 1.72. +
35
FIG.13. Major known recognition elements in E . coli Cln tRNAs are indicated by arrowheads. Additional bases that contact GlnRS in the complex are also indicated. See Sections 1v.C and V11 for references and discussion.
72
LADONNE I f . SCHULMAN
VIII. tRNA Binding Domains of Other Synthetases E. coli AlaRS is the largest of the E. coli synthetases, having four identical subunits, each containing 875 amino acids (161).Deletion analysis has revealed that the functional domains of each subunit are arranged in an approximately linear fashion, with the amino-acid activation domain nearest the amino terminus of the polypeptide, followed by sequences required for the aminoacylation of tRNAAIa, and then by an oligomerization domain near the carboxy terminus (162).Several other synthetases have an arrangement of functional domains similar to that seen in AlaRS (4).The smallest fragment of AlaRS able to complement an alaS null mutant strain is a monomeric enzyme containing amino acids 1 to 461. Further deletions show that sequences that contribute most to tRNA binding are located between residues 368 and 461 (Table XIX). In addition to interactions in this domain, the 3' end of the tRNA must also bind to amino-acid residues near the active site during the transfer step of the aminoacylation reaction. An amino-acid residue (Lys?,), which lies close to the acceptor end of tRNAAla in the AlaRS*tRNAA'"complex, has been identified by crosslinking and shown by site-directed mutagenesis to affect tRNA binding (163;Table XIX). Mutations in the protein that allow aminoacylation of an amber suppressor derivative of tRNAA1"containing a G,.U70+G,C70 base change have also recently been obtained (P. Schimmel, personal communication). Characterization of these mutants should lead to the identification of amino-acid residues important for recognition of the key structural feature required for tRNAAIaidentity. High-resolution X-ray crystal structures are available for E. coli MetRS and B. stear. TyrRS (16,167-172);However, cocrystals of complexes containing the cognate tRNAs have not yet been obtained. Native E. coli MetRS is a dimeric protein of identical subunits, each containing 676 amino acids (173,174). Limited trypsin digestion releases peptides from the carboxy terminus of the protein to produce a biologically active monomeric enzyme containing amino acids 1 to 547 (MetRS,,) (175,176). This form of the synthetase has been crystallized in both the presence and absence of ATP (16).Crosslinking experiments show that the tRNA binding site is located largely in the carboxy-terminal half of MetRS (177,178),with residues close to the 3' end of the tRNA located in the amino-terminal domain near the active site (179). Coupling of a crosslinker between anticodon base C,, and Lys,= in MetRS identified a region of the protein near the anticodon binding domain (178).This domain is located at the extreme periphery of the MetRS molecule at a maximal distance from the active site (16,167).Site-directed mutagenesis of nine amino acids in this region having side-chains potentially
SYNTHETASE MUTATIONS THATAFFECT
Synthetaseb (subunit structure)
E . coli AlaRS,,
(wt)
tRNA binding sites
4
Mutation None
TABLE XIX THE BINDINGAND/OR AMINOACYLATION OF tRNAs"
KmCor relative binding d
Relative
k.f
klK,,,
Location of amino acid in synthetase-tRNA complex
Comments
l.Od.9
1.0f
1.0
12.19
0.w
0.02
Near active site
Crosslinks to oxidized A76
0.29 2.19
1.0f 0.05.f
5.0 0.02
Dispensible domain
2.89
0.121
0.04
Dispensible domain
Coupling between dispensible and indispensible domains Coupling between dispensible and indispensible domains Complements ahS null strain Complements alas null strain Does not complement alas null strain Does not complement alas null strain
Reference
(a,)
4
Delete 809-875
1
Delete 462-875
1
Delete 386-875
1
Delete 369-875
2
None
0.8 at pH 5.0d 5 x lo-* at pH 5.0d 5 x lo-, at pH 5.0d < 5 x 10-5 at pH5.0d 1.0e
Delete 553-557 and 1.og add four unrelated AAs (RQRD)
-0.001f -0.001f
1.W
1.0
0.W
0.2
163 164
165
165
162, 166 162, 166 162, 166 162. 166
182
(continued)
TABLE XIX (Continued)
Synthetaseb (subunit structure)
E . coli GlnRS,
(wt)
kcor
tRNA binding sites
1
Mutation
relative bindingd
kf
Relative klK,
Location of amino acid in synthetase-tRNA complex
Comments
Reference
None
(a)
E . coli GlyRS a303 (wt) (a282)
2
Aspm+Asn
Contacts NH, of G ,
Mischarging mutant
GlY Ile129+Thr None
Near phosphate 74
Mischarging mutant Mischarging mutant
1.0
Cys,+Gln
0.08
13, 133, 160 13 13
8689
%A 4
Ala
Steric or conformational effect
191
0.80
191
0.01
192
Inactive Inactive
192 189, 192
Inactive
189, 192 197
a303 p575
Delete 8576-689
a303 P383 a303
Delete 0384-689 No p subunit
PM9
No a subunit
E . coli MetRS,,,
(wt)
1
None
Does not bind tRNAGly Binds tRNAGlY 1.09
1
Delete 548-676
1.09
1.0
1.0
Prow+Leu
6.0c
1.0
0.16
2.9f
2.9
(a2)
MetRS,,
Complements metG mutant strain
(a)
Near anticodon
176
18
Yeast MetRS,,
(wt)
1
Tyr,,+Ala In 531J None
62.59
1.3
0.02
2.99 8005300K
0.6 1.5 1.0
0.2 0.002
1.0a
Near anticodon base C, Near active site Near active stie
18 176 176
1.0
(4 Asp,,+Tyr
Acquires ability to acy- 119 late tRNA?, (CAUt.(CCU)
Lysffi,+Ala
0.5 0.5
Asn
0.1 0.1 u1 4
Yeast PheRS a595 (wt) (azP2) P503 a595 Pa1 E. coli ThrRS,, (wt) ( 4
(wt)
m
h
m
h
m
1960
m
196a
1.0
0
2&
Delete Pl-172 None
1
+
No binding
Inactive
0.115=
1.Of
1.0
0.025c
0.W
1.7
Glu,,+ Lys Arh-His
1.66C
1.W
0.07
None
1.4~
0.45e
1.0
Gln,,+Ala
2.7~
0.4SC
0.52
Glu,+Lys
B. stear TyrRS,,,
Presumably near anticodon Presumably near anticodon Presumably near anticodon Presumably near anticodon
196 202 Super-repressor of thrS translation Defective in repression of thrS translation
202 202
187
(a21 ~~
Near G ;
188
(continued)
TABLE XIX (Continued)
Synthetaseb (subunit structure)
tRNA binding sites
0
Mutation
K,,,c or relative bindingd
Thr,,+Ala Lys,,+Asn
0.6~ Footnote 1
Ar&+Gln Asn,,,+Ala Lys,,,+Asn
Footnote 1 >looc 1.2c
Gln Glu,,,+Asp Gln Ala Trp,,+Ala Phe Gln Arg,,-Gln Lys,+Asn Lys, and Lysm Ar&+Gln Ar&,,+Gln Arh,+Gln Arb+Gln Lys,,,-,+Asn Lys,,,+Asn Delete 320-419
kf 0.14e
Relative klK,,,
0.73
Location of amino acid in synthetase-tRNA complex
Comments
Ribose 1" Greatly reduced rate of transfer from preformed Tyr-AMP
ND 0.003e
0.52 0.007
Phosphates 69, 70n Near A,,"
0.6~ 1.4~ 0.3=
O.6ge 0.99' 0.38e
3.65 2.24 4.25
Phosphates 73, 74"
8. 3c 1.7~ 11.4~
0.10e 0.87~ 0.84e ND ND
0.04 1.62 0.23 0.062 0.036
ND ND ND ND ND ND
0.008
Additional tRNA contact in transtition state
Toxic mutation
>l o o C >28=
>looc >loo=
>l o o C >l o o C >loo=
>l o o C No binding
van der Walls contacts to GI and
0.054
0.038 0.052
0.030 0.041 Inactive
188 188 188 188 187 188 188 188 1 1
1 1
'473"
Phosphate 67" Phosphate 6 8 n Near ribose 76"
Reference
0
187 187 188 187 187 187 187 187 187 184
TvrRS,,,
1
Phe,,+Asp
tRNA binds
(a)
Inactive
Mutation at subunit interface
186
aData are for the aminoacylation of cognate tRNAs. Mutations listed have little or no effect on the kinetic parameters for amino-acid activation unless otherwise noted. wt, Wild-type. ND, not determined. bSubscript indicates the number of amino acids in each subunit. c& in pM. dBinding relative to the corresponding wild-type synthetase.
ekc,t (s-l).
fRelative Vma. gRelative K,,,. h F. Fasiolo, personal communication. i H. Bedouelle, personal communication. iThe in531 mutant contains Lys,=+Met Tyr,,+Ala + an insertion of Glu-Leu-Ala between amino acids 531 and 532. kFrom 203. T h e s e mutations also affect the activation of Tyr. mResidues 657-659 in yeast MetRS correspond to residues 459-461 in the anticodon binding domain of E. coli MetRS. “Based on the model of the B. steor. TyrRS tRNAphecomplex (188). OEquivalent to Lysm and Lys,, in E. coli TyrRS. These residues are crosslinked to the 3’-oxidized terminus of tRNATyr (180).
+
4
78
LADONNE
n. SCHULMAN
capable of forming hydrogen bonds to anticodon bases showed that only one mutation, Trp,,,+Phe, had a significant effect on the recognition of tRNAMet (18). The effect of the mutation is only on the initial complex formation, with the rate of aminoacylation remaining unchanged (Table XIX). Examination of the interaction of the Phe,,l mutant enzyme with a series of anticodon derivatives of tRNAMet and several noncognate tRNAs (Table XX) suggests that Trp461stabilizes the interaction with the Met anticodon, interacts with the 5' while blocking binding to non-Met anticodons. Trp461 end of the anticodon, possibly directly with the key anticodon base for the recognition of tRNAMet, C,, and strongly excludes anticodons containing U, G, or A at this position (18).Mutation of the adjacent Pro,, residue to Leu increases the K,,,somewhat, but does not alter the anticodon preference of the wild-type enzyme, suggesting that this mutation only introduces a structural alteration of the a-helix containing TT~,,~. The crystal structure of MetRS,,, reveals that the extreme carboxy terminus of the truncated protein is connected to the amino-terminal domain and forms part of the active site of the enzyme (16, 176).The shortest protein to complement an E . coli strain having a defective chromosomal metG (MetRS) gene contains amino acids 1 to 537 (176). Site-specific changes at amino acids 528 and 531 plus a three-amino-acid insertion between residues 531 and 532 lead to a drastic increase in K, for the aminoacylation of tRNAMet (176; Table XIX), implicating the 528-531 region in interaction with the 3' end of the tRNA. Periodate-oxidized tRNAMet crosslinks through the 3'-terminal A?, to several Lys residues near the active site, including Lys,, and Lys,,,. Sequences similar to that surrounding Lys,,, in MetRS (Lys-Met-Ser-LysSer) occur in several other synthetases, including a site in E . coli TyrRS TABLE M AMINOACYLATION OF tRNAs BY WILD-TYPEAND MUTANTE . coli MetRS,," ~~
~
~~
Wild-type MetRS,, Km
tRNA
Anticodon
(PM)
tRNA,Mel
CAU (wt) CAG UAU GAU
1.2f 0.2 63 f 16 >100 >loo
GGU tRNAnr tRNAVd
GGU UAC
>100 >100 >100
Trp,,pPhe
Relative
1.0 1.8 x 10-3
0.8 x 1.2 x 1.5 x 2.6 X 1.7 x
Km
(PM)
k,llK,
>100 >100
10-4
134 2 56
10-5
27 2 4 17 2 1 10 2 4 71 2 34
10-5 10-5 10-6
Relative k,,llK, 0.7 X 10-2 1.4 x 10-5 0.6 x 10-5 0.6 x 10-5
2.8 X 10-6 1.3 x 10-6 2.9 X 10-7
OData are from 18. All tRNAs were prepared by in oitro transcription. MetRS,, monomeric, truncated derivative of native MetRS (see Table XIX). wt, Wild-type.
is a
RECOGNITION OF
tHNAs
79
crosslinked to the oxidized 3' terminus of tRNATyr(180).This has led to the suggestion that sequences related to Lys-Met-Ser-Lys are important for binding the 3' end of tRNAs during aminoacylation (181).A Lys,,,+Glu mutation in E. coli MetRS leads to inactivation of the ATP-PP, exchange activity of the protein (198),however, and the corresponding Met-Ser-Lys sequence in E . coli GlnRS contacts ATP, not tRNAG'", in the crystal structure of the complex (12),suggesting that the conserved sequence is important for the catalytic site of synthetases rather than for tRNA binding. Site-directed mutagenesis of amino-acid residues in yeast MetRS corresponding to the anticodon binding domain of E. coli MetRS reduce the aminoacylation activity of the yeast enzyme as well (F. Fasiolo, personal communication; Table XIX). Genetic experiments also reveal a mutation in yeast MetRS that compensates for an anticodon base change from CAU to CCU in the yeast initiator tRNA (119).This mutation (Asp,,,pTyr) is located some distance from the presumed anticodon binding domain, based on a model of the enzyme made by sequence comparison with E. coli MetRS, and has not yet been characterized biochemically. B. stearothermophilus TyrRS utilizes a different mode of tRNA binding from those discussed above. The enzyme is a small dimer of identical subunits, each containing 419 amino acids (183).Residues in the amino-terminal domain (1 to 319) make up the active site and the dimerization interface, while residues in the carboxy-terminal domain (320 to 419) are required for tRNA binding (184). A single tRNA binds to both subunits of the enzyme, with protein contacts between the amino-terminal domain of one subunit and the acceptor stem of tRNATyr, and additional strong contacts between the carboxy-terminal domain of the second subunit and other regions of the tRNA (185). A truncated dimer containing only amino acids 1 to 319 does not bind tRNATyr;however, dissociation of the subunits by introducing a mutation at the subunit interface yields monomers capable of binding, but not aminoacylating, the tRNA (186). An extensive set of site-directed mutants of B . stearothermophilus TyrRS has been generated to define the amino-acid residues critical for tRNA binding (187, 188;Table XIX). This analysis has revealed a set of eight positively charged amino acids that make strong contacts to the tRNA, in a manner analogous to the backbone interactions seen in the GlnRS*tRNAG'l) complex. Many of these residues are located in the disordered carboxyterminal domain of the crystal structure of B. stearothermophilus TyrRS (170).However, high-resolution structures of amino acids 1 to 319 complexed with tyrosyl-adenylate and with tyrosine are available (170, 172). These structures, along with the crystal structure of yeast tRNAPhe, and the TyrRS mutational analysis, allow detailed modeling of the interaction between the amino-terminal domain of the protein and the tRNA acceptor stem (187,188).
80
LADONNE
n. SCHULMAN
In the model, contacts are made between GIU,,~, Arg,,,, and Lys,, and phosphate groups 67-70, 73, and 74 (Table XIX). In contrast to the GlnRS complex, the TyrRS model also postulates direct protein contacts to the discriminator base, A,,, a known recognition element of tRNATyr. It is suggested that this base plays a key role in orientation of the 3' end of the tRNA at the active site through interaction with Lys,,,, Glu152, and the main-chain carbonyl of Ala,,, and van der Waals contacts with Trp1=. Study of a mutant of Lys,,, reveals that this amino acid makes an additional contact with tRNATyr in the transition state. Location of Lys,,, near A,, in the model has led to the suggestion that recognition of A,, in the transition state may govern the rate of aminoacylation of tRNAs by TyrRS (188).A,, is positioned close to Lys,, and ArG, at the active site, and the side-chains of Lys,, and Lys,, are near the 2' and 3' OH groups of the terminal ribose in the model. As was observed in the GlnRS*tRNAG'" complex, the TyrRS model reveals a tight complementarity in shape between the structure of the tRNA and the synthetase (188). Overproduction of either E. coli or B . stearothermophilus TyrRS is toxic to E. coli cells (H. Bedouelle, personal communication), suggesting that the Tyr synthetases weakly interact with other tRNAs and mischarge one or more noncognate species when present at elevated concentrations. E. coli GlyRS, yeast PheRS and E. coli PheRS have the subunit structure O L ~ Pbind ~ , two molecules of tRNA per molecule of enzyme, and require both subunits for activity (189-196). Construction of deletion mutants of E. coli GlyRS has shown that the carboxy-terminal half of the @subunit is required for tRNA aminoacylation, but not for ATP-PP, exchange (192).In contrast, the amino-terminal portion (1 to 172) of the P-subunit of yeast PheRS contains the main tRNA binding sites (196). E. cob ThrRS regulates its own expression at the translational level by interacting with a site in the 5' leader region of its mRNA that resembles the anticodon stem and loop of Thr tRNAs (199,200).ThrRS mutants defective in translational repression have been isolated and shown to have increased &'s for the aminoacylation of tRNAThr (201, 202). Point mutations in the mRNA stem-loop structure lead to constitutive synthesis of ThrRS. Mutants of the enzyme (super-repressors) capable of compensating for mutations in the stem structure have been isolated; they aminoacylate tRNAThr with a decreased K , (201,201;Table XIX). In addition, these mutants increase the suppressor activity of an amber suppressor derivative of tRNAzhr which is poorly aminoacylated by wild-type ThrRS (202). Attempts to isolate ThrRS mutants that compensate for mutations in the anticodon-like G.U sequence in the loop of the mRNA structure were unsuccessful, suggesting that such mutants may be toxic to cells as a result of altered specificity in the recognition of tRNA anticodons and the mischarging of noncognate tRNAs.
RECOGNITION OF
tRNAs
81
As can be seen from the above discussion, there is considerable diversity in the way that tRNAs interact with their cognate synthetases, and no obvious correlation exists between the mode of tRNA binding and the pattern of recognition elements present in the tRNA. Knowledge of the detailed interactions that occur in additional tRNA-synthetase complexes will be required before any common features are revealed, if they exist. Cocrystals of a complex of yeast AspRS(a,), and two molecules of tRNAASpsuitable for high-resolution X-ray analysis have recently been obtained (204). Solution of the structure of this complex is eagerly awaited.
IX. Concluding Remarks The next few years promise exciting new developments in our understanding of tRNA-synthetase recognition. Mutational analysis of tRNAs is proceeding at a rapid pace, and should lead to the definition of major recognition sites in many tRNAs in the near future. Solution of new crystal structures and accompanying mutational studies will soon provide new insights into the tRNA binding sites of additional proteins. The wealth of information obtained from the cocrystal structure of E. coli GlnRS and tRNAGln has inspired renewed efforts in many laboratories to obtain crystals of additional tRNA-synthetase complexes. Recognition of tRNAs by GlnRS involves base-specific protein-RNA contacts, a base-specific RNA-RNA contact, and dramatic local conformational changes in the tRNA. It remains to be determined whether all or only some of these features are conserved in other tRNA-synthetase interactions, and to what extent recognition involves conformational changes in the proteins as well. The requirement for cognate tRNA in the activation of amino acids by ArgRS, GlnRS, and GluRS implies the induction of structural changes at the active sites of these enzymes on tRNA binding. The structure of the GlnRS.tRNAG’”complex suggests that there may be structural connectivity between the anticodon binding domain and the active site domain. Changes in key recognition elements are often seen to d e c t both K , and kcat for aminoacylation, suggesting that initial binding may be followed by a reorientation of the 3’ terminus of the tRNA at the active site and/or by induced changes in the enzyme structure. Early experiments suggested that synthetases may undergo conformational transitions dependent on the binding of cognate tRNAs (205). Additional solution studies directed at resolving discrete steps in tRNA recognition and catalysis during aminoacylation and their dependence on specific sequences in the tRNA and protein would be useful in clarifying the dynamic aspects of the recognition process. Additional studies of the deacylation activity of synthetases toward mischarged tRNAs would also be useful to determine whether identical struc-
82
LADONNE
n. SCHULMAN
tural features in tRNAs govern the aminoacylation reaction and the aminoacyl-tRNA hydrolysis reaction catalyzed by these enzymes. In vioo approaches to the study of tRNA identity would benefit from greater flexibility in the types of test tRNAs that can be assayed, and from greater attention to the balance of tRNA and synthetase levels in the cell. The design of suitable in uiuo assays remains a formidable challenge, which will occupy some of us for years to come.
ACKNOWLEDGMENTS I would like to thank all my colleagues who provided preprints and results of unpublished studies for use in this review. I am also grateful to Gourisankar Ghosh and Leo Pallanck for a critical reading of the manuscript, and to Rita Romita for expert typing. Studies carried out in my laboratory were supported by research grants GM16995 from the National Institutes of Health and NP-19 from the American Cancer Society.
REFERENCES 1. S.-H. Kim, F. L. Suddath, G. J. Quigley, A. McPherson, J. L. Sussman, A. H. J. Wang, N. C. Seeman and A. Rich, Science 185,435 (1974). 2. J. D. Robertus, J. E. Ladner, J. T. Finch, D. Rhodes, R. S. Brown, B. F. C. Clark and A. Klug, Nature 250, 546 (1974). 3. E. Westhof, P. Dumas and D. Moras, ] M B 184, 119 (1985). 4. P. Schimmel, ARB 56, 125 (1987). 5. L. L. Kisselev, This Series 32, 237 (1985). 6. P. R. Schimmel and D. So11, ARB 48, 601 (1979). 7. J. Normanly and J. Abelson, ARB 58, 1029 (1989). 8. P. Schimmel, Bchem 28, 2747 (1989). 9. L. H. Schulman and J. Abelson, Science 240, 1591 (1988). 10. M . Yarus, Cell 55, 739 (1988). 11. U. L. RajBhandary, Nature 336, 112 (1988). 12. M . A. Rould, J. J. Perona, D. So11 and T. A. Steitz, Science 246, 1135 (1989). 13. J. J. Perona, R. N. Swanson, M. A. Rould, T. A. Steitz and D. Soll, Science 246, 1152 (1989). 14. H. Bedouelle and G. Winter, Nature 320, 371 (1986). 15. E. Labouze and H. Bedouelle, ] M B 205, 729 (1989). 16. S. Brunie, C. Zelwer and J.-L. Rider, ]MB 216, 411 (1990). 17. P. Mellot, Y. Mechulam, D. LeCorre, S. Blanquet and G . Fayat, J M B 208, 429 (1989). 18. G. Ghosh, H. Pelka and L. H. Schulman, Bchem 29, 2220 (1990). 19. J. Normanly, R. C. Ogden, S. J. Horvath and J. Abelson, Nature 321, 213 (1986). 20. R. Cieg6, D. Kern and J.-P. Ebel, Biochimie 54, 1245 (1972). 21. D. Kern, R. Gieg6 and J.-P. Ebel, EJB 31, 148 (1972). 22. M. Yarus, Bchem 11, 2352 (1972). 23. M. Mertes, M. A. Peters, W. Mahoney and M. Yam, ] M B 71, 671 (1972). 24. L. H. Schulman, in “Transfer RNA: Structure, Properties and Recognition” (P. Schimmel, D. Sol1 and J. Abelson, eds.), p. 311. CSHLab, Cold Spring Harbor, New York, 1979. 25. J. D. Smith, in “Nonsense Mutations and tRNA Suppressors” (J. E. Celis and J. D. Smith, eds.), p. 109. Academic Press, New York, 1979. 26. H. Ozeki, H. Inokuchi, F. Yamao, M. Kodaira, H. Sakano, T. Ikemura and Y. Shimura, in
RECOGNITION OF
27. 28. 29. 30. 31.
32. 33. 34.
35. 36. 37. 38. 39. 40. 41.
tRNAs
83
“Transfer RNA: Biological Aspects” (D. So11, J. Ahelson and P. R. Schimmel, eds.), p. 341. CSHLab, Cold Spring Harbor, New York, 1980. C. Squires and J. Carbon, Nature N B 233, 274 (1971). J. W. Roberts and J. Carbon, Nature 250, 412 (1974). A. G. Bruce and 0. C. Uhlenheck, Bchem 21, 855 (1982). H. Uemura, M. Imai, E. Ohtsuka, M. Ikehara and D. SOH, NARes 10, 6531 (1982). E. Ohtsuka, S. Tanaka, T. Tanaka, T. Miyake, A. F. Markham, E. Nakagawa, T. Wakahayashi, Y. Taniyama, S. Nishikawa, R. Fukumoto, H. Uemura, T. Doi, T. Tokunaga and M. Ikehara, PNAS 78, 5493 (1981). B. L. Seong and U. L. RajBhandary, PNAS 84, 334 and 8859 (1987). J. J. Perona, R . Swanson, T. A. Steitz and D. Soll, J M B 202, 121 (1988). T. Meinnel, Y. Mechulam and G . Fayat, NARes 16, 8095 (1988). Y.-M. Hou and P. Schimmel, Nature 333, 140 (1988). S. J. Park, Y.-M. Hou and P. Schimmel, Bchem 28, 2740 (1989). J. R. Sampson and 0. C. Uhlenheck, PNAS 85, 1033 (1988). T. Samuelsson, T. Boren, T.-I. Johansen and F. Lustig, JBC 263, 13692 (1988). K. B. Hall, J. R. Sampson, 0. C. Uhlenheck and A. G. Redfield, Bchem 28,5794 (1989). T. Hasegawa, H. Himeno, H. Ishikura and M. Shimizu, BBRC 163, 1534 (1989). V. Perret, A. Garcia, H. Grosjean, J.-P. Ehel, C. Florentz and R. Gieg6, Nature 344, 787 (1990).
42. H. Himeno, T. Hasegawa, T. Ueda, K. Watanahe, K. Miura and M. Shimuzu, NARes 17, 7855 (1989).
43. 44. 45. 46.
L. H. Schulman and H. Pelka, Science 242, 765 (1988). L. H. Schulman and H. Pelka, NARes 18, 285 (1990). W.-C. Chu and J. Horowitz, NARes 17, 7241 (1989). Y. Shimura, A. Aono, H. Ozeki, A. Sarahhai, H. Lamfrom and J. Ahelson, FEBS Lett. 22, 144 (1972).
47. M. L. Hooper, R. Russell and J. D. Smith, FEBS Lett. 22, 149 (1972). 48. J. D. Smith and J. E. Celis, Nature N B 243, 66 (1973). 49. J. E. Celis, M. L. Hooper and J. D. Smith, Nature N B 244, 261 (1973). 50. A. Ghysen and J. E. Celis, J M B 83, 333 (1974). 51. H. Inokuchi, J. E. Celis and J. D. Smith, J M B 85, 187 (1974). 52. J. Normanly, J.-M. Masson, L. G. Kleina, J. Ahelson and J. H. Miller, PNAS 83, 6548 (1986).
53. J.-M. Masson and J. H. Miller, Gene 47, 179 (1986). 54. W. H. McClain and K. Foss, Science 240, 793 (1988). 55. W. H. McClain and K. Foss, Science 241, 1804 (1988). 56. W. H. McClain, Y.-M. Chen, K. Foss and J. Schneider, Science 242, 1681 (1988). 57. W. H. McClain and K. Foss, J M B 202, 697 (1988). 58. Y.-M. Hou and P. Schimmel, Nature 333, 140 (1988). 59. Y.-M. Hou and P. Schimmel. Bchem 28, 4942 (1989). 60. Y.-M. Hou and P. Schimmel, Bchem 28, 6800 (1989). 61. M. J. Rogers and D. Soll, PNAS 85, 6627 (1988). 62. M .Yarus, S. W. Cline, P. Wier, L. Breeden and R. C. Thompson,JMB 192, 235 (1986). 63. M. Yarus, Nature N B 239, 106 (1972). 64. R. Swanson, P. Hoben, M. Sumner-Smith, H. Uemura, L. Watson and D. SOH, Science 242, 1548 (1988). 65. E. J. Murgola, ARGen 19, 57 (1985).
66. N. E. Prather, E. J. Murgola and B. H. Mims,JMB 172, 177 (1984). 67. R. Chattapadhyay, H. Pelka and L. H. Schulman, Bchem 29, 4263 (1990).
a4
LADONNE H. SCHULMAN
670. L. Pallanck and L. H. Schulman, PNAS 88, in press (1991). 68. B. L. Seong and U. L. RajBhandary, PNAS 84, 8859 (1987). 69. B. L. Seong, C.-P. Lee and U. L. RajBhandary, JBC 246, 6504 (1989). 70. D. L. Riddle and J. Carbon, Nature N B 242, 230 (1973). 71. L. H. Schulman and H. Pelka, PNAS 80, 6755 (1983). 72. J. F. Curran and M. Yarus, Science 238, 1545 (1987). 73. C. M. Cummins, T. F. Donahue and M. R. Culbertson, PNAS 79, 3565 (1982). 74. R. F. Gaber and M. R. Culbertson, MCBiol4, 2052 (1984). 75. L. Bossi and D. M. Smith, PNAS 81, 6105 (1984). 76. J. R. Roth, Cell 24, 601 (1981). 77. L. Fralova and L. L. Kisselev, Biokhimiya (Moscow) 29, 1177 (1964). 78. C. Mathews and K. van Holde, “Biochemistry,” p. 962. Benjamin-Cummings, Redwood City, California, 1990. 79. M. Yaniv, W. R. Folk, P. Berg and L. Soll, J M B 86, 245 (1974). 80. M . Yarus, R. G . Knowlton and L. Soll, in “Nucleic Acid-Protein Recognition” (H. Vogel, ed.), p. 391. Academic Press, New York, 1977. 81. R. G. Knowlton, L. Sol1 and M. Yarus, ] M B 139, 705 (1980). 82. L. H. Schulman and J. P. Goddard, JBC 248, 1341 (1973). 83. K. Chakraburtty, NARes 2, 1793 (1975). 84. R . W. Chambers, S. Aoyasi, Y. Furukawa, H. Zawadzka and 0. Bhanot, JBC 248, 5549 (1973). 85. H. Pelka and L. H. Schulman, Bchem 25, 4450 (1986). 86. A. Theobald, M. Springer, M. Grunberg-Manago, J.-P. Ebel and R. GiegC, EJB 175,511 (1988). 87. R. W. Chambers, 0. S. Bhanot, S. Aoyagi, Y. Furukawa and H. Zawada, FP 33, 1422 (1974). 88. M. S. Qiu, Y. X. Jin, W. Q. Li, J. R. Bao and D. Wang, Sci. Sin. [ B ] 31, 695 (1988). 89. Y. X. Jin, M. S . Qiu, W. Q. Li, K. Q. Zeng, J. Bao, P. Gong, R. Wu and D. Wang, Anal. Biochem. 161, 453 (1987). 90. L. Raftery and M. Yarus, J M B 184, 343 (1985). 91. R. P. Singhal, Bchem 13, 2924 (1974). 92. C. W. Hill, G. Combriato and W. Dolph, J . Bact. 117, 351 (1974). 93. D. L. Riddle and J. Carbon, Nature N B 242, 230 (1973). 94. E. L. Sabban and 0. S. Bhanot, JBC 257, 4796 (1982). 95. T.Muramatsu, K. Nishikawa, F. Nemoto, Y. Kuchino, S. Nishimura, T.Miyazawa and S. Yokoyama, Nature 336, 179 (1988). 96. L. H. Schulman and H. Pelka, PNAS 80, 6755 (1983). 97. L. H. Schulman and H. Pelka, Bchem 24, 7309 (1985). 98. L. Stern and L. H. Schulman, JBC 252, 6403 (1977). 99. M. Rould, J. J. Perona and T. Steitz, personal communication. ZOO. L. H. Schulman and H. Pelka, Science 246, 1595 (1989). 101. A. G . Bruce and 0. C. Uhlenbeck, Bchern 21, 3921 (1982). 101a. H. Himeno, T. Hasegawa, T. Ueda, K. Watanabe and M. Shimizu, NARes. 18, 6815 (1990). 102. M. Springer, M. GraEe, J. Dondon and M. Grunberg-Manago, E M B O J . 8, 2417 (1989). 103. V. Scheinker, S. F. Beresten, T.D. Mashkova, A. M. Mazo and L. L. Kisselev, F E E S Lett. 132, 349 (1981). 104. L. A. Bare and 0. C. Uhlenbeck, Bchem 25, 5825 (1986). 105. L. A. Bare and 0. C. Uhlenbeck, Bchem 24, 2354 (1985). 106. 0. C. Uhlenbeck, Chem. Scr. 26B, 97 (1986).
RECOGNITION OF
tRNAs
85
107. L. G. Kleina, J.-M. Masson, J. Normanly, J. Abelson and J. H. Miller, J M B 213, 705 (1990). 108. J. Normanly, L. G. Kleina. J.-M. Masson, J. Abelson and J. H. Miller, J M B 213, 719 (1990). 109. M . Springer, M. GrafTe, J. S. Butler and M. Grunberg-Manago, PNAS 83, 4384 (1986). 110. H. Moine, P. Romby, M. Springer, M. Grunberg-Manago, J.-P. Ebel, C. Ehresmann and B. Ehresmann, PNAS 85, 7892 (1988). 111. M .Springer, M. Graffe, J. Dondon, M. Grunberg-Manago, P. Romby, B. Ehresmann, C. Ehresmann and J.-P. Ebel, Biosci. Rep. 8, 619 (1988). 112. M . Sprinzl, T. Hartmann, J. Weher, J. Blank and R. Zeidler, NARes 17, r l (1989). 113. U. Varshney and U. L. RajBhandary, PNAS 87, 1586 (1990). 114. U. L. RajBhandary and H. P. Ghosh, JBC 244, 1104 (1969). 115. K. Takeishi, T. Ukita and S. Nishirnura, JBC 243, 5761 (1968). 116. M. Simsek, U. L. RajBhandary, M. Boisnard and G. Petrissant, Nature 247, 518 (1974). 117. P. Guillemaut and J. H. Weil, BBA 407, 240 (1975). 118. D. R. Smith and J. M. Calvo, NARes 8, 2255 (1980). 119. A. M. Cigan, L. Feng and T. F. Donahue, Science 242, 93 (1988). 120. D. M. Crothers, T. Seno and D. G. Still, PNAS 69, 3063 (1972). 121. C. deDuve, Nature 333, 117 (1988). 122. C. Francklyn and P. Schimmel, Nature 337, 478 (1989). 123. S. J. Park and P. Schimmel, JBC 263, 16527 (1988). 124. N. E. Prather, E. J. Murgola and B. H. Mims, JMB 172, 177 (1984). 125. E. J. Murgola and K. A. Hijazi, MGG 191, 132 (1983). 126. J.-P. Shi, C. Francklyn, K. Hill and P. Schimmel, Bchem 29, 3621 (1990). 127. M. Jasin, L. Regan and P. Schimmel, JBC 260, 2226 (1985). 128. T. Atilgan, H. B. Nicholas, Jr., and W. H. McClain, NARes 14, 375 (1986). 129. W. H. McClain and H. B. Nicholas, Jr.. JMB 194, 635 (1987). 130. N. Imura, G. B. Weiss and R. W. Chambers, Nature 222, 1147 (1969). 131. F. Zinoni, A. Birkman, T. Stadtrnan and A. Bock, PNAS 83, 4650 (1986). 132. W. Leinfelder, E. Zehelein, M. Mandrand-Berthelot and A. Bock, Nature 331, 723 (1988). 133. P. Hoben, Ph.D. thesis, Yale University, New Haven, Connecticut, 1984. 134. C. E. Singer and G. R. Smith, JBC 247, 2989 (1972). 135. 0. Orellana, L. Cooley and D. Soll, MCBiol 6, 525 (1986). 136. L. Cooley, B. Appel and D. Soll, PNAS 79, 6475 (1982). 137. E. Labouze and H. Bedouelle, J M B 205, 729 (1989). 138. J. E. Ladner, A. Jack, J. D. Robertus, R. S. Brown, D. Rhodes, B. F. C. Clark and A. Klug, PNAS 72, 4414 (1975). 1380. W. H. McClain, K. Foss, R. A. Jenkins and J. Schneider, PNAS 87, 9260 (1990). 139, J. R. Sampson, A. B. DiRenzo, L. S. Behlen and 0. C. Uhlenbeck, Science 243, 1363 (1989). 140. J. R . Sampson, A. B. DiRenzo, L. S. Behlen and 0. C. Uhlenbeck, Bchem 29,2515 (1990). 141. L. S. Behlen, J. R . Sampson, A. B. DiRenzo and 0. C. Uhlenbeck, Bchem29,2523 (1990). 142. K. B. Hall, J. R . Sampson, 0. C. Uhlenbeck and A. 6. Redfield, Bchem 28,5794 (1989). 143. P. Romby, D. Moras, M. Bergdoll, P. Dumas, V. V. Vlassov, E. Westhof, J. P. EbelandR. Giegk, J M B 184, 455 (1985). 144. A. Jack, J. E. Ladner, D. Rhodes, R. S. Brown and A. Klug, JMB 111, 315 (1977). 145. W. J. Krzyzosiak, T. Marciniec, M. Wiewiorowski, P. Romby, J. P. Ebel and R. Gieg6, Bchein 27, 5771 (1988).
86
LADONNE 11. SCHULMAN
146. D. Moras, A. C. Dock, P. Dumas, E. Westhof, P. Romby, J. P. Ebel and R. GiegC, PNAS 83, 932 (1986). 147. F. Harada and S. Nishimura, Bchem 13, 300 (1974). 148. T.Muramatsu, S. Yokoyama, N. Horie, A. Matsuda, T.Ueda, Z. Yamaizumi, Y. Kuchino, S. Nishimura and T.Miyazawa, JBC 263, 9261 (1988). 149. T.Muramatsu, K. Nishikawa, F. Nemoto, Y. Kuchino, S. Nishimura, T.Miyazawa and S. Yokoyama, Nature 336, 179 (1988). 150. V. Perret, A. Garcia, H. Grosjean, J,-P, Ebel, C. Florentz and R . GiegC, Nature 344, 787
(1990). 151. J. P. Ebel, R. Gieg6, J. Bonnet, D. Kern, N . Befort, C. Bollack, F. Fasiolo, J. Gangloffand G . Dirheimer, Biochimie 55, 547 (1973). 152. B. Roe, M . Michael and B. Dudock, Nature 246, 135 (1973). 153. T.McCutchan, S. Silverman, J. Kohli and D. Soll, Bchem 17, 1622 (1978). 154. A. L. Haenni, S. Joshi and F. Chapeville, This Series 27, 85 (1982). 156. F. Yamao, H. Inokuchi, A. Cheung, H. Ozeki and D. Soll, JBC 257, 11639 (1982). 157. P. Hoben, N. Royal, A. Cheung, F. Yamao, K. Biemann and D. Soll, JBC 257, 11644 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183. 184. 185. 186. 187.
(1982). J. J. Perona, R. Swanson, T.A. Steitz and D. Soll,]MB 202, 121 (1988). A. Rich and P. R. Schimmel, NARes 4, 1649 (1977). H. Inokushi, P. Hoben, F. Yamao, H. Ozeki and D. Soll, PNAS 81, 5076 (1984). S. D. Putney, N. J. Royal, H. N. deVegvar, W. C. Herlihy, K. Biemann and P. Schimmel. Science 213, 1497 (1981). M. Jasin, L. Regan and P. Schimmel, Nature 306, 441 (1983). K. Hill and P. Schimmel, Bchem 28, 2577 (1989). L. Regan, L. Buxbaum, K. Hill and P. Schimmel, JBC 263, 18598 (1988). M. Jasin, L. Regan and P. Schimmel, JBC 260, 2226 (1985). L. Regan, J. Bowie and P. Schimmel, Science 235, 1651 (1987). S. Brunie, P. Mellot, C. Zelwer, J.-L. Risler, S. Blanquet and G . Fayat, J . Mol. Graphics 5, 18 (1987). C. Zelwer, J. L. Risler and S. Brunie, J M B 155, 63 (1982). T.N. Bhat, D. M. Blow, P. Brick and J. Nyberg, JMB 158, 699 (1982). D. M. Blow and P. Brick, in “Biological Macromolecules and Assemblies” ( J . A. Jurnak and A. McPherson, eds.), Vol. 2, p. 442. Wiley, New York, 1985. P. Brick, T. N. Bhat and D. M. Blow, J M B 208, 83 (1989). P. Brick and D. M. Blow, J M B 194, 287 (1987). D. G. Barker. J. P. Ebel, R. Jakes and C. J. Bruton, EJB 127, 449 (1982). F. Dardel, G. Fayat and S. Blanquet, J . Bact. 160, 1115 (1984). D. Cassio and J. P. Waller, EJB 20, 283 (1971). P. Mellot, Y. Mechulam, D. LeCorre, S. Blanquet and 6. Fayat, ] M E 208, 429 (1989). D. Valenzuela and L. H. Schulman, Bchem 25, 4555 (1986). 0. Leon and L. H. Schulman, Bchem 26, 5416 (1987). C. Hountondji, S . Blanquet and F. Lederer, Bchem 24, 1175 (1985). C. Hountondji, F. Lederer, P. Dessen and S. Blanquet, Bchem 25, 16 (1986). C. Hountondji, P. Dessen and S. Blanquet, Biochimie 68, 1071 (1986). G. Prevost, G. Eriani, D. Kern, G. Dirheimer and J. Gangloff, EJB 180, 351 (1989). G. Winter, G. L. E. Koch, B. S. Hartley and D. G. Barker, EJB 132, 383 (1983). M. Waye, G . Winter, A. J. Wilkinson and A. R. Fersht, EMBO J . 2, 1827 (1983). P. Carter, H. Bedouelle and G. Winter, PNAS 83, 1189 (1986). A. R. Fersht, Bchem 26, 8031 (1987). H. Bedouelle and G. Winter, Nature 320, 371 (1986).
RECOGNITION OF
tRNAs
87
188. E. Labouze and H. Bedouelle, J M B 205, 729 (1989). 189. G . M. Nagel, S. Cumberledge, M. S. Johnson, E. Petrella and B. Weber, NARes 12, 4377 (1984). 190. T. A. Wehster, B. W. Gibson, T. Keng, K. Biemann and P. Schimmel, JBC 258, 10637 (1983). 191. A. T. Profy and P. Schimmel, JBC 261, 15474 (1986). 192. M. J. Toth and P. Schimmel, JBC 265, loo0 (1990). 193. Y. Mechulam, G . Fayat and S. Blanquet, J. Bact. 163, 787 (1985). 194. A. Dueruix, N. Hounwanou, J. Reinbolt, Y. Boulanger and S. Blanquet, BBA 741, 244 (1983). 195. F. Fasiolo, N . Befort, Y. Boulanger and J:P. Ebel, BBA 217, 305 (1970). 196. F. Fasiolo, A. Sanni, S. Potier, J.-P. Ebel and Y. Boulanger, FEBS Lett. 242, 351 (1989). 196a. P. Walter, L. Despoils, M. Laforet, J.-P. Ebel and F. Fasiolo, Biochimie 72, 537 (1990). 197. F. Lawrence, S . Blanquet, M. Poiret, M. Robert-Gero and J. P. Waller, EJB 36, 234 (1973). 198. A. Brevet, J. Chen, F. Leveque, P. Plateau and S. Blanquet, PNAS 86, 8275 (1989). 199. H. Moine, P. Romby, M. Springer, M. Grunberg-Manago, J.-P. Ebel, C. Ehresmann and B. Ehresmann, PNAS 85, 7892 (1988). 200. M. Springer, M . Graffe, J.S. Butler and M. Grunberg-Manago, PNAS 83, 4384 (1986). 201. M. Springer, M. Graffe, J. Dondon, M. Grunberg-Manago, P. Romby, B. Ehresmann, C. Ehresmann and J.-P. Ebel, Biosci. Rep. 8, 619 (1988). 202. M. Springer, M. Graffe, J. Dondon and M. Grunherg-Manago, E M B O J . 8, 2417 (1989). 203. K. Muench, JBC 251, 5195 (1976). 204. M. Ruff, J. Caverelli, V. Mikol, B. h r b e r , A. Mitschler, R. Gieg6, J. C. Thierry and D. Moras, J M B 201, 235 (1988). 205. D. Riesner, A. Pingoud, D. Boehme, F. Peters and G . Maass, EJB 68, 71 (1976).
This Page Intentionally Left Blank
Ribosome Biogenesis in Yeast'
I
I
H. A. RAud
AND
R. J. PLANTA~
Biochemisch kboratorium Vrije Unioersiteit 1081 HV Amsterdom, The Netherlands
I. Transcription of Ribosomal-RNA Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Genetic Organization and Structure of Yeast rRNA Genes . . . . . . . . . 8. Transcription Initiation by Yeast RNA Pol I ...................... C. Transcription Termination by Yeast RNA Pol D. Regulation of Yeast RNA-Pol-I Transcription .............. E. Transcription of 5-S rRNA by Yeast RNA Pol 11. Expression of Ribosomal-protein Genes . . . . . . . A. Genetic Organization and Structure of Yeast Ribosomal-protein Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Transcriptional Regulation of Ribosomal-protein Synthesis . . . . C. Post-transcriptional Regulation of Ribosomal-protein Synthesis . . . . . 111. Processing and Assembly of Ribosomal Constituents . . , , , , . , A. Processing of Pre-rRNA ................... B. Modification of Pre-rRNA . . . . . . . . . . . . . . . . . . . . . . . C. Modification of Ribosomal Proteins ........................ D. Nucletxytoplasmic Transport and Assembly ..................... References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91 91 92 99 101 102 103 103 104 109 111 111 116 117 118 124
The formation of functional ribosomes is a highly complex phenomenon requiring the interplay of a large number of molecular processes. Ribosomes contain, depending on their origin, some 60-80 different components, proteins as well as RNA molecules, most in a single copy per ribosome. Since normally growing cells generally contain no free pools of these ribosomal components, the expression of the numerous ribosomal genes must be subject to tight coordinate control to ensure the production of equimolar amounts of the various rRNA and ribosomal-protein (r-protein) constituents.
1 Abbreviations and terminology: r-protein or rp-: ribosomal protein; rp-mRNA, rp-gene: mRNA, gene encoding r-protein; snRNA: small nuclear RNA; ITS: internal transcribed spacer; ETS: external transcribed spacer; NTS: non-transcribed spacer; YEp: yeast episomal plasmid; YCp: yeast centromeric plasmid; UAS: upstream activating sequence; NLS: nuclear localization signal; RAP: repressor-activator protein; ABF: ARS-binding factor (SUF); linker-scanning: mutational analysis in which consecutive regions of a DNA sequence are replaced one by one by a synthetic oligonucleotide; confocal laser scanning microscopy: microscope technique employing zero depth-of-field, which allows sequential focusing on different planes within a cell; RPG: ribosomal protein gene; TUF: transcription upstream factor. 2 To whom correspondence may be addressed.
89 Progress in Nucleic Acid Hesearch and Molrrular Liiol0gy. Nil, 41
Copyright 0 1991 by Academic Press, Inc. All rights nfreproduction in any form reserved.
90
n. A.
F L W ~ AND R. J. PLANTA
In addition, cells can increase or decrease the expression of all members of the set of ribosomal genes in a concerted fashion in response to variations in environmental conditions, such as amino-acid deprivation, and changes in carbon source or temperature, which alter the demand for protein biosynthetic capacity. As far as prokaryotic (i.e., Escherichia coli) ribosome biogenesis is concerned, the main features of this regulation have now been elucidated (1,2). Control of ribosome biogenesis in E. coli involves a number of linked feedback loops in which excess ribosomes, in some as yet not completely understood way that requires translational initiation, inhibit rRNA (and incidentally also tRNA) transcription. The balance between the synthesis of rRNA and r-protein is subsequently redressed mainly, although not exclusively, through the action at the translational level of a small number of r-proteins. These “repressor” r-proteins all bind individually to either 16-S or 23-S rRNA. However, upon failing to find an rRNA partner, they instead associate with their own mRNA and in this way prevent further translation. Since the polycistronic E. cob mRNAs encoding r-proteins (rp-mRNA) show a phenomenon called “translational coupling,” binding of a single specific repressor r-protein blocks expression of a complete translational unit and thus the production of all, or at least a large number, of the r-proteins encoded by the mRNA in question. In this way, r-protein translation is directly coupled to the assembly process. Ribosome assembly in prokaryotes has been studied mainly in in uitro assembly systems. These studies have led to the elucidation of the assembly pathways for both the small and large subunits (3, 4 ) and have recently resulted in the identification of the “initiator” r-proteins (5, 6), whose association with the rRNA forms the first step in the assembly process. While the end result of regulation of ribosome biogenesis in eukaryotes is the same as in prokaryotes-namely, equimolar production of all ribosomal constituents-it should be realized that different means must be used to achieve this goal, since ribosome formation in eukaryotic cells differs from that in prokaryotes in several hndamental aspects: (a) it involves the participation of three different RNA polymerases; (b) there are (often considerable) differences in gene dosage for the individual components, even those of the same type; (c) the genes encoding r-proteins (rp-genes) do not form operons; and (d) the various stages in the formation of ribosomal constituents, as well as their assembly into active ribosomes, occur in different cellular compartments. Because of its accessibility to genetic and physiological manipulation, the yeast Saccharomyces cereuisiae has become one of the most popular organisms for studying the questions raised by the coordinate control of ribosome biogenesis in eukaryotes. The genes for the various yeast rRNAs as well as
RIBOSOME BIOGENESIS IN YEAST
91
those for about 40 of the yeast r-proteins have been cloned and subjected to detailed studies. As a result, the amount of information collected on the structure and function of ribosomal genes in yeast surpasses that available for any other eukaryote. This information has given us important insights into the regulatory mechanisms controlling the expression of these genes. In addition, progress has been made in our laboratory in developing systems that provide access to the hitherto largely unexplored field of eukaryotic ribosome assembly, including the mechanism of rRNA processing. Although there is still a long way to go, these studies should eventually lead to a full description of the way in which ribosomes are formed in yeast cells, as well as the mechanisms by which the rate of ribosome formation is adjusted to the requirements of the cell at any given moment.
1. Transcription of Ribosomal-RNA Genes A. Genetic Organization and Structure of Yeast rRNA Genes The genetic organization of the rRNA genes in S. cerevisiae is basically similar to that found in other eukaryotes. The S. cerevisiae genome contains a cluster of approximately 140-200 repeats of a 9. l-kb rDNA unit arranged in tandem on the long arm of chromosome XI1 (7). In addition, R-loop analysis has revealed the presence in yeast cells of a small number of circular DNA molecules consisting of one or more of these rDNA units, whose function is still unclear (8).However, since replicative intermediates of these circular molecules have been observed occasionally, they may play a role in regulating rRNA gene dosage. The organization of the tandem repeats of S. cereoisiae rDNA, as well as the fine structure of one such repeat, is depicted in Fig. 1A. Each unit contains a single operon transcribed by RNA polymerase I (Pol I), which encompasses one gene each for 1 7 4 , 5 . 8 3 , and 2 6 3 rRNA3, in that order. The three genes are separated by two internal transcribed spacers (ITS1 and -9, while external transcribed spacers (ETSs) are present upstream from the 17-S gene and downstream from the 26-S gene. The latter ETS (3’ ETS) is considerably longer than originally supposed on the basis of structural analysis of the longest Pol-I precursor transcript (37-S pre-rRNA) detectable in yeast cells (see Section 1,C) (9). The ITS and ETS sequences are removed post-transcriptionally in an ordered series of steps (see Section 111,A). Pro-
3 The actual numbers by which the two high-molecular-weight rRNA species as well as the precursor transcripts derived from the rRNA operon are indicated vary slightly from one laboratory to another.
92
ti. A. R A UAND ~ R. J . PLANTA
FIG. 1. Fine structure of the rDNA repeat of the yeast S. cereoisioe. (A) Slightly more than one repeat is shown to clarify the tandem nature of the repeats in the rDNA locus. Regions encoding the mature rRNA sequences are indicated by black bars. Striped boxes represent regions involved in transcription initiation. Pol-I and Pol-111 transcription-initiation sites are indicated by bent arrows. Shaded bars correspond to transcribed, open bars to non-transcribed, spacer sequences. An exception has been made for the region between T2 and T3B, which appears to be transcribed by only a minor portion of the Pol-I molecules (see Section 1.C). TO through Tp indicate the positions of processing sites (0)and termination sites (0)for Pol I. (B) The general structure of an rRNA minigene is shown. The length of the initiating and 3'-endgenerating fragments varies, depending on the purpose of the experiment. The reporter fragment is a piece of DNA not normally present in yeast.
cessing of the 3' ETS must occur rapidly, since no precursor containing this complete ETS can be detected in uiuo. Each pair of rRNA operons in the rDNA cluster is separated by about 2 kb of non-transcribed spacer (NTS), divided into two parts (NTS1 and -2) by a single 5-S rRNA gene. Thus, in S. cereuisiae, in contrast to most other eukaryotes, the number of 5-S rRNA genes equals that of the genes for the other rRNA species. The 5-S gene is transcribed by RNA Pol I11 into a precursor having a short 3' extension (7-13 nucleotides) relative to the mature 5-S rRNA molecule (10). The direction of transcription of the 5-S gene is opposite that of the large rRNA operon. A number of additional 5-S rRNA genes differing from those in the 9.1-kb repeats are located at the telomere-proximal end of the array (11). However, it is not known whether these genes are transcriptionally active. No 5-S rRNA sequence heterogeneity within one specific S. cereuisiae strain has come to our attention.
B. Transcription Initiation by Yeast RNA Pol I The first indication concerning sequence elements important for transcription initiation by yeast Pol I came from a comparative analysis of the regions containing the initiation sites in five different, but evolutionarily relatively closely related, yeast strains (12).The Pol-I transcription-initiation
RIBOSOME BIOGENESIS IN YEAST
93
site in Saccharomyces arbb berg ens is,^ S. cerevisiae, Saccharomyces rosei, Klu yveromyces lactis, and Hansenula wingei, all members of the subfamily Saccharomycetoideae, has been mapped by SL nuclease, R-loop, and reverse transcription analyses, using the 3 7 4 precursor transcript of the large rRNA operon as a reference. This precursor carries a 5‘-triphosphate group, indicating that it has not been subjected to processing at its 5’ terminus (13,14). In each case, the 5’ end of the 37-S precursor mapped at a unique position located within a 23-bp sequence (positions -9 to $14 with respect to the start site) that, in contrast to the adjacent NTS and ETS sequences, is strongly conserved in all five strains. Therefore, this conservation was taken to indicate an important role for the sequence in Pol-I transcription. Functional analysis of the yeast Pol-I promoter has been limited almost exclusively to in uivo experiments, using cells transformed with mutant rRNA genes, although in vitro systems based on permeabilized cells (15), isolated nuclei (16), or crude cell extracts (17) have been described. However, these systems suffer from the drawback that they are too inflexible to allow detailed mutational analyses. Only recently has a yeast in vitro system been developed that exhibits faithful transcription by Pol I (18). Therefore, most of our present knowledge of the yeast Pol-I promoter is derived from in vivo experiments using cells transformed with genes carrying mutant Pol-I promoters. However, it was necessary to find a way to distinguish the transcripts of these mutant genes from those produced by the inevitable excess of wild-type chromosomal rDNA units. This problem was solved by using the “rRNA minigene assay system” (19-21). The basic minigene (Fig. 1B) consists of two rRNA regions, an “initiating fragment” and a “3’-end-generating fragment” separated by a “reporter” sequence of foreign origin that allows specific detection of the minigene transcript by Northern hybridization. The initiating and 3’-end-generating fragments consist of the 5’ and 3’ ends of the rRNA operon, respectively, each flanked by a length of spacer sequence sufficient to include the elements required for correct initiation on the one hand and formation of a discrete 3’ end on the other. The minigenes of rRNA can be introduced into yeast cells using either episomal (both YEpand YCp-type) or integrating vectors (20, 22, 23). In all cases, the minigenes are indeed transcribed by Pol I, as shown by the insensitivity of their transcription to high concentrations of a-amanitin (20, 22). Using the minigene system, the borders of the yeast Pol-I promoter were defined by deletion mapping (24). Starting with an initiating fragment encompassing the sequence between positions -207 and + 128 with respect to the transcription start site, we found that transcription remained unaffected 4 Recently, it has become clear that S. carlsbergensis is in fact not a separate species, but should also be classified as belonging to the species cerevisiae.
94
H. A. RAUt AND R. J . PLANTA
by 5’ deletions up to position -149. However, removal of an additional 16 bp reduced transcription by more than 80%, indicating that the sequence between positions - 149 and - 133 contains an important part of the yeast Pol-I promoter. Analysis of 3’-deletion mutants showed that no more than 15 bp downstream from the start site are required for accurate and efficient transcription. This downstream border of the Pol-I promoter coincides with the 3’ end of the conserved sequence identified in the comparative studies described above. To analyze the fine structure of the yeast Pol-I promoter, a series of linker-scanning mutants was constructed covering the region between - 146 and +8 (25). When the transcription of minigenes carrying these mutant promoters (in the context of an otherwise wild-type, full-length NTS), was quantified, we found that the yeast Pol-I promoter consists of three distinct domains (Fig. 2). Mutations in either domain I (-28 to +8)5 or domain I1 (-70 to -50) had a severe negative effect on the efficiency of transcription, reducing the amount of correctly initiated transcripts to 4-10% of the control. The sequence of the region separating domains I and I1 appeared not to be crucial for Pol-I transcription, although linker-scanning mutations in this region did reduce the amount of minigene transcripts by about 40%(Fig. 2). However, the length of this region is of vital importance. When the spacing of domains I and I1 was increased by 4 bp (i.e., almost one-half of a helix turn), transcription was reduced to less than 10% (25), suggesting that the formation of a stable transcription complex depends either on simultaneous binding of a single trans-acting factor to both domains, or on interaction between different trans-acting factors binding to these two domains at the same face of the DNA helix. Mutations in domain I11 (- 146 to -76) were considerably less debilitating than those in domains I and I1 (Fig. 2), with the strongest effect shown by mutations at the 5’ end of the domain (-146 to -134). This result is in agreement with the deletion mapping experiments discussed above. The relatively high activity displayed by mutant - 102 to -91 might indicate that the 3’ border of domain I11 is located somewhat farther upstream than position -76. So far, our results do not allow us to conclude whether the spacing between domains I11 and I1 is important. The structure of the yeast Pol-I promoter as revealed by these deletion mapping and linker-scanning experiments shows a considerable degree of similarity with the general model derived from in oitro studies of insect and vertebrate Pol-I promoters (reviewed in 26), which may extend to plants as well (27, 28). In this model, a core element (approximately -40 to - 10) acts as the binding site for a species-specific transcription factor (factor D as used 5
The indicated borders of the yeast Pol-I promoter domains are approximate.
RIBOSOME BIOGENESIS I N YEAST
95
FIG. 2. Linker-scanning analysis of the yeast Pol-I promoter. (A) Bars indicate the relative efficiencyof transcription of the various linker-scanning mutants analyzed. The controls, shown to the left, were arbitrarily set at 100%.SIRT is a minigene containing the full-length, wild-type NTS, while NIRT carries a linker in the S m d site located outside the promoter proper (position -208). The position of the linker in the various mutants is indicated by the rectangles. The solid rectangle indicates the - 102 to -91 mutant mentioned in the text. Numbers show the distance (in base-pairs) to the transcription-initiation site, which is indicated by the bent arrow. Asterisks show the limits of the promoter region as determined by deletion mapping. (B) Schematic representation of the domain structure of the yeast Pol-I promoter. (Adapted from 25.)
in 29), while an upstream element (approximately - 150 to - 110) increases the efficiency of transcription, when tested under stringent conditions, by binding of additional transcription factors. The yeast domains I and I1 together, although extending somewhat farther upstream, probably correspond to the core element. It should be noted that in Xenopus Zueuis, the core element may also consist of two, albeit more closely spaced, domains (30, 31).Yeast domain I11 appears to be analogous to the upstream element. So far, no proteins binding to any of these domains have been identified in yeast. Although the region between positions -149 and +15 is both necessary and sufficient for accurate initiation of transcription by yeast Pol I, initiation is stimulated about 10- to 20-fold by the presence of an 160- or 190-bp (depending on the type of transcription unit) EcoRI-Hind111 fragment located about 2 kb upstream from the start site (22, 23, 32; cf. Fig. 1).Again, minigene constructs showed that this element acts as a true enhancer, since its stimulating effect is independent of position or orientation, although the degree of stimulation is influenced to some extent by the nature of the sequence context (22,23). Moreover, a single enhancer stimulates transcrip-
96
H. A. RAW)? AND R. J. PLANTA
tion of both copies of a tandem repeat containing two minigenes, irrespective of its location upstream, downstream, or between the two genes (33). The enhancer segment contains binding sites for two different proteins (32,34)called REBl (or RBPl) and REB2, respectively, that can be detected in crude yeast extracts by gel-retardation analysis. DNase I and chemical footprinting showed the REBl (RBP1) site to encompass a 14- to 20-bp region starting 10 bp downstream from the EcoRI site marking the 5’ end of the enhancer fragment (32, 34). The REBB site (-20 bp) is located some 10 bp downstream from the REBl (RBP1) site (34). Interestingly, a second REBl (RBP1) binding site is centered on position -212 upstream from the transcription-initiation site (32, 34). The two sites display considerable sequence similarity, but have opposite orientations. In particular, an 8-bp sequence in the middle of the binding site is absolutely conserved. Gelretardation assays showed that the promoter-proximal site has a 4- to 5-fold higher affinity for REBl (RBP1) compared to the site within the enhancer fragment (32, 34). Although the promoter-proximal REBl (RBP1) binding site is located outside the actual promoter region, as defined by deletion mapping and linker scanning, its presence suggests a function for REBl (RBP1) in enhancement of rRNA transcription. This suggestion was reinforced by our observation that both REBl (RBP1) sites are actually occupied in uiuo (32). However, when we deleted either part or all of the enhancer site, or destroyed the promoter-proximal site by insertion of a linker into the SmaI site at position -208, there was no detectable effect on the efficiency of minigene transcription (32). A longer 5‘ deletion of the enhancer, which also encompasses about half of the REB2 binding site, reduces transcription by a factor of only about four, which makes it unlikely that REBB binding is essential for enhancer activity (23).Moreover, this reduction may well be an overestimate, since deletions extending to within the REBl (RBP1) binding site, which in our hands did not affect enhancer activity, have been reported (23)to reduce transcription to about a third. The most likely explanation for this discrepancy is that the authors did not correct for differences in copy number. We have found that copy numbers of rDNA-containing centromeric plasmids, as used in 23, can differ significantly from one transformant to another (35). In contrast to deletions at the 5’ end, even small deletions at the 3’ end of the enhancer segment cause a drastic reduction in its stimulator y activity (23).Therefore, the main element responsible for enhancer function appears to reside closely upstream from the HindIII site located about 300 bp downstream from the 3’ end of the 26-S gene. It is interesting to note that the yeast enhancer segment contains additional functional elements. First, the enhancer segment also is one of a pair of rDNA fragments (the other one being a fragment spanning the Pol-I
RIBOSOME BIOGENESIS IN YEAST
97
transcription-initiation site) that stimulate local mitotic recombination between homologous sequences when inserted together at different sites in the yeast genome (36).This HOTl activity appears to be due to Pol-I transcription of the region adjacent to the insert containing the two homologous regions. The authors suggested that HOTl activity might be responsible for maintaining sequence homogeneity within the rDNA cluster. On the other hand, mitotic recombination within the yeast rDNA cluster was strongly suppressed both by the combined action of DNA topoisomerases I and I1 (37) and by the product of the S I R 2 gene (38),which sheds some doubt on this proposed role of HOTl. Second, a replication-fork barrier lies about midway in the enhancer region (39). However, the most noteworthy additional functional element present in the enhancer region is the major terminator for Pol I, discussed in more detail in Section I,C, which is located about 100 bp downstream from the 5’ border of the enhancer. The close proximity of the Pol-I transcription terminator and enhancer has led us to propose a model for enhancer function that envisages a functional linkage between enhancer and promoter (40) (Fig. 3). In this model, each rDNA transcription unit is thought to form a loop that juxtaposes its enhancer/terminator and promoter elements. Pol-I molecules that have traversed the rDNA unit and have reached the terminator are supposed to be transferred back to the promoter. The enhancer might be actively involved
FIG.3. Model for stimulation of yeast Pol-I transcription by the enhancer located in NTSI. T-E, Terminatodenhancer element; P, promoter. (Adapted from 40.)
98
n. A. mud AND R. J.
PLANTA
in this transfer or, alternatively, might just serve to bring terminator and promoter together, thus generating a local high concentration of Pol-I molecules. This “loop-out” model is supported by the recent observation that both Pol-I and Pol-I1 enhancers can exert their stimulating effect in trans (41,42).The model also takes into account the presence of the independently transcribed 5-S rRNA gene, which forms part of a second loop (Fig. 3). As mentioned above, destruction of either one of the two REBl (RBP1) binding sites leaves enhancer function in minigenes unaffected (32). However, the minigene experiments cannot be taken as absolute proof for the absence of any involvement of REBl (RBP1) in enhancement of transcription of the genomic rRNA operons. It should be kept in mind that the minigenes used in these studies were carried on episomal plasmids, causing their topology and cellular location to differ substantially from that of the genomic rDNA units, which form an array of tandem repeats in the nucleolus. The loop-out model allows for a role of REBl (RBP1) in enhancer function, for example, by anchoring the enhancedterminator and promoter elements of the genomic rRNA operons to the nuclear matrix at closely neighboring sites. However, the fact that REBl (RBP1) is dispensable for transcription enhancement in minigene constructs implies that some sort of interaction between enhancedterminator and promoter occurs even when they are unable to bind this protein. Whatever the role of the REBl (RBP1) protein, its presence is crucial for cell survival, since the gene is essential (Q. Ju and J. R. Warner, personal communication). Recently, it was found that REBl (RBP1) is identical to the Y factor (43, 44),which creates a nucleosome-free region in the GAL1-10 upstream activating sequence (45).REBl (RBP1) binding sites are present in the UASs of many yeast protein genes as well as in centromeres and telomeres (43). Although the protein itself is only a weak activator of Pol-I1 transcription, it acts synergistically with other activators to give as much as 170-fold stimulation. The loop-out model has been extended (33) to account for the observation that the enhancer element also acts from a downstream position and can even stimulate transcription of two adjacent minigenes. In this model, several (or all) enhancer/terminator and promoter elements are juxtaposed, facilitating transfer of a terminating Pol-I molecule to any of a number of different promoters. Again, one should realize that these experiments used minigenes that, although integrated into the yeast genome, were still dislocated from the natural site of Pol-I transcription. Thus, while it is clear that a functional linkage can be established in these minigene constructs between any enhancer and any promoter, irrespective of their relative location, constraints in the chromosomal rDNA locus might be more severe. We are currently trying to resolve this question by using “tagged rDNA units integrated into
RIBOSOME BIOCENESIS IN YEAST
99
the rDNA locus (35;see also Section 111,A). Since the transcripts of these units can be clearly distinguished from those of the wild-type genes, we should be able to analyze the effect of enhancer mutations on the transcription of individual rRNA operons in their natural context.
C. Transcription Termination by Yeast RNA Pol I The longest precursor rRNA molecule detectable in yeast cells ( 3 7 3 prerRNA) has a 3’ end that maps only 7 bp downstream from the 3’ terminus of the mature 26-S rRNA sequence (46). This 3’ end, however, does not correspond to the actual Pol-I transcription-termination site, a conclusion initially drawn from the observation that, in yeast cells carrying the ma82.1 mutation, minigene transcripts extending beyond this point were detected (40). The ma82.1 mutation inactivates an endonuclease first shown to be involved in 3’-end maturation of yeast 5-S rRNA (10).Apparently, the same endonuclease is involved in (rapid) 3‘ processing of longer transcripts of the rRNA operon to 37-S pre-rRNA. Analysis of minigenes in which the processing sites had been deleted (40) showed that yeast Pol I most likely terminates at position +210 downstream from the 3’ end of the 2 6 4 gene. This site, designated T2, is located about 100 bp downstream from the 5’ border of the enhancer segment (cf. Fig. 1).Some transcription up to about position +700 (T3) was also observed when T2 was deleted as well. The conclusion that Pol-I transcription proceeds a considerable distance into the NTS was confirmed by in oitro run-on experiments using both permeabilized cells and isolated nuclei (9). In good agreement with the minigene data, these experiments showed transcription of the genomic NTS sequences up to about position +700, with attenuation occurring at two sites located in regions + 18 to +270 and +270 to +721, respectively. No transcription beyond T3, which is located upstream from the 3‘ end of the 5-S gene (Fig. l), could be detected. Since neither the minigene nor the run-on analyses allow a definitive distinction to be made between a processing site and a genuine temination site, spacer regions containing T2 or T3, as well as the region containing the processing sites TO and T1 (-70 to +100; cf. Fig. 1) were subjected to functional analysis (9). Each region was cloned in the middle of the reporter sequence of an rRNA minigene, and the products of transcription were analyzed for the presence of “upstream” (3’ end at the termination site), “downstream” (5’ end at the termination site), and “read-through” transcripts by Northern blotting, reverse transcription, and S 1 mapping. The results clearly demonstrated that Pol-I transcription does pass TO and T1, marking these sites as processing sites, since downstream transcripts having a 5’ end mapping at the appropriate location could be detected by both reverse transcription and S 1 nuclease analyses. Northern blotting did not
100
n. A. MUG
AND R. J. PLANTA
detect these downstream or read-through transcripts, demonstrating that processing at these sites is highly efficient and that the downstream part of the transcript is rapidly degraded. Identification of TO and T1 as processing sites is confirmed by the recently described in uitro processing of a synthetic rRNA transcript at these sites (47). Analysis of the transcription products of a minigene into which the T2 region had been inserted revealed no downstream transcripts, leading us to conclude that T2 is an actual termination site for yeast Pol I. By the same reasoning, two further termination sites, T3A at position +690 and T3B at position +950, were identified (Fig. 1). Since data from both the run-on and the minigene experiments indicate T2 to be a highly efficient terminator, we suspect that T3A and T3B act as failsafe terminators to prevent any Pol-I molecules that escape T2 from interfering with 5-S rRNA transcription. Our own deletion mapping data (9, 40), in combination with those of others (23),confine the sequences both required and sufficient for termination at T2 to the region between positions +120 and +211. The actual requirements may be even less, since a naturally occurring deletion of basepairs +181 to f197 did not affect the function of T2 (9). Short sequence elements (12-18 bp) direct termination of transcription by Pol I in mice, X. Zueuis, and humans by binding a truns-acting factor (48-50). Our results exclude a role for REBl (RBP1) in termination by yeast Pol I, since deletion of about half of the binding site for this protein, which abolishes its interaction with the DNA, leaves termination unaffected (40). A more extensive deletion reaching about halfway into the REB2 binding site, on the other hand, abolished termination at T2 (23), suggesting a role for the REB2 protein in this process. Comparison of the sequences surrounding T2, T3A, and T3B did not reveal any significant conservation that might mark a cisacting element(s) responsible for transcription termination. However, this does not rule out the involvement of a common truns-acting factor at these three sites, since several examples of proteins that interact with DNA at specific sites in an (apparently) sequence-independent manner are known
(51-53). Whereas none of our experiments revealed any Pol-I transcription beyond T3B (+950), functional analysis of further regions of the NTS nevertheless disclosed the presence of still another (albeit, in our minigene test system, inefficient) termination site at position -300 upstream from the Pol-I transcription-initiation site (Tp; cf. Fig. 1). This situation is similar to that reported for X. Zueuis, mice, and (probably) humans (54-56).However, inactivation of Tp reduces transcription initiation at the associated start site only slightly, even in a situation in which this inactivation leads to considerable read-through transcription by Pol-1 molecules that started at the pro-
RIBOSOME BIOCENESIS I N YEAST
101
moter of a preceding minigene (9). This result, as well as the absence of spacer promoters in yeast, makes it unlikely that Tp has a function similar to that of the promoter-associated terminators in vertebrates, which stimulate transcription initiation at the downstream promoter (54,56, 57) either by “hand-over” of Pol-I molecules transcribing the spacer or, more likely, by preventing disruption by these molecules of the stable pre-initiation complex formed at the promoter (58-60). Inactivation of Tp was achieved by inserting a linker into the SmaI site at position -208 almost 100 bp downstream (9). Thus, an element essential for termination at Tp is located a considerable distance from the actual termination site. Note that, as discussed above, the same linker insertion also destroys the promoter-proximal REBl (RBP1)binding site, which may indicate a role for this protein in termination at Tp.
D. Regulation of Yeast RNA-Pol4 Transcription Since ribosomes constitute a critical component of the protein synthetic machinery, the rate of ribosome production is tightly coupled to the cellular growth rate (61-63). In prokaryotes, there is considerable evidence that the rate at which ribosomes are formed is controlled primarily at the level of initiation of rRNA synthesis (see the introduction to this article and references therein). Although it is likely that in yeast cells, too, initiation of rRNA synthesis is a central point of control in ribosome biogenesis, little is known about the specific mechanisms. Moreover, it is clear that additional important general regulating mechanisms operate in yeast, such as those directly controlling the transcription of the r-protein genes (see Section II). Growth-rate control of ribosome production in yeast has been studied mainly by analyzing the effects of a carbon-source upshift. Changing the carbon source of the growth medium from ethanol to glucose results in a rapid 2.5 to 4-fold increase in the transcription of both r-protein and rRNA genes (64).While the outlines of the mechanism stimulating transcription of r-protein genes under these conditions are becoming visible (see Section II), that responsible for the increase in rRNA synthesis remains obscure, although the Pol-I enhancer may be involved (65).In this respect, it is noteworthy that the promoter of the gene encoding the transcriptional activator TUF of the majority of yeast r-protein genes (the RAP1 gene; see Section II,B) contains a REBl (RBP1) binding site (43) which establishes a possible link between rRNA and r-protein synthesis. An even more direct link is suggested by the Fact that yeast rDNA also contains a TUF binding site about 100 bp downstream from the start of the 2 6 4 rRNA gene (66). However, although this site has a high affinity for TUF in uitro, nothing is known about
102
H. A. RAUk AND R. J . PLANTA
in uiuo binding. Possibly, TUF, a nuclear scaffold protein (67),influences PolI transcription by affecting the topology of the rDNA. The finding that REB2 is probably identical to ABFI (43), a transcriptional activator of r-protein genes not controlled by TUF (see Section II,B), constitutes another possible example of such a direct linkage between rRNA and r-protein synthesis. Finally, an indirect link via TUF may also exist, since the gene for a common subunit of yeast Pol I and Pol I11 contains a TUF binding site in its upstream region (68).Thus, rRNA transcription might also be regulated via changes in the supply of active polymerase molecules. Saccharomyces cereuisiae, like E . coli, shows a stringent response in that amino-acid deprivation leads to a specific inhibition of rRNA transcription (66, 69, 70).The search for ppGpp (or a related compound), known to act as an inhibitor of rRNA transcription in amino-acid-starved E. coli cells, was unsuccessful, however. Also, the availability of a relaxed yeast mutant strain (71)has so far failed to throw more light on the mechanism of the stringent response in yeast. A mild heat-shock (23"C+36"C) transiently decreases transcription of rRNA genes in yeast (72).Again, the basis for this regulatory response in not known. A similar effect, however, is observed for r-protein genes as well as for several genes encoding non-ribosomal proteins (73).
E. Transcription of 5-S rRNA by Yeast RNA Pol Ill As in other eukaryotic cells, transcription of 5-S rRNA in yeast is carried out by RNA Pol 111. Transcription initiation depends on the transcription factors TFIIIA, TFIIIB, and TFIIIC that form a stable initiation complex with the promoter located in the coding region of the 5-S RNA gene (74). TFIIIB is the actual transcription activator protein, while TFIIIA and TFIIIC are required for assembly of TFIIIB into the transcription complex (75).Since yeast 5-S rRNA inhibits its own synthesis in uitro (76),it is likely that production of 5-S rRNA in oiuo is tuned to that of the other rRNA species by the same feedback system operating in vertebrate cells, in which excess 5-S rRNA sequesters TFIIIA (77). It is unclear how Pol-I and Pol-111 transcription remain balanced in a change in physiological conditions that either increases or decreases the cellular demand for ribosomes, although the TUF-controlled production of their common subunit (see Section I, D) may be involved. However, there is no obligatory coupling between Pol-I and Pol-I11 transcription. A yeast strain carrying a temperature-sensitive PolI11 mutation continues to produce 37-S rRNA for a considerable time when placed at the restrictive temperature (78). Although each yeast 5-S rRNA gene is part of an rDNA repeat, transcription of these genes does not respond to the Pol-I enhancer (65).
RIBOSOME BIOGENESIS I N YEAST
103
11. Expression of Ribosomal-protein Genes A. Genetic Organization and Structure of Yeast Ribosomal-protein Genes Whereas the equimolar production of the 17-S, 5.8-S, and 26-S rRNA species is ensured by the organization of these genes into multiple tandem copies of a contiguous transcription unit, no such linkage exists for the genes encoding the r-proteins (rp-genes). Even in the few cases in which two rpgenes are located in each other’s immediate neighborhood, they are independently transcribed. Such an arrangement has been reported for the genes encoding r-proteins L46 and S24, which are only 640 bp apart and arranged in a head-to-head fashion (79). A similar situation holds for the genes encoding rp29 and L32, separated by a 600-bp intergenic region (80). Finally, there are two pairs of RP28-S166 genes linked in a head-to-tail fashion, again by about 600 bp of DNA (81,82).A recent remarkable finding is that two yeast r-proteins, S37 from the small and an as-yet-unidentified species from the large subunit, are encoded as the carboxy-terminal part of fusion proteins with ubiquitin (83, 84), a situation also encountered in mammals and plants (83, 85). About three-quarters of the r-proteins in S. cerevisiae that have been analyzed genetically are derived from duplicate genes. In all cases, both gene copies are functional, although their expression levels often differ considerably (86-90). The duplicate copies encode identical, or virtually identical, proteins that are functionally indistinguishable (89, 91). However, their intron, leader, and trailer sequences display substantial differences. The group of r-proteins encoded by single-copy genes so far consists of L3,L29, L25 (92), L32 (go), the ubiquitin-fused S37 (84), and the phosphoproteins L44, L44’, and L45 (93),as well as the related acidic proteins A0 and A 1 (94). Only a few yeast rp-genes have as yet been mapped to specific chromosomes. The first of these were the ones conferring resistance against the antibiotics trichodermin (U), cycloheximide (L29),and cryptopleurin (RP59). The former two-single-copy ones-are located on chromosomes XV and VII, respectively (95, 96). One of the duplicate copies of the RP59 gene has been mapped to chromosome I11 (97). One of the two RP28-Sl6 gene pairs resides on chromosome XIV, the other on chromosome XV (82).The genes for the acidic proteins L44’ and L45 were both traced to chromosome IV, while that for L44 is located on either chromosome VII or XV (98). The general structure of yeast rp-genes is depicted in Fig. 4. The most striking feature of this structure is the presence of an intron close to the 5’ fi
S16 was formerly called rp55.
104
H . A. RAUg AND R. J. PLANTA
I ,
UAS
FIG. 4. Generalized structure of a yeast rp-gene. Coding sequences, the intron, and the leader region are indicated by the vertically striped bars, solid bars, and an open bar, respectively. Highly conserved nucleotides in the intron are shown. Boldface nucleotides are universally conserved. The diagonally striped boxes represent UAS elements; the dotted box, the Trich region. Average distances are indicated in base-pairs. Py, Pyrimidine.
end of the gene, since introns are rare in yeast non-ribosomal genes (reviewed in 99). Of 39 rp-genes analyzed so far, 28 contain an intron, usually within 10-20 codons downstream from the translation start. In two instances (ZIP29 and S31), the intron is located in the 5’-untranslated leader sequence of the gene. In the case of two of the ubiquitinlr-protein fusions, the intron is present in the ubiquitin-encoding part of the gene. The UB13 gene encoding S37 does not contain an intron (82). The common occurrence of an intron in rp-genes has led to speculation about a regulatory role for splicing in rprotein synthesis. So far, however, only one instance of regulation at this level has been uncovered (100; see Section 11,C).
B. Transcriptional Regulation of Ribosomal-protein Synthesis Despite the difference in gene copy number, the cellular levels of the mRNAs for several different yeast r-proteins (rp-mRNA) are approximately equal (101).Since the various rp-mRNAs have about the same stability ( l o ] ) , this suggests that the rate at which r-proteins are produced is controlled primarily at the level of transcription. Over the past few years, ample evidence supporting this suggestion has been accumulated, although additional regulation at the post-transcriptional level has also been detected (see Section 11,C). The first intimation of possible cis-acting elements involved in transcriptional control of r-protein synthesis came from a comparison of the 5’-flanking sequences of 21 different yeast rp-genes (102, 103). This comparison revealed two conserved elements, at that time designated HOMOLl and RPG boxes, located 250-450 bp upstream from the ATG start codon (cf. Fig. 5). Deletion mapping and linker-scanning experiments, carried out on several different rp-genes (104-108), demonstrated that these elements indeed play a central role in controlling rp-gene expression and, therefore, should be considered upstream activating sequences (UASs). These experiments also revealed a third element in the form of a T-rich region located closely downstream from the UAS elements (104; cf. Fig. 4). Similar regions act as promoter elements in other, non-ribosomal, yeast genes (109). Although virtually all yeast rp-genes possess a T-rich region (104-109),the
RIBOSOME BIOCENESIS I N YEAST
105
relative importance of this element seems to vary from one gene to another. Deletion mapping of the RP59 promoter showed 40% residual transcription when only the T-stretch was still present (106). Similar experiments on the L25 promoter, however, demonstrated that the T-stretch by itself does not support significant transcription of this gene (105).Nevertheless, in this case, as well as that of the L16’ (104)and L29 (107) genes, the T-stretch appears to act in conjunction with the UAS elements to promote transcription. On the other hand, the intergenic region separating the S24 and LA6 genes contains only a single T-stretch, located between the (duplicate) UAS elements and the transcription start of the LA6 gene. Since the same UAS elements are essential for expression of the S24 gene (108), either S24 gene transcription does not require a T-stretch, or the T-stretch can also function from a position upstream from the UAS elements. The latter possibility seems less likely in view of the highly conserved 5’+3’ order of UAS elements and T-stretch in rp-genes. The HOMOLl and RPG boxes, although originally considered distinct elements, in fact are different representatives of the same functional UAS (now called the RPG box) that acts as a binding site for an abundant transcription factor variously known as TUF, SBF-E, M P 1 , GRFI, and TBA (this plethora of aliases is due to the multihnctionality of this factor, discussed further on in this section). The first indication for the existence of this transcription factor came from the analysis of protein-DNA interactions in the 5’-flanking sequence of the TEFl and TEF2 genes encoding yeast elongation factor E F - l a (121,112).Using footprint and gel-retardation analyses, a protein, called TUF, was identified that bound to both the HOMOLl and RPG boxes present in the TEFl and TEF2 genes, as well as the RP5lA rp-gene. Subsequently, the same protein was shown to bind to the RPG boxes of various other yeast rp-genes (113).Sequence comparison, as well as determination of the relative affinity of various different RPG boxes for the protein (108),led to the following consensus binding sequence: 5’-ACACCCATACATTT-3’. (The most strongly conserved nucleotides are shown in boldface type. ) With only a few exceptions, yeast rp-genes carry two copies of the RPG box (110) in either orientation. Although there is no evidence for a cooperative effect between the two copies in binding of the TUF factor (113),both contribute synergistically to the final level of transcription, albeit often to a different degree. Removal of the upstream copy from the L16 gene, for example, reduced transcription by more than 90%, while deletion of the downstream copy led to a reduction of only about 30% (104). Similar results were obtained for the L25 gene, although in this case the downstream copy is 7
L16
was formerly
called rp39.
106
H . A. RAUt AND R. J. PLANTA
the stronger (114).The differences in activity are probably due to differences in nucleotide sequence of the RPG boxes, which affect the affinity for the TUF factor (113) and possibly the orientation of a particular RPG box. Replacement of the two naturally occurring copies of this element in an L25galK fusion gene by a single synthetic copy in either orientation revealed a clear orientation-dependent difference in activity (114). Also, a certain distance-dependent variation in activity cannot be excluded, since the stimulatory effect of the synthetic RPG box on transcription of the L25-gaZK fusion gene was found to diminish when its distance from the ATG start codon was reduced to about 100 bp (114). Apparently, the various factors contributing to promoter strength have been carefully balanced in the various yeast rp-genes to ensure the production of (approximately) equimolar amounts of all rp-mRNAs. In cases in which the r-protein is encoded by duplicate genes, the contribution of the individual gene copies to the final amount of mRNA is not necessarily equal. The two RP51 gene copies produce 40% and 60% of the rpdl-mRNA, respectively (86),while for L16 and RP28 genes, this ratio is about 30:70 and 15:85, respectively (89, 115). Again, the same factors mentioned above are probably responsible for these differences. The structures of the two RP28 promoters are particularly suggestive of this idea, since the most highly expressed copy contains two RPG boxes with high affinity for the TUF factor, whereas the other carries only a single low-affinity site (113). However, no comparison of promoter strength between promoters that differ only in their RPG box(es) has so far been described. TUF binding regions are not only able to combine with themselves in a synergistic manner to create stronger activation sites, but also with other sites, notably a T-rich sequence. The basis for this effect is not yet clear, but it may well involve binding of additional protein factors, albeit not in a cooperative manner (116). There is evidence that the rapid increase in transcription of rp-genes observed after a nutritional upshift is mediated through the RPG boxes. Removal of the two RPG boxes of the L25 gene abolished the typical response of this gene to a carbon-source upshift. The response could be restored by insertion of a synthetic RPG box, even in the absence of the T-stretch, demonstrating that a single element is sufficient (115). Similar results were obtained for one of the S16 genes ( I 1 7). However, the transient reduction in rprotein synthesis caused by a heat-shock, although in part due to a lowering of the transcription rate, is not mediated by the RPG boxes (73;see also Section 11,C). Although the rate of rp-gene transcription can be adapted to changes in the physiological conditions via the RPG boxes, it cannot be adjusted further once these conditions are stable. This was demonstrated by inactivating one
RIBOSOME BIOGENESIS IN YEAST
107
of the two gene copies encoding rp51 (86),L16, or rp59 (89).Such inactivation caused a reduction in the steady-state level of the pertinent rp-mRNA, demonstrating that there is no increase in transcription of the single surviving gene copy to compensate for the deletion (see also Section II,C). So far, four yeast rp-genes have been found that do not carry any RPG box in their 5'-flanking sequence. The first of these were the genes encoding S33 and L3 (113,218).In both cases, again an abundant protein factor, called S U F or TAF, respectively, that binds to a specific sequence in the upstream region of the gene was identified. We have recently demonstrated, by competition-binding experiments, that, in fact, SUF and TAF are the same protein ( 119). Footprinting and methylation interference analyses ( 119) identified the binding site as RTCRYN,ACG (R = purine; Y = pyrimidine; N = nucleotide). The rp-genes for L2 [two copies (120)]and L45 contain this same protein binding site (121,122;our own unpublished results). However, analysis of the role of the UAS (UAS,) for this set of r-proteins has not progressed to the level reached in the studies of the RPG box. As far as S33 is concerned, removal of the UAS, lowers transcription by as much as 80% when cells are grown in 0.04% glucose/2% ethanol. Upon growth in 2% glucose, however, deletion of the UAS, results in only a slight reduction in transcription. Under these conditions, transcription appears to depend on a cis-acting element located downstream from the UAS, , that binds another protein factor (123).Nevertheless, the carbon-source upshift effect, which is normal for S33, depends on the presence of the intact UAS, (123).In contrast to our observations on S33, deletion of the UAS, from the L.3 gene causes a drastic reduction in transcription even in cells grown on 2% glucose
(118). Similar to TUF, S U F binding sites can combine synergistically with themselves and other activator sites to form elements that promote transcription more strongly (116). A solitary S U F site constitutes a relatively weak activator site (5- to 30-fold weaker than a high-afEnity TUF site), but the synergism (e.g., with a T-rich sequence) can cause transcription to rise to a level 4-10 times the sum of the levels obtained with each of the elements separately. As already demonstrated by the occurrence of RPG boxes in the 5'flanking sequences of the TEFl and TEF2 genes, the role of TUF is not limited to transcription activation of yeast rp-genes. The same holds true for SUF/TAF. In fact, both proteins are involved in either positive or negative regulation of quite a number of other, non-ribosomal, genes, and are implicated in the control of several more. Furthermore, both TUF and SUF/TAF play additional important roles in the cell. TUF acts as a nuclear scaffold protein (67) and binds to yeast telomeres (124, 125), whereas SUF/TAF is implicated in DNA replication because it interacts with the conserved B
108
H . A. R A U t AND R. J. PLANTA
domain of a number of ARS elements (125-128). This multifunctional character of the two proteins has caused them to be known by a variety of aliases (Table I). Two proteins (called SBF-E and SBF-B or GRFI and ABFI, respectively) that recognize the same binding sites as TUF and SUF/TAF, play an important role in the functioning of the silencer that suppresses the H M k and HMRa mating-type loci (66, 125, 129, 137). This function of TUF is correlated with its ability to act as a nuclear scaffold protein (67). Since the sequence to which SBF-E binds acts as a transcription activator site when taken out of its context in the silencer, the protein binding to this site was also designated RAPl [for repressor activator protein (129)].This name now appears to be generally accepted. The SUF factor now is generally designated ABFI (for ARS binding factor). RAPl binding (=RPG box) as well as ABFI binding (=UAS,) sites have been found in the 5'-flanking sequences of a large number of yeast genes by sequence comparison (see 121,133 and 138 for recent compilations). Among these are genes encoding components of the transcriptional and translational (including the rp-genes) machinery, as well as genes involved in cellular differentiation, the cell division cycle, nutrient transport, and glycolysis. A TABLE I NOMENCLATUREAND MOLECULAR WEIGHTS OF TUF AND SUF FACTORS" Nameb TU F SBF-E RAPl
MW (kDa)c 150 92.5
120 GRFI TBA
Reference
Nameb
1 1 1 , 112 129 130, 131 131 125 124
SUF TAF SBF-B ABFld
MW (kDa)c
147
135 81.6
GFI BAFl TyBF OBFl
120 123
Reference 113 118 129 125, 126 127 132 133 134 135
"Abbreviations are defined in footnote 1 on p. 89. bIt should he noted that definitive experimental proof for the fact that the proteins listed in this column are indeed identical has not been provided in all cases. GMolecular-weight values were generally derived from electrophoretic analysis on SDSlpolyacrylamide gels. Boldface numbers indicated values calcualted from the nucleotide sequence of the gene. dThere are indications that ABFI may he present in two different forms in the cell, both of which are functional (116, 136).
RIBOSOME BIOGENESIS IN YEAST
109
number of nuclear genes encoding mitochondria1 components also contain ABFI binding sites (132). The common feature linking these genes is their involvement in cell growth. Although there is experimental proof for only some of the non-ribosomal genes carrying this site that the RAPl and ABFI binding sites are actually involved in transcriptional activation (see 66, 1 2 5 , 133, 138 and 139 and references therein), there is little doubt that both proteins act as global regulators of gene activity. It is therefore not surprising that both RAPl and ABFI are essential for cell survival (127, 131). The nature and efficiency of their regulatory function must be determined by interaction with additional, more specific, factors, that, for the most part, have yet to be characterized. The SIR gene products required for silencer function are likely to be such factors (66). Furthermore, an 82-kDa protein that is crosslinked to both RAPl and ABFI when bound at their respective sites has been identified (118). Finally, the GCRl gene product has been implicated in RAP1-, but not ABFI-, mediated regulation, particularly of the A D H l , T E F l , T E F 2 , and RP59 genes (140). As discussed above, the rapid elevation of rp-gene transcription in response to a nutritional upshift is mediated through the RPG box or UAS, However, the precise role of the RAPl and ABFI factors in this response is not yet clear. The amount of (active) RAPl varies with growth rate (139,141). However, the kinetics of this variation and those of the increase in rp-gene transcription are considerably out of step, the latter being much more rapid than the former (141). Tseng et al. (139) suggested that post-translational modification (phosphorylation) of RAPl might be involved in regulating its activity. On the other hand, an increase in growth rate does cause the level of ABFI to rise rapidly. The question, therefore, remains as to how expression of the two classes of rp-genes, one carrying the RPG box, the other, UAS, elements, is coordinated.
C. Post-transcriptional Regulation of Ribosomal-protein Synthesis As discussed above, transcriptional regulation occupies the foremost position among the processes controlling r-protein synthesis in yeast. However, additional regulation does occur at several post-transcriptional levels, to wit splicing, rp-mRNA stability, and stability of the r-protein. The mechanism of nuclear mRNA splicing in S . cereoisiae, which is almost exclusively limited to rp-gene transcripts, has recently been reviewed (99).There are no fundamental differences between this process in yeast and that in other eukaryotic cells, except that, unlike the situation in metazoan cells, the 3’ portion of the intron (downstream from the branch point) appears not to be involved in the initial phases of yeast spliceosome assembly. Furthermore, the yeast splicing machinery appears to be more fastidious in its sequence requirements. Sequence conservation at the 5’ end and particu-
110
H. A. RAUd AND R. J . PLANTA
lady around the branch point is much stronger in S. cereuisiae than in metazoan pre-mRNA introns (cf. Fig. 4). Despite the fact that introns in yeast occur almost exclusively in rp-gene transcripts, splicing plays only a minor role in the control of yeast r-protein synthesis. In general, splicing is not the rate-limiting step in the production of mature mRNA (142),although splicing of L29 pre-mRNA seems to be an exception (143).However, there is one well-documented example of splicing regulating the production of an r-protein. Introduction of extra copies of a cDNA encoding r-protein L32 into yeast cells did not elevate the level of the protein, but did cause an increase in the level of L32 pre-mRNA (100). Since this effect was not observed when a cDNA clone lacking the ATG translation start codon was used, it was concluded that the L32 protein regulates its own synthesis via feedback inhibition of its pre-mRNA splicing. This regulatory mechanism resembles the translational feedback that plays a central role in controlling r-protein production in E. coZi. It is reasonable to assume that the feedback regulation of L32 pre-mRNA splicing is based on competition between a pre-ribosomal particle and the precursor mRNA transcript for binding of L32. Evidence in support of this mechanism was recently supplied by the observation that blocking rRNA synthesis in a yeast mutant containing a conditionally expressed Pol-I subunit gene also leads to accumulation of L32 pre-mRNA (144). Regions of L32 pre-mRNA important for regulation of its splicing are located close to its 5' end, as well as near the 5' splice site (F. Eng and J. €3. Warner, personal communication). It is not yet clear whether L32 can bind directly to (pre-) rRNA. Of course, such direct L32/rRNA interaction is not essential for competition-based feedback regulation. Binding of L32 to the pre-ribosomal particle via proteidprotein interaction would serve just as well. Feedback regulation of splicing has also been observed for r-protein L1 from X. Zaeuis (145),and a structural similarity between the two introns in question and X . laeuis 2 8 4 rRNA has been noted (146). So far, however, there is no experimental evidence that L1 pre-mRNA and 2 8 4 rRNA directly compete for binding of L1. Subjecting yeast cells to a mild heat-shock (23"+36"C) leads to a severe, though transient, decline in r-protein synthesis, in part due to a temporary, non-rp-gene-specific, transcriptional arrest (73). In addition, however, rpmRNAs are specifically destabilized. The half-lives of S10 and L25 mRNA decrease by a factor of about three. When transcription is blocked prior to, or concomitant with, the heat-shock, either by addition of 1,lO-phenanthroline (73) or by using a temperature-sensitive Pol-11 mutant (147, 148), the stability of rp-mRNA does not change. Thus, de nouo synthesis of a protein factor seems to be required to effect the (temporary) destabilization of rp-mRNA by heat-shock. Both the nature of this factor and its target on the rp-mRNA remain to be identified.
RIBOSOME BIOGENESIS IN YEAST
111
When yeast cells are supplied with supernumerary copies of an rp-gene, the level of the corresponding rp-mRNA increases proportionally, but the amount of r-protein does not (143, 149-153). Initially, translational control was thought to be responsible, at least in some cases, notably that of L3 (143). Later experiments, however, refuted this notion and, using pulsechase techniques, demonstrated that r-protein levels are kept constant by the rapid turnover of excess protein (149-153). As discussed above, L32 is the only known exception to this rule. Yeast r-proteins, except for the acidic species (see below), appear to be inherently unstable when not immediately assembled into ribosomal precursor particles, a feature also apparent when synthesis or processing of rRNA is blocked (144, 154). The molecular cause of this extreme instability [t,,, = 12 minutes (143, 150, 151)] is unknown. The amino-terminal amino acids of the large majority of yeast r-proteins do not belong to the set conferring instability on a test protein in yeast (91, 155). Possibly, r-proteins contain a destabilizing element within their amino-acid sequences (P. J. Schaap, J. van’t Riet and H. A. RauC, unpublished). Anyway, the data discussed above clearly show that yeast cells fine-tune the supply of most r-proteins by an assembly-mediated rapid degradation of any excess produced. The exception is the set of acidic r-proteins, the only species for which a significant pool of free protein is present in the cytoplasm (156-158). The reason for this excess has yet to be elucidated. Transcriptional regulation of the genes encoding the acidic rp-genes appears to conform to the normal pattern using either RAP1 or ABFI as activators (122;our own unpublished results). Differences in post-transcriptional regulation are probably thus responsible for the “abnormal” amounts of these proteins. Experiments in which the minor (in terms of mRNA production) duplicate copy of an rp-gene was inactivated without a significant effect on growth rate (86, 89, 90) suggest that, even in wild-type cells, r-protein production is (slightly) in excess of actual need. On the other hand, no dosage compensation occurred at the level of translation when the supply of r-protein was curtailed either by knocking out the major copy (86, 89, 90) or by lowering the transcription of a conditionally expressed copy (159).Thus, the rate of translation of an rp-mRNA, like the rate of transcription of the rp-gene, is set at a fixed value by the physiological conditions.
111. Processing and Assembly of Ribosomal Constituents A. Processing of Pre-rRNA As discussed above, the Pol-I transcript of the yeast rRNA operon extends to position +210 downstream from the 3’ end of the mature 2 6 3 rRNA sequence. However, this transcript is cleaved very rapidly to generate
H. A. RAUd AND R. J. PLANTA
112
the 37-S precursor rRNA having seven extra nucleotides at its 3’ end. This cleavage appears to be carried out by the product of the RNA82 gene (40) also involved in 3’ processing of yeast 5-S rRNA (10).The same endonuclease carries out 3’ processing of the upstream member of a pair of dimeric yeast tRNA genes as well (160).Whereas minigene transcription in wild-type cells produces transcripts having 3’ ends at positions -1 (TO, corresponding to the 3’ end of the mature 26-S sequence) and +7, in rna82.1 mutant cells transcripts ending at positions +15 to +50 (site T1; cf. Fig. 1) abound. However, 3’-end formation at - 1 and + 7 is not completely abolished by the mutation, indicating that yeast cells have the means to circumvent the rna82.1 defect. Normal 3‘ processing of the Pol-I transcript thus appears to entail endonucleolytic cleavage at positions + 15 to +50 by a so-far-unidentified nuclease, followed by further endonucleolytic cleavages at positions + 7 and -1 carried out by the RNA82 gene product. Alternatively, the mature 26-S rRNA terminus might be produced by exonucleolytic removal of the final seven extra nucleotides. Yip and Holland ‘(47) recently reproduced these processing events in uitro, using a partially purified yeast extract. Upon incubation of a synthetic minigene transcript extending to position +210, they observed 3’-end formation at positions -1, +7, and +12. In contrast to the results obtained in uiuo, processing products -70 and 10 nucleotides in length, respectively, did accumulate in uitro. The former should be due to endonucleolytic cleavage in the +15 to +50 region. The identity of the latter was not established, but it could be the product of endonucleolytic cleavage of the 37-S pre-rRNA that generates the mature 3’ end. Formation of 3’-ends at -1 and + 7 requires the presence of an element(s) located between positions -36 and +74 (40)that acts in an orientation-dependent manner (9,47). An imperfect helix extending from positions +9 to +55, conserved in even distantly related Saccharomycetoidae, might constitute such a cis-acting recognition signal (40).Furthermore, this processing depends on the presence of Mg2+ ions. No evidence for the involvement of an RNA-containing component has been obtained so far (47). Figure 5 shows the sequence of processing events leading to the formation of mature 17-S, 26-S, and 5.8-S rRNAs from yeast 37-S pre-rRNA (161, 162). First, virtual simultaneous cleavages at sites A1 and A2 remove the complete 5’ ETS and separate the 17-S and covalently linked 5.8-S and 26-S rRNA sequences. A second pair of (almost) simultaneous cuts at positions B1 and B2 produces the mature 5’ end of 5.8-S rRNA and the mature 3’ end of 26-S rRNA, respectively. A third pair (C1 and C2) then results in the removal of most of ITS2, uncovering the mature 5’ end of 26-S rRNA and producing the 7-S precursor of 5.8-S rRNA. Subsequent cleavage at C 3 leads to the formation of mature 5.8-S rRNA. All cleavage events mentioned so far
-
113
RIBOSOME BIOGENESIS IN YEAST
M 500 bp
5.5 s a
32 S
-I
A
18s
A
A
A1
A2
A
Bi
' A 82
37 s 29 SA 29 SB
M
c1 c 2 7s
26 S
m-
A
c3
nucleus cytoplasm
18s
A _.
A3
I
5 s
- I
17 S
5.8 S
26 S
FIG.5. Processing of the primary Pol-I transcript in yeast to the mature rRNA species. The top line shows a single rDNA repeat acmrding to the conventions used in Fig. 1. The Pol-I transcription-initiation (bent arrow) and termination (T2) sites are indicated. Below are shown the various processing intermediates identified in yeast cells. The hypothetical primary transcript is shown shaded. Thick bars correspond to the mature sequences; thin bars to the spacer sequences. Processing sites are indicated by arrowheads. The names of the various intermediates are shown to the left and right. The structure of the unusual 3 2 4 precursor produced when site A2 has been deleted (see Section III,A) is outlined above the 3 7 4 precursor.
occur in the nucle(o1)us. The final processing of the 18-S precursor to 17-S rRNA by cleavage at A3, however, is a cytoplasmic event (163). The yeast rRNA processing scheme differs in several respects from that worked out for mammals. In this case, the primary processing event appears to be removal of the 5' ETS by means of two or three consecutive endonucleolytic cleavages (164-166),followed by removal of the 3' ETS, again in two steps (167). Further processing occurs at, or very close to, the ends of the various mature rRNA sequences (168). Processing of Tetrahymena thermophila rRNA appears to follow a similar course of events, except that, as in yeast, the 5' ETS is removed by a single cleavage (169). The lack of an in uitro system for yeast that supports processing at sites Al-C3 has severely hampered identification of the cis-acting elements required for these cleavages. We have now developed a system allowing mutational analysis of yeast rDNA in uiuo that should enable us to delineate these
114
H. A. RAUd AND R. J. PLANTA
elements (35, 170, 171). The system consists of a complete rDNA unit in which both the 17-S and 26-S rRNA genes have been “tagged” by insertion of an oligonucleotide having a unique sequence. The rDNA unit is transformed into yeast cells on a centromeric plasmid to preclude recombination with the chromosomal rDNA repeats. The plasmid-derived transcripts can easily be detected by Northern hybridization using probes complementary to the tags. The presence of the tags, inserted into one of the variable segments of each of the genes (172), in itself does not affect ribosome biogenesis or function. In a first series of experiments, we studied the effect of deletions in the 5’ ETS as well as both ITS regions on the biogenesis of 1 7 4 and 2 6 4 rRNAs. As far as the 5’ ETS is concerned, we found that deletion of XlO-4OO bp from the 5’ end, the 3’ end, or the middle of this region completely abolished formation of mature (tagged) 1 7 4 rRNA. However, accumulation of tagged 26-S rRNA, assembled into functional 604 subunits, remained unaffected even when all but 75 of the 699 bp of the 5’ ETS were deleted (171).This result clearly demonstrates that the presence of the 5’ ETS in cis is essential for the formation of the 40-S ribosomal subunit, whereas it appears to be dispensable for 6 0 3 subunit formation. Deletion of about two-thirds of the 17-S gene also did not affect 6 0 4 subunit formation (35),a surprising result in view of the fact that the first steps in yeast rRNA processing occur at the level of a 90-S ribonucleoprotein particle containing the complete Pol-I transcript (173; see Section 111,D). Deletion of a 160-bp region from ITS1 centered on processing site A2 again abolished accumulation of mature (tagged) 1 7 4 rRNA, while leaving the formation of 60-S subunits from tagged 26-S rRNA unaffected (171). This indicates that cleavage at site A2 is a crucial step in the formation of 40-S subunits. Cells containing the mutant rDNA unit accumulated an unusual 3 2 4 precursor that must have arisen from the processing of 37-S pre-rRNA at site A l . In an snR10.3 yeast mutant, in which the gene encoding snRlO RNA (sn = small nuclear) has been destroyed, processing at A2 is blocked (174).Instead, cleavage at site B1 is used to separate the 1 7 4 rRNA and 5.8Sl26-S rRNA sequences. We assume that in our case also the unusual 3 2 4 precursor is cleaved at B2, resulting in the formation of a normal processing intermediate (29-SB; cf. Fig. 5) en route to mature 26-5 rRNA and 504 subunits. We could not detect the 21-S precursor of 17-S rRNA that should be the other product of this cleavage. Therefore, we infer that this precursor is not an effective substrate for 4 0 4 subunit formation and is rapidly degraded. ITS2 has been called a pseudointron because it separates the 5.8-S RNA, which is homologous to the 5’ terminal 150 nucleotides of prokaryotic 23-S rRNA, from the 26-S rRNA sequence, equivalent to the rest of the 23-S
RIBOSOME BIOGENESIS IN YEAST
115
molecule (172).However, ITS2 must play a role at some stage of 60-S subunit formation since its complete deletion blocked rRNA processing at the stage of the 29-SB precursor equivalent, which is then degraded (171).The same result was obtained when ITS2 was only partially deleted, which shows that correct removal of ITS2 is crucial for 6 0 4 subunit formation. Since these experiments were carried out using an rDNA unit in which only the 2 6 4 rRNA gene contained a tag, we do not know the effect of the ITS2 deletions on 4 0 4 subunit biogenesis. Apart from the involvement, demonstrated by genetic analysis, of snRlO RNA and the RNA82 gene product, evidence has recently become available for involvement of three further trans-acting factors in yeast rRNA processing. Two of these factors also are snRNAs. The snR128 (U14) species appears to be required for 3 7 3 pre-rRNA processing, because its depletion causes severe underaccumulation of mature 1 7 4 rRNA and its 18-S precursor (175). Most likely, snR128 plays a role in processing at A1 or A2, although an indirect effect of this factor on rRNA maturation cannot be excluded. In contrast to the snR1O gene, the gene encoding snR128 is essential for cell viability ( 1 76). Yeast snR17 RNA is homologous to metazoan U3 snRNA and is associated in uiuo with 37-S pre-rRNA (174).Again, its gene is essential (177).While at present there is no direct evidence for a role of snR17 RNA in yeast rRNA processing, recent in uitro experiments strongly support the notion that its mammalian counterpart is involved in the first processing step that removes the 5'-terminal portion of the 5' ETS from the primary rRNA transcript (178). The in uiuo crosslinking of U 3 snRNA to the 5' ETS of both human (179) and rat (180) pre-rRNA constitutes further support for this role. It is unclear whether the yeast U 3 homolog fulfills the same function, because no equivalent cleavage within the 5' ETS of yeast 374 pre-rRNA has as yet been detected. Nine additional yeast snRNAs have been implicated in rRNA processing by virtue of their probable nucleolar localization and association with one or more precursor rRNA species (174, 176). A further trans-acting factor important for yeast pre-rRNA processing is the product of the RRPl gene, essential for cell viability (181, 182). In cells carrying the rrpl-1 temperature-sensitive mutation, processing of the 29-SB precursor appears to be blocked at 37"C, indicating the protein to be involved, directly or indirectly, in the cleavage at site C1. The precursor is subsequently degraded, in agreement with our studies of the rDNA units containing a partial deletion of ITS2. Remarkably, the same mutation also slows down the conversion of the 18-S precursor to 17-S rRNA (181). Since this conversion occurs in the cytoplasm, whereas processing of 29-SB prerRNA is a nucle(o1)arevent, it seems most likely that the RRP1 gene product affects processing indirectly, possibly because it is responsible for modifica-
116
H. A. RAU6 AND R . J . PLANTA
tion(s) in the rRNAs that play a role in this process. The fact that the rrpl-1 mutation also causes hypersensitivity to aminoglycoside antibiotics, while the gene does not seem to encode a ribosomal protein (181),supports this hypothesis. Many mutations affecting the sensitivity of ribosomes toward aminoglycosides have been mapped to 163 rRNA of E . coli and chloroplasts (183).The rrpl-1 mutation, but not a disruption of the RRPl gene, is (partially) suppressed by a second-site mutation in a locus on chromosome 111 designated SRD1 (182). Thus, either the SRDl protein interacts directly with the RRPl gene product or regulates the expression of this gene (182). The RRPl gene has been sequenced, but the deduced amino-acid sequence of the protein gives no clue to its function (182). Although it is likely that several r-proteins play a role in rRNA processing, only in the case of S37 has any evidence for such a role been reported. Lack of this r-protein retards the conversion of 20-S pre-rRNA to mature 17S rRNA (84; see also Section 111,D).
B. Modification of Pre-rRNA A second type of structural alteration introduced during maturation of pre-rRNA consists of the modification of nucleotides located at specific sites in the chain (172, 184). Most of these modifications concern 2‘-O-methylation of the ribose moiety. At a few positions, methyl groups are linked to the base. As a consequence of this modification, mature yeast 17-S and 2 6 4 rRNAs contain 22 and 43 methylated nucleotides, respectively. In addition, one hypermodified nucleotide, identified as l-methyl-2-(a-amino-a-carboxypropy1)pseudouridine (mx), is present in 17-S rRNA. Furthermore, both 1 7 4 and 2 6 4 rRNAs carry a large number of pseudouridine residues (184, 185), most of which, however, have yet to be assigned a location.N As discussed in more detail elsewhere (173), methylation starts at the level of 37-S pre-rRNA. However, most of the methyl groups are introduced at the level of the individual 18-S and 294 rRNA precursors. Modification takes place in the nucle(o)lus, except for the formation of the two 6-dimethyladenine residues near the 3’ end of the 17-S rRNA and the completion of mX, both of which occur in the cytoplasm shortly before conversion of the 18-S rRNA precursor into mature 1 7 4 rRNA. Virtually nothing is known as yet about the enzymes involved in yeast pre-rRNA modification or the way in which the specific residues to be modified are selected. The same holds true where the role of modification in the biogenesis of ribosomes is concerned. A more detailed discussion of possible roles of individual modified nucleotides in ribosome formation and function can be found in 172. 8 See “The Numerous Modified Nucleotides in Eukaryotic Ribosomal RNA,” by 9. E. H. Maden, in Vol. 39 of this series. [Eds.]
RIBOSOME BIOGENESIS IN YEAST
117
C. Modification of Ribosomal Proteins Several yeast ribosomal proteins are modified post-translationally, either by methylation (186)or by phosphorylation (187,188). Four methylated r-proteins (S31, S32, L15, and L41) have been detected. The biological relevance of this methylation is not known. Interestingly, however, we have shown L15 to be the yeast equivalent of EL11 (189),one of the most strongly methylated E. cold ribosomal proteins (190).The two proteins are part of the ribosomal GTPase center and appear to play a role in numerous ribosomal functions (189,191). Two phosphorylated species (S2 and S10) are present among the small subunit r-proteins of yeast, while the large subunit contains at least five phosphoproteins (L9, L30, L44, L44',and L45). Of these proteins, only S10 and the three structurally highly related acidic proteins L44, L44', and L45 have been subjected to functional analysis. S10 is the yeast homolog of the mammalian ribosomal phosphoprotein S6 (192).As in other eukaryotes, phosphorylation of S10, which can occur at either (or both) of two adjoining serine residues close to its carboxy terminus, is reversible and depends on the physiological conditions of the cells (see 193 for references). Attachment of the phosphate group(s) is due to the action of CAMP-dependent protein kinase (BCYl), while the product of the PPDl gene, encoding one of the three yeast phosphoprotein phosphatases, seems to be involved in dephosphorylation of S10 (194).Despite the effect of varying growth conditions on the degree of phosphorylation of S10, destruction of the phosphorylation sites of this protein does not lead to detectable growth impairment (193).Apparently, phosphorylation of S10 plays no role in either ribosome assembly or function in yeast. Yeast r-proteins L44, L44', and L45 are the structural and functional equivalents of E. coli r-proteins L7/L12 (93).As mentioned above, these species are the only yeast r-proteins for which a significant pool of free protein is present in the cytoplasm (156-158).Whereas the protein molecules bound to the ribosome are monophosphorylated, the free cytoplasmic molecules carry no phosphate groups (156).In uitro reconstitution experiments demonstrate that the affinity of L44, L44', and L45 for their ribosome binding site is reduced to at least a fourth, and probably considerably more, by dephosphorylation (156).The significance of these observations so far remains unclear. However, there are several indications that the acidic protein content of yeast ribosomes changes both qualitatively and quantitatively in response to changes in the physiological conditions of the cells, the phase of the cell cycle, and even the functional state of the ribosome. Concerning the last of these points, polysomes contain about twice the amount of acidic proteins as do free 804 particles (156, 195). Phosphorylation/dephos-
118
H. A. RAUt AND R. J . PLANTA
phorylation of these proteins might play a role in this process, which could affect the activity of the ribosomes. Gene disruption experiments have shown that L44 is dispensable for cell growth, whereas lack of L45 retards, but does not block, growth. However, cells carrying a disruption of both of these genes are not viable (196). In most cases in which both the nucleotide sequence of the gene(s) and the (amino-terminal) sequence of the r-protein have been determined, the two sets of data are in agreement, except for the occasional lack of the aminoterminal methionine residue. However, two instances of disagreement do exist (91). These concern r-proteins S31 and L15, where the published aminoterminal amino-acid sequence starts at 14 and 16 codons downstream from the AUG start codon, respectively. While this discrepancy might indicate posttranslational removal of an amino-terminal pre-sequence, preliminary results make it more likely that the amino terminus of the two mature r-proteins is blocked (P. J. Schaap, J. van’t Riet and H. A. RauC, unpublished; J. P. G . Ballesta, personal communication). The published amino-terminal sequences probably came from a degradation product.
D. Nucleocytoplasmic Transport and Assembly Ribosome biogenesis in yeast, as in other eukaryotes, entails intensive two-way traffic across the nuclear double membrane, all of which passes through the nuclear pores. The rp-mRNAs must be exported to the cytoplasm in order to be translated into the r-proteins, which then move back to the nucle(o1)us to be assembled with the pre-rRNAs. Finally, the nearly completed subunits relocate to the cytoplasm. So far, relatively little is known about the first and last of these three processes. Nuclear import of (r-)proteins, however, has recently enjoyed increasing interest (see 197 and 198 for reviews), leading to the identification of the cis-acting nuclear localization signal@)(NLS) of several yeast r-proteins. Progress is also being made with respect to the identification of the components of the import machinery and the general mechanism of nuclear protein import in yeast (199-20]) NLSs offour different yeast r-proteins (L3,L29, L25, and S31) have been characterized by fusing various parts of their coding regions to the E. coli gene for P-galactosidase (202-205). The latter protein is a convenient reporter, since it normally remains in the cytoplasm when expressed in yeast cells. The intracellular distribution of the fusion protein is analyzed by indirect immunofluorescence, using mouse antibodies against P-galactosidase followed by mouse anti-IgG coupled to a fluorescent marker. Both standard immunofluorescence microscopy and “confocal” laser scanning microscopy have been used. The latter technique has the advantage of being able to distinguish between perinuclear and intranuclear localization, since it allows
119
RIBOSOME BIOCENESIS IN YEAST
viewing of the cell in a series of consecutive “optical slices” (204). Several points should be kept in mind, however, when trying to identify an NLS by means of fusion proteins. The first is that the NLS may be “masked” by the conformation of the fusion protein. This problem can be solved by introducing a linker peptide between the reporter and the sequence to be analyzed (202). The second point is that the efficiency of an NLS may depend on the nature of the reporter even to the extent that no nuclear localization is observed with one, and full nuclear localization with another, reporter protein (206). A third problem, which so far has not materialized, however, is that a specific sequence might be identified as an NLS even though in its natural context it does not serve that function. The first 21 amino acids of L3 were found to cause nuclear localization of an L3P-galactosidase fusion (202), demonstrating the presence of an NLS in this region of the protein. In L29, two independent NLSs were identified, consisting of amino acids 6-13 and 23-29, respectively (203).The two signals are identical in five of the seven residues (Table 11). Yeast r-protein L25 also contains two NLSs, as does S31 (204, 205). The first NLS of L25 is located in the region encompassing amino acids 1-17. Residues 11-17 contain an essential part of this signal, but by themselves constitute a weaker NLS. The second NLS of L25 is located between amino acids 17 and 62, with its most important part lying between residues 17 and TABLE I1 NUCLEARLOCALIZATION SIGNALS(NLS) OF A NUMBEROF KARYOPHILIC YEAST PROTEINSO Protein
Amino-acid sequence of NLS
Reference
r-proteins
L3
ISHRKY E APRHGHLGFLPRKRA”
L25 L29 S31
I ( M)PPSAKATAAKKAWKKGL7
Others MATa2 H2B GAL4 SV40-T
fiKTRKHRG13 l2AKAAALAGGKKSKKKWSKKSMKDRS IMNKIPIKDLLNPQl3 =TSTSTDGKkRSx3
lsTN$KKALKVR‘P =KHRKHRGZ8
202 204 203 205
lMKLLSSlEQACDLCRLKKLKCSKEKPKCAZg
197 208 206
12fiMPKkKRKV13z
21 0
aThe amino-acid sequences (in one-letter symbols) of the regions of various yeast proteins that cause nuclear localization when fused to a reporter protein (usually P-galactosidase) are shown. It should be noted that the minimal sequence required for NLS function has not been determined in all cases. Numbers indicate the position of the sequence in the original protein. Basic amino acids are displayed in boldface type. Residues identified by an asterisk are crucial for NLS function by mutational analysis. For comparison, the prototypic SV40 large-T-antigen NLS is shown at the bottom. This metazoan NLS is fully functional in yeast cells (204, 206).
,
120
1-1. A. RAUR AND R. J. PLANTA
28. One of the two NLSs of S31 has been traced to amino acids 12-35. The other is located around amino-acid 85 (205). Even from this limited set of data, it is clear that yeast cells do not depend on passive diffusion for import of their r-proteins into the nucleus, even though the size of these proteins is considerably smaller than the cutoff value of 60-70 kDa, below which proteins can pass more or less freely through the nuclear pores (197, 198). Still, not all yeast r-proteins may contain an NLS. First, there is evidence that at least some of the large subunit r-proteins assemble in the cytoplasm and thus require no NLS (see below). Second, we have observed that an S24/P-galactosidase fusion is imported into the yeast nucleus, but have been unable to find any smaller region of S24 that will support nuclear localization of a fusion protein (P. J. Schaap, J. van’t Riet and H. A. RauC, unpublished). Either the NLS of S24 is composed of residues far apart in the primary sequence that are brought together by folding of the protein, or S24 is imported by associating with (an)other protein(s), most likely an r-protein, that does contain an NLS. The first possibility seems less likely in view of the fact that NLSs so far identified generally consist of a small number of consecutive amino acids (197, 198). The second mechanism, however, which has been called “piggy-back import” does operate in vertebrate cells, as shown by the nuclear import of a pentameric assembly of X . Zaeuis nucleoplasmin molecules, in which only one of these molecules carries an NLS (207). Similarly, yeast histone H2B, containing a defective NLS, is probably imported into the nucleus through its association with histone H2A (208). The dependence of cdc2 protein on cdcl3 for nuclear import in Schizosaccharomyces pombe (209)may be a naturally occurring example of piggy-backing. Table I1 shows a comparison of the NLSs of the various yeast r-proteins so far characterized. Apart from the fact that they all contain a number of basic amino acids, no common features can be discerned. Nevertheless, the presence of a cluster of basic amino acids by itself does not constitute an NLS, as demonstrated by the failure of portions of r-proteins containing such clusters to cause nuclear import of a fusion protein (P. J. Schaap, J. van’t Riet and H. A. Raub, unpublished). Mutational analysis (203, 208) has identified single residues crucial for the function of the histone H2B and L29 NLSs (Table II), but these data still do not resolve the question as to what precisely does and what does not constitute a yeast NLS. It is not clear why the majority of the yeast r-proteins analyzed carry two NLSs. The presence of multiple karyophilic signals has been shown to enhance the efficiency of nuclear import in vertebrate cells (197, 198). However, the N-distal signal of L29 appears not to be required for efficient nuclear import of the protein, since its inactivation does not affect the growth rate of yeast cells. On the other hand, inactivation of both NLSs of
RIBOSOME BIOCENESIS IN YEAST
121
L29 has a more severe effect on growth rate than inactivation of the Nproximal one alone (203).This shows that the N-distal signal in its natural context is able to interact with the nuclear import machinery and, thus, is likely to play a functional role. Relatively little is known about the order in which the various yeast r-proteins are assembled into pre-ribosomal particles. Kinetic labeling studies in our laboratory (211)have identified 17 large subunit and 7 small subunit r-proteins that appear to assemble at a relatively late stage. Four members of the first group (L7, L9, L24, and L30) are almost completely absent from nuclear 6 6 4 precursor particles of the large subunit, indicating that they are assembled in the cytoplasm. The same may hold true for the acidic proteins L44, L14’, and L15 (158). On the other hand, seven of the large subunit r-proteins (L10, L12, L20-22, L25, and L37) appear to associate with the 3 7 3 pre-rRNA. Among these is r-protein L25, which binds specifically to a region within domain I11 of the 2 6 4 rRNA (212).The kinetic labeling data are in general agreement with results obtained by controlled dissociation of 60-S subunits with the aid of NH,CI (213), LiCl (214), or treatment with dimethylmaleic anhydride (215).The r-proteins remaining in the core particles produced by these methods predominantly belong to the class of “earlyassembling” species, as identified by kinetic labeling, although only L25 is common to all three types of particle. The compiled data, however, are far from sufficient to establish a precise order of assembly of the various r-proteins in yeast. Analysis at the molecular level of rRNA/r-protein and r-protein/r-protein interactions, which constitute the basis for ribosome assembly, is just beginning in eukaryotes. We have identified a number of large subunit r-proteins in yeast that are likely to interact directly with yeast 26-S rRNA (214, 216). Two of these proteins (L15 and L25) were positively characterized as “primary rRNA-binding proteins,” and their respective binding sites on the 2 6 4 rRNA have been analyzed in detail (189, 212). In both cases, a homologous protein is present in E. coli ribosomes that binds to the corresponding, structurally conserved, site in 23-S rRNA. Moreover, the homologous yeast and eubacterial r-proteins are able to recognize each other’s binding site (reviewed in 183). Thus, at least some of the basic interactions between ribosomal constituents have been conserved during evolution even from proto eukaryotic cells. We have also been able to limit the region in L25 that is essential for its interaction with rRNA to the carboxy-terminal80 (out of 142) amino acids (217).The only other studies of comparable depth are those on the yeast 5-S rRNA binding r-protein Lla, most of which have been carried out by Nazar’s group (reviewed in 218). Six yeast r-proteins (L14, L19, L21, L29, and L33) that bind individually to 5.8-S rRNA (218) have been identified. All of these belong to the class of
122
H. A. €MU$ AND R. J . PLANTA
early-assembling proteins, L21 being one of those assembling with the 37-S pre-rRNA. Finally, the first 64 (out of 135) amino acids of rp51B are sufficient to ensure incorporation of an rp51B/P-galactosidase fusion protein into biological active 40-S particles (219).Such assembly, however, only occurred when the rp51A gene had been destroyed. Thus, the carboxy-terminal region of the rp51 protein contains information that increases the efficiency of its assembly. In a similar way, it was established that the carboxy-terminal 14 amino acids of L3 are dispensable for its assembly into 60-S subunits (202). Interestingly, assembly of the 40-S subunit is facilitated by the fusion of r-protein S37 to ubiquitin. This was concluded from the observation that a mutant in which the UB13 gene encoding the ubiquitin-S37 fusion protein had been disrupted could not be fully complemented by a single copy of a gene encoding only S37. Instead, multiple copies of such a gene were required to restore the growth of the mutant cells to the wild-type value (84). Thus, the covalent attachment of ubiquitin to the amino terminus of S37 either promotes incorporation of this protein into pre-ribosomal particles or serves to protect S37 from premature degradation. As already mentioned (Section III,A), the presence of S37 in the precursor of the 40-S subunit facilitates the final processing step in the biogenesis of 17-S rRNA (84). It is not clear at present whether the fusion of the large subunit r-protein to uniquitin in the UBll and UB12 genes serves the same purpose as the ubiquitin fusion of S37, since no complementation experiments using single copies of a gene encoding only the r-protein tail of the UBIl or UB12 fusions have been carried out (84). One further point of interest in this context is the observation that introduction of the gene encoding mouse r-protein L27’ into yeast cells carrying an inactivated copy of the L29 gene restored viability as well as cycloheximide sensitivity (220). This suggests that yeast ribosomes containing the mouse protein are functional. The large degree of structural similarity between yeast and mammalian r-proteins (91)makes it likely that further examples of such hnctional homology will be uncovered. As mentioned above, association between r-proteins and rRNA initiates at the level of the 37-S pre-rRNA and, as in other eukaryotes (221),is likely to start even before the primary rRNA transcript has been completed. The first pre-ribosomal assembly detectable in yeast nuclei is a 9O-S particle containing 37-S pre-rRNA (222).The 5-S rRNA may also already be present in this particle (223). The 90-S particle is converted into 66-S and 43-S particles by rRNA processing at sites A1 and A2 (cf. Fig. 5). The former, containing 29-S pre-rRNA and 5-S rRNA (223), remains in the nucleus for further processing. The latter, however, is rapidly exported to the cytoplasm,
RIBOSOME BIOGENESIS IN YEAST
123
where the final steps in maturation to 40-S subunits take place. These include removal of the remaining spacer sequences attached to the 3' end of the 1 7 4 rRNA (see Section 111,A). Processing of the large subunit rRNA is completed in the nucleus. Completion of the 60-S subunits by assembly of a final set of r-proteins also appears to take place in the cytoplasm (see above). Both the 90-S and 6 6 4 but not the 43-S, particles have a boyant density below that of the mature subunits, indicating a higher protein/rRNA ratio (222). Apparently, these particles contain a considerable number of non-ribosomal proteins that are somehow involved in ribosome biogenesis. The identity of these proteins has not been established. Obvious candidates are the enzymes involved in pre-rRNA processing and modification, and the proteins of the snRNP particles implicated in processing (see Section 111,A). Despite their common origin, the production of 40-S and 60-S subunits is not coordinated at the level of assembly (see also Section 111,A). When the supply of either a 60-S or 40-S r-protein is diminished, the production of the corresponding mature subunit decreases without a concomitant decrease in the accumulation of the other subunit (84, 89, 149). So far, eight yeast r-proteins (L3, L16, L25, L29, rp51, rp59, S10, and the unidentified large subunit species encoded by the U B l l and UBZ2 genes) are essential (84,89, 149, 159, 217) and five (L30,L44,L45, L46, and S37) are dispensable to a greater or lesser extent (84, 196, 224; B. Baronas-Lowell and J. R. Warner, personal communication). This is in stark contrast to the situation in E. coli, in which the lack of any one of at least 16 r-proteins is tolerated, albeit usually with considerable deleterious effects on the well-being of the mutant cells (225). Very little information is available on the mechanism causing export of ribosomal subunits to the cytoplasm. Injection of ribosomal subunits into Xenopus oocytes revealed export to be saturable, suggesting a carrier-mediated process (226,227).The nature of the determinant(s) responsible for this export remains to be elucidated, although it seems likely that parts of the rRNA are important in this respect (228).
ACKNOWLEDGMENTS We thank all colleagues who supplied us with reprints and preprints of their papers and who gave us permission to quote unpublished results. The work carried out in our laboratory was supported by The Netherlands Organization for Chemical Research with financial aid from The Netherlands Organization for Scientific Research and by the Program Committee for Industrial Biotechnology with financial aid from The Netherlands Ministry for Economic Affairs.
124
H . A. RAW& AND R . J . PLANTA
REFERENCES 1. R. L. Gourse, R. A. Sharrock and M. Nomura, in “Structure, Function and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), pp. 766-788. Springer-Verlag, New York, 1985. 2. J. R . Cole, M. Yamagishi and M. Nomura, in “Genetics ofTranslation” (M. F. Tuite, M . Picard and M. Bolotin-Fukuhara. eds.), pp. 55-82. Springer-Verlag, New York, 1988. 3. M. Nomura and W. A. Held, in “Ribosomes” (M. Nomura, A. Tissih-es and P. Lengyel, eds.), pp. 193-224. CSHLab, Cold Spring Harbor, New York, 1974. 4. M. Herold and K. H. Nierhaus, JBC 262, 8826 (1987). 5. V. Nowotny and K. H. Nierhaus, Bchern 27, 7051 (1988). 6. V. Nowotny and K. H. Nierhaus, PNAS 79, 7238 (1982). 7 . T. D. Petes and D. Botstein, PNAS 74, 5091 (1977). 8 . J. H. Meyerink, Ph. D. thesis, Vrije Universiteit, Amsterdam, 1979. 9. C. A. F. M. van der Sande, T.Kulkens, A. B. Kramer, I. J. de Wijs, H. van Heerikhuizen, J. Klootwijk and R. J. Planta, NARes 17, 9127 (1989). 10. P. W. Piper, J. A. Bellatin and A. Lockheart, EMBO J . 2, 353 (1983). 11. M. E. McMahon, D. Stamenkovich and T. D. Petes, NARes 12, 8001 (1984). 12. M. P. Verbeet, J. Klootwijk, H. van Heerikhuizen, R. D. Fontijn, E. Vreugdenhil and R. J. Planta, NARes 12, 1137 (1984). 13. J. Klootwijk, P. de Jonge and R. J. Planta, NARes 6, 27 (1979). 14. N. Nikolaev, 0. I. Georgiev, P. V. Venkov and A. A. Hadjiolov, J M B 127, 297 (1979). 15. A. E. Veenstra, Ph.D. thesis, Vrije Universiteit, Amsterdam, 1987. 16. D. Lohr and G. I. Ide,JBC 258, 4668 (1983). 17. J. Klootwijk, M. P. Verbeet, G. M. Veldman, V. C. H. F. de Regt, H. van Heerikhuizen, J. Bogerd and R. J. Planta, NARes 12, 1377 (1984). 18. D. L. Riggs and M. Nomura, JBC 265, 7596 (1990). 19. E. A. Elion and J. R. Warner, Cell 39, 663 (1984). 20. A. E. Kempers-Veenstra, H. van Heerikhuizen, W. Musters, J. Klootwijk and R.J. Planta, EMBO J . 3, 1377 (1984). 21. R. V. Quincey and R. E . Arnold, BJ 224, 497 (1984). 22. E. A. Elion and J. R. Warner, MCBiol6, 2089 (1986). 23. R . Mestel, M. Yip, J. P. Holland, E. Wang, J. Kang and M. J. Holland, MCBiol9, 1243 (1989). 24. A. E. Kempers-Veenstra, W. Musters, A. F. Dekker, J. Klootwijk and R. J. Planta, Curr. Genet. 10, 253 (1985). 25. W. Musters, J. Knol, P. Maas, A. F. Dekker, H. van Heerikhuizen and R. J. Planta, NARes 17, 9661 (1989). 26. 8. Sollner-Webb and J. Tower, ARB 55, 801 (1986). 27. M. L. Schmitz, U.-G. Maier, J. W. S . Brown and G. Feix, JBC 264, 1467 (1989). 28. K. Nandabalan and J. D. Padayatty, BBRC 160, 1117 (1989). 29. Y. Mishima, I. Financsek, R. Kominami and M. Muramatsu, NARes 10, 6659 (1982). 30. J. J. Windle and B. Sollner-Webb, MCBiol6, 4584 (1986). 31. R. H. Reeder, D. Pennock, B. McStay, J. Roan, E. Tolentino and P. Walker, NARes 15, 7429 (1987). 32. T. Kulkens, H. van Heerikhuizen, J. Klootwijk, J. Oliemans and R . J. Planta, Curr. Genet. 16, 351 (1989). 33. S . P. Johnson and J. R. Warner, MCBiol 9, 4986 (1989). 34. B. E. Morrow, S. P. Johnson and J. R. Warner, JBC 264, 9061 (1989).
RIBOSOME BIOGENESIS IN YEAST
125
35. W. Musters, J. Venema, G. van der Linden, H. van Heerikhuizen, J. Klootwijk and R. J. Planta, MCBiol 9, 551 (1989). 36. K. Volkel-Meiman, R . L. Keil and G. S. Roeder, Cell 48, 1071 (1987). 37. M. F. Christman, F. S. Dietrich and G . R. Fink, Cell 55, 413 (1988). 38. S. Gottlieb and R. Easton-Esposito, Cell 56, 771 (1989). 39. 8. J. Brewer and W. L. Fangman, Cell 55, 637 (1988). 40. A. E. Kempers-Veenstra, J. Oliemans, H. Offenberg, A. F. Dekker, P. W. Piper, R. J. Planta and J. Klootwijk, EMBO J . 5, 2703 (1986). 41. M . Dunaway and P. Droge, Nature 341, 657 (1989). 42. H. P. Miiller, J. M. Sogo and W. Schaher, Cell 58, 767 (1989). 43. D. I. Chasman, N. F. Lue, A. R. Buchman, J. W. LaPointe, Y. h r c h and R. D. Kornberg, Genes Dew. 4, 503 (1990). 44. H. Wang, P. R. Nicholson and D. J. Stillman, MCBiol 10, 1743 (1990). 45. M . J. Fedor, N. F. Lue and R. D. Kornberg, J M B 204, 109 (1988). 46. G. M . Veldman, J. Klootwijk, P. de Jonge, R. J. Leer and R. J. Planta, NARes 8, 5179 (1980). 47. M. T. Yip and M. J. Holland, JBC 264, 4045 (1989). 48. I. Grummt, H. Rosenbauer, I. Niedermeyer, U. Maier and A. Ohrlein, Cell 45, 837 (1986). 49. P. Labhart and R. H. Reeder, MCBiol 7 , 1900 (1987). 50. I. Bartsch, C. Schoneberg and I. Grummt, MCBiol 7 , 2521 (1987). 51. K. Pfeifer, T. Prezant and L. Guarente, Cell 49, 19 (1987). 52. R. P. Fisher and D. A. Clayton, MCBiol8, 3496 (1988). 53. L. C. Garg, A. Dixit and S. T. Jacob, JBC 264, 220 (1989). 54. I. Grummt, A. Kuhn, I. Bartsch and H. Rosenbauer, Cell 47, 901 (1986). 55. K. A. Parker and U. Bond, MCBiol9, 2500 (1989). 56. B. McStay and R. H. Reeder, Cell 47, 913 (1986). 57. S . Firek, C. Read, D. R . Smith and T. Moss, MCBiol 9, 3777 (1989). 58. R. Lucchini and R. H. Reeder, NARes 17, 373 (1989). 59. E. Bateman and M. R. Paule, Cell 54,985 (1988). 60. P. Labhart and R. Reeder, PNAS 86, 3155 (1989). 61. 0. MaaLe and N. 0. Kjeldgaard, “Control of Macromolecular Synthesis.” Benjamin, New York, 1966. 62. R. J. Planta and W. H. Mager, in “The Cell Nucleus” (H. Bush and L. Rothblum, eds.), Vol. XII, pp. 213-226. Academic Press, New York, 1982. 63. C. Waldron and F. Lacroute, J . Bad. 122, 855 (1975). 64. D. R. Kief and J. R. Warner, MCBiol 1, 1007 (1981). 65. J. R. Warner, Microbid. Reu. 53, 256 (1989). 66. A. R . Buchman, N . F. Lue and R. D. Kornberg, MCBiol8, 5086 (1988). 67. J. F.-X. Hofmann, T. Laroche. A. H. Brand and S. M. Gasser, Cell 57, 725 (1989). 68. C. Mann, I.-M. Buhler, I. Treich and A. Sentenac, Cell 48, 627 (1987). 69. R. W. Shulman, C. E. Sripati and J. R. Warner, JBC 252, 1344 (1977). 70. R. W. Shulman and J. R. Warner, MGG 161, 221 (1978). 71. L. Waltschewa, 0. Georgiev and P. Venkov, Cell 33, 221 (1983). 72. L. M. Veinot-Ihebot, R. A. Singer and G . C. Johnston, JBC 264, 19473 (1989). 73. M. H. Herruer, W. H. Mager, H. A. RauB, P. Vreken, E. Wilms and R. J. Planta, NARes 16, 7917 (1988). 74. J. Segall, JBC 261, 11578 (1986). 75. G . A. Kassavetis, B. R. Braun, L. H. Nguyen and E. P. Geidushek, Cell 60, 235 (1990).
126
€3. A. RAUk AND R. J . PLANTA
76. D. A. Brow and E. P. Geidushek, JBC 262, 13953 (1987). 77. E. P. Geidushek and G. P. Tocchini-Valentini, ARB 57, 873 (1988). 78. R. Gudenus, S. Mariotte, A. Moenne, A. Ruet, S. Memet, J.-M. Buhler, A. Sentenac and P. Thuriaux, Genetics 119, 517 (1988). 79. R. J. Leer, M. M. van Raamsdonk-Duin, P. Kraakman, W. H. Mager and R. J. Planta, NARes 13, 701 (1985). 80. M. D. Dabeva and J. R. Warner, JBC 262, 16055 (1987). 81. C. M. T. Molenaar, L. P. Woudt, A. E. M. Jansen, W. H. Mager, R. J. Planta, D. M. Donovan and N. J. Pearson, NARes 12, 7345 (1984). 82. S. M. Papciak and N. J. Pearson, Curr. Genet. 11, 445 (1987). 83. E. Ozkaynak, D. Finley, M. J. Solomon and A. Varshavsky, EMBO J . 6, 1429 (1987). 84. D. Finley, B. Bartel and A. Varshavsky, Nature 338, 394 (1989). 85. K. L. Redman and M. Rechsteiner, Nature 338, 438 (1989). 86. N. Abovich and M. Rosbash, MCBiol4, 1871 (1984). 87. R. J. Leer, M . M. C. van Raamsdonk-Duin, W. H. Mager and R . J. Planta, FEBS Lett. 175, 371 (1984).
88. R. J. Leer, M. M. vanRaamsdonk-Duin, C. M.T. Molenaar, H. M. A. Witsenboer, W. H. Mager and R. J. Planta, NARes 13, 5027 (1985). 89. M . 0. Rotenberg, M. Moritz and J. L. Woolford, Jr., Genes Dew. 2, 160 (1988). 90. A. Lucioli, C. Presutti, S. CiafrB, E. CafFarelli, P. Fragapane and I. Bozzoni, MCBiol8, 4792 (1988).
91. H. A. Rau6, W. H. Mager and R. J. Planta, in “Methods in Enzymology” (Christine Guthrie and Gerald R. Fink, eds.), Vol. 194, p. 453. Academic Press, San Diego, 1991. 92. R. J. Leer, M. M . C. van Raamsdonk-Duin, M. J. M. Hagendoorn, W. H. Mager and R. J. Planta, NARes 12, 6685 (1984). 93. M . Remacha, M. T. Saenz-Robles, M. D. Vilella and J. P. G. Ballesta, JBC 263, 9094 (1988).
K. Mitsui and K. Tsurugi, NARes 16, 3574 (1988). P. Grant, D. Schindler and J. E. Davies, Genetics 83, 667 (1976). P. Grant, L. Shchez and A. JimBnez, J . Bact. 120, 1308 (1974). D. C. Hawthorne and R. K. Mortimer, Genetics 60, 735 (1960). M. Remacha, L. Ramirez, I. Marin and J. P. G. Ballesta, Curs. Genet. 17, 535 (1990). J. L. Woolford, Yeast 5, 439 (1989). M. D. Dabeva, M. A. Post-Beittenmiller and J. R. Warner, PNAS 83, 5854 (1986). 101. C. H. Kim and J. R. Warner, J M B 165, 79 (1983). 102. J. L. Teem, N. Abovich, N. F. Kaufer, W. F. Schwindinger, J. R. Warner, A. Levy, J. Woolford, R. J. Leer, M. M. C. van Raamsdonk-Duin, W. H. Mager, R. J. Planta, L. Schultz, J. D. Friesen, H. Fried and M. Rosbash, NARes 12, 8295 (1984). 103. R. J. Leer, M. M. C. van Raamsdonk-Duin, W. H. Mager and R. J. Planta, Curr. Genet.
94. 95. 96. 97. 98. 99. 100.
104. 105. 106. 107. 108.
9, 273 (1985). M. 0. Rotenberg and J. L. Woolford, Jr., MCBiol6, 674 (1986).
L. P. Woudt, A. B. Smit, W. H. Mager and R. J. Planta, E M B O J . 5, 1037 (1986). J. C. Larkin, J. R. Thompson and J. L. Woolford, Jr., MCBiol7, 1764 (1987). W. F. Schwindinger and J. R. Warner, JBC 262, 5690 (1987). L. S. Kraakman, W. H. Mager, C. T. C. Maurer, R. T. M. Nieuwint and R. J. Planta, NARes 17, 9693 (1989). 109. K. Struhl, PNAS 82, 8419 (1985). 110. R. J. Planta and H. A. Rau6, Trends Genet. 4, 64 (1988). 111. J. Huet, P. Cottrelle, M. Cool, M.-L. Vignais, D. Thiele, C. Marck, J.-M. Buhler, A. Sentenac and P. Fromageot, EMBO J . 4, 3539 (1985).
RIBOSOME BIOCENESIS IN YEAST
127
112. J. Huet and A. Sentenac, PNAS 84, 3648 (1987). 113. M.-L. Vignais, L. P. Woudt, G. M . Wassenaar, W. H. Mager, A. Sentenac and R . J. Planta, EMBO J . 6, 1451 (1987). 114. L. P. Woudt, W. H. Mager, R. T. M. Nieuwint, G. M. Wassenaar, A. C. van der Kuyl, J. J. Murre, M. F. M. Hoekman, P. G. M. Brockhoff and R. J. Planta, NARes 15, 6037 (1987). 115. M . H. Herruer, W. H. Mager, L. P. Woudt, R. T. M. Nieuwint, G. M.Wassenaar, P. Groeneveld and R. J. Planta, NARes 15, 10133 (1987). 116. A. R. Buchman and R. D. Kornberg, MCBiol 10, 887 (1990). 117. D. D. Donovan and N . J. Pearson, MCBiol 6 , 2429 (1986). 118. K. G. Hamil, H. G. Nam and H. M . Fried, MCBiol8, 4328 (1988). 119. J. C. Dorsman, M. M. Doorenbosch, C. T. C. Maurer, J. H. de Winde, W. H. Mager, R. J. Planta and L. A. Grivell, NARes 17, 4917 (1989). 120. C. Presutti, A. Lucioli and 1. Bozzoni, JBC 263, 6188 (1988). 121. F. Della Seta, S.-A. CiafrB, C. Marck, B. Santoro, C. Presutti, A. Sentenac and I. Bozzoni, MCBiol 10, 2437 (1990). 122. W. H. Mager and R. J. Planta, BBA 1050, 351 (1990). 123. M. H. Herruer, W. H. Mager, T. M. Doorenbosch, P. L. M. Wessels, G. M. Wassenaar and R. J. Planta, NARCS17, 7427 (1989). 124. J. Berman, C. Y. Tachibana and B.-K. Tye, PNAS 83, 3713 (1986). 125. A. R. Buchman, W. J. Kimmerley, J. Rine and R. D. Kornberg, MCBiol 8, 210 (1988). 126. J. F. X. Diffley and B. Stillman, PNAS 85, 2120 (1988). 127. P. R. Rhode, K. S. Sweder, K. F. Oegema and J. L. Campbell, Genes Dew. 3,1926 (1989). 128. S. B. Biswas and E. E. Biswas, MCBiol 10, 810 (1990). 129. D. Shore, D. J. Stillman, A. H. Brand and K. A. Nasmyth,EMBOJ. 6, 461 (1987). 130. A. H. Brand, G. Micklem and K. Nasmyth, Cell 51, 709 (1987). 131. D. Shore and K. Nasmyth, Cell 51, 721 (1987). 132. J. C. Dorsman, W. C. van Heeswijk and L. A. Grivell, NARes 16, 7287 (1988). 133. H. Halfter, U. Miiller, E.-L. Winnacker and D. Gallwitz, EMBOJ. 8, 3029 (1989). 134. A. Goel and R. E. Pearlman, MCBiol8, 2572 (1988). 135. S . C. Francesconi and S. Eisenberg, MCBiol 9, 2906 (1989). 136. K. S. Sweder, P. R. Rhode and J. L. Campbell, JBC 263, 17270 (1988). 137. W. Kimmerly, A. Buchman, R. Kornberg and J. Rine, EMBOJ. 7, 2241 (1988). 138. E. Capieaux, M. L. Vignais, A. Sentenac and A. Goffeau, JBC 264, 7437 (1989). 139. J. S. H. Tsang, A. L. Henry, A. Chambers, A. J. Kingsman and S. M. Kingsman, NARes 18, 7331 (1990). 140. G. M. Santangelo and J. Tornow, MCBiol 10, 859 (1990). 141. M. H. Herruer, Ph.D. thesis, Vrije Universiteit, Amsterdam, 1989. 142. C. W. Pikielny and M. Rosbash, Cell 41, 119 (1985). 143. J. R. Warner, G. Mitra, W. F. Schwindinger, M . Studeny and H. M. Fried, MCBiol 5, 1512 (1985). 144. M. G. Wittekind, J. M. Kolb, J. Dodd, M. Yamagishi, S. MBmet, J.-M. Buhler and M. Nomura, MCBiol 10, 2049 (1990). 145. E. Cdarelli, P. Fragapane, C. Gehring and I. Bozzoni, EMBO J . 6, 3493 (1987). 146. A. P. Gultyaev and B. V. Shestopalov, FEBS Lett. 232, 9 (1988). 147. M. Nonet, C. Scafe, J. Sexton and R. Young, MCBiol 7, 1602 (1987). 148. P. Vreeken, R. van der Veen, V. C. H. F. de Regt, A. L. de Maat, and H. A. Rau6, Biochimie, in press (1991). 149. H. J. Himmelfarb, A. Viissarotti and J. D. Friesen, MGG 195, 500 (1984). 150. N. Abovich, L. Gritz, L. Tung and M. Rosbash, MCBiol 5, 3429 (1985).
128
H . A. RAUt AND R. J . PLANTA
151. T. T. A. L. El-Baradi, C. A. F. M. van der Sande, W. H. Mager, H. A. Rau6 and R. J.
Planta, Curr. Genet. 10, 733 (1986). 152. Y.-F. Tsay, J. R. Thompson, M. 0. Rotenberg, J. C. Larkin and J. L. Woolford, Jr., Genes Deu. 2, 664 (1988). 153. E. Maicas, F. G. Pluthero and J. D. Friesen, MCBiol 8 , 169 (1988). 154. C. Gorenstein and J. R. Warner, MGG 157, 327 (1977). 155. A. Bachmair and A. Varshavsky, Cell 56, 1019 (1989). 156. F. SBnchez-Madrid, F. J. Vidales and J. P. G. Ballesta, EJB 114, 609 (1981). 157. S. Zinker, BBA 606,76 (1980). 158. K. Mitsui, T. Nakagawa and K. Tsurugi, J . Biochem. (Tokyo) 104, 908 (1988). 159. H. G . Nam and H. M. Fried, MCBiol6, 1535 (1986). 160. P. W. Piper and K. B. Striby, FEBS Lett. 250, 311 (1989). 161. S. A. Udem and J. R. Warner, J M B 65, 227 (1972). 162. G. M. Veldman, J. Klootwijk, H. van Heerikhuizen and R. J. Planta, NARes 9, 4847 (1981). 163. J. Trapman and R. J. Planta, BBA 442, 265 (1976). 164. S. Kass, N. Craig and B. Sollner-Webb, MCBiol7, 2891 (1987). 165. Y. Mishima, M. Katayama and K. Ogata, J . Biochem. (Tokyo) 104, 515 (1988). 166. N. Craig, S . Kass and B. Sollner-Webb, PNAS 84, 629 (1987). 167. T. Gurney, Jr., NARes 13, 4905 (1985). 168. R. Crouch, in “Processing of RNA” (D. Apirion, ed.), pp. 214-226. CRC Press, Boca Raton, Florida, 1984. 169. B. Miiller and W. A. Eckert, Eur. J . Cell Biol. 49, 225 (1989). 170. W. Musters, R. J. Planta, H. van Heerikhuizen and H. A. Rau6, in “The Ribosome: Structure, Function, and Evolution” (W. E. Hill, A. Dahlberg, R. A. Garrett, R. B. Moore, D. Schlessinger and J. R. Warner, eds.), pp. 435-442. Am. Soc. Microbiol., Washington, D.C., 1990. 171. W. Musters, Ph.D. thesis, Vrije Universiteit, Amsterdam, 1990. 172. H. A. Rau6, J. Klootwijk and W. Musters, Prog. Biophys. Mol. Biol. 51, 77 (1988). 173. J. Klootwijk and R. J. Planta, Methodr Entymol. 180, 92 (1989). 174. D. Tollervey, EMBOJ. 6 , 4169 (1987). 175. H. V. Li, J. Zagorski and M. J. Fournier, MCBiol 10, 1145 (1990). 176. J. Zagorski, D. Tollervey and M. J. Fournier, MCBiol8, 3282 (1988). 177. J. M. X. Hughes, D. A. M. Konings and G. Cesareni, EMBOJ. 6 , 2145 (1987). 178. S. Kass, K. Tyc, J. A. Steitz and B. Sollner-Webb, Cell 60,897 (1990). 179. R. L. Maser and J. P. Calvet, PNAS 86, 6523 (1989). 180. I. L. Stroke and A. M. Weiner, J M B 210, 497 (1989). 181. G. R. Fabian and A. K. Hopper, MCBiol7, 1571 (1987). 182. G. R. Fabian, S. M. Hess and A. K. Hopper, Genetics 124, 497 (1990). 183. H. A. Rau6, W. Musters, C. A. Rutgers, J. van? Riet and R. J. Planta, in “The Ribosome: Structure, Function, and Evolution” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), pp. 217-235. Am. SOC. Microbiol., Washington, D.C., 1990. 184. G. M. Veldman, J. Klootwijk, V. C. H. F. de Regt, R. J. Planta, C. Branlant, A. Krol and J.-P. Ebel, NARes 9, 6935 (1981). 185. R. C. Brand, J. Klootwijk, C. P. Sibum and R. J. Planta, NARes 7, 121 (1979). 186. T. Kruiswijk, A. Kunst, R. J. Planta and W. H. Mager, BJ 175, 221 (1978). 187. T. Kruiswijk, J. T. de Hey and R. J. Planta, BJ 175, 213 (1978). 188. F. J. Vidales, M. T. Saenz-Robles and J. P. G. Ballesta. Bchem 23, 390 (1984). 289. ‘ I T. A. L. El-Baradi, V. C. H. F. de Regt, S. W. C. Einerhand, J. Teixido, R. J. Planta, J. P. G. Ballesta and H. A. RauB, JMB 195, 909 (1987).
RIBOSOME BIOGENESIS IN YEAST
129
190. M. J. Dognin and €3. Wittmann-Liebold, FEBS Lett. 84, 342 (1977). 191. F. Gotz, C. Fleischer, C. L. Pon and C. 0. Gualerzi, EJB 183, 19 (1989). 192. R. J. Leer, M. M. C. van Raamsdonk-Duin, C. M. T. Molenaar, L. H. Cohen, W. H. Mager and R. J. Planta, NARes 10, 5869 (1982). 193. S. P. Johnson and I. R. Warner, MCBiol7, 1338 (1987). 194. E. Otaka, T. Kumaiaki and K. Matsumoto, ]. Bact. 167, 713 (1986). 195. M. T. Saenz-Robles, M. D. Villela, M. Remacha, J. P. G. Ballesta and S. Zinker, unpublished. 196. M. Remacha, C. Santos and J. P. G. Ballesta, MCBiol 10, 2182 (1990). 197. P. A. Silver and M. N. Hall, in “Protein Transfer and Organelle Biosynthesis” (R. C. Das and P. W. Robbins, eds.), pp. 749-769. Academic Press, San Diego, California, 1988. 198. B. Roberts, BBA 1008, 263 (1989). 199. W.-C. Lee and T. M618se, PNAS 86, 8808 (1989). 200. J. F. Kalinich and M. G . Douglas, JBC 264, 17979 (1989). 201. P. Silver, I . Sadler and M. A. Osborne, J . Cell Biol. 109, 983 (1989). 202. R . B. Moreland, H. G. Nam, L. M. Hereford and H. M. Fried, PNAS 82, 6561 (1985). 203. M. R. Underwood and H. M. Fried, EMBO]. 9, 91 (1990). 204. P. J. Schaap, 1. van’t Riet, C. L. Woldringh and H. A. RauC, submitted. 205. P. J. Schaap, J. van’t Riet and H. A. RauC, in preparation. 206. M. Nelson and P. Silver, MCBiol9, 384 (1989). 207. C. Dingwall, S . V. Sharnick and R. A . Laskey, Cell 30, 449 (1982). 208. R. B. Moreland, G . L. Langevin, R. H. Singer, R. L. Garcea and L. M. Hereford, MCBiol 7, 4048 (1987). 209. R. H. Booher, C. E . Alfa, J. S. Hyams and D. H. Beach, Cell 58, 485 (1989). 210. D. Kalderon, W. D. Richardson, A. F. Markham and A. E. Smith, Nature 311,33 (1984). 211. T. Kruiswijk, R. J. Planta and J. M. Krop, BBA 517, 378 (1978). 212. T. T. A. L. El-Baradi, H. A. Rau6, V. C. H. F. de Regt and R. J. Planta, EMBO]. 4,2101 (1985). 213. J. C. Lee, R. Anderson, Y. C. Yeh and P. M. Horowitz, ABB 237, 292 (1985). 214. T. T. A. L. El-Baradi, H. A. Rau6, V. C. H. F. de Regt and R. J. Planta, EJB 144, 393 (1984). 215. M. A. Nieto, F. Hernfindez and E. Palacian, MCBchern 86, 55 (1989). 216. T. T. A. L. El-Baradi, H. A. Rau6, M. Linnekamp and R. J. Planta, FEBS Lett. 186, 26 (1985). 217. C. A. Rutgers, P. J. Schaap, J. Van’t Riet, C. L. Woldringh and H. A. Rau6, BBA 1050,74 (1990). 218. J. C. Lee, in “The Yeasts,” 2nd ed. CSHLAb, Cold Spring Harbor, New York, in press (1990). 219. L. Gritz, N . Abovicb, J. L. Teem and M. Rosbash, MCBiol 5, 3436 (1985). 220. G. Fleming, P. Belhumeur, D. Skup and H. M. Fried, PNAS 86, 217 (1989). 221. A. A. Hadjiolov, Cell B i d . Monogr. 12, (1985). 222. J . Trapman, J. Retd and R. J. Planta, Exp. Cell Res. 90, 95 (1975). 223. J . Trapman, R. J. Planta and H. A. Rau6, BBA 442, 275 (1976). 224. A. B. Sachs and R. W. Davis, Cell 58, 857 (1989). 225. E. R. Dabbs, in “Structure, Function and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), pp. 733-748. Springer-Verlag, New York, 1985. 226. A. Khanna-Gupta and V. C. Ware, PNAS 86, 1791 (1989). 227. N . BataillC, T. Halser and H. M. Fried, J . Cell B i d . 111, 1571 (1990). 228. S . 1. Dworetzkv and C. M. Feldherr, J. Cell B i d . 106, 575 (1988).
This Page Intentionally Left Blank
Structural Elements in RNA MICHAEL CHASTAIN AND
IGNACIOTINOCO,JR.
Unioersity of California Berkeley, California 94720
1. Secondary Structure B. Single-stranded Regions C. Hairpins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E. Internal Loops ..................... F. Junctions ..........................
136
. . . .
111. Tertiary Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Tertiary Base-pairing ................................ B. Single Strand-Helix Interactions ............................... C. Helix-Helix Interactions 1V. Predicting Tertiary Inter V. Three-dimensional Structure VI. Determining RNA Struc VII. Protein-RNA Interactions . . . VIII. RNA-RNA Interactions IX. RNA-DNA Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................................
150 150 153
170 171
The discovery that RNA molecules can catalyze phosphodiester bond cleavage and formation (1,2)dramatically changed our view of RNA function and greatly increased interest in RNA structure. There are several examples of RNA molecules that catalyze the cleavage, and in some cases the formation, of phosphodiester linkages. The first discovery of catalytic RNA was the generation of active rRNA from pre-rRNA in Tetrahymena by self-splicingthe excision of a region of RNA from the middle of a precursor RNA and ligation of the two ends to give an active rRNA and an intron (1).Similar reactions have been observed in a number of organisms for other introns (3, 4). Several other catalytic RNA structures that undergo self-cleavage, but not ligation, have been discovered. These include structures within the minus strand of tobacco ringspot virus (5, 6), the human 6 virus (7), and the “hammerhead” structure found in several organisms (8, 9). In an intermolecular reaction, the RNA component of RNase P (EC 3.1.26.5) is sufficient to cleave mature tRNA from a tRNA precursor (2). There is growing evidence that mRNA structure regulates processes such as transcription (10, 11), splicing (12),translation (13,14), and mRNA decay 131 Progress in Nucleic Acid Research and Molecular Biology. Vol. 41
Copyright 8 1881by Academic Press, Inc. All rights of reproduction in any form reserved.
132
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
(15-17). For example, protein synthesis is reduced in prokaryotic systems if the nucleotides within the Shine-Dalgarno region upstream from the initiation codon form a stable structure, since these nucleotides must pair with the 3’ end of the ribosome to initiate translation (18-21). The structure of the mRNA for topoisomerase gene 60 in bacteriophage T4 has a more dramatic effect on protein synthesis (22). A sequence of 50 nucleotides in the mRNA is skipped by the ribosome during translation so that amino acids corresponding to these nucleotides are not incorporated into the protein. Different aspects of RNA structure influence biological processes. The three-dimensional structure of self-splicing RNAs arranges functional groups to catalyze specific phosphodiester bond cleavage and formation. The global folded structure of an mRNA is less likely to regulate translation since the ribosome unfolds the mRNA as it is translated; however, the local structure of the mRNA can influence the ability of the ribosome to bind and translate the mRNA. The increased interest in RNA function has led to a corresponding interest in RNA structure. Two problems have slowed the characterization of RNA structure. One is that techniques for synthesizing RNA were limited in their ability to generate large quantities of many RNA sequences. The second problem is that biological RNA molecules, with the exception of tRNA, have not formed crystals suitable for X-ray diffraction. The development of enzymatic (23)and chemical (24, 25) methods for synthesizing large amounts of RNA oligonucleotides, plus the development of new techniques for determining RNA structure, have now led to the characterization of several RNA molecules. Here we describe the RNA structural characteristics that have emerged so far. In those cases in which RNA studies are incomplete, studies of DNA are described with the rationalization that RNA structures may be analogous to DNA structures, or that the techniques used to study DNA could be applied to the analogous RNA structures. We focus on aspects of RNA structure that affect the three-dimensional shape of RNA and that affect its ability to interact with other molecules.
1. Secondary Structure Folded RNA molecules are stabilized by a variety of interactions, the most prevalent of which are stacking and hydrogen bonding between bases. Watson-Crick base-pairing is usually thought of first, but the importance of base stacking is seen in the crystal structure of tRNAPhe, where 71 of the 76 bases are stacked (26).Many interactions between backbone atoms also occur in the structure of tRNA, although they are often ignored when considering RNA structure since they are not as well-characterized as interactions be-
133
STRUCTURAL ELEMENTS IN RNA
tween bases. Backbone interactions include hydrogen bonding and stacking of sugar or phosphate groups with bases or with other sugar and phosphate groups. The interactions found in a three-dimensional RNA structure can be divided into two categories: secondary interactions and tertiary interactions. This division is useful for several reasons. Secondary structures are routinely determined by a combination of techniques discussed in Section 11, whereas tertiary interactions are more difficult to determine. Computer algorithms that generate RNA structures can search completely through possible secondary structures, but inclusion of tertiary interactions makes a complete search of possible structures impractical for RNA molecules even as small as tRNA. Finally, the division of RNA structure into building blocks consisting of secondary or tertiary interactions makes it easier to describe RNA structures. To distinguish tertiary interactions from secondary structure, the sequence of the RNA is drawn on a plane; the backbone forms a continuous closed boundary if a line is drawn joining the 5’ end to the 3’ end. Hydrogen bonding between bases in the sequence may be depicted by lines joining the bases; these lines must remain within the closed boundary. A secondary structure can be represented without any lines crossing. Tertiary interactions occur when lines cross; this is called chord crossing. Secondary and tertiary structures of tRNAPhe are shown in Fig. 1. The secondary structure of RNA consists of duplex and loop regions that can be divided into six different types, as shown in Fig. 2: duplexes, single-
0 60
* *
* * 35
FIG.1. The definition of secondary structure and tertiary structure in terms of chord crossing. The lines between points represent base-pairs. (a) The secondary structure of tRNAPhe. (b) The secondary and tertiary structures of tRNAPhe.
134
MICHAEL CHASTAIN AND ICNACIO TINOCO, JR.
a. DUPLEXES
b. SINGLE STRANDED REGIONS
c. HAIRPINS
d. BULGES
HAIRPIN LOOP
HAIRPIN STEM
IIULGE
e. INTERNAL LOOPS
hllSMA'I'CH
ASYMMKI'RIC SYMMWRIC INI'KRNAL LOOP INI'ERNAL LOOP
SINGLE-BASE IIULGE
f. JUNCTIONS
THREE STEM
FOUR STEM
FIG. 2. Secondary structures of RNA.
stranded regions, hairpins, internal loops or bubbles, bulge loops or bulges, and junctions.
A. Duplexes Duplex RNA forms a right-handed double helix stabilized by stacking between adjoining bases and by hydrogen bonds between bases on opposite, antiparallel strands. The conformations of double-helical regions in RNA
STRUCTURAL ELEMENTS IN RNA
135
have been determined by X-ray diffraction studies of fibers and single crystals (26-28). The structures are all similar to that found for DNA fibers at low humidity; the geometry of these helices is termed “A-form.” Proton NMR measurements on a variety of RNA oligonucleotides in solution are also consistent with A-form geometry (24, 29-32). The A form of duplex RNA differs in several ways from DNA duplexes, which typically have B-form geometry in aqueous solution. The riboses in Aform RNA adopt a 3‘-endo conformation, whereas B-form DNA deoxyriboses are in the 2‘-endo conformation; with the usual phosphodiester backbone angles, the distance between phosphorus atoms in RNA is 5.9 A, compared to a distance of 7.0 8, in B-form DNA (26). The base-pairs in A-form helices are tilted with respect to the helix axis and displaced from it by about 4 A; this causes the minor groove in RNA to be wide and shallow, and the major groove to be very narrow and deep (Fig. 3). In crystals, the A-form RNA helix has 11 bp per turn, as opposed to 10 bp per turn of B-form DNA helix. In both RNA and DNA, the helices wind tighter in aqueous solution. The helical repeat of RNA in solution is reported (34) as 11.3 bp per turn or as 11.6 bp (34a), as compared to 10.6 bp per turn for DNA (35-37).
FIG.3. A- and B-form helices tilted from the helix axis so that the groove sizes are visible. Fiber-diffraction coordinates are used (27.33). Note (a) the deep and narrow major groove of Aform geometry compared to (b) the wide major groove of B-form geometry.
136
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
Although there are local variations in conformation depending on base sequence, RNA duplex regions are, in general, in standard A-form geometry. Unusual sequences may adopt other conformations. For example, Z-RNA is formed by alternating C-G base-pairs in the presence of high salt concentrations (29, 38).
B. Single-stranded Regions Single-stranded regions consist of unpaired nucleotides at the 5’ or 3’ end or between duplex regions of an RNA secondary structure. The conformation of these nucleotides differs from that of nucleotides in loop regions of secondary structure, since the ends of loop regions such as hairpins are constrained by the secondary structure. In the absence of tertiary interactions to constrain single-stranded regions, these regions are assumed to be roughly ordered by base stacking in a helical geometry similar to the structures of single-stranded RNAs that have no potential for base-pairing (26).
C. Hairpins A hairpin consists of a duplex bridged by a loop of unpaired nucleotides. Hairpin loops are known to bind proteins, form tertiary interactions, and serve as nucleation sites for RNA folding. The conformation of the backbone in hairpin loops must differ from the conformation of the backbone in helical regions in order to reverse the direction of the RNA strand. The goal of studies on hairpin loops is to understand how the conformation of the loop nucleotides depends on the size and sequence of the loop. The smallest loop capable of bridging a duplex was originally thought to be three nucleotides (39), but there is growing evidence that in some sequences, two unpaired nucleotides suffice to form hairpins in both RNA (40) and DNA (41).Thermodynamic studies of hairpins with loop sequences (U),, (C),, or (A), (n ranging from 3 to 9) showed that loops containing four or five nucleotides are the most stable (42). Loops containing four unpaired bases are the most prevalent in 1 6 4 rRNA (43). Eight different four-base-loop sequences (tetraloops) account for over 60% of the ribosomal tetraloop sequences (44;R. Gutell and 0. C. Uhlenbeck, personal communication). Two of these sequences form unusually stable hairpins: UUCG (45) and GAAA (44, 46; 0. C. Uhlenbeck, personal communication). The interactions that stabilize particular loop sequences can be determined by examining the structures of hairpins determined by X-ray crystallography or NMR. NMR studies of the hairpin GGAC(UUCG)GUCC show that interactions between loop bases and between loop bases and the sugar-phosphate backbone contribute to the unusual stability of the UUCG loop sequence (40, 40a; see Fig. 4). U, and G, form a reverse wobble U*Gpair (shown in Fig. 5 )
STRUCTURAL ELEMENTS IN RNA
137
FIG.4. The structure of an unusually stable hairpin with the loop sequence UUCG as determined by NMR. Interactions stabilizing the loop include a reverse wobble G.U base-pair with a syn G and a hydrogen bond between a cytosine amino proton and a phosphate oxygen. Thin circles represent CY-endo sugars; thick circles represent CB'-endo sugars. Thin rectangles indicate anti bases; thick rectangles indicate syn bases. Dotted circles represent phosphates. Ellipses (. . .) indicate hydrogen bonds. Solid boxes represent stacking. (Reprinted from 40.)
with the guanosine in the syn conformation. This leaves a loop of only two unpaired nucleotides: u6 and C,. The u6 base stacks on the sugar of C,, and the C, base stacks on the Us base. The C, amino proton is close to a 3' phosphate oxygen of Us, implying the formation of a base-phosphate hydrogen bond. The presence of this hydrogen bond is consistent with the 1.4 kcal/mol destabilization of the loop (in 10 mM buffer, pH 6.5, at 25°C) when the cytidine at position 7 is changed to uridine. NMR studies of the U, mutant show little change in loop conformation. G, is still syn, and most of the chemical shifts are the same as in the original molecule. The destabilization can be explained by the replacement of the favorable contact between the C, amino and a 3' phosphate oxygen of U, by an unfavorable interaction of the U, keto with the phosphate oxygen. The loop is also destabilized by switching the closing base-pair from C4.Ggto G4Cg(45);N M R studies have not yet been done to try to understand this effect. N M R studies of three hairpins with small loops indicate that the conformation of the sugar-phosphate backbone throughout the loop is very different from A-form geometry. In the UUCG loop just described, U6 and C, have 2'-endo sugar conformations different from the 3'-endo conformations of the A-form stem. Furthermore, the presence of two phosphate resonances in the downfield region of the spectrum indicates that some of the loop backbone torsion angles are different from the backbone conformation found in duplex regions. These changes enable the sugar-phosphate backbone of the small loop to bridge the duplex region. N M R studies ofa hairpin with loop sequence UCU (31)and a hairpin with loop sequence UUU (P. Davis, personal communication) show similar behavior. The three nucleotides in these loops have 2'-endo sugar puckers, and some of the phosphate resonances are in the
138
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR,
A.A
u. u
C*AII*
c.c
A.C
c.u
WObb*
Rvmrw Watson-Crkh
Rmvoru Hooprian
FIG. 5. Non-Watson-Crick hydrogen-bonding schemes. Most mismatches have more than one possible hydrogen-bonding scheme; a complete list is found in 26.
STRUCTURAL ELEMENTS IN RNA
139
downfield region of the spectrum, indicating non-A-form backbone conformations. In the case of the hairpin with loop sequence UUU, the deviation from A-form geometry extended 1 bp into the stem region. We suspect that changes in the sugar-phosphate backbone, including 2’-endo sugar puckers, are a general property of small hairpin loops. A-form geometry is preserved in portions of larger hairpin loops. As proposed prior to the crystal structure determination (47), five of the seven nucleotides in the anticodon loop of yeast tRNAPhecontinue stacking on the 3’ strand (26). The stacking of the anticodon bases in A-form helix geometry leaves these nucleotides accessible for base-pairing with the mRNA. The five stacked nucleotides are followed by a turn in the sugar-phosphate backbone created by a change in the phosphodiester torsion angles between G, and U,. The turn structure is stabilized by a hydrogen bond between the imino proton of U,, and a 3’ phosphate oxygen of A,, as well as stacking between the phosphate of G, and the base of U33. One-dimensional NMR studies of the anticodon loop of yeast tRNAPheshowed that the structure in solution is consistent with the crystal structure described (48),although two-dimensional NMR methods are needed to give unambiguous assignments. A hairpin that occurs in wheat germ 5-S rRNA containing a loop of 12 nucleotides has been studied by N M R (49, 50). A-form stacking does continue into the loop from the stem, but, unlike the anticodon loop, the stacking continues for several nucleotides along the 5’ strand. The loop is also stabilized by C-U mismatch hydrogen bonding (see Fig. 5 ) between the first two loop nucleotides. The hairpins studied so far show that the stability of a hairpin loop changes with different loop sequences and sizes. The structures of the tRNA anticodon loop and the UUCG loop suggest that the specific loop sequences adopt conformations that are more stable because they contain more hydrogen bonding and stacking interactions-particularly interactions with the sugar-phosphate backbone. It has been suggested (51)that loop nucleotides stacked in A-form geometry along the 3’ strand are a common feature of RNA hairpins. The examples discussed here show that there is a range of loop structures. The anticodon loop contains nucleotides stacked along the 3’ strand, but the hairpin from 5-S rRNA has nucleotides stacked along the 5’ strand. In the three small hairpin loops-UUCG, UCU, and UUU-the backbone angles of each loop nucleotide differ significantly from A-form geometry. The conformations of the nucleotides in these small loops suggest that these nucleotides are not very accessible for base-pairing with other regions of RNA. Five of the nucleotides in the anticodon loop, on the other hand, are stacked in normal A-form geometry which facilitates pairing with other nucleotides. The structures of hairpin loops determine how the loop nucleotides can
140
MICHAEL CHASTAIN AND ICNACIO TINOCO, JR.
interact with regions of the same RNA molecule, with other RNA molecules, and with proteins. Further studies of RNA hairpins are needed to determine more about the interactions that stabilize hairpin loops. Helical regions are stabilized by hydrogen bonding and stacking between bases. The nucleotides in hairpin loops are stabilized by hydrogen bonding between base protons and phosphate oxygens, and by stacking between bases, sugars, and phosphates. The effect that the closing base-pair of the stem has on the stability and structure of hairpin loops must be determined.
D. Bulge Loops Bulge loops (or bulges) are defined as unpaired nucleotides on one strand of a double-stranded region; the other strand has continuous base-pairing. Bulges range in size from one to many nucleotides. The main structural questions of interest are: (1) Does one unpaired base intercalate into the helix, or is it extrahelical, with the base-pairs on either side stacked on each other? (2) How much does a bulge bend or kink the helix? (3) What is the effect of very large bulge loops? Do the surrounding base-pairs break to form an internal loop instead of a bulge? The local conformation of single-base bulges in several DNA oligonucleotides has been studied, but there has been very little work on RNA bulges. The equilibrium between a nucleotide bulge intercalating in or looping out of the helix depends on temperature, the identity of the bulge nucleotide, and the sequence of base-pairs in the duplex surrounding the bulge. This equilibrium is demonstrated by NMR studies on DNA oligonucleotides containing a thymidine bulge that loops out of the helix at 0°C and intercalates into the helix at 35°C when located between two guanosines, whereas a thymidine bulge located between two cytidines remains looped out of the helix independent of the temperature (52). NMR studies on an RNA duplex with a uridine bulge between two guanosines show that it loops out of the helix (53). The local conformation of nucleotides in bulge loops containing more than one nucleotide have not been studied by high-resolution structural techniques. Determining the conformation of these loops in RNA is important for understanding how these nucleotides interact with other elements of secondary structure to form tertiary interactions and for understanding how proteins bind to bulge loops. The chemical reactivity of bulge loops in 1 6 4 rRNA (54)has been studied. A bulge loop containing six nucleotides appears to exist without breaking the base-pairs on either side, since these pairs are protected from chemical modification. Some of the nucleotides within the bulge loop were also protected from modification, which suggests either that
STRUCTURAL ELEMENTS IN RNA
141
the loop nucleotides form a structure involving stacking and hydrogen bonding, or that the nucleotides form tertiary interactions. Bulge loops can affect the long-range structure of nucleic acid helices by creating a bend in the double helix. Bending has been detected by the altered mobility in nondenaturing gel electrophoresis for both RNA and DNA helices containing bulge loops (55-57). Bulge loops in DNA or RNA helices containing five adenosines were found to alter the gel mobility more than bulge loops containing five thymidines or uridines (34,55),which shows that the structure of the bend depends on the identity of the bulge nucleotides. The bending was also shown to be affected by the base-pairs surrounding the bulge loop. The gel mobility for bulge loops of adenosines in DNA differed when the sequence flaking the bulge was d[AGG(A),TCG].d[CGACCT] rather than d[CGA(A),CCT].d[AGGTCG] (55). One of the assumptions made when studying RNA secondary structure is that the structure is independent of the surrounding conformations. For example, a short region of duplex RNA is assumed to be the same whether it occurs next to a hairpin or next to a junction. Several studies suggest that bulge nucleotides affect the structure of the duplex surrounding them for several base-pairs. Thermodynamic studies showed that the free energies of bulge loops containing adenosines were 2 kcal/mol more stable (at 37°C) for the sequences GCG(A),GCGACGCCGCA than for GCG(A),GUCA-GACCGCA. Helices containing dangling nucleotides were more destabilized by bulge loops than were helices without dangling nucleotides (58). Nuclease V, (EC 3.1.22.3 or 3.1.27.8) cutting in duplex regions containing bulges and intercalated ethidium showed that ethidium intercalation into a double helix is affected for several base-pairs around a bulge nucleotide (59).A distortion in the duplex region near an adenosine bulge (in boldface) in the sequence d[CGCGAAATTTACGCG], was observed by NMR. The presence of a phosphate resonance in the downfield region of the spectrum indicated an unusual backbone conformation. The downfield resonance was assigned to the 5’ phosphate of the guanosine one nucleotide removed from the bulge (60). Thus, the distortions due to bulges are not necessarily localized at the bulge, but may extend into the surrounding duplex region. Many questions remain about bulge loop structures. Bulge loops bend RNA, but a correlation between the number and sequence of bases in the bulge and the extent of bending has not been established. Does a single base bulge that is looped out of the helix create a bend? Bulge loops containing more than one nucleotide have not been structurally characterized at high resolution. The structures of these loops determine the ability of bulge loop nucleotides to form tertiary pairs. Finally, the effect bulge loops have on the surrounding duplex structures must be determined.
142
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
E. Internal Loops 1. MKSMATCHES Two apposed nucleotides that cannot form a Watson-Crick pair are called mismatches. The two mismatched bases can engage in some other form of hydrogen-bonding, or they can form an open loop of two nonbonded nucleotides. Figure 5 shows some of the hydrogen-bonding schemes proposed for mismatches with two hydrogen bonds, although mismatches containing only one hydrogen bond have been found in tRNA (26). Mismatches occur frequently in proposed secondary structures. The stability, the contribution of the surrounding sequences to the stability, the hydrogen bonding, and the effect on the sugar-phosphate backbone of the various mismatches has not been determined systematically in RNAs. The most-characterized mismatch is the wobble G*Upair (Fig. 5),which forms two hydrogen bonds and is virtually as stable as an A-U pair (61). Crystal structures of tRNAPhe and tRNAAsp containing G.U mismatches show that they are usually incorporated into the helix without creating distortions in the backbone. However, one of the G*Upairs in the crystal structure of yeast tRNAASpdoes have an irregularity in the backbone. Two of the sugar-phosphate backbone angles (aand y in Fig. 6) at the G-Upair in the anticodon stem are in the trans conformation instead of the normal gauche conformation found in helical regions (62).The distortion in the helix backbone is presumably caused by the difference in the width of a mismatch pair (C1’-Cl’ distance) versus that of Watson-Crick base-pairs. The difference in width could also explain why G-Umismatches are slightly less stable when they are in the middle of a Watson-Crick region of a helix than at the end (61)and why many of the mismatches in proposed rRNA secondary structures are found at helix termini (43, 63). A change in backbone
RASE
FIG.6. The nomenclature describing the torsion angles that define the conformation of a nucleotide. Another parameter describing the sugar conformation is needed to completely define the conformation, hut this parameter, the sugar puckering amplitude, has not varied in crystal structures of RNA and is usually assumed to be constant (26).
STRUCTURAL ELEMENTS IN RNA
143
conformation may be a recurring feature of mismatches, although these distortions seem to depend on the sequence surrounding the mismatch. The hairpin GCGA,UU,(UCU)G,&,,CGCC has been studied by NMR (31).It has a stem of 6 bp, including A$ *C12and U,.G,, mismatches. Interproton distances by NMR were consistent with A-form stem geometry, even at the mismatches. The A + .C pair forms by protonation (at pH 6.5) of the adenine at the imino nitrogen, and is proposed to result in a pair containing two hydrogen bonds whose geometry is very similar to that of a G.U pair (Fig. 5). The free energy of formation of the A + C pair, is 2 kcal/mol less stable than an A*U pair at 37°C. Two hydrogen bonds form in the G-A mismatch (Fig. 5 ) found at the end of the anticodon helix in tRNAPhe. G-A mismatches occur frequently at helix termini in rRNA (63),and hydrogen bonding in G*A mismatches has been proposed (on the basis of chemical modification) to occur in several internal loops in rRNA (64,65). A symmetrical RNA duplex containing two G.A mismatches (in boldface) and 3’ dangling guanosines, [GCGAGCG],, is 1.9 kcal/mol more stable at 37°C than predicted by the nearest-neighbor parameters, suggesting that G*Amismatches from stable structures (58). More data are available on mismatches in DNA than in RNA. The gel mobilities of DNA restriction fragments containing all 12 possible mismatches are unchanged compared to a fully Watson-Crick helix; this shows that mismatches do not cause significant bends in the helix axis (66). Thermodynamic studies on mismatches in the DNA oligonucleotides dCAAAXAAAG + dGTTTYTTTC showed that mismatches involving guanine (GaT, G-G, G-A) are the most stable, and mismatches involving cytosine ( C C , C*A)are the least stable (67).
2. INTERNALLOOPS Internal loops contain three or more nucleotides not capable of forming Watson-Crick base-pairs; there is at least one unpaired nucleotide on each strand. Internal loops containing equal numbers of unpaired nucleotides on each strand are symmetrical, and those containing unequal numbers are asymmetrical. Comparison of computer-predicted secondary structures to phylogenetically derived ones suggests that asymmetrical internal loops are thermodynamically less stable than symmetrical ones (68).It is not known what determines whether internal loops are open or whether they close by forming non-Watson-Crick hydrogen bonding. The effect that internal loops, particularly asymmetrical ones, have on the helical backbone of RNA has not been determined. Chemical modification of two asymmetrical internal loops in the S8 protein binding site of 1 6 4 rRNA suggests that one is open and the other is closed by the formation of mismatch hydrogen bonding (69).The accessi-
144
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
bility of all of the nucleotides in a loop to either chemical or enzymatic modification is good evidence that the loop remains open. Uridines in the second loop were protected from modification, suggesting the formation of hydrogen bonds in a U-U mismatch (Fig. 5);the U-U pair, together with the accessibility of three adenosines in this loop to chemical modification, suggests that the loop closes with the three adenosines looped out of the helix. NMR studies have been done on a symmetrical internal loop of Escherichia coli 5-S rRNA, known as loop E (70).The chemical shift and exchange behavior of the imino protons assigned to nucleotides in the loop indicate that the loop does not close by forming hydrogen bonds in G - G or G-A (Fig. 5) mismatch pairs. An open conformation was also found for a symmetrical internal loop similar to loop E of Xenopus Zueuis 5-S rRNA by NMR studies (30).This internal loop (Fig. 7) has the potential to close by the formation of hydrogen bonds in G*A mismatches, but the imino proton spectrum indicates that the nucleotides in the loop are not involved in hydrogen bonding. NMR studies on the nonexchangeable protons showed that the bases within this open loop stack extensively with only minor distortions from A-form geometry. A uridine was added to this internal loop to make the sequence identical to that within loop E of X. Zueuis 5-S rRNA (B. Wimberly, unpublished
b
a
5'
5'
G A
G-C G-C C-G G-C
A U
G-C G-C C-G G-C, G A +A UG A A
u u
C-G c-G G-c C-G
5'
A-U C-G C-G G-C C-G
s'
FIG.7. Conformations of two internal loops, as determined by NMR. Breaks in NOE connectivities between adjacent nucleotides are marked by arrows. (a) An open internal loop structure is formed by an eight-base symmetrical internal loop (30). (b) Adding a single nucleotide to form an asymmetrical nine-base internal loop results in a closed structure with mismatch hydrogen bonding and a guanosine looped out of the helix (9. Wimberly, personal communication).
STRUCTURAL ELEMENTS IN RNA
145
results). The structure of the asymmetrical internal loop which contains four nucleotides on one strand and five on the other (Fig. 7) is very different from the structure of the symmetrical one, despite the addition of only one nucleotide. NMR studies of the asymmetrical loop show that it closes with the guanosine on the longer strand looped out of the helix and hydrogen-bond formation in at least one of the mismatches in the loop. A closed structure for this loop was also proposed on the basis of chemical modification (M),although the guanosine thought to be extrahelical on the basis of NMR was proposed to be involved in a G-A pair (Fig. 5). Determining where bends occur in RNA structures is important for understanding which regions of secondary structure are close together in the tertiary structure. It has been proposed in a model of the 16-S rRNA structure that a large bend occurs in a six-nucleotide symmetrical internal loop (71), but the effect of internal loops on the sugar-phosphate backbone in RNA has not been studied experimentally. Gel-mobility studies show that symmetrical internal loops of six or ten adenosines or thymidines in DNA bend the helix axis very much less than bulge loops (55). The structures of internal loops in RNA remain poorly defined. The influence of loop sequence and size on the ability of internal loops to close by the formation of non-Watson-Crick hydrogen bonding must be determined. The accessibility of nucleotides in internal loops to tertiary pairing depends on the loop conformation. Extensive stacking of the bases within open internal loops may facilitate tertiary pairirig, whereas bases in closed internal loops would be unavailable. Finally, the effects of symmetrical and asymmetrical internal loops on the sugar-phosphate backbone must be explored: Do bends occur at internal loops?
F. J unctions Junctions, or multibranched loops, contain three or more double-helical regions with a variable number of unpaired nucleotides where the helical regions come together. Junction regions are important because helical regions can stack coaxially at these regions, and because the alignment of helical regions at junctions gives these regions characteristic shapes. The only RNA junction whose conformation has been established is the four-stem junction in tRNA. For each tRNA for which a crystal structure has been determined, the acceptor stem stacks coaxially on the T stem, and the D stem stacks on the anticodon stem forming two long helical regions. These two helical regions are oriented roughly perpendicular to each other, cresting the overall L shape of the molecule. Several of the unpaired nucleotides in the junction are involved in tertiary interactions, which are discussed in Section 111. An example of an RNA junction whose conformation plays a role in
146
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
protein binding is found in the 5-S rRNA from X. Zueuis. This three-stem junction contains four unpaired nucleotides critical for binding transcription factor IIIA. Mutations of the unpaired nucleotides in the junction greatly reduce protein binding (72), even though the protein does not directly contact these nucleotides (73). The implication of these studies is that the unpaired nucleotides in the junction act as a hinge controlling how the helical regions stack on each other, thus determining the overall three-dimensional conformation of the 5-S rRNA. The conformation of a four-stem DNA junction, a model of recombination intermediates, has been determined (74). The conformation of the fourstem DNA junction was determined by measuring fluorescence energy transfer between acceptor and donor groups located on the four different helical regions. The four stems of the junction stack coaxially in pairs to form two long helical regions oriented in an X shape. The difference between the shapes of four stem-junctions in tRNA and in the four-stem DNA junction could be caused by the presence of unpaired nucleotides in the junction of tRNA. On the basis of the tRNA structure, 5-S rRNA studies, and the DNA four-stem junction structure, it seems to be a general feature of junction regions that different stems stack coaxially to form longer helical regions. Coaxial stacking between helical regions in the 1 6 4 rRNA has been proposed on the basis of phylogenetic comparison (75). One criterion for deciding whether two helical regions stack is that their combined length in basepairs is conserved between different organisms, although the lengths of individual helices are not conserved. It has also been suggested (76)that the helical regions separated by the fewest unpaired nucleotides will stack coaxially. Both of these methods for predicting coaxial stacking predict more coaxially stacked helical regions than are found in a 16-S rRNA model based on chemical crosslinking (77). Experimental studies must be done to determine which factors, such as the number of stems and unpaired nucleotides in the junction, govern the stacking of different helices. In addition to stacking helical regions together, junction regions act as hinges that orient the different helical regions in space. The four-stem junction in tRNAs adopt an L shape and the four-stem DNA junction adopts an X shape. The conformations of more RNA junctions should be characterized in order to learn the various shapes that junctions can adopt. Junction regions appear to constitute catalytic RNA sites. The hammerhead self-cleaving RNA region is coded for by newt satellite DNA (78) and is found in plant viruses (8), virusoids (79), and viroids (9).The consensus secondary structure for the catalytic domain contains a three-stem junction containing 11 unpaired nucleotides. Most of the unpaired nucleotides of the junction region are conserved (Fig. 8) and are required for the self-cleavage
147
STRUCTURAL ELEMENTS IN RNA
N J
FIG. 8. The “hammerhead” self-cleavage RNA structure found in several organisms is shown, with the position of specific cleavage marked by an arrow and the conserved nucleotides boxed.
reaction (80, 81). The specific cleavage takes place between one of the unpaired nucleotides and a stem region (Fig. 8);a 2’,3’-cyclic phosphate and a 5’-OH group are formed (9).The catalytic activity must involve the conformation of the unpaired nucleotides in the junction since these comprise almost all of the required nucleotides. Growing evidence implies that the unpaired nucleotides in a five-stem junction of the 2 3 4 rRNA form the peptidyl transferase site of the ribosome, where peptide bonds are formed when amino acids are transferred from tRNAs to the nascent protein (82). Chemical “footprinting” experiments have established that unpaired nucleotides in this junction are protected from chemical modification when tRNA is bound to the peptidyl transferase site (83).Direct chemical crosslinking of the tRNA to the five-stem junction has also been seen (84, 85). The examples of the hammerhead catalytic domain and the peptidyl transferase junction of 23-S rRNA illustrate the importance of determining the conformation of unpaired nucleotides in junction regions. Not only do the conformations of these nucleotides have a great impact on the threedimensional structure by orienting the stem regions that meet at the junction, but also these nucleotides can be positioned to catalyze specific reactions.
II. Predicting Secondary Structure Two techniques that are commonly used either separately or together to predict RNA secondary structure are phylogenetic comparison (43, 86) and thermodynamic stability (39, 61). Phylogenetic comparison is the method
148
MICHAEL CHASTAIN AND ICNACIO TINOCO, JR.
currently used as the standard for determining secondary structure. The underlying principle is that mutations that do not alter function will be preserved. Since function is assumed to depend on structure, the preservation of a structure between different organisms, despite changes in the base sequence, is good evidence for the structure’s existence.
A. Phylogenetic Comparison Phylogenetic comparison requires that RNA sequences from several different organisms be known. The sequences are first aligned and then searched for regions capable of base-pairing. since helical regions are maintained if G*C pairs are replaced by A*U pairs or vice versa, covariance of nucleotides establishes which regions are involved in base-pairing and which are not. A helix is usually considered to exist if two or more covariations are found in it (43, 86). In practice, organisms whose primary sequences differ by 20-40% give the best results with phylogenetic comparison (86). Sequences that are too dissimilar are difficult to align, and sequences that are too similar do not have enough compensatory base changes to establish the existence of helical regions. A limitation of the phylogenetic method is that it cannot provide any information about regions of secondary structure that contain conserved nucleotides. Thus, phylogenetic methods tend to predict fewer helices than actually exist in the molecule.
B. Thermodynamic Stability Thermodynamic stabilities are used routinely to predict secondary structure. Computer algorithms predict structures by calculating the free energies for all possible base-pairing schemes and finding the secondary structure of lowest free energy. Computer algorithms that find several different secondary structures whose calculated free energies are close to the lowest free energy (87, 88) are important for several reasons. One reason is that the calculated energies are based on incomplete experimental data and thus have significant uncertainties. Another reason is that biological RNA molecules begin folding as they are synthesized; they could become kinetically trapped in a structure that is not the structure of lowest energy. Furthermore, tertiary interactions may stabilize a secondary structure that is not calculated to have the lowest free energy. The most fundamental reason for calculating alternate secondary structures is that biological RNA molecules may not form a single secondary structure, but may instead have several structures in equilibrium. This has been suggested, for example, for the cIZ1 gene mRNA of bacteriophage X (19). Free energies are calculated from experimentally determined parameters by the computer algorithms using a nearest-neighbor model. Since stacking interactions are short-range, it is reasonable to assume that the free energy of a helical region depends on the sequence of dinucleotide steps it
STRUCTURAL ELEMENTS IN RNA
149
contains (39). This nearest-neighbor model has been tested for short duplexes with different sequences, but with the same set of dinucleotide steps; with only a few exceptions, the average agreement between the predicted and measured free energies is within 6% (61). The free energies of loop regions (hairpins, internal loops, and bulges) are more difficult to quantitate. Originally, loop free energy was assumed to depend only on the number of unbonded nucleotides within the loop. This was based on free energies of loop formation of a small set of molecules with limited sequence variation (89, 90). There are now examples of loop regions whose free energy of formation depends markedly on the sequence of nucleotides within the loop. For example, the hairpin loop UUCG discussed above is much more stable than would be predicted by the current freeenergy parameters. So is the internal loop containing two G-Amismatches (in boldface) formed by the duplex with 3’ dangling guanosines [GCGAGCG],. As more thermodynamic data are collected for loop regions, more accurate formulas for predicting the free energy of loop regions will become possible. Junction regions present bigger problems than other loop regions. No thermodynamic data have been measured for the free energy of junction formation in RNA. The present computer programs typically assume that the free energy of a junction region depends on the number of stems and the number of unpaired nucleotide within the junction (88). The values of the parameters are empirically derived to best fit the structures of RNA sequences whose secondary structures are well established. Despite the problems with free energy of loop formation, current computer algorithms do remarkably well in predicting RNA secondary structure. The lowest free-energy structures calculated by the Zuker program for over 200 molecules predicted 70% of the helices deduced from phylogeny (44). Furthermore, the best structure within 10% of the lowest free energy predicted 90% of the helices correctly. The number of correctly predicted helices will surely increase as more free energies are measured for loop formation. In practice, both phylogenetic comparison and computer algorithms are used to predict secondary structure. If only a few sequences that are not homologous enough to align for phylogenetic comparison are available, they can be folded with computer algorithms. A model secondary structure can be determined by searching for a common secondary structure among the various suboptimal foldings generated for the different sequences (91). Determining a secondary structure by either method is usually an iterative process. The secondary structure model is refined as more sequences are determined or as the model is compared to the results of chemical and enzymatic modification procedures that map the accessibilities of different nucleotides.
150
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
111. Tertiary Interactions RNA molecules fold into compact structures stabilized by tertiary interactions not included in the secondary structure. Secondary and tertiary interactions are distinguished in terms of chord crossing (Fig. 1).Tertiary interactions are considered separately from secondary structure, since the formation of tertiary interactions depends not only on the nucleotides that form the tertiary interaction but also on the rest of the secondary structure. For example, the conformation about the four-stem junction in tRNA brings the T loop and the D loop close together; this allows tertiary pairing between nucleotides in these loops. The nucleotides in the anticodon loop cannot interact with either of the other loops, since the anticodon loop is fixed at one end of the molecule by the junction conformation. The biological functions of RNA molecules-in particular, catalytic behavior-are determined by their three-dimensional structures. Characterizing all of the tertiary interactions that occur in an RNA molecule does not completely describe the three-dimensional structure, but tertiary interactions do indicate regions of the secondary structure that are close together in space. Most of what we know about tertiary interactions comes from a few tRNA crystal structures, but we are beginning to learn more about tertiary interactions from a variety of methods (discussed in Section IV) to study an increasing number of RNA molecules. The goal is to find recurring tertiary elements: specific arrangements of secondary structure elements stabilized by tertiary contacts. Here we discuss the three types of tertiary interactions so far characterized: tertiary base-pairing, single-strand-helix interactions, and helix-helix interactions.
A. Tertiary Base-pairing 1. PSEUDOKNOTS
Nucleotides that are unpaired in the secondary structure of an RNA molecule can form tertiary contacts by hydrogen bonding to other nucleotides unpaired in the secondary structure. In general, this could occur between any of the secondary structure regions containing unpaired nucleotides (single-stranded regions, hairpin loops, bulge loops, internal loops, and junction loops). All of these interactions were originally termed knots or pseudoknots (92); here, we use the term “pseudoknot” to describe only the structure in which nucleotides in a loop (hairpin, internal, or bulge) pair with nucleotides in a single-stranded region. Different types of pseudoknots are shown in Fig. 9 (93). Pseudoknots have been found in an increasing number of biological sys-
151
STRUCTURAL ELEMENTS IN RNA
a
b
C
@ EB 3'
5'
FIG. 9. The different types of pseudoknots are diagrammed for hairpin loops pairing with an adjacent single-stranded region. (a) In the best-characterized form of pseudoknot, the loop at the top crosses the major groove, and the loop at the bottom crosses the minor groove. (b) This form of pseudoknot, in which one loop crosses the major groove and the other loop bridges the whole helix, has not been found. (c) This type of pseudoknot was proposed to occur in the amRNA of E . coli (94). One loop crosses the minor groove, and the other loop bridges the whole helix.
tems since their discovery at the 3' end of several plant RNAs (95, 96). The viral RNAs were recognized by tRNA-specific enzymes, but the secondary structures predicted for these sequences did not include an amino-acid acceptor stem. The formation of a pseudoknot allows these molecules to fold into a tertiary structure functionally similar to tRNA, even though the secondary structures are different. Pseudoknot formation also enhances frameshifting during translation in the coronavirus IBV (97). The mechanism by which pseudoknots contribute to frameshifting has not been determined, but the combination of an (A+ U)-rich sequence and a pseudoknot results in a frameshift. One possible explanation is that the pseudoknot causes the ribosome to pause, allowing slippage to occur at the A+U-rich sequence. This mechanism seems to be a general one, since 14 of 22 sequences from a variety of viruses, known or suggested to contain frameshifting sites, have the potential to form pseudoknots (97). A pseudoknot of the type diagrammed in Fig. 9a formed by a short oligonucleotide has been studied by N M R (98). The interproton distances, determined by NMR, were consistent with the two stem regions, one containing 5 bp and the other containing 3 bp, with A-form helix geometry. The distances between protons located in the two different stem regions indicate that the two stem regions stack coaxially to form one continuous helical
152
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
region. The pseudoknot appears to be a normal duplex from one side, but on the other side, two loops bridge the duplex, one crossing the major groove and one crossing the minor groove. The size of the loops was varied without changing the stem sizes to show that the minimum loop size for crossing the minor groove of the 3-bp stem is three nucleotides, and the minimum loop size for crossing the major groove of the 5-bp stem is two nucleotides. The pseudoknot is only marginally more stable than either of the two potential hairpin structures that the sequence could form. The equilibrium between pseudoknot and hairpins depends on salt concentration, temperature, and nucleotide sequence (99). 2. LOOP-LOOPINTERACTIONS There are several examples of RNA molecules containing tertiary contacts between nucleotides that are in loop regions of secondary structure. Phylogenetic comparison suggests that the RNA component of RNase P, which catalyzes the processing of tRNA precursors, makes Watson-Crick pairs between four nucleotides in one junction loop and four nucleotides in a second junction loop (100).These are interactions between secondary structure loop regions in the crystal structure of yeast tRNAPhe. Two of the nucleotides in the D loop form parallel pairs with two nucleotides in the T loop: G,,*IJJ, and G,,C,. Two additional tertiary base-pairs are formed between unpaired nucleotides in the central four-stem junction and nucleotides in the D loop: a reverse-Hoogsteen pair (Fig. 5), U,.A,,, and a parallel-stranded reverse-Watson-Crick pair (Fig. 5), G15*C,, (26). These tertiary pairs are stabilized by hydrogen bonding and stacking interactions with adjacent nucleotides; they are partially responsible for stabilizing the L shape of tRNA. Tertiary pairs between loop regions have also been proposed on the basis of phylogenetic comparison in the 16-S and 2 3 4 rRNAs (101,
102). The high frequency with which known RNA structures contain tertiary pairing between nucleotides that are unpaired in the secondary structure stresses the importance of learning more about such interactions. Can unpaired nucleotides in all of the secondary structure loop types engage in tertiary pairing? There are several examples of pairing involving nucleotides in hairpin loops and junction loops, but it is not known whether nucleotides in bulge loops or internal loops form tertiary interactions. The fact that tRNA has tertiary pairing between parallel strands shows that the rules for forming tertiary pairs can be different from the rules for secondary structure. The thermodynamics of tertiary pairs is of critical importance for predicting their existence. In the tRNAs and pseudoknots, tertiary pairs contribute less to the free energy than secondary structure does, but it may be possible for very stable tertiary interactions to replace secondary structure.
STRUCTURAL ELEMENTS IN RNA
153
6. Single Strand-Helix Interactions 1. INTERCALATION Base stacking is one of the most important factors stabilizing RNA structures. One of the ways in which stacking can stabilize the tertiary structure is for nucleotides that are unpaired to intercalate between base-pairs. An example of intercalation is seen in the crystal structure of yeast tRNAPhe, where G,,, an unpaired nucleotide in the T loop, intercalates between two nucleotides of the D loop, G,, and G,,, which form tertiary pairs with nucleotides in the T loop. To accommodate the intercalated guanine between the two tertiary pairs, the sugar-phosphate backbone is extended by a change in the G,, sugar conformation to 2'-endo (26).
2. BASE-TRIPLES Base-triples occur when an unpaired nucleotide forms hydrogen bonds with a nucleotide that is already base-paired. The third base of a base-triple may bind to the Watson-Crick pair in either the major or minor groove and be stabilized by the formation of one or two hydrogen bonds as well as stacking interactions. Several biological functions have been proposed for base-triples. Base-triples stabilize the three-dimensional shape of tRNA and several other RNA molecules discussed below. The formation of DNA triples inhibits transcription in uitro (103),and it has been suggested that a small RNA molecule may inhibit transcription by forming a triplex 115 bp upstream from the transcription site of the human c-myc gene (104, 105). The self-splicing intron from Tetruhymenu binds guanosines via base-triple formation during the self-splicing reaction (106). Three base-triples occur in tRNAPhe, all of which involve nucleotides in the junction loop binding in the major groove to Watson-Crick pairs of the D (Fig. lo), the third stem. In two of the triples, Ag*A,,*U,, and G,,*G,,C,, base forms two hydrogen bonds with the purine of the Watson-Crick pair, while only one hydrogen bond is formed by the third base in G,,.G,,C,, (26).Different base-triples are found in the crystal structure of tRNAAsp(62); the A46*G22.+13 triple replaces the G,,*G,,.C,, triple of tRNAPhe, with the adenosine forming one hydrogen bond with the G*+ pair in the major groove. An unusual base-triple occurs at the beginning of the D loop. A,, binds to the tertiary reverse-Hoogsteen pair (Fig. 5) A,,.U, by forming base-base and base-sugar hydrogen bonds. A,, forms one hydrogen bond with an amino proton of A,, and one hydrogen bond with the 2'hydroxyl of U,; the 2' hydroxyl of A,, forms a hydrogen bond with a base nitrogen of A,*. These base-triples in tRNA occur at the hinge region between the two helical domains and help stabilize the tRNA in its characteristic L shape.
154
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
n
I
U*A*U
C-G*CH+
n
U*A*A
C*G*G
FIG. 10. Proposed hydrogen-bonding schemes for base triples. U.A.A and C.G.G base triples have been proposed with several different hydrogen-bonding schemes. For other schemes, see 26 and 108. (Reprinted from 107.)
Another RNA molecule that forms base-triples is the self-splicing intron from Tetrahymena. The intron binds a free guanosine during cleavage at the 5' exon and binds an internal guanosine during cleavage of the 3' exon. It is in the P7 postulated that these guanosines bind to a base-pair, G,,C,,,, stem by forming a base-triple (106).Replacing the G C pair with an A.U pair abolished splicing activity, but cleavage at the 5' exon was restored by adding 2-aminopurine instead of guanine. 2-Aminopurine can form a basetriple with the A * U pair that is isomorphic to the wild-type G.G*C triple.
155
STRUCTURAL ELEMENTS IN RNA
This is strong evidence for the existence of the G.G.C triple in the splicing reaction of the wild-type intron. A base-triple has also been proposed to occur between a nucleotide in a junction loop and an adjacent helix in Xenopus Zueuis 5-S rRNA (109).On the basis of chemical modification and model building, it is proposed that an adenosine in the junction loop binds to a G-U pair in the minor groove by forming two hydrogen bonds, one between the adenosine N7 and the 2' hydroxyl of the guanosine, and one between the adenosine amino and a base nitrogen of the guanosine. The formation of base-triples in junction regions where helices stack coaxially may be a recurring RNA structural element. If two helical regions in a junction stack coaxially (Fig. ll), it is a consequence of A-form helix geometry that the 3' strand cannot reach the major groove of the adjacent helix, and the 5' strand cannot reach the minor groove of its adjacent helix.
b
T-stem
D-stem G-C-U-G-
minor groove
c
stem V
major groove
stem I1
c minor groove minor groove
major groove
major groove
d P9.0
minor groove
P7
major groove
FIG. 11. Base-triple formation at regions where helices coaxially stack may be a recurring RNA structure. (a) The 3' strand enters the major groove, and the 5' strand enters the minor groove. (b)Two of the three base-triples in tRNAphe are consistent with this model. The 5' strand does not enter the minor groove as expected, but instead loops back into the major groove to form a third triple. (c) The proposed minor-groove triple in Xenopus loeuis 5-S rRNA (109).(d)The major-groove triple that occurs in the self-splicing intron from Tetrahymenu(106).
156
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
The 3' strand of unpaired nucleotides can follow the minor groove of the adjacent helix, while the 5' strand can follow the major groove of its adjacent helix. Although oversimplified, this picture is consistent with two of the three base-triples that form in tRNAPhe, and with the base-triple proposed to occur in the 5-S rRNA from X. Zueuis (Fig. 11). The base-triple formed during the 3' splicing reaction of the intron from Tetruhymenu also fits this model if the recently identified P9.0 helix near the 3' splice site of the Tetruhymenu intron (106, 110) is stacked on the P7 helix (Fig. 11). To predict the existence of base-triples, the sequences that can form triples must be determined, and their structures and thermodynamic stabilities must be characterized. Triple helices were originally found in polynucleotides with simple repeating sequences. Fiber diffraction studies on the poly(rU).poly(rA)-poly(rU) system showed that the third strand, poly(rU), bound parallel to the purine strand of the Watson-Crick helix in the major groove with two hydrogen bonds between the adenine and uridine (111). In addition to rU.rA*rU, the following polynucleotides have been shown to form triple helices: rA-rA-rU, rC+.rG*rC (at pH7.0), and rG.rG*rC (112) (Fig. 10). Recent NMR studies of DNA triples show that dC + .dG.dC forms a structure isomorphic with dT*dA.dT (113, 114); the corresponding RNA triples are probably isomorphic as well. Structural characterization has not been done on rG-rG*rCor rA*rA.rU triple helices yet, but these base-triples can form isomorphic structures. Replacing the rA.rA*rU base-triple in tRNAPhe with a sequence capable of forming an rG.rG*rC base-triple resulted in no loss of aminoacylation activity, whereas sequences that could not form base-triples did lose activity (108). Unpublished results from our laboratory, including UV absorption mixing curves and circular dichroism spectra, show that poly(rA)*poly(rG).poly(rC) forms a triple helix. A hydrogen-bonding scheme has not yet been determined for this structure, but the poly(rA) strand appears to bind in the minor groove of the Watson-Crick duplex. The evidence for minor groove binding is that a triple helix did not form when the poly(rG) strand was replaced by poly(rI), which lacks the minor groove amino group capable of hydrogen bonding to poly(rA).
C. Helix-Helix Interactions We have already described structures in which helical regions stack coaxially end to end. Helix-helix contacts can form between the grooves of different helices when RNA molecules fold into compact tertiary structures. The negatively charged sugar-phosphate backbones repel each other, but a variety of interactions found in crystal structures of nucleic acid duplexes could stabilize the structure. The importance of the 2' hydroxyl groups in stabilizing helix-helix contacts is seen in the crystal structure
STRUCTURAL ELEMENTS IN RNA
157
of an RNA duplex [U(UA),A],. There are 12 intermolecular hydrogen bonds between the 2' hydroxyl groups in one helix and either uracil carbonyl groups or sugars in the minor groove of another helix (28).Another type of helix-helix contact is found in the crystal structure of a DNA duplex, d[ACCGGCGCCACA]-d[TGTGGCGCCGGT]. Cytosine amino protons in the major groove of one helix form hydrogen bonds with phosphate oxygens of another helix (115). In general, helix-helix interactions could include base-phosphate, base-sugar, sugar-sugar, and sugar-phosphate hydrogen bonding. Helix-helix contacts have been implicated in the function of one intriguing biological system. The Tetrahymena intron is capable of binding a nicked duplex RNA containing three oligonucleotides and then ligating the nick (116).Since the nicked duplex is base-paired and the reaction is independent of the duplex sequence, the intron must bind the duplex substrate through the formation of helix-helix contacts. The weakness of the contacts between the intron and the duplex substrate is evidenced by a Michaelis constant greater than 0.1 mM. This system illustrates the complexity of the interactions stabilizing RNA structure. Determining the interactions between bases is not enough to understand the structure and function of RNA; the backbone interactions must be determined as well. Another system in which helical regions may interact involves sequences rich in guanosine. Guanosine-rich DNA sequences have been proposed to form duplex structures (117),and there is evidence that two of these duplexes in solution dimerize to form four-stranded DNA complexes (118-120).It is not known whether similar sequences in RNA can form these structures, but X-ray diffraction studies have shown that poly(rG) forms a structure containing four parallel strands with the four equivalent guanosines hydrogen-bonded in a coplanar arrangement (121).
IV. Predicting Tertiary Interactions The secondary structures for a wide variety of biological RNA molecules have been established by a combination of techniques such as phylogenetic comparison, chemical and enzymatic modification, and computer prediction algorithms. Some biological functions of RNA can be understood once the secondary structure is known, but understanding most biological functions, particularly catalytic RNA activity, requires the determination of RNA three-dimensional structure. The next step toward the prediction of RNA three-dimensional structure is to develop methods to predict its tertiary interactions. The same two methods used to predict secondary structure (phylogenetic comparison and computer algorithms) can be used to predict tertiary interactions. As currently used, both of these methods are limited because
158
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
they predict tertiary interactions only between bases. The tertiary interactions in the crystal structures of tRNA contain many examples of base-sugar, base-phosphate, sugar-sugar, and sugar-phosphate interactions (62, 122).
A. Phylogenetic Comparison Establishing the presence of tertiary structure from phylogeny relies on the replacement of a specific tertiary pairing by an equivalent one. Phylogenetic comparison has been used to predict the existence of WatsonCrick tertiary pairing in the RNA component of RNase P (loo), the Tetruhymenu intron (123, 124), and both the 1 6 4 (125) and 23-S (101,102) rRNAs. Phylogeny can also be used to predict the existence of base triples, as was done for tRNA (126). There are several limitations on predicting tertiary pairs using phylogenetic comparison. All but one of the tertiary pairs formed in tRNA is nonWatson-Crick; these include reverse-Hoogsteen, reverse-Watson-Crick, and parallel-stranded pairing. If this is true for other RNA molecules, phylogenetic comparison will have trouble predicting tertiary pairings. Although phylogenetic evidence has been used to suggest the presence of non-WatsonCrick pairs (101),in general, we do not know which non-Watson-Crick pairs are equivalent. Since phylogenetic comparison depends on sequence variation, the structures of conserved regions cannot be predicted. This is a problem in predicting secondary structure and could be a much greater limitation in the case of tertiary structure. The nucleotides in tRNA that are engaged in tertiary pairing are much more highly conserved than are those involved in secondary structure. If this is true in general, it will be difficult to predict tertiary pairing by phylogeny.
B. Thermodynamic Stabilities The use of thermodynamic stabilities to predict secondary structures has already been discussed. Extending this approach to the prediction of tertiary interactions poses several problems. We discuss the problems inherent in predicting tertiary motifs and then describe an algorithm that predicts the pseudoknot tertiary structure as well as secondary structure. The prediction of tertiary interactions using computer algorithms based on thermodynamic stabilities poses three problems: (1)evaluating free energies for all of the possible secondary and tertiary structures requires prohibitive amounts of computer time; (2) rules governing which regions of secondary structure are sterically allowed to form tertiary interactions have not been established; and (3) the free energies of most tertiary structures have not been determined. The distinction between secondary and tertiary structures was originally made so that an algorithm could rigorously consider all of the possible secondary structures. For an RNA of n nucleotides, the number of different
STRUCTURAL ELEMENTS IN RNA
159
base-pairs possible is n(n - 1)/2. If tertiary pairs are allowed, the number of possible structures increases proportional to n factorial instead of n2; this renders impractical a rigorous examination of all possible secondary and tertiary structures for a biological RNA. Since we cannot search every possible combination of secondary and tertiary structures, we must choose criteria that limit the number of structures to be evaluated, but still find the biologically relevant structures, Although the actual path by which RNA molecules fold is a kinetic property, we assume that the structure that forms is the structure of lowest free energy. This means that we can fold the RNA by any path we choose as long as the free energy for each step is known. One way to restrict the number of structures evaluated is first to find the low-free-energy secondary structures by standard programs (88). Then tertiary interactions are added to a small number of calculated secondary structures to obtain the tertiary structure of lowest free energy. We cannot use the approach just outlined for predicting tertiary structures until we know how to look for secondary structure elements that can form tertiary interactions. For example, from the cloverleaf secondary structure of tRNA, how would the algorithm know that the junction conformation brings the D loop close to the T loop, but not to the anticodon loop? Until we learn more about the conformation around the junction, internal, and bulge loops, assumptions must be made about which loop regions can interact. The simplest assumption is that any pair of loop regions can interact. Prediction of tertiary interactions also requires knowledge of the free energy of forming tertiary interactions. Judging from tRNA and pseudoknots, secondary structure is more stable than tertiary structure. If this is true, the calculation of the free energy for tertiary pairs after the calculation of the free energy for secondary structure is justified. Errors in this assumption can be accounted for by predicting tertiary interactions for several secondary structures. Unfortunately, free energies have not been measured for most tertiary structures. The free energy of pseudoknot formation has been determined for a very limited set of molecules (99), but the free energies of other tertiary interactions, such as loop-loop pairing, base-triples, or intercalation, have not been determined. Despite the difficulties of predicting tertiary structure, an algorithm capable of predicting pseudoknots as well as secondary structure has been developed (93).The algorithm calculates the free energy of formation for all of the stem-loops that could be formed by the structure. The predicted structure starts with the stem-loop with the lowest free energy of formation; new stem-loops consistent with those already incorporated are added in order of their stability. The algorithm predicts stem-loops due to pseudoknots as well as those due to normal hairpins. The biggest limitation of this
160
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
algorithm is that it only predicts a single structure. The free-energy contribution of the base-pairs in a pseudoknot were assumed to be the same as in a hairpin stem; the free-energy contribution of the two loops of the pseudoknot was empirically determined so that known pseudoknots were predicted. The program also predicted pseudoknots in sequences that had not previously been shown to contain pseudoknots. The most important information needed to improve tertiary structure prediction are the free-energy parameters for tertiary interactions such as pseudoknots, tertiary base-pairs, and base-triples. Furthermore, rules must be developed indicating which tertiary contacts are sterically possible for a given structure. Realistic predictions of tertiary interactions and three-dimensional structure will not be possible until the conformations around junctions, bulge loops, and internal loops are known.
V. Three-dimensional Structure Once the secondary structure and tertiary interactions contained in an RNA molecule are known, the next step in understanding its structure is the determination of a three-dimensional structure. The concept of one threedimensional structure may be misleading, since it implies that the molecule exists in a single structure and ignores the changes the RNA can undergo. Ultimately, we would like to know all of the conformations an RNA molecule can adopt and the dynamics of their interconversion. The first step toward this goal is determination of the three-dimensional structure of one conformation. Since no three-dimensional structures of RNA are known, with the exception of tRNA, which is discussed extensively elsewhere (26, 62, 122), we discuss the three-dimensional models which have been built for other RNAs. Three-dimensional models are built in an attempt to understand the functions of RNAs and to guide the development of further experiments to refine the models. These models are built from phylogenetically proven secondary structures plus information about the accessibility of nucleotides to chemical probes, the positions of crosslinks, and any phylogenetically suggested tertiary pairing. Models have been built for 1 6 3 rRNA (71, 77, 127-129), 5-S rRNA (log), the 3' end of turnip yellow mosaic virus RNA (130), and the self-splicing intron of Tetrahymena (76). Although the details differ, in general, the following assumptions about RNA structure are used. Model builders first assume that helical regions adopt standard A-form geometry; this is supported by NMR experiments on RNA in solution as well as the known crystal structures of RNAs. Only loop regions of RNA are now left with any degrees of freedom. The conformations of loop regions are varied without bringing atoms closer than van der Waals
STRUCTURAL ELEMENTS IN RNA
161
contact, so that regions of secondary structure satisfy the crosslinking and tertiary pairing constraints. Helical regions around junction regions are allowed to stack coaxially to form longer helical regions. The process of model building shows the importance of secondary structure loop regions (internal loops, bulge loops, and junctions) in the threedimensional structure of RNA. The ability to predict the conformation of loop regions or even the ability to rule out certain conformations would significantly improve the process of building three-dimensional structures of RNA molecules. The positions of the helical regions in models depend on different types of experimental data. For example, the three-dimensional locations of the proteins bound to 16-S rRNA (131) were combined with protein-RNA crosslinking and footprinting data to generate constraints on 75% of the helices in one of the models of 1 6 4 rRNA (77). Another model, based on much of the same data, is in substantial agreement with this model (71),as is a model built from the accessibility of the 16-S rRNA to DNA oligonucleotide probes, which constrains 40% of the helices in the 16-S rRNA (128). Chemical and enzymatic modification data are useful, but these data are insufficient to determine the relative positioning of helical regions.
VI. Determining RNA Structure RNA structure can be determined at several levels of resolution. The experimental method giving the highest resolution is single crystal X-ray diffraction. In principle, it can provide the coordinates of all of the atoms, although for the large molecules of biological interest the positions of the protons are only inferred. X-Ray diffraction thus reveals the secondary, tertiary, and three-dimensional structures. Unfortunately, RNA molecules often do not form crystals suitable for X-ray analysis, and thus only a few threedimensional structures of RNA molecules have been solved: tRNAphe (132, 133), tRNAAsp(62), and tRNAfMet(134), as well as an oligonucleotide duplex, [U(U-A),Al, (28). The method with the next highest level of resolution is NMR. It complements the X-ray method in that it provides distances between nearby protons (distances less than 5 &; it can also determine backbone torsion angles. Thus, NMR provides details about local conformation and can be used to determine secondary, tertiary, and, in principle, three-dimensional structures. Information about the arrangement of the secondary structure elements in three dimensions can also be obtained from crosslinking and fluorescence energy transfer experiments. Chemical and enzymatic modifications are used to determine the accessibility of functional groups within nucleotides,
162
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
and the effect of mutations introduced into RNAs on biological activity can also determine RNA structure.
A. Nuclear Magnetic Resonance NMR experiments provide a powerful method for determining the structures of proteins and nucleic acids in solution. Detailed explanations of NMR methodology applied to nucleic acids have been published (135, 136), so we only outline the principles of the NMR method here. Then we discuss how well the information determined by NMR defines the three-dimensional structures of RNA molecules. NMR methods provide three types of information that can be used in structure determination: nuclear Overhauser effects (NOES),scalar coupling constants, and chemical shifts. NOE is the transfer of magnetization due to magnetic dipole-dipole coupling between nuclei. The effect is directly proportional to the magnetic moments of the nuclei and depends on the inverse sixth power of the distance between them. NOES can be measured between protons up to 5 A apart. Both the exchangeable and nonexchangeable protons in RNA are used for NOE measurements. Information regarding base-pairing and stacking can be obtained from the measurement of NOES between imino resonances assigned to specific nucleotides. The imino protons resonate in a separate region of the spectrum from other protons, and each Watson-Crick base-pair contains one hydrogen-bonded imino proton. Only imino protons that are hydrogen-bonded, or whose rates of solvent exchange are otherwise decreased, are seen in the NMR spectrum. NOES between imino protons have been measured in molecules as large as tRNA (137).The secondary and tertiary base-pairing as well as the coaxial stacking of helical regions of tRNA in solution have been confirmed by this method (138, 139). NOES between the nonexchangeable base and sugar proton resonances give much more detailed information about RNA structure than exchangeable-proton NMR. About 21 intranucleotide distances (9 base-sugar and 12 intra-sugar) can be measured between base and sugar protons. These distances are sufficient to define the conformation of a nucleoside. Up to 11 additional internucleotide (base-base, base-sugar, and sugar-sugar) distances can be measured, depending on the RNA structure. NMR studies on the nonexchangeable protons are currently limited to oligonucleotides containing no more than 30 or 40 nucleotides. New NMR techniques such as isotopic labeling (70, 140) and three-dimensional NMR methods (141) are being developed, which will allow NMR studies on larger RNA molecules. Oligonucleotides used for NMR studies of nonexchangeable protons are designed to adopt structures found in larger RNA molecules. Lower-resolution studies such as chemical modification can be done to check
STRUCTURAL ELEMENTS I N RNA
163
that the structure adopted by the oligonucleotide is similar to the structure within the larger RNA molecule. Scalar coupling constants, also called spin-spin splittings, can be measured for two nuclei separated by two, three, or sometimes four bonds. For a three-bond splitting, for example, H-C-C-H, the value of the coupling constant depends on the torsion angle for rotation around the central bond. Coupling constants are related to torsion angles by Karplus-type equations (142); these relationships have been determined for the sugar-phosphate backbone in RNA by studies of model compounds (143). Coupling constants can be measured between ribose protons as well as between protons and phosphorus atoms. These coupling constants can be used to determine four of the seven torsion angles that completely define the conformation of a nucleotide unit (Fig. 6). Information about two of the three remaining torsion angles can be estimated from phosphorus chemical shift information as described below, and the remaining torsion angle can be determined from the intranucleotide NOEs. In principle, all of the torsion angles can be determined by NMR methods. The chemical shift of a resonance depends on the local magnetic field at the nucleus. The local magnetic field is extremely sensitive to the bonding, and to the proximities and types, of nearby atoms. Unfortunately, no simple correlation between proton chemical shift and structure has been established so far, although chemical shifts have been calculated for protons in several molecules (144). Phosphorus chemical shifts have been correlated with the two phosphodiester torsion angles 0-P-0 (145). Normally, both phosphodiester torsion angles are in the gauche conformation. Both theory (146) and experiment (14,148) suggest that ifeither of the two angles changes to the trans conformation, the phosphorus resonance moves to the downfield region of the spectrum. The two adjacent C - 0 torsion angles and the 0 - P - 0 bond angle have also been shown to effect the phosphorus chemical shift (149).Ifthe information inherent in the chemical shift could be tapped, NMR structure determination would become much more powerful. In principle, NMR experiments on RNA molecules could determine their three-dimensional structures. Accurate measurement of seven torsion angles per nucleotide suffices to specify an RNA structure. If all of the torsion angles and a large number of intra- and internucleotide distances are determined (within experimental uncertainties), the atomic coordinates of the RNA molecule can be generated by distance-geometry algorithms (150152). In practice, not all of the torsion angles are determined, and spectral overlap can preclude the measurement of many NOEs. An important question not yet satisfactorily answered is: Which distances and torsion angles must be measured with what accuracy to specify a three-dimensional structure for RNA molecules?
164
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
The first step in answering this question was made by testing the distance-geometry algorithm on a known DNA duplex structure (153). A set of 117 distances that could be measured easily by NMR was taken from the crystal structure of a 6-bp DNA duplex. The established base-pairing in the duplex was used to add 14 more constraints: the distances between hydrogen-bond donors and acceptors in the base-pairs. Since each nucleotide has seven degrees of freedom, 84 independent constraints suffice to define the structure completely. Although there are more than 84 N M R constraints, the fact that they are not all independent may result in an underdefined structure. The structure generated from these constraints by the distancegeometry algorithm was compared to the starting structure. The results showed that areas of the structure where many interproton distances were used (the sugars and the bases) were well defined. However, the phosphate backbone was not very well defined. The root-mean-square deviation for the generated base hydrogen atoms was 0.43 A, whereas the deviation was 1.99 8, for the phosphorus atoms. For all of the atoms in the structure, the deviation was 1.29 A, comparable to the deviation found for the atoms in a protein structure when a similar procedure is undertaken (154). NMR data can determine structures more accurate than that of the DNA duplex just described. The constraints used to generate the DNA structure did not include distances to the H4’, H5‘, and H5” sugar protons because they cannot always be assigned. More importantly, no torsion angles were used to constrain the structure, although many of these angles can be determined from proton-proton and proton-phosphorus coupling constants (30, 155, 156). The accuracy of structures determined by NMR ultimately depends on the nucleic acid structure itself. The example of the DNA duplex gives us the lower bound for the accuracy of the structures of duplex RNA determined by NMR, but it is the loop regions of RNA that are of most interest. Can the loop regions of RNA molecules be as well-defined as the duplex regions? Compact loop structures are much better defined than more open ones. The hairpin loop UUCG, discussed above, forms a compact structure in which the sugar protons of the cytosine in the loop give NOE effects to all other nucleotides in the loop. This results in a structure that is well defined by the N M R constraints. The structures of loop regions in which the nucleotides are in extended conformations, or in multiple conformations, can be difficult to determine precisely. If the nucleotides in the loop are not stacked, there probably will be very small NOE effects between them, and their resonances will be difficult to assign. In principle, however, these resonances can be assigned by specific isotope labeling, and a structure can be obtained by measuring coupling constants to determine the backbone torsion angles.
STRUCTURAL ELEMENTS IN RNA
165
B. Long-range Constraints 1. CROSSLINKING Crosslinks-covalent bonds between different bases-can be introduced into RNA molecules by irradiation with UV light (157) or by using chemical reagents such as psoralen (158) or nitrogen mustard (77). The positions of such crosslinks within the structure can be determined by sequencing the RNA on either side of the crosslink. The crosslinks are of two types. The first type occurs between nucleotides adjacent in the secondary structure. The more interesting crosslinks occur between nucleotides that are not adjacent in the secondary structure. The positions of these crosslinks reveal something of the three-dimensional structure of the molecule, since they show the proximity of two secondary structure elements. Caution must be used when interpreting the results obtained from crosslinking studies. Crosslinks generated by chemical reagents that bind to one base and then crosslink to a second base may not reflect the native structure of the RNA, since the binding of the reagent to the first base may alter the RNA structure.' Some of the molecules studied by crosslinking include the 1 6 4 rRNA (77), the 23-S rRNA (77),tRNA binding to the catalytic subunit of RNase P (159), tRNA binding to ribosomes (84), and protein binding to ribosomes (160).
2. FLUORESCENCE ENERGY TRANSFER Fluorescence energy can be transferred from a donor to an acceptor group by an electric dipole-dipole mechanism. The transfer depends on the inverse sixth power of the distance between these groups. In practice, the transfer can be measured for groups separated by roughly 10-70 W (161). Measuring fluorescence energy transfer in RNA requires adding acceptor and donor groups to the molecule. Some of the applications of this technique include the study of tRNA (162), ribosome assembly (163),and the conformation about a four-stem DNA junction, as described above (74).
C. Chemical Modification Chemical and enzymatic modification methods determine the accessibility of the nucleotides within an RNA molecule to modification by 1 This hazard is discussed by Budowsky and Abdurashidova in their article on UV crosslinking (160). [Eds.]
166
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
chemical reagents or enzymes (164).The reactivity of a nucleotide to chemical reagents is a complicated function of solvent accessibility and electrostatic environment (165,166).The reactivity of nucleotides to chemical modification is used to confirm predicted secondary structures and to learn about tertiary interactions. The advantages of these methods are that they can be used to probe the structure of very large RNAs, and that they require only picomoles of RNA. Chemicals that react with each of the four bases at Watson-Crick hydrogen-bonding positions can reveal which nucleotides are involved in basepairing. Conditions are used in which each RNA is modified at most only once, so that the structural information deduced is not an artifact of the modification procedure. Enzymes that cleave specifically in base-paired or unpaired regions are used to determine base-pairing as well, but the large size of enzymes makes them generally less useful than chemicals, since the compact structure of a large folded RNA yields very few enzymatic cleavage sites (167). Chemical modification of the nucleotides within the RNA is detected by one of two methods. The simplest method induces strand scission at the site of modification; this is most useful for short RNA molecules. Sites of modification within large RNA molecules are located by synthesis of a DNA complementary to the RNA using reverse transcriptase (54).Modified residues in the RNA cause the reverse transcriptase to stop. Separation of the synthesized DNAs by gel electrophoresis determines the positions of modification. Clear interpretation of the results of chemical modification is often difficult. The amount of modification ranges from essentially none through various degrees of partial modification to strong modification. Strong modification of a nucleotide is good evidence that it is not involved in secondary or tertiary pairing. Weak or partial modification can result from nucleotides engaged in either secondary or tertiary interactions.
D. Mutational Analysis Mutations are often introduced into RNA sequences in order to determine RNA structure or protein-RNA interactions (94, 97, 108, 168-172). The effect of these mutations is often assayed by measuring the ability of the mutated sequences to bind a protein that specifically recognizes the wildtype RNA. Although mutational analysis is a powerful technique for determining interactions in RNA structure, caution is necessary with this approach. The results of such experiments can be unclear, since loss of protein binding can result either from a change in RNA structure or from an RNA sequence that maintains the same structure but is not recognized efficiently by the protein.
STRUCTURAL ELEMENTS IN RNA
167
VII. Protein-RNA Interactions Since most cellular RNA molecules are complexed by proteins, understanding protein-RNA interactions is vital to understanding cellular RNA functions. Unfortunately, even less is known about the structure of RNA binding proteins than about RNA structure. The protein-RNA interactions studied show that proteins recognize specific secondary structural features of RNA as well as its three-dimensional shape.
A. Protein-Duplex Interactions The interactions between the helix-turn-helix proteins and duplex DNA are well-characterized (173-175). An a-helix lies in the major groove with amino acids forming specific hydrogen bonds to the DNA sequence. As shown in Fig. 3, the major groove in typical A-form RNA is much narrower than in typical B-form DNA. It has been suggested that the narrower major groove in RNA prevents protein structure elements from binding to the bases in the major groove (176-178). However, the fact that a nucleotide strand is capable of binding to duplex RNA in the major groove to form a triple helix, and the fact that there is considerable polymorphism in the size of the A-form major groove in X-ray studies of fibers, suggest that the bases may be accessible to protein structures through the major groove (179). Transcription factor IIIA is a “zinc finger” protein that binds the DNA gene for the 5-S rRNA in X.Zueuis as well as the 5-S rRNA. There is evidence that it binds to the DNA gene in the major groove (180), and it is proposed that it binds to the 5-S rRNA in the major groove (73).Although it could bind to the DNA and RNA sites by different mechanisms, the possibility that proteins bind in the major groove of RNA should not be ruled out.
B. Protein-Loop Binding Most of the protein binding sites in RNA that have been characterized are loop regions: hairpins, bulges, and internal loops. Unpaired nucleotides are more conserved than base-paired nucleotides in 1 6 4 rRNA sequences, suggesting that these nucleotides are involved in either tertiary interactions or protein contacts. Unpaired adenosines occur more frequently than the other nucleosides (181).Direct evidence for the role of unpaired adenosines in protein binding of the 16-S rRNA comes from the decrease in chemical reactivity of these adenosines when the ribosomal proteins are bound to the 16-S rRNA (54). Both bulge and internal loops bind proteins. A purine bulge is required for the coat protein to bind to the R17 viral RNA (182),and an adenosine bulge is part of the L18 protein binding site on the 5-S rRNA (183). An
168
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
asymmetrical internal loop is required for the binding of S8 ribosomal protein to the 16-S rRNA in E. coli (69,184,185). Ribosomal protein L3 binds to the 23-S rRNA at a large asymmetrical internal loop as well (186). Hairpin structures are commonly found to bind proteins specifically (187-194). Hairpins with loop sequences CAGUGN bind to the iron-responsive element (IRE)-binding protein (195). As proposed in other proteinRNA interactions (196-198), free cysteines in the IRE-binding protein are thought to form transient covalent bonds to the RNA hairpin by a nucleophilic attack by the sulfhydryl group on a uracil in the RNA. Iron regulates this system by altering the equilibrium between reduced and oxidized sulihydryl groups. This in turn alters the amount of hairpin bound by protein and alters the translation of the iron receptor mRNA (199). The most-characterized example of a hairpin protein binding site is the R17 coat protein binding site (182). Binding requires a hairpin, whose loop sequence must be ANYA, plus a purine bulge three nucleotides away on the 5' side of the loop. It is not clear what role the purine bulge plays in protein binding, since adding substituents such as methyl groups to an adenosine bulge reduces binding IOW-fold, whereas changing the bulge from adenosine to guanosine leaves binding essentially unchanged. These data are insufficient to determine whether the protein forms specific contacts with the purine bulge, or whether intercalation of the purine alters the structure of the RNA. This study shows the limitation of mutagenesis experiments in which specific nucleotides are changed and the effect on protein binding is measured. Without doing structural studies of the RNA molecules, it is not clear whether the different binding ailhities that result from substituting specific nucleotides are due to disruption of protein contacts to that nucleotide, or whether the overall structure of the RNA has changed.
C. Protein Recognition of Three-dimensional Structure The first high-resolution crystal structure of a protein-RNA complex to be solved was that of E. co2i tRNAG'" and its synthetase (178). Sequencespecific contacts occur at the anticodon loop and at the end of the acceptor stem of the tRNA in addition to contacts along one side of the tRNA. Three bases in the anticodon loop, which form specific contacts with the protein, are unstacked compared to the tRNAPhestructure. The first base-pair in the acceptor stem is unpaired, and the protein contacts the acceptor stem in the minor groove, forming hydrogen bonds with exocyclic amino protons of two guanines. Overall, the crystal1 structure shows that the synthetase recognizes the tRNA shape, but distinguishes it from other tRNAs by forming two types of sequence-specific contacts: binding to anticodon loop nucleotides and binding to guanines in the minor groove of the acceptor stem. Three-dimensional structure is not always recognized by tRNA syn-
STRUCTURAL ELEMENTS IN RNA
169
thetases, since a hairpin containing the major recognition feature of tRNAAla is efficiently aminoacylated by E. coli tRNAAIasynthetase (200).The major recognition feature is a single G-U mismatch in the acceptor stem (201). Replacing the G*U mismatch by G*A, C-A, or U*U mismatches (Fig. 5) results in only small losses in aminoacylation activity, although replacing the mismatch by a Watson-Crick pair completely abolishes activity. The fact that other mismatches are almost as efficient as G*U suggests that the protein recognizes a change in the sugar-phosphate backbone caused by the mismatch rather than specific groups on the G-U pair. Many of the tRNA synthetases probably recognize three-dimensional structure, since the nucleotides required for binding are often scattered throughout the tRNA (202-204). Although the structure of only one protein-RNA complex has been determined, several features of protein binding sites in RNA molecules have been characterized. The most frequently found protein binding sites in RNA are specific nucleotides within loop regions. Hairpin loops are the bestcharacterized protein binding sites, but bulge loops and internal loops have been implicated in protein binding as well. Proteins recognize primary sequence [poly(A)binding protein] (205),secondary structure (hairpin, bulge, and internal loops), and three-dimensional shape (tRNA synthetase binding). The specific interactions between proteins and RNA include transient covalent sulihydryl-uracil interactions, positively charged amino acids binding to negatively charged phosphates, hydrogen-bond formation with the exocyclic amino group of guanine in the minor groove, and specific hydrogen bonds and stacking interactions with bases in loop regions. The RNA structures that proteins recognize are better characterized than the protein structural elements that bind to RNA. An 80-amino-acid consensus RNA binding domain, which contains a sequence of eight highly conserved amino acids (205), has been identified in several RNA binding proteins, including human U1A protein as well as the poly(A) binding protein (206-208). The structure of a peptide portion of the gag polyprotein which binds HIV viral RNA has been determined by NMR (209) and is similar to the structure of a zinc finger protein that binds to DNA (210).
VIII. RNA-RNA Interactions Intermolecular RNA interactions occur in a wide range of biological processes, including a “spliceosome” assembly (211, 212), RNA “editing” (213), and protein synthesis (214-216). Intermolecular contacts include base-pairing and backbone interactions. Backbone interactions occur between regions of RNA already base-paired. Although these interactions are not well-characterized, they probably involve the backbone of one helical region interacting
170
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
with the groove of another helical region. These interactions include basesugar, base-phosphate, sugar-sugar, and sugar-phosphate hydrogen bonds. The stabilities of complexes formed between tRNA anticodon loops and short oligonucleotides (217) or between complementary anticodon loops (218) are much greater than the stabilities of short duplexes. The enthalpy changes during formation of anticodon loop complexes are similar to that of duplex formation, but the entropy of formation is significantly more favorable for the anticodons than for the duplexes (217). The favorable entropy change for the codon-anticodon interaction is probably due to the stacked conformation of the nucleotides within the anticodon loop. Often, the nucleotides in small hairpin loops are not stacked in A-form geometry, presumably making base-paired complexes with these loops much less stable than complexes with anticodon loops. The importance of backbone interactions in the formation of RNA-RNA complexes is demonstrated by the Tetrahymena intron. As previously discussed, the intron is capable of binding an RNA duplex through backbone interactions (116). A separate study showed the importance of the 2' hydroxyl when the intron binds a single-stranded oligonucleotide (219).A single-stranded RNA oligonucleotide base-pairs to the intron, forming a complex lo4 times (6 kal/mol) more stable than would be expected for RNA duplex formation; however, the stability of a complex between the intron and a DNA oligonucleotide is only as stable as the formation of short RNA-DNA duplexes. This suggests that either the 2' hydroxyl groups of the oligonucleotide form specific interactions with the intron or that the intron forms interactions with the sugar or phosphate groups along the backbone of an A-form duplex. The complex formed between M 1 RNA, the catalytic component of RNase P, and its tRNA substrate is probably stabilized by the formation of backbone interactions. This suggestion is supported by a chemical crosslink that forms in the M 1 RNA-tRNA complex between base-paired regions of the M 1 RNA and the precursor tRNA (159).As previously noted (159), there is striking sequence and secondary structure homology between the region of the M 1 RNA that crosslinks to the tRNA and the region of the 2 3 4 rRNA which was found to bind tRNA by chemical footprinting (83). Complexes stabilized by backbone interactions may be a general feature of catalytic RNA-substrate complexes since these interactions are weak and allow dissociation of the enzyme-substrate complex after the reaction.
IX. RNA-DNA Interactions Hybrid duplexes form between RNA and DNA during transcription and reverse transcription. The stability of these hybrids is believed to play a role in transcription termination (220).The combination of a stable hairpin forma-
171
STRUCTURAL ELEMENTS IN RNA
tion in the nascent RNA followed by a repeating dA sequence in the DNA leads to termination. The explanation for this is that the stable hairpin causes the polymerase to pause and disrupts part of the hybrid duplex (221). The polymerase complex is held together by the remaining hybrid duplex. Since rU*dAhybrid duplexes are much less stable than other hybrids (222, 223), the polymerase falls off and transcription terminates. The discovery that a ribonucleoprotein complex adds DNA sequences to chromosomal ends (224) presents the possibility that an RNA molecule can function as a reverse transcriptase. Tetrahymena telomerase2 adds the DNA sequence TTGGGG to the 5’ end of chromosomes; it contains a protein component and a 159-nucleotide RNA component, including the sequence CAACCCCAA complementary to the synthesized DNA (225). The role of the CAACCCAAA sequence as a template for DNA synthesis was proven by the discovery that mutating this sequence changes the DNA sequence at the ends of chromosomes in uiuo (226). Whether the RNA alone can act as a reverse transcriptase or whether it only serves as a template for DNA synthesis by the protein component of telomerase has not been established.
ACKNOWLEDGMENTS We thank Peter Davis, John Jaeger, and Gabriele Varani for their reading of the manuscript, and we particularly thank Jacqueline Wyatt for her reading of the manuscript and for many useful discussions. M.C. is a Howard Hughes Medical Institute Doctoral Fellow. This work was supported in part by National Institutes of Health Grant GM 10840,and by the Department of Energy, Office of Energy Research, Office of Health and Environmental Research under Grant DE-FG03-86ER60406.
REFERENCES 1. K. Kruger, P. J. Grabowski, A. J. Zaug, J. Sands, D. E. Gottschling and T. R. Cech, Cell 31, 147 (1982). 2. C. Guerrier-Takada. K. Gardiner, T. Marsh, N. Pace and S. Altman, Cell 35, 849 (1983). 3. R. van der Veen, A. C. Arnberg, G. van der Horst, L. Bonen, H. F. Tabak and L. A. Grivell, Cell 44, 225 (1986). 4. C. L. Peebles, P. S. Perlman, K. L. Mecklenburg, M. L. Petrillo, J. H. Tabor, K. A. Jarrell and H.-L. Cheng, Cell 44,213 (1986). 5. A. Hampel and R. Tritz, Bchen 28, 4929 (1989). 6. A. Hampel, R. Tritz, M. Hicks and P. Cruz, NARes 18, 299 (1990). 7. H.-N. Wu, Y.-J. Lin, F.-P. Lin, S. Makino, M.-F. Chang and M. M. C. Lai, PNAS 86, 1831 (1989). 8. G. A. Prody, J. T. Bakos, J. M. Buzayan, I. R. Schneider and G. Bruening, Science 231, 1577 (1986). 9. C. H. Hutchins, P. D. Rathjen, A. C. Forster and R. H. Symons, NARes 14, 3627 (1986). 10. C. Yanofsky, Nature 289, 751 (1981). 11. C. L. Chan and R. hndick, JBC 264, 20796 (1989). 12. L. P. Eperon, I. R. Graham, A. D. Griffiths and I. C. Eperon, Cell 54, 393 (1988). 2 Formerly “telomere-adding enzyme,” “T,G,-adding transferase.” [Eds.]
enzyme,” or “telomere terminal
172
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
P. Schimmel, Cell 58, 9 (1989). M. Kozak, PNAS 83, 2850 (1986). K.-0. Cho and C. Yanofsky, JMB 204, 51 (1988). G. Brawerman, Cell 57, 9 (1989). V. J. Cannistraro, M. N. Subbarao and D. Kennell, JMB 192, 257 (1986). U. Blasi, K. Nam, D. Hartz, L. Gold and R. Young, E M B O J . 8, 3501 (1989). S. Altuvia, D. Komitzer, D. Teff and A. B. Oppenheim, JMB 210, 265 (1989). M. H. de Smit and J. van Duin, This Series 38, 1 (1990). L. Gold, ARB 57, 199 (1988). W. M. Huang, S.-Z. Ao, S. Casjens, R. Orlandi, R. Zeikus, R. Weiss, D. Winge and M. Fang, Science 239, 1005 (1988). 23. J. F. Milligan, D. R. Groebe, G . W. Witherell and 0. C. Uhlenbeck, NARes 15, 8783 (1987). 24. S.-H. Chou, P. Flynn and B. Reid, Bchem 28, 2422 (1989). 25. N. Usman, K. K. Ogilvie, M.-Y. Jiang and R. J. Cedergren, JACS 109, 7845 (1987). 26. W. Saenger, “Principles of Nucleic Acid Structure.” Springer-Verlag, New York, 1984. 27. S. Amott, D. W. L. Hukins and S. D. Dover, BBRC 48, 1392 (1972). 28. A. C. Dock-Bregeon, B. Chewier, A. Podjarny, J. Johnson, J. S. de Bear, G . R. Cough, P. T. Gilham and D. Moras, ] M B 209, 459 (1989). 29. P. W. Davis, R. W. Adamiak and I. Tinom, Jr., Biopolymers 29, 109 (1990). 30. G . Varani, B. Wimberly and I. Tinoco, Jr., Bchem 28, 7760 (1989). 31. J. D. Puglisi, J. R . Wyatt and I. Tinoco, Jr., Bchem 29, 4215 (1990). 32. C. S. Happ, E. Happ, N . Nilges, A. M. Gronenborn and G. M. Clore, Bchem 27, 1735 (1988). 33. S. Arnott and D. W. L. Hukins, BBRC 47, 1504 (1972). 34. A. Bhattacharyya, A. I. H. Murchie and D. M. J. Lilley, Nature 343, 484 (1990). 34a. R. S. Tang and D. E. Draper, Bchem 29, 5232 (1990). 35. D. Rhodes and A. Klug, Nature 292, 378 (1981). 36. L. J. Peck and J. C. Wang, Nature 292, 375 (1981). 37. T. D. Tullius and B. A. Dombroski, Science 230, 679 (1985). 38. K. Hall, P. Cruz, I. Tinoco, Jr., T. M. Jovin and J. H. van de Sande, Nature 311, 584 (1984). 39. I. Tinoco, Jr., 0. C. Uhlenbeck and M. D. Levine, Nature 230, 362 (1971). 40. C. Cheong, G. Varani and I. Tinoco, Jr.. Nature 346, 680 (1990). 40a. T. Sakata, H. Hiroaki, Y. Oda, T. Tanaka, M. Ikehara and S. Uesugi, NARes 18, 3831 (1990). 41. M . J. J. Blommers, J. A. L. I. Walters, C. A. G. Haasnoot, J. M. A. Aelen, G. A. van der Marel, J. H. van Bloom and C. W. Hilbers, Bchem 28, 7491 (1989). 42. D. R. Groebe and 0. C. Uhlenbeck, NARes 16, 11725 (1988). 43. H. F. Noller, ARB 53, 119 (1984). 44. J. A. Jaeger, D. H. Turner and M. Zuker, PNAS 86, 7706 (1989). 45. C. Tuerk, P. Gauss, C. Thermes, D. R. Groebe, M. Gayle, N. Guild, G. Stormo, Y. dAubenton-Carafa, 0. C. Uhlenbeck, I. Tinoco, Jr.. E. N. Brody and L. Gold, PNAS 85, 1364 (1988). 46. I. Hirao, Y. Nishimura, T. Naraoka, K. Watanabe, Y. Arata and K. Miura, NARes 17,2223 (1989). 47. W. Fuller and A. Hodgson, Nature 215, 817 (1967). 48. G . M. Clore, A. M. Gronenborn, E. A. Piper, L. W. Mchughlin, E. Graeser and J. H. van Boom, BJ 221, 737 (1984). 49. J. Wu and A. G. Marshall, Bchem 29, 1722 (1990).
13. 14. 15. 16. 17. 18. 19. 20. 21. 22.
STRUCTURAL ELEMENTS IN RNA
173
50. J. Wu and A. G . Marshall, Bchem 29, 1730 (1990). 51. C. A. G. Haasnoot, C. W. Hilbers, G . A. van der Marel, J. H. van Boom, U. C. Singh, N. Pattabiraman and P. A. Kollman, J . Biomol. Struct. Dyn.3, 843 (1986).
52. M. W. Kalnik, D. G. Norman, B. F. Li, P. F. Swann and D. J. Patel, JBC 265, 636 (1990). 53. Y.T. van den Hoogen, A. A. van Beuzekom, E. de Vroom, G. A. van der Marel, J. H. van Boom and C. Altona, NARes 16, 5013 (1988). 54. D. Moazed, S. Stern and H. F. Noller, J M B 187, 399 (1986). 55. A. Bhattacharyya and D. M. J. Lilley, NARes 17, 6821 (1989). 56. C.-H. Hsieh and J. D. Criffith, PNAS 86, 4833 (1989). 57. J. A. Rice and D. M. Crothers, Bchem 28, 4512 (1989). 58. C. E. Longfellow, R. Kierzek and D. H. Turner, Bchem 29, 278 (1990). 59. S. A. White and D. E. Draper, Bchem 28, 1892 (1989). 60. S. Roy, V. Sklenar, E. Appella and J. S. Cohen, Biopolymers 26, 2041 (1987). 61. D. H. Turner, N . Sugimoto and S. M . Freier, Annu. Reu. Biophys. Biophys. Chem. 17, 167 (1988). 62. E. Westhof, P. Dumas and D. Moras, J M B 184, 119 (1985). 63. W. Traub and J. L. Sussman, NARes 10, 2701 (1982). 64. P. J. Romaniuk, I. L. de Stevenson, C. Ehresmann, P. Romby and 8. Ehresmann, NARes 16, 2295 (1988). 65. P. Romby, E. Westhof, R . Toukifimpa, R . Mache, J.-P. Ebel, C. Ehresmann and B. Ehresmann, Bchem 27, 4721 (1988). 66. A. Bhattacbaryya and D. M . J. Lilley, J M B 209, 583 (1989). 67. F. Aboul-ela, D. Koh and I. Tinoco, Jr., NARes 13, 4811 (1985). 68. C. Papanicolaou, M. Gouy and J. Ninio, NARes 12, 31 (1984). 69. M. Mougel, F. Eyermann, E. Westhof, P. Romby, A. Expert-Bezanson, J.-P. Ebel, B. Ehresmann and C. Ehresmann, J M B 198, 91 (1987). 70. P. Zhang and P. B. Moore, Bchem 28, 4607 (1989). 71. S. Stern, B. Weiser and H. F. Noller, J M B 204, 447 (1988). 72. P. J. Romaniuk, Bchem 28, 1388 (1989). 73. J. Christiansen, R. S. Brown, B. S. Sproat and R. A. Garrett, EMBOJ. 6, 453 (1987). 74. A. I. H. Murchie, R. M . Clegg, E. von Kitzing, D. R. Duckett, S. Diekmann and D. M. J. Lilley, Nature 341, 763 (1989). 75. C. R. Woese, R. Gutell, R. Gupta and H. F. Noller, Microbiol. Reu. 47, 621 (1983). 76. S.-H. Kim and T. R. Cech, PNAS 84, 8788 (1987). 77. R . Brimacombe, J. Atmadja, W. Stiege and D. Schiiler, J M B 199, 115 (1988). 78. L. M. Epstein and J. G. Gall, Cell 48, 535 (1987). 79. A. C. Forster and R. H. Symons, Cell 49, 211 (1987). 80. M. Koizumi, S. Iwai and E. Ohtsuka, FEBS Lett. 228, 228 (1988). 81. C. C. Sheldon and R. H. Symons, NARes 17, 5679 (1989). 82. A. E. Dahlberg, Cell 57, 525 (1989). 83. D. Moazed and H. F. Noller, Cell 57, 585 (1989). 84. G . Steiner, E. Kuechler and A. Barta, EMBOJ. 7, 3949 (1988). 85. C. C. Hall, D. Johnson and B. S. Cooperman, Bchem 27, 3983 (1988). 86. N . R. Pace, D. K. Smith, G . J. Olsen and B. D. James, Gene 82, 65 (1989). 87. A. L. Williams, Jr., and I. Tinoco, Jr., NARes 14, 299 (1986). 88. M . Zuker, Science 244, 48 (1989). 89. J. Gralla and D. M. Crothers, J M B 73, 497 (1973). 90. 0. C. Uhlenbeck, P. N. Borer, B. Dengler and I. Tinoco, Jr., J M B 73, 483 (1973). 91. D. A. M. Konings and P. Hogeweg, J M B 207, 597 (1989). 92. G . M. Studnicka, G. M. Rahn, I. W. Cummings and W. A. Salser, NARes 5,3365 (1978).
174
MICHhEL CHASTAIN AND ICNACIO TINOCO, J R .
93. J. P. Abrahams, M. van den Berg, E. van Batenburg and C. Pleij, NAAes 18, 3035 (1990). 94. C. K. Tang and D. E. Draper, Cell 57, 531 (1989). 95. K. Rietveld, R. van Poelgeest, C. W. A. Pleij, J. H. van Boom and L. Bosch, NARes 10, 1929 (1982). 96. C. W. A. Pleij, K. Rietveld and L. Bosch, NARes 13, 1717 (1985). 97. I. Brierley, P. Digard and S. C. Inglis, Cell 57, 537 (1989). 98. J. D. Puglisi, J. R. Wyatt and I. Tinoco, Jr.. J M B 214, 437 (1990). 99. J. R. Wyatt, J. D. Puglisi and I. Tinoco, Jr.. J M B 214, 455 (1990). 100. B. D. James, G. J. Olsen, J. Liu and N. R. Pace, Cell 52, 19 (1988). 101. R. R. Gutell and C. R. Woese, PNAS 87, 663 (1990). 102. H. Leffers, J. Kjems, L. Bstergaard, N. Larsen and R. A. Garrett, J M B 195, 43 (1987). 103. L. J. Maher 111, B. Wold and P. B. Dervan, Science 245, 725 (1989). 104. M. Cooney, G. Czernuszewicz, E. H. Postel, S. J. Flint and M. E. Hogan, Science 241, 456 (1988). 105. T. C. Boles and M. E. Hogan, Bchem 26, 367 (1987). 106. F. Michel, M. Hanna, R. Green, D. P. Bartel and J. W. Szostak, Nature 342, 391 (1989). 107. I. Tinoco, Jr.. J. D. Puglisi and J. R. Wyatt, in “Nucleic Acids and Molecular Biology” (F. Eckstein and D. M. J. Lilley, eds.), p. 205. Springer-Verlag, New York, 1990. 108. J. R. Sampson, A. B. DiRenzo, L. S. Behlen and 0. C. Uhlenbeck, Bchem 29,2523 (1990). 109. E. Westhof, P. Romby, P. J. Romaniuk, J.-P. Ebel, C. Ehresmann and B. Ehresmann, J M B 207, 417 (1989). 110. J. M. Burke, J. S. Esherick, W. R. Burfeind and J. L. King, Nature 344, 80 (1990). 111. S. Arnott and P. J. Bond, Nature NB 244, 99 (1973). 112. A. G. Letai, M. A. Palladino, E. Fromm, V. Rizzo and J. R. Fresco, Bchem 27,9108 (1988). 113. P. Rajagopal and J. Feigon, Bchem 28, 7859 (1989). 114. C. de 10s Santos, M. Rosen and D. Patel, Bchem 28, 7282 (1989). 115. Y. Timsit, E. Westhof, R. P. P. Fuchs and D. Moras,Nature 341, 459 (1989). 116. J. A. Doudna and J. W. Szostak, Nature 339,519 (1989). 117. E. Henderson, C. C. Hardin, S. K. Wolk, I. Tinoco, Jr.. and E. H. Blackburn, Cell 51, 899 (1987). 118. W. I. Sundquist and A. Klug, Nature 342, 825 (1989). 119. J. R. Williamson, M. K. Raghuraman and T. R. Cech, Cell 59, 871 (1989). 120. D. Sen and W. Gilbert, Nature 344, 410 (1990). 121. S. B. Zimmerman, G. S. Cohen and D. R. Davies, JMB 92, 181 (1975). 122. J. P. Goddard, Prog. Biophys. Mol. Biol. 32, 233 (1977). 123. R. B. Waring, C. Scazzocchio, T. A. Brown and R. W. Davies, JMB 167, 595 (1983). 124. F. Michel and B. Dujon, EMBOJ. 2, 33 (1983). 125. R. R. Gutell. H. F. Noller and C. R. Woese, EMBOJ. 5, 1111 (1986). 126. M. Levitt, Nature 224, 759 (1969). 127. A. Expert-Bezanpn and P. Wollenzien, JMB 184, 53 (1985). 128. M. I. Oakes, L. Khan and J. A. Lake, J M B 211, 907 (1990). 129. I. Hubbard, Ph.D. thesis, University of California, Berkeley, 1990. 130. P. Dumas, D. Moras, C. Florentz, R. Giegi., P. Verlaan, A. Van Belkum and C. W. A. Pleij, J . Biomol. Struct. Dyn. 4, 707 (1987). 131. M. S . Capel, D. M. Engelman, B. R. Freeborn, M. Kjeldgaard, J. A. Langer, V. Ramakrishnan, D. G . Schindler, D. K. Schneider, B. P. Schoenborn, I.-Y. Sillers, S . Yabuki and P. B. Moore, Science 238, 1403 (1987). 132. S.-H. Kim, F. L. Suddath, G. J. Quigley, A. McPherson, J. L. Sussman, A. H. J. Wang, N . C. Seeman and A. Rich, Science 185, 435 (1974). 133. J. D. Robertus, J. E. Ladner, J. T.Finch, D. Rhodes, R. S. Brown, B. F. C. Clark and A. Klug, Nature 250, 546 (1974).
STRUCTURAL ELEMENTS IN RNA
175
134. N. H. Woo, 8. A. Roe and A. Rich, Nature 286, 346 (1980). 135. K. Wiithrich, “NMR of Proteins and Nucleic Acids.” Wiley, New York, 1986. 136. F. J. M. van de Ven and C. W. Hilbers, EJB 178, l(1988). 137. S. Roy and A. G. Redfield, NARes 9, 7073 (1981). 138. R. E. Hurd and B. R. Reid, Bchem 18, 4017 (1979). 139. A. Heerschap, J.-R. Mellema, H. C . J. M. Janssen, J. A. L. I. Walters, C. A. G. Haasnoot and C. W. Hilbers, EJB 149,649 (1985). 140. A. Bax, R. H. Griffey and B. L. Hawkins, J . Magn. Reson. 55, 301 (1983). 141. G. W. Vuister, R. Boelens, A. Padilla, G. J. Kleywegt and R. Kaptein, Bchem 29, 1829 (1W). 142. M. Karplus, JACS 85, 2870 (1963). 143. C. Altona, Recl. Trau. Chim. Pays-Bas 101, 413 (1982). 144. C. Giessner-Prettre and B. Pullman, Q . Reu. Biophys. 20, 113 (1987). 145. D. G. Gorenstein, Annu. Rev. Biophys. Bioeng. 10, 355 (1981). 146. F. R. Prado, C. Giessner-Prettre. B. Pullman and J.-P. Dandley, JACS 101, 1737 (1979). 147. T. M. Jovin, J. H. van de Sande, D. A. Zarling, D. J. Arndt-Jovin, F. Eckstein, H. H. Fuldner, C. Greider, I. Grieger, E. Hamori, B. Kalisch, L. P. McIntosh and M. RobertNicoud, CSHSQB 47, 143 (1983). 148. A. H.-J. Wang, G. J. Quigley, F. J. Kolpak, G. van der Marel, J. H. van Boom and A. Rich, Science 211, 171 (1981). 149. C. Giessner-Prettre, B. Pullman, F. R. Prado, D. M. Cheng, V. Iuorno and P. 0. P. Ts’o, Biopolymers 23, 377 (1984). 150. T.F. Havel and K . Wuthrich, Bull. Math. Biol. 46, 673 (1984). 151. W. Braun and N. Go, J M B 186, 611 (1985). 152. D. R. Hare, L. Shapiro and D. J. Patel, Bchem 25, 7445 (1986). 153. A. Pardi, D. R. Hare and C. Wang, PNAS 85, 8785 (1988). 154. T. F. Havel and K. Wuthrich, J M B 182, 281 (1985). 155. J. R. Williamson and S. G. Boxer, Bchem 28, 2819 (1989). 156. E. Nikonowicz, V. Roongta, C. R. Jones and D. G. Gorenstein, Bchem 28, 8714 (1989). 157. A. D. Branch, B. J. Benenfeld, C. P. Paul and H. D. Robertson, in “Methods in Enzymology” (James E. Dahlberg and John N. Abelson, eds.), Vol. 180, p. 418. Academic Press, San Diego, 1989. 158. G . D. Cimino, H. B. Gamper, S. T. Isaacs and J. E. Hearst, ARB 54, 1151 (1985). 159. G. Guerrier-Takada. N. Lumelsky and S. Altman, Science 246, 1578 (1990). 160. E. I. Budowsky and G. G . Abdurashidova, This Series 37, 1 (1989). 161. L. Stryer, ARB 47, 819 (1978). 162. K. Beardsley and C. R. Cantor, PNAS 65, 39 (1970). 163. K.-H. Huang, R. H. Fairclough and C. R. Cantor, J M B 97, 443 (1975). 164. C.Ehresmann, F. Baudin, M. Mougel, P. Romby, J.-P. Ebel and B. Ehresmann, NARes 15, 9109 (1987). 165. R. Lavery and A. Pullman, Biophys. Chem. 19, 171 (1984). 166. S. Furois-Corbin and A. Pullman, Biophys. Chem. 22, l(1985). 167. G. Knapp, in “Methods in Enzymology” (James E. Dahlberg and John N. Abelson, eds.), Vol. 180,p. 193.Academic Press, San Diego, 1989. 168. R. Parker, in “Methods in Enzymology” Oames E. Dahlberg and John N. Abelson, eds.), Vol. 180, 510. Academic Press, San Diego, 1989. 169. H. S. Olsen, P. Nelbock, A. W. Cochrane and C. A. Rosen, Science 247, 845 (1990). 170. S. Heaphy, C. Dingwall, I. Emberg, M. J. Gait, S. M. Green, J. Karn, A. D. Low, M. Singh and M. A. Skinner, Cell 60, 685 (1990). 171. K. Ehrenman, R. Schroeder, P. S. Chandry, D. H. Hall and M. Belfort, NARes 17,9147 (1989).
176
MICHAEL CHASTAIN AND IGNACIO TINOCO, JR.
C. L. Williamson, W. M. Tierney, B. J. Kerker and J. M. Burke, JBC 262, 14672 (1987). C. Wolberger, Y. Dong, M. Ptashne and S. C. Harrison, Nature 335, 789 (1988). S. R. Jordan and C. 0. Pabo, Science 242, 893 (1988). A. K. Aggarwal, D. W. Rodgers, M. Drottar, M.Ptashne and S. C. Harrison, Science 242, 899 (1988). 176. S. Stern, R. C. Wilson and H. F. Noller, JMB 192, 101 (1986). 177. J.-H. Wang, Nature 319, 183 (1986). 178. M. A. Rould, J. J. Perona, D. Sol1 and T.A. Steitz, Science 246, 1135 (1989). 179. S. Amott, Nature 320, 313 (1986). 180. L. Fairall, D. Rhodes and A. Klug, J M B 192, 577 (1986). 181. R. R. Gutell, B. Weiser, C. R. Woese and H. F. Noller, This Series 32, 155 (1985). 182. H.-N. Wu and 0. C. Uhlenbeck, Bchem 26, 8221 (1987). 183. D. A. Peattie, S. Douthwaite, R. A. Garrett and H. F. Noller, PNAS 78, 7331 (1981). 184. R. J. Gregory and R. A. Zimmerman, NARes 14, 5761 (1986). 185. R. J. Gregory, P. B. F. Cahill, D. L. Thurlow and R. A. Zimmerman, J M B 204,295(1988). 186. H. Leffers, J. Egebjerg, A. Anderson, T.Christensen and R. A. Garrett, J M B 204, 507 (1988). 187. S . Feng and E. C. Holland, Nature 334, 165 (1988). 188. C. Dingwall, I. Emberg, M. J. Gait, S. M. Green, S. Heaphy, J. Kam, A. D. Lowe, M. Singh, M. A. Skinner and R. Valerio, PNAS 86, 6925 (1989). 189. B. Berkhout, R. H. Silverman and K.-T. Jeang, Cell 59, 273 (1989). 190. D. k i n s k i , E. Grzadzielska and A. Das, Cell 59, 207 (1989). 191. D. Scherly, W. Boelens, W. J. van Venrooij, N. A. Dathan, J. Hamm and I. W. Mattaj, E M B O J . 8, 4163 (1989). 192. D. R. Turner, L. E. Joyce and P. J. G. Butler, J M B 203, 531 (1988). 193. G . W. Witherell and 0. C. Uhlenbeck, Bchem 28, 71 (1989). 194. Y. Endo, A. cluck, Y.-L. Chan, K. Tsurugi and I. G. Wool, JBC 265, 2216 (1990). 195. J. L. Casey, M. W. Hentze, D. M. Koeller, S. W. Caughman, T. A. Rouault, R. D. Klausner and J. B. Harford, Science 240, 924 (1988). 196. Y. Wataya, A. Matsuda and D. V. Santi, JBC 255, 5538 (1980). 197. R. M. Starzyk, S. W. Koontz, and P. Schimmel, Nature 298, 136 (1982). 198. P. J. Romaniuk and 0. C. Uhlenbeck, Bchem 24, 4239 (1985). 199. M. W. Hentze, T. A. Rouault, J. B. Harford and R. D. Klausner, Science 244,357 (1989). 200. C. Francklyn and P. Schirnmel, Nature 337, 478 (1989). 201. W. H. McClain, Y.-M. Chen, F. Foss and J. Schneider, Science 242, 1681 (1988). 202. J. Normanly and J. Abelson, ARB 58, 1029 (1989). 203. P. Schimmel, Bchem 28, 2747 (1989). 204. J. R. Sampson, A. B. DiRenzo, L. S. Behlen and 0. C. Uhlenbeck, Science 243, 1363 (1989). 205. S. A. Adam, T. Nakagawa, M. S. Swanson, T.K. Woodruff and G. Dreyfuss, MCBiol 6 , 2932 (1986). 206. C. C . Query, R. C. Bentley and J. D. Keene, Cell 57, 89 (1989). 207. I. W. Mattaj, Cell 57, 1 (1989). 208. G . Dreyfuss, M. S. Swanson and S. Pinol-Roma, TlBS 13, 86 (1988). 209. M. F. Summers, T. L. South, B. Kim and D. R. Hare, Bchem 29, 329 (1990). 210. M. S. Lee, G. P. Gippert, K. V. Soman, D. A. Case and P. E. Wright, Science 245,635 (1989). 211. P. A. Sharp, Science 235, 766 (1987). 212. D. A. Brow and C. Guthrie, Nature 334, 213 (1988). 213. B. Blum, N. Bakalara and L. Simpson, Cell 60,189 (1990).
172. 173. 174. 175.
STRUCTURAL ELEMENTS IN RNA
177
214. D. Moazed and H. F. Noller, Nature 342, 142 (1989). 215. D. Moras, A.-C. Dock, P. Dumas, E. Westhof, P. Romby, J.-P. Ebel and R. Gieg6, 1. Bioml. Struct. Dyn. 3, 479 (1985). 216. D. Moras, A.-C. Dock, P. Dumas, E. Westhof, P. Romby, J.-P. Ebel and R. Gieg6, PNAS 83, 932 (1986). 217. K. Yoon, D. H. Turner and I. Tinoco, Jr., J M B 99, 507 (1975). 218. P. Romby, R. Giege, C. Houssier and H. Grosjean, ] M B 184, 107 (1985). 219. D. Herschlag and T. R. Cech, Nature 344, 405 (1990). 220. P. H. von Hippel, D. G. Bear, W. D. Morgan and J. A. McSwiggen, ARB 53,389 (1984). 221. T. Platt, Cell 24, 10 (1981). 222. M. Riley, B. Maling and M. J. Chamberlin, J M B 20, 359 (1966). 223. F. H. Martin and I. Tinoco, Jr., NARes 8 , 2295 (1980). 224. C. W.Greider and E. H. Blackburn, Cell 51, 887 (1987). 225. C. W. Greider and E. H. Blackburn, Nature 337, 331 (1989). 226. G.-L. Yu, J. D. Bradley, L. D. Attardi and E. H. Blackburn, Nature 344, 126 (1990).
This Page Intentionally Left Blank
Nuclear RNA-binding Proteins JACK AND
D. KEENE CHARLES C. QUERY
Department of Microbiology and Immunology Duke Unioersity Medical Center Durham, North Carolina 27710 I. RNA-binding Proteins . . 11. RNA-Protein Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Detection of RNA Binding in Vitro B. Sequence Similarities among RNA-a ............ 111. RRM Family of Proteins . . A. Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. RNA Recognition Motif C. Origins of the RRM Family ................................... D . Evidence for Direct Interaction of the RRM with RNA E. Specificity of RNA Recognition ................................ F. Do RRMs Constitute “RNA-binding Domains”? ........... IV. Structural Features of RN V. Regulatory Potentials of the RRM Family of Proteins . . . . . . . . . . . . . . . . . VI. Conclusions and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Note Added in Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
180 181 181 186 187 187 187 188 188 192 194 195 196 198 199 202
The control of gene expression involves several steps at which specific sequences in pre-mRNA transcripts, as well as those in small RNA molecules, are recognized by proteins. RNA-binding proteins can be expected to mediate interactions in a variety of cellular processes, including those occurring in the transcription complex, the spliceosome and the ribosome. Members of one family of nuclear proteins that bind to RNA contain a specific RNA recognition motif (RRM). The RRM family of proteins functions at several levels in RNA processing and some family members are involved in tissue-specific as well as developmentally regulated gene expression. This review describes the proteins that contain this RRM and discusses the potential involvement of these proteins in the control of gene expression at the level of RNA processing. These proteins are modular in structure and often contain at least two types of interactive surfaces, one or more that interacts specifically with RNA and another that interacts with other molecules. Studies to date indicate that, despite the strong homology among these proteins, 179 Progress in Nucleic Acid Research and Molecular Biology, Vol. 41
Copyright Q 1991 by Academic Press. Inc. All rights of r e p d u c t i o n in any form reserved.
180
JACK D. KEENE AND CHARLES C. QUERY
they have unique properties of recognition that allow them to distinguish RNAs of diverse structure.
1. RNA-binding Proteins RNA in living cells is rarely naked. In most cases, RNA is bound to proteins recognizing regions that include RNA termini, homopolymeric sequences, or specific stem-loop structures. The attachment of protein to RNA may be covalent, as in the case of the poliovirus VPg protein, or, more commonly, may involve hydrophobic interactions, hydrogen bonding or charge pairing. The less specific proteins may recognize just the phosphate backbone of RNA, and the more specific proteins may interact with RNA bases. The first three-dimensional determination of RNA structure was obtained by X-ray crystallography of phenylalanine tRNA ( 1 , 2 ) ;more recently, co-crystals of glutamine tRNA synthetase and glutamine tRNA have been reported (3).These latter studies demonstrated contacts between the protein and both the bases and the phosphate backbone of the RNA. In some cases the amino-acid-RNA contacts involved unexpected residues. An overview of RNA-associated proteins is presented in Table I. Among the many known RNA-associated proteins, the bacteriophage r17 coat protein is the best understood (4, 5). This protein contacts a stem-loop structure of the phage RNA and depends on specific sequences in the loop for recognition. The sequence of the RNA stem is less critical; however, a specific “bulged” nucleotide in the stem is vital for high-affinity binding of the protein. In other recent examples, bulged nucleotides have been found to be dispensible (0.Uhlenbeck, personal communication). It has long been known that transcription cofactor TFIIIA can bind to the gene encoding 5-S RNA and mediate the function of RNA polymerase I11 (6). TFIIIA can also bind to 5-S RNA, and an autoregulatory mechanism for the control of 5-S transcription was proposed (6). Studies of the structure of TFIIIA have implicated the zinc-binding fingers in nucleic acid recognition (7). In recent years, viral proteins involved in interactions with the RNAs of the human immunodeficiency virus (HIV) have been identified. These include the tat, rev, and rex proteins implicated in RNA processing and transport of viral mRNA to the cytoplasm (8).These proteins interact with specific regions of the lentivirus RNAs such as the tar and the rev-response element, as recently demonstrated by direct RNA binding studies (8-11). Proteins that are uniquely associated with heterogeneous nuclear RNA (hnRNA) or with small nuclear RNAs (snRNAs) are known, but few studies have demonstrated their direct interaction with RNA. Several of these proteins may attach to ribonucleoprotein (RNP) complexes through proteinprotein interactions and others may bind directly to RNA. Methods to exam-
NUCLEAR HNA-BINDING PROTEINS
181
TABLE I RNA-ASSOCIATED PHOTEINS'
A.
Hctcrogeneoirs nuclear RNP proteins (hnRNP proteins Al, A2/B1, CllC2, E, L) Small nuclear RNP proteins (snRNP proteins 70K, A, B', PRP-24b) mHNP proteins (poly(A)-binding protein, eIF-46) Prc-rRNP protein (nuclcolin) Splicing factors (ASF/SFPc; pl'TW; Drosophila tra-2, S x l ) Transcription factors (La,E. coli rhod) Helix-destabilizing proteins (UPl, HDP, SSBI) Others: Ho-GOK; Neuronal protein (eho);yeast NSHle; Malaria CARP; Phage proteins (T4 gp32d. ~$29gpl0); Maize AAIP; Chloroplast proteins (28, 31, and 33kD); Drosophilo
6.
Ribosomal proteins Signal recognition particle proteins tRNA synthetascs RNA vinis core and nucleocapsid proteins RNA-dependent DNA or RNA polymerases Hibonucleases RNA-RNA helicascs Others (TFIIIA, rev, rex, tat)
bicoidd
aHNA-associated proteins that possess (A) or lack (6)the RNA recognition motif (HRM) depicted in Fig. 2. These proteins arc known to he associated with HNA; some are known to contact HNA directly and others arc present in RNP complexes hut may not be in direct contact with RNA. In most categories within part (A), examples also exist that do not possess the RRM. Abbreviations: PRP, precursor RNA processing; eIF-4B. eukaryotic initiation factor-48; ASF, alternative splicing factor; SF2, splicing factor 2; tra-2, tranrformer-2; Sxl, sex-lethal; SSB1, yeast single-stranded binding protein; eloc, embryonic-lethal abnormal-oisual; M I P , abscisic acid-induciblr protein; NSRI, nuclear signal recognition protein; CARP, clustered-asparaginc-rich protein. pPTB, polypyrimidine tract-binding protein. UP1 is proteolytically derived from huRNPA1. *K. Shannon and C. Guthrie, personal communication. cH. Ge, P. Zuo and J. L. Manley; and A. Krainer, personal communication. dAtypical RRM. These proteins share partial similarity with the conserved RNP-1 and RNP-2 sequences, but do not match at some other positions expected to be critical for the structure of the domain, such as conserved hydrophobic residues (29). =Ten MBIesc, personal communication. fM.Garcia-Blanrw, personal rwmmunication.
ine direct interactions have involved cross-linking with UV light1 (12, 13) and in uitro reconstitution of the RNP complex (see Section 11). In some cases, these approaches have allowed discrimination between direct and indirect attachment of proteins to RNA.
II.
RNA-Protein Interactions
A. Detection of RNA Binding in Vitro Methods of studying RNA-protein complexes isolated from cells have been available for several years (reviewed in 14), but have not allowed determination of which proteins in the complex were in contact with the RNA. 1
See the article on this point by Budowsky and Abdurashidova in Vol. 37 of this series.
[Eds.]
182
JACK D. KEENE AND CHARLES C. QUERY
Several methods have been used to examine directly complexes formed in uitro (e.g., fluorescence quenching), but have had limited applicability because of the requirement for purified components. More recently, other methods that do not require homogeneous materials have been employed. Some useful methods are discussed below, and some examples are shown in Fig. 1 using the U1-snRNP-A protein.
FIG. 1. Five different methods of detecting RNA binding. (A) Immunoprecipitation of an in uitro translated (TI)protein containing an epitope tag using a tag-specific antibody (16, 27). Similar immunoprecipitation methods for RNA binding have used autoantibodies from patients or antibodies produced in animals (26, 32). (B) The WestNorthern blot procedure uses radiolabeled protein and unlabeled RNA transferred to a solid surface (C. C. Query and J. D. Keene, unpublished). (C) A mobility-shift assay with radiolabel in the protein, (D) the Northwestern blot procedure, and (E) a mobility-shift assay with the radiolabel in the RNA are according to published procedures described in the text. In each example shown here, the U1-snRNP-A protein was used to bind to U1 RNA, stem-loop I1 of U 1 RNA (SL2). a deletion of SL2 (ASL2). or stem-loop I of U 1 RNA (SLl). The epitope tag used in (A) is the 12-aminoacid gene-10 peptide of phage T7 (16, 27, 29). Comp, nonspecific competitor RNA.
183
NUCLEAR RNA-BINDING PROTEINS
D. NorthWesltm
C. Mobility Shift
Blol
A 8'.B C D
E. RNA Mobility Shift
UlSLl RNA
UlSL2 RNA
184
JACK D. KEENE AND CHARLES C. QUERY
1. FILTERBINDING A standard approach to the study of both DNA and RNA binding involves attachment to nitrocellulose filters (15). Protein binds directly to various solid substrates, but nucleic acids require special treatments, such as chemical denaturation or UV light exposure, to bind to these materials. However, if the nucleic acid binds to the protein that, in turn, binds to the nitrocellulose, one can identify specific protein-nucleic acid interactions by labeling the nucleic acid. Such methods have been used to study protein-RNA binding quantitatively, and dissociation constants have been approximated in this manner (4). The “Northwestern blot” method is an modification of the filter-binding assay, in which the protein is separated on acrylamide gels prior to transfer to nitrocellulose and probing with labeled RNA (Fig. 1D). When denaturing polyacrylamide gels are used, renaturation of the binding domain of the protein is required. Thus, some RNA-binding proteins are not amenable to analysis by this method. Furthermore, proteins that require accessory factors for RNA binding will not be detected by Northwestern blotting (16). On the other hand, this method has the advantage of not requiring previous purification of protein, and it may serve as an alternative method when antibody precipitation and other methods are not feasible. For example, Northwestern blotting has been used to identify new proteins that bind to an RNA sequence of interest (17, 18),and to determine the RNA sequence specificity of a known RNA-binding protein (19). Recently, this method was used to determine a site of recognition of the Drosophila sex-lethul (Sxl) protein on the alternatively spliced pre-mRNA of the transformer (tra)gene (20). 2. FLUORESCENCE QUENCHING
Under circumstances in which aromatic amino acids such as tryptophan, tyrosine, and phenylalanine are involved in RNA binding, it has been possible to detect binding by measuring the reduction of fluorescence (21).This method is very sensitive if the correct amino acids are involved in RNA contact, but requires the use of highly purified components. The method is most sensitive when tryptophan emission is altered upon binding, and less sensitive for detecting phenylalanines, because their emission can be obscured by nucleic acids. Fluorescence quench techniques have been used to quantitate interactions of the yeast poly(A)-binding protein with poly(A)RNA (22).
3. UV-CROSS-LINKING The ability to cross-link RNA covalently to protein using UV light’ has become increasingly popular in recent years as an indicator of RNA binding. The method is dependent on an appropriate juxtaposition of photoreactive
NUCLEAR RNA-BINDING PROTEINS
185
RNA bases and amino acids (23). For example, pyrimidine bases, especially uracil, can be cross-linked to hydroxylated amino acids (e.g., tyrosine) that are in close proximity. Thus, this method is limited to coincidental proximities, but has proved useful in many cases.l It has also been criticized as not being a measure of specific RNA-protein binding, but only one indication of an association. Under some conditions, one may be able to cross-link proteins that are not directly bound to RNAs. Thus, precautions should be taken in the interpretation of interactions involving UV-cross-linking. More recently, in combination with in uitro RNA competition assays, UV-crosslinking has been used as a method to detect sequence-specific RNA interactions (12, 24). 4. ANTIBODY PRECIPITATION
The presence of autoantibodies reactive with snRNA-binding proteins allows a convenient method of selection of RNPs containing these proteins (25). These antibodies can immunoprecipitate either RNPs synthesized in uiuo or complexes bound in uitro. However, the use of auto-antisera often is compromised by the presence of more than one antibody specificity or by interactions with the RNP that differ from those occurring with the protein alone. Sera from animals immunized with a specific RNA-binding protein or the use of an antibody reactive with an antigenic “tag” that has been attached to a recombinant protein can circumvent these difficulties (26,27). An example of the latter is a phage-T7 gene-10 peptide of 12 amino acids fused to the amino terminus of proteins expressed from certain vectors (28). Antibodies to this tag (16, 27, 29) allowed efficient immunoprecipitation of various in uitro bound complexes (Fig. 1A). 5. MOBILITYSHIFT Nondenaturing gel electrophoresis has been widely used to assay DNAprotein complexes (30). RNA-protein complexes may similarly be examined. For example, a shift in the mobility of a labeled RNA in the presence of cellular extracts and excess competitor RNAs may indicate the presence of a specific binding protein (Fig. 1E). In some cases, resistance of nucleotides in the complex to ribonuclease has been used to indicate specificity (9, 31). Alternatively, RNA binding can be assessed by a change in the mobility of labeled protein, produced by in uitro translation, in the presence of specific RNAs (27, 32) (Fig. 1C). In all cases, competition experiments using nonspecific RNAs are required to verify the specificity of the complex formed. Methods using nondenaturing gels for analysis of large multicomponent complexes such as spliceosomes and polyadenylation complexes have been reviewed recently (33).
186
JACK D. KEENE AND CHARLES C. QUERY
6. BIOTINYLATIONOF RNA Methods have been developed to bind proteins to RNAs containing biotinylated nucleotides so that the RNP complex can be isolated on immobilized avidin. The use of biotinylated RNA as a handle for studying RNAprotein interactions has centered on the isolation of components of the spliceosome (33-35). Recently, such methods also have allowed study of the binding of U 1 RNA to the U1-snRNP-A protein (36). This method has the advantage of allowing detection of labeled proteins from whole-cell extracts, as well as from in uitro translation systems. On the other hand, background binding is often present, and in oitro transcribed RNA must be used. Thus, if the RNA folds incorrectly, if modified bases are required, or if biotinylation interferes with binding, this method would be less useful.
7. WESTNORTHERN BLOTTING A counterpart to the Northwestern blot is the WestNorthern blot (Fig. 1B), which is a Northern blot probed with labeled protein (37).This method of RNA binding has many of the same advantages and disadvantages as the Northwestern blot, but has other applications. For example, conditions of binding need not be compatible with antibodies or gel electrophoresis. However, only a limited number of proteins have been amenable to this technique. One useful application of Northwestern and WestNorthern blots in studying RNA-protein interactions is screening expression libraries to isolate the cognate ligand. We have recently used this method to bind sequences in stem-loop I1 of U 1 RNA that interact with the U1-snRNP-A protein (37). In summary, a variety of RNA-binding methods have been developed that allow the study of RNA-protein interactions, and they have various advantages. No single method is ideal for all such interactions, and different methods may have to be applied in each individual case. Quantitative binding studies using different methods indicate that most RNA-binding proteins studied to date have dissociation constants in the range of lo7 to lo9 M-' (4 16, 22, 27, 37-39).
B. Sequence Similarities among RNA-associated Proteins Sequence elements characteristic of a group of RNA-associated proteins began to emerge with the observation (40) of four copies of an 80-aminoacid repeat in the poly(A)-binding protein. The most conserved region of eight amino acids common among these repeats and in the hnRNP-A1 protein led (41) to the term RNP consensus sequence to describe the octamer. Other groups also noted the presence of conserved sequences in other RNA-associated proteins and speculated that such sequences might be involved in direct RNA binding (42-56). More recently, conservation of the octamer and a separate hexamer sequence in a collection of RNA-associated proteins was noted
NUCLEAR RNA-BINDING PROTEINS
187
(49,50). The amino-acid consensus sequences were referred to as RNP 1and RNP 2, respectively. Several workers independently observed extensively conserved sequences in a diverse collection of approximately 20 proteins ranging in source from Escherichia coli to humans, and the region of sequence similarity was found to span approximately 80 residues (32,42,49,55). The presence of the 80-aminoacid motif in a broad family of proteins and direct evidence for interactions of this motif with RNA led to its designation as an “RNA recognition motif,” or RRM (32).It has also been referred to as an “RNP consensus sequence type-RNA binding domain” (44)and an “RNP-80 motif‘ (36).However, it is certain that not all RNA-binding proteins contain this RRM; other motifs may yet be discovered. The variety of RNAassociated proteins identified to date is outlined in Table I. Those found to contain at least one RRM are listed in Section A, and those apparently lacking this motif are listed in Section B.
111. RRM Family of Proteins A. Members Several members of this family of RNA-associated proteins are involved in aspects of RNA metabolism, including pre-mRNA transcription, splicing, and, possibly, stability and transport. Proteins containing the RRM are associated with hnRNA (Al, Bl/A2, Cl/C2, E, and L proteins), small RNAs (A, B”, 70K,2 La, and Ro-60K2 proteins), mature mRNAs [poly(A)-binding protein (PAB protein) and eIF-4B1, and pre-rRNA (nucleolin) (Table I). Some RRM-containing proteins are helix-destabilizing proteins (UP1, HDP, and SSBl), and one is a translational repressor (T4 gp32). Other members, such as Drosophila embyonic-lethal-abnormul-visual-system(elm), p9, tra-2, and S x l , were implicated in RNA-protein interactions because they contain the RRM. Genes tru-2 and Sxl are linked in a regulatory cascade pathway of alternative pre-mRNA splicing. The product of Sxl interacts with the tra premRNA (20); however, tra-2has not yet been shown to contact any specific RNA molecule. It is logical to predict that it may associate with pre-mRNAs or with the snRNAs that interact with pre-mRNA (reviewed in 45).
B. RNA Recognition Motif The RRM of 80 amino acids contains the RNP consensus octamer near the center. Conserved residues are present on both sides of the RNP octamer, but are more abundant in the amino-terminal half of the motif. About 30 residues toward the amino terminus from the octamer is the hexamer of 2 K here represents kDa, kilodaltons (or kilo-atomic-mass-units, kamu). However, 70K is the name given (53) a 52-kDa protein (32). [Eds.]
188
JACK D. KEENE AND CHARLES C. QUERY
conserved amino acids corresponding to RNP 2 (49). In addition to these regions, there are many other conserved positions in the RRM that contain predominantly phenylalanine, glycine, or alanine residues (32). An aminoacid consensus for the RRM has been derived that assigns residues as shown in Fig. 2. Nine residues within the RRM consensus are conserved as particular amino acids and therefore may be essential for the RNA binding function or for the structure of the RNA-binding domain. In determining structure/function relationships within the RRM, a central question is: Which residues are essential for sequence-specific RNA binding? Furthermore, are sequences inside or outside of the RRM essential for sequence specificity? A short amino-acid sequence flanking the aminoterminal side of the RNP octamer is critical for the specificity of recognition of U l and U2 RNAs by the A and Brrproteins, respectively (16). No information is presently available regarding specific amino-acid contacts with bases and phosphates in the RNA. Thus, it is not known which regions within the RRM influence affinity for RNA, but it has been suggested that the RNP octamer has this potential. These questions can be addressed, in part, by site-directed mutagenesis and subdomain switching among RRMs. Ultimately, three-dimensional structure data will be required to resolve many of these issues.
C. Origins of the RRM Family The similarities manifest among the proteins containing the RRM make it likely that many of these proteins are truly homologous. It has been proposed that, in some cases, exons represent protein domains that rearrange as units in evolution (57). The question of whether the RRM is coded within a single exon has been examined for three cases: La, nucleolin, and U 1 snRNP-70K (the gene for yeast PAB protein has been sequenced, but has no introns). The human La (47)and mouse nucleolin (58) genes contain introns within each of the conserved octamer sequences, although the Xenopus 70K protein does not (59). All three proteins contain introns at positions near the flanking edges of the RRMs. Thus, it appears that the RRMs of these proteins may contain distinct structural elements encoded on separate exons. Exon shuffling as well as gene duplication events may have contributed to the evolution of these RNA-binding proteins.
D. Evidence for Direct Interaction of the RRM with RNA Evidence that the RRMs of RNA-associated proteins interact with RNA comes from several approaches. The 320-aminoacid hnRNP-A1 protein and a 195-aminoacid proteolysis fragment of hnRNP-A1 (UP1) can bind singlestranded (ss) DNA and ssRNA (60). Two 92-aminoacid fragments isolated from the UP1 portion of hnRNP-A1 were UV-cross-linked from their phe-
B P G A O
40
B
A A
A P
A G R G F L O F
*
A P A A A N P C n A H A A A B H P A P C C O C C C P O H A O - A X O A G X X L B G A X - O O L O X A X X H X 50 60 70 80
FWP 1 or RNP octamer
FIG. 2. Amino-acid consensus for the RNA recognition motif family of proteins (32). The residues are designated PO (polar), AL (aliphatic), AR (aromatic), AC (acidic), HO (hydrophobic), AB (amides, acids, or bases), BA (basic), AA (amides or acids), NP (nonpolar), CH (charged), X (unassigned), A (alanine), M (methionine), D (aspartic acid), T (threonine), G (glycine), F (phenylalanine), and R (arginine). The most conserved regions (designated RNP 1 for the RNP consensus sequence and RNP 2) are bracketed (44) and the most highly conserved positions are shaded. Dashes indicate positions of variability between family members, where insertions or deletions occur. Asterisks indicate the two aromatic residues in the hnRNP-A1 protein that were cross-linked to oligo-deoxythymidine (52).
190
JACK D. KEENE AND CHARLES C. QUERY
nylalaine residues to oligo-deoxythymidine [positions 3 (aromatic) and 46 (phenylalanine) in Fig. 21 (52). The yeast PAB protein, which contains four RRMs (Fig. 3) and is essential for growth, can be truncated to a 66-aminoacid fragment and still support some degree of growth (22). This peptide contains the amino half of one RRM, but no RNP octamer, which suggests that the RNP octamer may not be required for the function of the PAB protein. This is so far the only genetic system in which the dispensibility of RRM sequences has been examined. A 33-kDa fragment of nucleolin can interact with processed 184 and 284 RNA in a Northwestern assay, although neither the RNA sequence specificity of this association nor interactions with pre-rRNA was examined (46). This 33-kDa binding fragment encompassed nearly three 80- to 90aminoacid RRM repeats, suggesting that at least part of the RNA-binding activity is associated with the RRMs.
FIG.3. Structural characteristics and modular features of the RNA recognition motif (RRM) family of proteins. Two classes are shown that correspond to the presence of single (classI) or multiple (class-11) repeats of the RHM. The highly charged arginine-rich regions of the U1-snRNP-70K and trunsfonner-2 proteins are designated RD/RE/RS. Potential nucleotide (ATP)-bindingand zinc-binding finger (Zn2+) motifs are also noted. Abbreviated proteins are defined in the text with appropriate references and as described (32). The diagrams are not necessarily to scale, and the size of the regions flanking the RRMs vary. Note that UP2 represents a partial clone.
NUCLEAR RNA-BINDING PROTEINS
191
A prokaryotic example of an RNA-binding protein within the family is the E. coli transcription termination factor rho (61). A proteolytic fragment of 155 amino acids from the amino terminus specifically bound RNAs that contained rho-dependent termination sequences. This domain has subsequently been observed to contain an 80-aminoacid RRM (32), although it possesses less sequence similarity to the consensus than most other members of the RRM family. A minimal region constituting an RNA-binding domain of a protein within the RRM family has been defined for the U1-snRNP-70K protein (32).The protein was progressively truncated until a 125-aminoacid fragment retained the specificity and affinity of the full length portion for U 1 RNA in a direct binding assay, whereas smaller fragments did not retain activity. For the 70KU 1 RNA interaction, 35 residues carboxy-terminal to the RRM were required for the full function of the domain; the RRM alone did not bind to U 1 RNA. Thus, the RNA-binding domain of this protein includes many residues external to the RRM. Determinants of specificity were proposed to reside at nonconserved positions within the RRM or at residues external to the RRM. Recent studies of the U1-snRNP-A protein indicate that a single-unit RNA-binding domain encompasses only one of two RRMs (27, 36). In this case, only six residues in addition to the 80-residue RRM constituted an 86aminoacid RNA-binding domain for U 1 RNA. Thus, the flanking sequences required for the A protein to bind U 1 RNA are fewer than those required for the 70K protein to bind. Studies of the RNA-binding features of the La and Ro-6OK proteins demonstrate that large regions of the protein external to the RRM are essential. The La protein can be deleted at its carboxy terminus and still retain binding activity, but removal of even a few amino acids at the amino terminus completely abolishes binding to precursor transcripts of RNA polymerase I11 (29; S. Clarkson, personal communication). The Ro-6OK protein is more fragile in that removal of even a few amino acids at the amino or carboxy terminus abolishes binding to the hY (human cytoplasmic Y) RNAs (62). Thus, the La (47) and Ro-6OK (48) proteins may be atypical in comparison to other RRM family members, but they demonstrate the diverse nature of the RNA-binding domains in that long-range intramolecular interactions may be involved in dictating the specificity of RNA recognition. Another recent variation on the theme of diversity in the recognition of RNA by members of the RRM family involves the U2-snRNP-B” protein that binds to both U1 and U2 RNAs with low affinity in uitro. Upon addition of the U2-snRNP-A’ protein, however, the affinity of B’ for U2 RNA increases at least 100-fold (16).The A’ protein appears to act on B through proteinprotein interactions involving a region of leucine periodicity in A‘ (81). These findings open the possibility that long-range interactions from within a protein (intramolecular) or interactions involving separate accessory proteins
192
JACK D. KEENE AND CHARLES C. QUERY
(intermolecular) may influence the specificity and affinity of RNA recognition and binding by members of this family of proteins. Analysis of the RNA-binding domain defined for the Ul-snRNP-A protein (27,36)using two-dimensional nuclear magnetic resonance has revealed a highly ordered structure for this module (D. W. Hoffman, C. C. Query, B. L. Golden, S. W. White, and J. D. Keene, unpublished). This RNA-binding domain consists largely of p-structure, with four antiparallel strands and two a-helices. Critical aromatic residues in RNPl and RNP2 believed to contact RNA are adjacent to one another in a P-sheet, and both project to the surface of the molecule. These adjacent aromatic residues are highly constrained and intolerant to significant variation (37, 62). Evidence to date demonstrates that an RRM can be a component of an RNA-binding domain. However, the requirement of particular residues within this motif and of sequences outside of it, as well as the contributions of accessory proteins, vary among members of the family. These findings indicate that the structural features of the RRM proteins that allow recognition and binding to RNA are complex and are likely to involve a variety of molecular interactions that are unique to each protein.
E. Specificity of RNA Recognition The diverse functions of various members of the RRM family of proteins are reflected in the different RNAs they recognize (Table 11). However, the RNA-binding site has been defined in only a few cases. For example, PAB protein recognizes the homopolymer poly(A) (22, 40, 41). Although the sequence is simple, poly(A) can assume complex higher-order structures (63), and the role of base-stacking in this protein-nucleic acid interaction has not been studied. An affinity of 2 X lo7 M - for one of the binding domains of PAB protein has been determined (22). It is estimated that 12 adenylate residues constitute the binding site on poly(A). It was also suggested (22)that multiple poly(A)-binding domains on PAB protein could allow transfer of the proteins between strands of poly(A). The helix-destabilizing proteins (UPl, UP2, HDP, and SSB1) (32)appear to recognize ssDNA and ssRNA with relatively little sequence specificity and thereby destabilize native base-paired structures (60, 63-65). UP1 has an affinity for ssDNA of approximately lo7 M - (60, 66). The binding to DNA by both UP1 and T4 gp32 proteins has been extensively investigated and reviewed (67). T4 gp32 also binds to an RNA stem-loop structure in the 5'untranslated region of its mRNA in order to autoregulate its translation (68). Binding sites in RNA for the hnRNP proteins A1 and C1/C2 have been studied by in uitro binding and UV-cross-linking. A subset of these proteins has been bound to intron sequences near splice acceptor sites that included the pyrimidine-rich tract in pre-mRNA (69). Others have shown that these proteins can be UV-cross-linked to RNA sequences near the AAUAAA poly-
193
NUCLEAR RNA-BINDING PROTEINS
TABLE I1 RNA SPECIFICITIESOF THE RRM FAMILY“ Protein
RNA
Specific sequence
UP1I u P2 hnRNP proteins E . coli rho PAB protein La Ro-GOK U1 snRNP-70K and -A U2 snRNP-B” Sex-lethal ( S x l ) Nucleolin Others: tra-2,elav, bicoid, AAIP, CARP, NSRl, etc.
Single strand Pre-mRNA RNA transcripts Poly(A) Polymerase-111 transcripts hY RNAs U1 RNA stem-loop U2 RNA stem-loop transformer (tra) pre-mRNA Pre-rRNA Unassigned
Nonspecific Nonspecific& Cytosine-rich Homopolymer 3’-terminal oligo(U) Specific Specific Specific Specific Unknown Unknown ~
RNA sequence specificity involved in recognition by the RRM family of cellular proteins. Proteins that contain the RRM are listed in approximate order of complexity of the specific RNAs with which they associate. bPreferences for homopolymers or pyrimidine-rich sequences have been demonstrated in some cases. 0
adenylation signal (12, 13). These apparently conflicting results may be reconciled by the proposed existence of high- versus low-affinity binding sites along the pre-mRNA for these proteins (69). The technique of UV-crosslinking does not allow measurements of affinity and is also limited by the possibility that cross-linked regions of RNA and protein may represent coincidental proximation of reactive bases and amino acids rather than representing true binding sites (23).These issues will be resolved by the development of direct binding assays and binding affinity determinations. However, the use of UV-cross-linking does have the significant advantage of allowing examination of protein-RNA association in vivo. The specificity of RNA recognition by the La (70, 71) and Ro-6OK (72) proteins has been demonstrated. The main recognition target of the La protein is the sequence of 3‘-terminal uridylates in precursor RNA-polymerase-I11 transcripts (reviewed in 73). The Ro proteins were shown by nuclease protection studies to bind the stem structure of 3’ and 5’ basepaired termini of the hY RNAs, which are also RNA-polymerase-I11 transcripts (72).In uitro binding assays using recombinant La protein (47) and recombinant Ro-6OK protein (48)were recently developed, but the specificity, stoichiometry, and affinity of RNA binding have not been reported. Rho-dependent transcription termination involves the binding of the protein to untranslated regions of mRNA (61). Rho has affinity for both ssRNA and ssDNA, and high affinity for poly(C) (reviewed in 74 and 75).The ability of rho to recognize both RNA and DNA may be related to its function
194
JACK D. KEENE AND CHARLES C. QUERY
in unwinding RNA-DNA duplexes during the process of transcription release. The RNA recognition properties of the snRNP proteins 70K, A, and B” have been studied using both cell extracts and in uitro binding assays. The 70K protein and perhaps the A protein were suggested to contact stem-loop I and probably other regions of U 1 RNA (76, 77). Recombinant 70K protein requires 31 nucleotides of U1 RNA stem-loop I for binding (26). Contacts with other regions of the RNA were not detected by these assays (26), but could not be ruled out. In addition, direct binding assays show that 36 nucleotides in stem-loop I1 of U 1 RNA are sufficient to bind to the A protein (19). It has been suggested (78) that B” binds to stem-loop 111 or IV of U2 RNA. The extensive sequence similarity of its RRMs to those of the U1snRNP-A protein (79) suggests that it probably recognizes a discrete sequence in a manner similar to the U1-snRNP-A protein (19).Recent studies indicate that stem-loop IV of U2 RNA is involved in the major contact with the B” protein (16). Studies of the sequence specificity of RNA binding by the RRM family of proteins have suffered, in part, for lack of precise probing methods. RNase protection, chemical probing, and fragment binding have been used, but each method has distinct limitations. Application of affinity measurements to a broader group of RRM-containing proteins would improve the understanding of RNA binding specificities. Footprinting methods, like those used for DNA-binding proteins, are hindered by the conformations assumed by RNA, and its altered conformation when bound to protein. Thus, improved methods of RNA-protein structural probing will be needed to define further the base and phosphate contacts between members of the RRM family of proteins and their cognate RNAs.
F. Do RRMs Constitute “RNA-binding Domains“? As knowledge concerning this group of proteins has emerged, many investigators have assumed that any region of a protein that contains the RNA recognition motif or the RNP octamer is an active RNA-binding domain. Such assignment must be viewed with caution, because domains are defined as single units with structural integrity or functional activity (80).In contrast, a motif is defined as a pattern of related sequences. As noted above, the RNA-binding domain of the Ul-snRNP-70K protein includes and requires amino-acid sequences carboxy-terminal to the boundaries of the RRM (32). However, some proteins with identified RRMs contain no sequences carboxy-terminal to the motif. It is possible that dissimilar regions flanking the RRM as well as the less conserved positions within the RRM are involved in the specificity of RNA recognition, as noted above (Section
NUCLEAR RNA-BINDING PROTEINS
195
111,B). It is also possible that some of these RRMs are degenerate and do not associate with RNA. Considering the differences between the apparently functional RNA-binding domains of PAB protein (66 amino acids) and U 1 snRNP-70K (125 amino acids), it seems that the amino-acid sequence requirements for RNA binding by different RRM proteins may vary widely. Therefore, it is important to define each RNA-binding domain as a singular structural or functional unit with the same binding efficiency and specificity as the complete protein. In the example described above (Section III,D), it was found that the B' protein component of U2 snRNPs binds with modest affinity to both U 1 and U2 snRNAs (16). Upon addition of the A' component of U2 snRNPs, the binding of B is specific for U2 snRNA, and the affinity increases dramatically. Furthermore, the U2-snRNP-A' protein does not bind U2 RNA alone, but appears to interact with the U2 snRNP through protein-protein contacts involving a region of leucine periodicity in A' (81).The in uiuo significance of these findings are yet to be demonstrated. However, it is possible that some RRMs cannot function as specific RNA-binding domains except in the presence of accessory proteins. Thus, the RNA-binding domain in such cases may require trans-acting accessory factors in addition to the RRM to be functional. Such accessory proteins may also participate in regulating functions involved in RNA processing that are mediated by RNA binding.
IV. Structural Features of RNA-binding Proteins Containing RRMs A. Classes of Proteins within the RRM Family From the collection of proteins identified that contain the RRM, at least two distinct classes within the family are evident. Most members contain a single RRM (class I), but some contain multiple copies (class 11) (Fig. 3). For example, the snRNA-associated proteins A and B" contain two copies of the RRM, the elau protein contains three copies, while PAB protein and nucleolin each contain four copies. The questions of whether class II members contain multiple RNA-binding domains, whether they contact more than one RNA, and whether multiple RRMs act cooperatively to bind a single RNA have not been addressed experimentally. It is possible that the binding domain of a class I1 protein could require the combined interactions of more than one RRM. For the Ul-snRNP-A (27, 36) and the U2-snRNP-B" (16) proteins, only the amino-terminal RRM has been shown to constitute an RNA-binding domain for U 1 and U2 RNAs, respectively. (See Section III,D for a discussion of the tertiary structure of the U1-snRNP-A protein.)
196
JACK D. KEENE AND CHARLES C. QUERY
B. Modular Structure The members of this family show structural analogy to DNA-binding proteins such as Gal 4, A repressor, and homeo-box (POU) proteins that contain two identifiable surfaces (82, 83). As depicted in Fig. 3, the La, rho, hnRNP-CVC2, and Ro-6OK proteins contain an RRM in the amino-terminal half and an ATP-binding motif (La, rho, and hnRNP-CUC2) or a zinc-binding finger motif (Ro-6OK) in the carboxy-terminal half of the molecule (47, 48, 50, 84). Likewise, the U1-snRNP-70K protein and the Drosophila tra-2 product contain an RRM in the amino-terminal portions and highly charged “RD/RE/RS” sequences in their carboxy-terminal portions (32, 43, 45, 53, 85).Thus, members of the RRM family class I appear to contain two types of interactive surfaces, one that has the potential to bind RNA and another that may interact with other molecules, including some that have regulatory functions in transcription or splicing. For members of class I1 of the RRM family, it is possible that the repeated RRMs are independent binding domains that interact with other RNA sequences. Therefore, these multiple RRM-containing proteins can be viewed also as containing structural modules for multiple molecular interactions.
V. Regulatory Potentials of the RRM Family of Proteins A. Transcription Two biological processes in which these proteins have been implicated are transcription and pre-mRNA processing. For example, the mammalian La protein and the E. coli rho proteins are involved in the termination of RNA transcription (61, 86). These proteins are analogous in terms of both structural organization (Fig. 3) and function. The La protein binds directly to unprocessed transcripts of RNA polymerase 111 (70) and interacts with other components of the transcription complex, such as TFIIIC (86). Rho is involved in factor-dependent termination of transcription by E. coli RNA polymerase (reviewed in 75). The mammalian Ro-6OK protein contains one RRM and a potential zinc-binding finger (48). The function of the Ro RNP is not known, but it has been suggested that it also plays a role in transcription. Ro-6OK and La proteins can co-exist on the same RNP complexes (72). It should be noted, however, that these proteins may also play a role in the processing of RNA transcripts. Thus, all members of the RRM family may be part of a network that is interrelated through RNA processing.
B. RNA Processing Members of the RRM family of proteins that are associated with mRNA or pre-mRNA include PAB protein and the hnRNA-bound proteins, A1 and
NUCLEAR RNA-BINDING PROTEINS
197
C l l C 2 (69). PAB protein is associated with the 3’ poly(A) tails of processed mRNA (41), but its function has not been defined. The hnRNP-A1 and CUC2 proteins have been suggested to have some limited sequence specificity that allows them to select intron sequences (69) or polyadenylation signal sequences (12, 13). Therefore, it is generally accepted that these proteins have a role in mRNA maturation and transport (reviewed in 49). The snRNP proteins contain amino-acid sequence motifs (RRM, RD/RE/RS, leucine periodicity) that participate in RNA-protein and protein-protein interactions to assemble RNP complexes, including the spliceosome. Although the structural versus functional roles of snRNP proteins are not completely understood, the standard U1, U2, U4/U6, and U5 snRNPs and their associated proteins appear to participate in a constitutive (“housekeeping”) pathway that is common to the tissues of most metazoans. Thus, tissue-specific or developmentally specific proteins containing similar amino-acid sequence motifs may participate in a regulatory (alternative) splicing pathway. Strong sequence similarity between a group of Drosophila proteins and the U1-snRNP-70K protein suggests a regulatory function for these members of the RRM family in pre-mRNA splicing. The 70K protein contains two distinct regions that include the U1 RNA-binding domain and two argininerich (RD/RE/RS) regions that consist of repeating arginine-aspartic acid, arginine-glutamic acid, and arginine-serine residues (32, 53). Two Drosophila proteins that are members of the RRM family are the tra-2 (43, 45) and Sx2 products (42). tra-2 and Sxl mRNAs have been shown to be alternatively spliced and to participate in the regulation of the alternative splicing pathway of sex determination, that also involves tra, double-sex (dsx) (87), and several other proteins (reviewed in 45 and 88). tra-2 contains both the RRM and an RD/RE/RS region and, thus, is strikingly similar in its motifs to the 70K protein (Fig. 1). Likewise, the Drosophila bicoid protein appears (89)to be a member of the RRM family, but also contains a homeodomain for DNA binding. Bicoid plays a developmental role in pattern development and may mediate mRNA gradients across the embryo. Two other Drosophila proteins that are quite similar to the RD/RE/RS regions of 70K, tra (90)and the protein product of the suppressor-of-whiteapricot locus, s u ( f l )(91),have been implicated in the autoregulation of their pattern of pre-mRNA splicing (90, 92, 93). Whether tra or s u ( f l ) directly contacts R N A has not been investigated, but both lack the RRM. The gag proteins of type-C retroviruses also contain arginine-rich sequences similar to those in 70K (94), but also lack an apparent RRM. Whether such viral proteins play a role in splicing is not clear. Models may be envisioned in which tra-2 mimics the 70K protein to
198
JACK D. KEENE AND CHARLES C. QUERY
specify splice-site selection, or in which tra-2 or other RD/RE/RS-containing proteins compete with the 70K protein by recognition of pre-mRNA or other splicing factors at a specific pre-mRNA sequence and thereby modulate a putative “trans” function of the RD/RE/RS regions of the 70K protein. By analogy with transcription factors, the RD/RE/RS sequences may be interchangeable among trans-acting splicing factors (95).
VI. Conclusions and Perspectives Pre-mRNA splicing, along with transcription and translation, is an important step in the control of eukaryotic gene expression. A pathway of splicing involving a standard set of constitutive proteins and snRNAs is well studied, but mechanisms that govern specific recognition remain to be elucidated. An amino-acid sequence motif in a family of proteins involved in mRNA splicing is part of an RNA-binding domain that interacts directly with RNA. The resultant RNP complexes involved in splicing may possess different components at points in development where regulatory functions are required. Thus, patterns of pre-mRNA splicing may be dictated by specific sets of trans-acting RNA-binding proteins that modlfy the interactions of the constitutive proteins.
Possible Control Signals Involved in Splice-site Selection It is possible that U l snRNPs associate initially with pre-mRNA through low-specificity electrostatic interactions, perhaps involving the highly charged RD/RE/RS regions of the U1-snRNP-70K protein or other proteins of the U1 snRNP. Preliminary evidence suggests that the RD/RE/RS region enables the 70K protein to interact with a variety of ssRNAs with relatively little RNA sequence specificity (16,26).The U 1 snRNP may be capable ofonedimensional diffusion along the pre-mRNA until an appropriate exon-intron junction is recognized, resulting from RNA-RNA base-pairing of the 5’ end of U 1 RNA with the donor-site consensus sequence (14, 96-99). Electrostatic interactions between the RD/RE/RS regions of 70K and the pre-mRNA, as well as other proteins, may further stabilize the complex, allowing engagement of the U 1 snRNP at the donor site. Donor splice-site selection may be influenced, in part, by competition within the spliceosome by trans-acting regulatory proteins such as those discussed above. For example, electrostatic interactions between the 70K protein and the pre-mRNA might be unfavorable in regions where a trans-acting RNA-binding protein is positioned. Thus, the U 1 snRNP would be displaced from that particular donor splice site if the RNA-RNA interactions were not strong enough to maintain a stable complex. Simultaneous
199
NUCLEAR RNA-BINDING PROTEINS
interactions of U 1 snRNPs with sequences or factors near the branch point and splice acceptor sites could result in formation of a commitment complex similar to that involved in yeast pre-mRNA splicing (100). In spliceosome assembly, electrostatic interactions between the 70K RD/RE/RS sequence on U l snRNPs at the donor splice site and components at the splice acceptor site may be disrupted by the competing RD/RE/RS sequences of trans-acting regulatory proteins. Alternatively, site-specific RRM proteins could attract the 70K protein and U 1 snRNP toward a specific acceptor site. Therefore, members of this family of proteins might serve as negative or positive trans-activators in splice-site selection. Thus, trans-active splicing proteins may have specific RNA-binding domains that recognize the pre-mRNA and also possess an RD/RE/RS sequence. For example, genetic evidence suggests that tra-2 may recognize six repeats of a specific 18nucleotide stretch in an exon of dsx (45, 87). In some regulatory pathways, the specificity of splice-site selection may be controlled by the expression of tissue-specific members of the RRM family. These may be dominant over factors controlling the constitutive splicing pathway. The possibility that splicing patterns in different tissues are determined by a variety of snRNAs appears unlikely because the snRNAs are relatively generic among tissues. It seems reasonable to consider that the RRM family of proteins and those containing RD/RE/RS regions possess trans-acting functions that are tissue-specific and act within the spliceosome or during spliceosome assembly to modulate pre-mRNA processing and/or transport. The identification and analysis of additional members of the RRM family of proteins should help to resolve these models of pre-mRNA splicesite selection by trans-acting regulatory proteins.
REFERENCES 1 . S. H. Kim, G. J. Quigley, F. L. Suddath, A. McPherson, D. Sneden, J. J. Kim, J. Weinzierl and A. Rich, Science 179, 285 (1973). 2. J. D. Robertus, J. E. Ladner, J. T. Finch, D. Rhodes, R. S. Brown, B. F. C. Clarkand A. Klug, Nature 250, 546 (1974). 3. M . A. Rould, J. J. Perona, D. Sol1 and T.A. Steitz, Science 246, 1135 (1989). 4 . J. Carey, V. Cameron, P. L. de Haseth and 0. C. Uhlenbeck, Bchem 22, 2601 (1983). 5. P. J. Romaniuk, P. Lowary, H.-N. Wu, G. Stormo and 0. C. Uhlenbeck, Bchem 26, 1563 (1987). 6. D. R. Engelke, S. Y. Ng, B. S. Shastry and R. G. Roeder, Cell 19, 717 (1980). 7. J. Miller, A. D. McLachlan and A. Klug, E M B O ] . 4, 1609 (1985). 8. M. H. Malim, J. Hauber, S.-Y. Le, J. V. Maize1 and 8. R. Cullen, Nature 338,254 (1989). 9. M. L. Zapp and M. R. Green, Nature 342, 714 (1989). 10. T. J. Daly, K. S. Cook, G. S. Gray, T.E. Maione and J. R. Rusche, Nature 342,816 (1989). 1 1 . A. W. Cochrane, C. H. Chen and C. A. Rosen, PNAS 87, 1198 (1990).
200
JACK D. KEENE AND CHARLES C. QUERY
12. J. Wilusz, D. I. Feig and T. Shenk, MCBiol 8, 4477 (1988). 13. C. L. Moore, J. Chen and J. Whoriskey, EMBO J . 7, 3159 (1988). 14. J. A. Steitz, D. L. Black, V. Gerke, K. A. Parker, A. Kramer, D. Frendeweyand W. Keller, in “Structure and Function of Major and Minor Small Nuclear Ribonucleoprotein Particles” (M. L. Birnstiel, ed.), p. 115. Springer-Verlag, Berlin, 1988. 15. S.-Y. Lin and A. D. Riggs, J M B 72, 671 (1972). 16. R. C. Bentley and J. D. Keene, MCBiol 11, 1829 (1991). 17. V. Gerke and J. A. Steitz, Cell 47, 973 (1986). 18. 1. Tagi, C. Alibert, J. Temsamani, I. Reveillaud, G. Cathala, C. Brunel and P. Jeanteur, Cell 47, 755 (1986). 19. C. Lutz-Freyermuth and J. D. Keene, MCBiol9, 2975 (1989). 20. K. Inoue, K. Hoshijima, H. Sakamoto and Y. Shimura, Nature 344, 461 (1990). 21. R. C. Kelly, D. E. Jensen and P. H. von Hippel, JBC 251, 7240 (1976). 22. A. B. Sachs, R. W. Davis and R. D. Kornberg, MCBiol7, 3268 (1986). 23. K. C. Smith, in “Photochemistry and Photobiology of Nucleic Acids” (S. Y. Wans, ed.), Vol. 2, p. 187. Academic Press, New York, 1976. 24. M. A. Garcia-Blanco, S. F. Jamison and P. A. Sharp, Genes Deu. 4, 1874 (1990). 25. M. R. Lerner and J. A. Steitz, PNAS 76, 5495 (1979). 26. C. C. Query, R. C. Bentley and J. D. Keene, MCBiol9, 4872 (1989). 27. C. Lutz-Freyermuth, C. C. Query and J. D. Keene, PNAS 87, 6393 (1990). 28. A. H. Rosenberg, B. N. Lade, D . 4 . Chui, S.-W. Lin, J. J. Dunn and F. W. Studier, Gene 56, 125 (1987). 29. D. Kenan and J. D. Keene, unpublished. 30. I. A. Hope and K. Struhl, Cell 43, 177 (1985). 31. E. A. Leibold and H. N. Munro, PNAS 85, 21712 (1988). 32. C. C. Query, R. C. Bentley and J. D. Keene, Cell 57, 89 (1989). 33. M. M. Konarska, in ”Methods in Enzymology” (J. E. Dahlberg and J. N. Abelson, eds.), Vol. 180, p. 442. Academic Press, San Diego, California, 1989. 34. P. J. Grabowski and P. A. Sharp, Science 233, 1294 (1986). 35. C. H. Agris, M. E. Nemeroff and R. M. Krug, MCBiol9, 259 (1989). 36. D. Scherly, W. Boelens, W. J. van Venrooij, N. A. Dathan, J. Hamm and I. W. Mattaj, EMBO J . 8, 4263 (1989). 37. C. C. Query and J. D. Keene, unpublished. 38. F. Cobianchi, R. L. Karpel, K. R. Williams, V. Notario and S. H. Wilson, JBC 263, 1063 (1988). 39. P. C. Ryan and D. E. Draper, Bchem 28, 9949 (1989). 40. A. B. Sachs, M. W. Bond and R. D. Kornberg, Cell 45, 827 (1986). 41. S. A. Adam, T. Nakagawa, M. S. Swanson, T. K. Woodruff and G. Dreyfuss, MCBiol 6, 2932 (1986). 42. L. R. Bell, E. M. Maine, P. Schedl and T. W. Cline, Cell 55, 1037 (1988). 43. H. Amrein, M. Gorman and R. Nothiger, Cell 55, 1025 (1988). 44. R. J. Bandziulis, M. S. Swanson and G. Dreyfuss, Genes Deo. 3, 431 (1989). 45. B. S. Baker, Nature 240, 1037 (1989). 46. B. Bugler, H. Bourbon, B. Lapeyre. M. 0. Wallace, J.-H. Chang, F. Amalric and M. 0. J. Olson, JBC 262, 10922 (1987). 47. J. C. Chambers, D. Kenan, B. J. Martin and J. D. Keene, JBC 263, 18043 (1988). 48. S. L. Deutscher, J. B. Harley and J. D. Keene, PNAS 85, 9479 (1988). 49. G . Dreyfuss, M. S. Swanson and S. Piriol-Roma, TZBS 13, 86 (1988). 50. M. S. Swanson, T. Y. Nakagawa, K. LeVan and G. Dreyfuss, MCBiol 7, 1731 (1987). 51. B. Lapeyre, H. Bourbon and F. Amalric, PNAS 84, 1472 (1987).
NUCLEAR RNA-BINDING PROTEINS
20 1
52. B. M. Merrill, K. L. Stone, F. Cobianchi, S. H. Wilson and K. R. Williams, j B C 263, 3307 (1988). 53. H. Theissen, M . Etzerodt, R . Reuter, C. Schneider, F. Lottspeich, P. Argos, R. Liihrmann and L. Philipson, EMBO /. 5, 3209 (1986). 54. K. R. Williams, K. L. Stone, M. B. LoPresti, B. M. Merrill and S. R. Planck, PNAS 82, 5666 (1985). 55. B. M . Merrill and K. H. Williams, in “The Eukaryotic Nucleus: Molecular Biochemistry and Macromolecular Assemblies” (P. Strauss and S. Wilson, eds.), p. 579. Telford, Caldwell, N . J., 1990. 56. S. R. Haynes, M. L. Rebbert, B. A. Moxer, R. Forquignon and I. B. Dawid, PNAS 84, 1819 (1987). 57. J. E. Darnell, Science 202, 1257 (1978). 58. H.-M. Bourbon, B. Lapeyre and R. A m a h , ] M B 200, 627 (1988). 59. M . Etzerodt, R. Vignali, G. Ciliberto, D. Scherly, I. W. Mattaj and L. Philipson, EMBO ]. 7, 4311 (1988). 60. G. Herrick and B. Alberts, JBC 251, 2133 (1976). 61. A. J. Dombroski and T.Platt, PNAS 85, 2538 (1988). 62. S. L. Deutscher and J. D. Keene, unpublished. 63. S. L. Broitman, D. D. Im and J. R. Fresco, PNAS 84, 51209 (1987). 64. S. R. Planck and S. H. Wilson, ]BC 255, 11547 (1980). 65. A. Y.-S. Jong, M. W. Clark, M. Gilbert, A. Oehm and J. L. Campbell, MCBiol7, 2947 (1987). 66. R. L. Karpel and A. C. Burchard, Bchem 19, 4674 (1980). 67. J. W. Chase and K. R. Williams, ARB 55, 103 (1986). 68. H. M. Krisch and B. Allet, PNAS 79, 4937 (1982). 69. M. S. Swanson and G. Dreyfuss, EMBO]. 7, 3519 (1988). 70. J. Rinke and J. A. Steitz, Cell 29, 149 (1982). 71. J. E. Stefano, Cell 29, 149 (1984). 72. S. L. Wolin and J. A. Steitz, PNAS 81, 1996 (1984). 73. J. D. Keene, S. L. Deutscher, D. Kenan and A. Kelekar, Mol. Biol. Rep. 12, 235 (1987). 74. P. H. von Hippel, D. G. Bear, W. D. Morgan and J. A. McSwiggen, ARB 53,389 (1984). 75. T. Platt, ARB 55, 339 (1986). 76. J. Hamm, M. Kazrnaier and I. W. Mattaj, EMBO]. 6, 3479 (1987). 77. J. Patton and T. Pederson, PNAS 85, 747 (1988). 78. I. W. Mattaj, Cell 46, 905 (1986). 79. P. T. G. Sillekens, W. J. Habets, R. P. Beijer and W. J. van Venrooij, EMBO]. 6, 3841 (1987). 80. W. R. Taylor, in ”Nucleic Acid and Protein Sequence Analysis: A Practical Approach” (M.J. Bishop and C. J. Rawlings, eds.), p. 290. IRL Press, Washington, D.C., 1987. 81. L. D. Fresco, D. S. Harper and J. D. Keene, MCBiol 11, 1578 (1991). 82. M. Ptashne, Nature 335, 683 (1988). 83. M. Levine and T. Hoey, Cell 55, 537 (1988). 84. J. L. Pinkham and T. Platt, NARes 11, 3531 (1983). 85. R. A. Spritz, K. Strunk, C. S. Surowy, S. 0. Hoch, D. E. Barton and U. Francke, NARes 15, 10373 (1987). 86. E. Gottlieb and J. A. Steitz, E M B O J . 8, 851 (1989). 87. K. C. Burtis and B. S. Baker, Cell 56, 997 (1989). 88. J. Hodgkin, Cell 56, 905 (1989). 89. M. Rebagliati, Cell 58, 231 (1989). 90. R . T. Boggs, P. Gregor, S. Idriss, J. M. Belote and M. McKeown, Cell 50, 739 (1987).
202
JACK D. KEENE AND CHARLES C. QUERY
91. Z. Zachar, T,-B. Chou and P. M. Bingham, EMBO J . 6, 4105 (1987). 92. P. M. Bingham, T.-B. Chou, I. Mims and Z. Zachar, Trends Genet. 4, 134 (1988). 93. T.-B. Chou, Z. Zachar and P. M. Bingham, EMBO J . 6, 4095 (1987). 94. C. C. Query and J. D. Keene, Cell 51, 211 (1987). 95. P. Bingham, personal communication. 96. S. M. Mount, I Pettersson, M. Hinterberger, A. Karmas and J. A. Steitz, Cell 33, 509 (1983). 97. P. A. Sharp, Science 235, 766 (1987). 98. T. Maniatis and R. Reed, Nature 325, 673 (1987). 99. C. Guthrie and B. Patterson, ARGen 22, 387 (1988). 100. B. Seraphin and M. Rosbash, Cell 59, 349 (1989).
NOTE ADDED IN PROOF:Since this manuscript was submitted (April 1990). relevant papers by Scherly et al. [Nature345,502 (1990)and E M B O J . 9,3675 (1990)] concerning the specificity of RNA recognition have appeared. In addition, papers by Nagai et a/. [Nature 348, 515 (1990)l and by Hoffman et al. [PNAS 88, 2495 (1991)]have described the tertiary structure of the U1 RNA-binding domain of the U1-snRNP-A protein. Additional RRM family members have recently been reported. These include: eukaryotic initiation factor-4B (eIF-4B) [ Milburn et a/., E M B O J . 9, 2783 (1990)l;Bj6, a chromosomal puff-specific protein product of the Drosophila no-on transient A gene that is required for correct visual system development [von Besser et al., Chromosoma 100,37 (1990);Jones and Rubin, Neuron 4,711 (1990)l;X16, that also contains an RDIREIRS-like region and is expressed differentially in tissues [Ayane et al., NARes. 19, 1273 (1991)l;CARP, a malarial clustered asparagine-rich protein [Kuma et al., FEBS Lett. 260, 67 (1990)l;429gp10, an RNA-associatedviral shell prohead connector [Grimes and Anderson, J M B 215, 559 (1990)l;and several chloroplastid proteins [Li and Sugiura, E M B O ] 9, 3059 (1990)]. We thank the laboratories of Mariano Garcia-Blanco, Christine Guthrie, Adrian Krainer, Jim Manley, and Teri MBIbse for communicating results prior to publication.
Amplification of DNA Sequences in Mammalian Cells L. HAMLIN,~ TZENG-HORNG LEU, JAMES P. VAUGHN, CHI MA AND PIETER A. DIJKWEL JOYCE
Department of Biochemistry University of Virginia School of Medicine Charlottesville, Virginia 22908
I. Historical Development of the Amplification Field . . . . . . . . . . . . . . . . . . . 11. Occurrence of Amplified DNA Sequences 111. Properties of Amplified DNA . . . . . . . . . . . A. Cytological Characteristics B. Size and Structure of Ampli ................. IV. Possible Mechanisms and Ways to Discriminate among Them ......... A. Unequal Sister-chromatid Exchange ...................... B. Re-replication ................ C. Deletion and Episome Formation ......................... D . Conservative Transposition . . . . . . . . . ........... V. Usefulness of Cell Lines Bearing A ...... VI. Conclusions and Future Directions ....................... References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205 206 207 207 213 218 219 22 1 224 226 228 232 232
DNA sequence amplification is a process in which a limited part of the genome attains a higher copy number per nuclear equivalent than the rest of the genome. Amplification occurs in a variety of organisms in the evolutionary spectrum, ranging from bacteria to mammals. It can be a natural, regulated mechanism that allows particular cell types to produce large quantities of needed proteins during defined developmental stages. Examples of this kind of selective process include: ribosomal DNA amplification in amphibian oocytes (1-3) and in tetrahymena (e.g., 4), and chorion gene amplification in the eggshell chambers of Drosophila and other insects (e.g., 5-7). In each of these cases, amplification is initiated by an over-replication mechanism that is presumably controlled by the interaction of trans-acting factors with cis-regulatory “origins” of replication near the center of the amplified sequence (the amplicon). In the case of the chorion genes in Drosophila, the extra daughter duplexes remain side by side in the chromosome in an onion1 To
whom correspondence may be addressed. 203
Progress in Nucleic Acid Research and Molecular Biology, Vol. 41
Copyright 8 1591 by Academic Press, Inc. All rights of reproduction in any form reserved.
204
JOYCE L. HAMLIN ET AL.
skin array (8), while the supernumerary copies of rDNA are released as extrachromosomal circular or linear molecules in amphibian oocytes (9) and Tetrahymena (10-12), respectively. The second general process of localized DNA sequence amplification is an aberrant one that usually occurs only rarely in cells. Amplification of DNA sequences containing selectable markers has been observed in several strains of bacteria (13)and yeast (14-17), in Leischmuniu (18-20) and Plasmodium chabaudi ( Z ) ,and is responsible for some forms of insecticide resistance in insects (22). In recent years, DNA sequence amplification has been intensively investigated in connection with drug resistance and tumorigenesis in mammalian cells. The hallmark of most aberrant amplification phenomena is that the amplicons are arranged end to end in tandem arrays either in chromosomes or as extrachromosomal elements. Furthermore, the amplicons from a given locus can be extremely variable in size and position with respect to the selectable marker in each case, both among different cell lines of the same species and within the same cell line (23-32). This property suggests that the responsible mechanisms have an inherently uncontrolled (possibly random) component. The copy number of the sequence in question can range from a few to as many as 10,000 in some drugresistant mammalian cell lines (28, 33), suggesting that whatever mechanisms are involved, they are repeated over and over, and in some cases with high fidelity. In this review, we focus exclusively on the aberrant DNA sequence amplification processes that occur in mammalian cells, because of the clinical relevance to drug resistance and oncogene amplification in tumors, and because the underlying mechanisms seem close to being understood at the molecular level. We show that what seemed to be a relatively esoteric mutational phenomenon is now a major determining factor in the genesis of cancer, as well as a serious deterrent to successful chemotherapeutic drug treatment regimens. We trace the history of the discovery of DNA sequence amplification, cite several examples, and attempt to formulate the similarities and differences among these systems. We also discuss several important methodologies developed for studying amplified sequences. We review the viable models that attempt to explain the phenomenon in mammalian cells, and attempt to show how the resolution of the underlying molecular mechanisms of amplification promises to teach us much about both normal and aberrant chromosome dynamics (e.g., replication, recombination and repair processes, and chromosomal breakage and healing). We suggest several earlier reviews (34-38) that can serve as background for the present article, many of which contain more detailed treatments of amplification mechanisms in organisms other than mammals (see particularly 37).
DNA SEQUENCE AMPLIFICATION
205
1. Historical Development of the Amplification Field Resistance to the chemotherapeutic agent methotrexate (MTX)2has been of great interest to tumor cell biologists for many years. It was shown rather early that resistance in model tumor cell lines can be explained by elevated levels of the target enzyme dihydrofolate reductase (DHFR) (39).In 1969, it was suggested (40) that the DHFR gene might be amplified, accounting for the elevated enzyme levels. However, this possibility was not seriously entertained until the late 1970s, when several important observations were made almost simultaneously. In studies of MTX-resistant murine cell lines that overproduced DHFR, it was shown that these cell lines contained elevated levels of DHFR mRNA (4). At the same time, the critical observation was made that extremely MTX-resistant Chinese hamster cells display abnormally staining regions (ABRs) on their chromosomes when subjected to Giemsa-banding techniques (42, 43). It was suggested that these expanded chromosomal regions This suggestion was borne out in might contain amplified DHFR genes (44). subsequent in situ hybridization studies on MTX-resistant Chinese hamster cell lines (45-47) and on some murine cell lines (as), in which a DHFR cDNA detected multiple copies of the cognate gene within the chromosomal ABRs. However, certain other MTX-resistant murine cell lines lacked detectable ABRs and, instead, displayed small acentromeric chromatin bodies known as double minutes (DMs) (49). Sucrose gradient fractionation of mitotic chromosomes and hybridization of the fractions to DHFR-specific probes showed convincingly that these extrachromosomal elements also contain amplified DHFR genes (49). The importance of the cytological observations relating ABRs and DMs to amplification of a particular genetic sequence cannot be overemphasized. Almost immediately, it was realized that similar structures had often been observed in material taken from tumor biopsies or from cell lines established from such tumors (reviewed in 50). However, at the time, there was simply no way of knowing that the ABRs and DMs in human tumors might contain amplified DNA sequences, until it was possible to clone specific molecular probes for oncogenes. Thus, appreciation of the importance of ABRs and DMs to drug resistance and oncogenesis was critically dependent on recombinant DNA technology. As a consequence, the field of DNA sequence 2 Terms and abbreviations used: ABR, abnormally banding chromosome region; amplicon, an amplified sequence; APRT, adenosine phosphoribosyltransferase; CAD, carbamoyl-phosphate synthetase, aspartate transcarbamoylase, dihydroorotase; DHFR, dihydrofolate reductase; DM, double-minute chromosome body; HAT, a medium containing hypoxanthine, adenosine, and thymidine; MTX, methotrexate; PALA, N-phosphonoacetyl-L-aspartate;TIC, thymidine kinase.
206
JOYCE L. HAMLIN ET AL.
amplification in mammalian cells has really blossomed only in the last 12 years. However, as we will see, the underlying mechanisms responsible for this widespread and clinically important phenomenon are still poorly understood.
II. Occurrence of Amplified DNA Sequences Amplification is one of the most common mechanisms by which mammalian cells acquire resistance to selective agents (35-38), although in some instances, drug resistance in mammalian cells can arise from point mutations that encode altered enzymes with lowered affinity for the drug (e.g., 51 and 52; reviewed in 53) or altered transport proteins (54).In cultured cells, any good competitive inhibitor of an enzyme or other protein activity can be used as a selective agent. Thus, the following inhibitors select for cells that have (or probably have) amplified the indicated genes: MTX for DHFR (reviewed in 54); PALA for the multi-enzyme CAD complex (55, 56); 5fluorodeoxyuridine for thymidylate synthase (57,58); albizziin for asparagine synthetase (59, 60); adenine or coformycin for AMP deaminase (61, 83); adenosine or deoxycoformycin for adenosine deaminase (62, 63); hydroxyurea for ribonucleotide reductase (64-68); compactin for HMG-CoA reductase (69, 70); heavy metals for metallothionein (71-73); ouabain for Na, K-ATPase (734; colchicine for microtubular proteins (74, 75); nitrogen mustards for a glutathionine S-transferase (76); 6-azauridine and pyrazofurin for UMP synthetase (77, 78); methionine sulfoximine for glutamine synthetase (79, 80); a-methyl ornithine for ornithine decarboxylase (81, 82); mycophenolic acid for IMP dehydrogenase (84); tunicamycin for N-acetylglucosaminyltransferase (85); borrelidin for threonyl-tRNA synthetase (86); and canavanine for argininosuccinate synthase (87). Most important additions to this list are the genes that confer cross-resistance to a variety of antitumor agents through the amplification of a broad-specificity drug-efflux transporter (88-93). The great variety of different genes that can be amplified in response to selective agents suggests that there is nothing special about the loci themselves, and that if a suitable selective agent can be found for any given enzyme or protein, its cognate gene can be amplified, There is another most interesting and important group of amplifiable genes to be added to the already lengthy list. As mentioned earlier, the cytological hallmarks of amplification (ABRs and DMs) were first observed in tumor tissues and tumor cell lines established from these tissues. However, when the first cloned oncogene probes became available, it was possible to show that several different tumors actually contain amplified oncogenes. Thus, in early studies, the human myeloid leukemia cell line, HL60, and its progenitor tumor (M), as well as the human neuroendocrine tumor lines, Colo 320 HSR and Colo 320 DM, were all shown to contain multiple copies
DNA SEQUENCE AMPLIFICATION
207
of the cellular oncogene c-myc (95). The gene c-Ki-rus was also amplified in the murine adrenocortical cell lines Y1-DM and Y1-HSR (96), as was c-ubl in the human myelogenous leukemia cell line K-562 (97). The number of other oncogenes since shown to be amplified in different human tumors and tumor cell lines now seems to be limited only by the number of oncogenes for which specific DNA probes are available. The following is a partial list: c-myc in breast carcinomas (98, 99), osteosarcomas (IOO), lung carcinomas (101-105),peripheral T-cell non-Hodgkin’s lymphomas (106), carcinogen-induced liver tumors (107),and adenocarcinomas (108); n-myc in neuroblastomas (109, 110) and small-cell lung carcinomas (101-104);c-myb in small-cell lung carcinomas (101);Ki-rus and n-rus in ovarian (111, 112), lung (105), and embryonal carcinomas (113);Ha-rus in carcinogen-induced tumors (114);ets-1 in acute myelomonocytic leukemia (115); erbB-2 (neulHer-2) in gastric (116),breast (99, 117), lung (109, and adenocarcinomas (118);c-erbB-1 (the epidermal-growth-factor receptor) in squamous cell (119, 120), breast (121), and lung (105) carcinomas, and in gliomas (122, 123); and Hist-1 and Znt-2 in esophageal (124) and stomach carcinomas (125). It has been suggested that amplification of cellular oncogenes is related to tumor progression as opposed to the initiation of the transforming event itself (reviewed in 126 and 127). An important point is that, in general, amplification in mammalian systems has been observed almost exclusively in tumor cells, but rarely in normal cells (128-130), with the interesting exception of the amplification of genes encoding cholinesterases in instances of continual exposure to insecticides in humans (131).A hallmark of tumor cells is their genetic instability, and drug resistance in tumors may be aided by this instability. Transfected selectable genes can be amplified at virtually any site into which they integrate (e.g., 132-134), although insertion into particular sites can lead to unusually high frequencies of amplification (134). It has been suggested that the latter result arises from the presence of an origin of replication close to the site of insertion, which is suggested to facilitate overreplication (135). In any case, there do not appear to be stringent requirements on the nature of surrounding sequences for a gene to be amplified (apart from the ability of the selectable gene to be expressed).
111. Properties of Amplified DNA A. Cytological Characteristics The modern field of DNA sequence amplification began when the critical connection was made between aberrant chromosomal structures (ABRs and DMs) and amplification of defined sequences. These two aberrant chromosomal forms are basically diagnostic of DNA sequence amplification. Chi-
208
JOYCE L . HAMLIN ET AL.
nese hamster cell lines always, and rat cells usually, manifest amplified DNA sequences as ABRs, regardless of the locus in question. Cultured murine cell lines usually display DMs, while human cell lines can display either ABRs or DMs. It has been suggested that the relatively stable karyotypes displayed by Chinese hamster and rat cell lines may explain the maintenance of amplified sequences as stable ABRs (36,38).There have also been suggestions that ABRs and DMs may be interconvertible under some circumstances (126, 136, 137) and, indeed, MTX-resistant murine tumor cell lines appear to oscillate between one form and the other, depending on whether the cells are maintained as an ascites tumor in the animal or as cultured cell lines in uitro (138). However, there is evidence in only a few cases that one form gives rise to the other (e.g., 137), and the two forms are only rarely observed in the same cell (139, 140). There is also an accumulation of evidence suggesting that, while ABRs are often found in human tumor cell lines in culture, the prevalent manifestation of amplification in human tumors in oiuo is in the form of DMs. A search of the literature (126) revealed that 95% of all fresh human tumors display DMs by standard karyotyping procedures, while only 66% of the cell lines established from human tumors manifest DMs. At present, there is no good explanation for why certain tumors or cell lines maintain amplicons either as ABRs or DMs, since it does not seem to be related strictly to organism or to the locus in question.
1. ABNORMALLYBANDINGCHROMOSOME REGIONS With the development of the Giemsa-staining technique (141), it became possible to karyotype mammalian chromosomes with much more precision and accuracy than was previously possible. The expert application of this method to antifolate-resistant Chinese hamster cell lines and several cultured neuroblastoma cells allowed the first description of the ABRs that characterize DNA sequence amplification to high copy number in many cell types. However, the elongation of selected chromosomes in mitotic spreads from tumor cells had been noticed even before the advent of Giemsa-banding techniques (see 34). Within a given cell line, there can be a single ABR or several, and they can be interstitial or telomeric. ABRs are often found on the same chromosome as the single-copy gene prior to amplification. However, studies of the amplified DHFR gene in Chinese hamster cells revealed that, while often on the parental chromosome, ABRs are usually not at the same band as the single-copy parental locus (75). Recent in situ hybridization studies confirm these findings (142). It is also clear from such studies that multiple amplicons can be present on a chromo-
DNA SEQUENCE AMPLIFICATION
209
some arm without being detectable as an ABR (e.g., 47, 143 and I&), possibly because the repeating units are so small that their amplification does not result in a noticeable distortion in chromosome length or banding pattern. The usual appearance of ABRs after Giemsa banding is either relatively uniform intermediate staining, or uniform intermediate staining punctuated at regular intervals with fine dark-staining bands. Most ABRs are therefore considered to be largely euchromatic, which correlates with genetically active chromatin (145).In addition, ABRs are usually found to be early-replicating, which is another property of active chromatin (145).However, the ABRs in antifolate-resistant Chinese hamster cells stain darkly in the Cbanding technique (146),which is almost invariably correlated with inactive chromatin (145).The significance of this observation is not yet understood, but ABRs may eventually be useful in understanding many older empirical observations relating to chromosome structure. When it was appreciated that ABRs represent amplified DNA sequences, and when molecular probes for the sequences in question became available, in situ hybridization studies with radioactive probes showed that a uniform staining pattern is unique to cell lines with very high copy numbers of the amplified sequences; cell lines isolated earlier in the drug-selection regimen could display patchy staining patterns, indicating that initial amplification events might involve very large repeating structures (147). While the low resolving power of hybridization studies with tritiumlabeled probes precluded a detailed investigation of the arrangement of amplicons in drug-resistant cell lines, the increased resolving power of the fluorescent detection method (148,149)supports the early suggestions that initial units of amplification are probably very large relative to amplicons in later stages of the process (147,150). In a recent study of CHO cells selected for amplification of the DHFR gene after only two selection steps involving about 40 cell-doublings in the presence of drug, the first observable DHFR amplicons are usually located on the same chromosome arm as the parental single-copy DHFR gene (142). However, the amplicon clusters were usually far away from the single-copy locus, often at the telomere, and sometimes farther from the parental locus than the usual end of the chromosome (142).As early as amplicons could be observed during the drug-selection regimen, most cell lines already contained more than 10 copies of the gene, and often more than 25-50 copies per cell, many more than required by the drug dose (B. Trask and J. L. Hamlin, unpublished observations). No cells were observed to have only one or a few extra copies. In the cell lines containing the lowest copy numbers of amplicons, there was evidence that the units of amplification could be extremely large [i.e.,
210
JOYCE L. HAMLIN ET AL.
fluorescent spots emanating from the DHFR-specific probes were separated from one another by several megabases within a cluster of amplicons (142). Finally, even at these early stages of drug selection, amplicons were sometimes found on chromosomes other than the parental chromosomes 2 or 22, but in almost every case, these could be explained by translocation from one of the two parental chromosomes. In fact, the patterns of amplification in these reasonably early stages suggested extreme instability due to breakagefusion-bridge cycles (151,152) set in motion by breakages within amplicon clusters. These findings may explain earlier results from Giemsa-banding studies in which ABRs in moderately MTX-resistant Chinese hamster cells were often observed to lie next to translocation breakpoints (144). Amplicon instability could also explain the extreme variability of DHFR gene copy number detected in studies of the early stages of DHFR gene amplification in MTX-resistant CHO cells (153). Virtually identical results were recently obtained in a similar in situ hybridization study of CAD gene amplification during development of PALA-resistance in Syrian hamster cells (154).The possible origins of the special patterns observed in early amplification steps are discussed in Section IV. Since ABRs become longer (as opposed to thicker) as the copy number of the amplified sequence increases, the multiple copies of the amplicon must be arranged end to end in the genome, as opposed to side by side in “onionskin” arrays. Formal proof for this suggestion derived from the molecular cloning of interamplicon junctions and, in some cases, whole amplicon equivalents (see Section 111,B). With the combined approaches of fluorescent in situ hybridization and the Giemsa-banding technique, it is now possible to examine more critically the microscopic appearance of ABRs. In condensed mitotic chromosomes, amplified DNA (ABRs)appears to be packaged the same way as the rest of the genome, with the same diameter and apparently the same DNA content per unit length as the surrounding chromatin. Interestingly, in interphase, the amplified sequences remain rather tightly clustered in small sectors of the nucleus, suggesting that there is a tight constraint on the region of the nucleus in which a given chromosomal domain can find itself (142). However, in at least one MTX-resistant cell line with about lo00 copies of the DHFR gene (142), the DHFR amplicons appear to herniate out of the smooth and rounded periphery of the nucleus as if not properly folded into a higher-order structure, or not attached properly to the subnuclear scaffolding. This finding suggests that there may be a limited number of attachment sites in a particular sector of the nucleus, and that a thousand extra copies of a defined sequence in a small volume can overtitrate these sites. With the higher resolution made possible by image reconstruction techniques (confocal microscopy, etc.), and the use of defined molecular probes in combina-
DNA SEQUENCE AMPLIFICATION
211
tion with fluorescence in situ hybridization, it should now be possible to examine the fine structure of chromosome folding at a supramolecular level, to learn more about nuclear architecture. 2. DOUBLE-MINUTE CHROMATIN BODIES These small, paired (but sometimes single), acentromeric bodies were noticed many years ago in tumor tissue, and have been detected in many examples of gene amplification in mammalian cells (34, 35, 38). Their numbers per cell can range from one or a few to thousands, and they can be almost submicroscopic or the size of small chromosomes (34).There is some indication that they can change in size, morphology, and number during propagation of cells in culture (34). In an MTX-resistant murine line derived from a lymphoma, extrachromosomal elements bearing amplified DHFR genes appear to oscillate between DMs and small rods and rings in a constantly changing spectrum of chromosomal forms (150). Like ABRs, DMs stain uniformly to an intermediate degree that, in combination with their small size, has sometimes made their detection difficult in standard Giemsa-stained preparations. Unlike the rods and rings cited above, they usually do not appear to contain any interstitial chromatin that stains darkly with Giemsa, and therefore may have lost any such heterochromatic material before becoming bona fide DMs. DMs may therefore have been missed in older preparations of tumor and drug-resistant cell lines before descriptions of them became so prevalent in the literature in the 1970s. Because DMs lack centromeres, they also do not stain in the Cbanding technique (which detects centromeric heterochromatin), and are easiest to detect in ethidium-bromide- or quinacrine-stained mitotic spreads (e.g., 34). As with the ABRs, the amplicons in DMs are thought to be arranged in end-to-end arrays, the smallest DMs containing only one or a few repeating units and the largest bearing several. Electron-microscope studies have suggested that the chromatin in DMs is packaged in a manner similar to the chromatin in the rest of the genome, except for the absence of centromeres (155).The failure to find chromosomal ends suggests that the DNA fibers in DMs are circular (155). DMs replicate only once during the S period of the cell cycle (137, 156, 157)and, presumably, appear to be double because the daughter chromatids have not yet separated. In mitotic chromosome spreads, DMs often appear in clusters that are closely apposed to regular chromosomes. An entire cluster (or most of it) may be carried along passively by this association during mitosis to only one of the daughter cells, resulting in nondisjunction. DMs can also be lost from cells by micronucleation (34). Furthermore, in the
212
JOYCE L. HAMLIN ET AL.
absence of selection, cells bearing DMs are at a growth disadvantage relative to cells not containing such structures (158). As a consequence of these properties, DMs are quite unstable relative to ABRs, and can be lost from a population within 15-20 cell-doublings in the absence of strong selection for the amplified genes that they contain (158). Highly drug-resistant cell lines that carry amplified genes on ABRs are much more stable and can sometimes be propagated in the absence of drugs for years with undetectable or very slow loss of the ABRs (159). However, any loss of resistance observed in ABR-containing cells is not necessarily a reversal of the amplification process itself, as the same result would be obtained if ABRs were particularly subject to chromosomal breakage, leading to bridgebreakage-fusion cycles and loss of acentromeric chromosome fragments containing parts of ABRs (see 142). The differing stabilities of the drug-resistance phenotype in individual cell lines were thought, in early studies, to indicate whether the amplicons were maintained as either ABRs or DMs. However, a very interesting series of experiments indicates that assignments based solely on stability might be fallacious, particularly at low levels of drug resistance. A fluorescent MTX analog in very low, nonselective concentrations was used to detect CHO cells with elevated levels of DHFR (153). Surprisingly, a wide spectrum of enzyme levels in this unselected population was observed. In most cases of elevated enzyme, the corresponding cells contained increased copy numbers of the DHFR gene. Interestingly, the frequency of amplification by this criterion was estimated to be about lop3 (160), considerably above the estimates based on drug-selection regimens to lop6(38a)l.Furthermore, it was the cells with high copy number that gave rise to stable, highly drug-resistant variants when subjected to selective concentrations of MTX (cells with increased enzyme levels but with normal gene copy numbers could arise as a result of mutations, leading to increased rates of transcription of the gene or translation of the mRNA). By itself, this result implies that, in any cell population, all of the amplificants that could ever be selected, regardless of copy number, may already be there, and could eventually be selected with an appropriate dose of the drug in question, assuming that those with extremely high copy numbers are very rare and provided that enough starting cells are examined. The results of this study are in general agreement with studies on the initial steps of CAD gene amplification in Syrian hamster cells, in which fluctuation analysis suggests that CAD amplificants are already in the population before the selective agent, PALA, is applied (28). Surprisingly, in the absence of MTX selection, DHFR gene amplification in CHO cells (in which amplicons normally appear in ABRs) was shown to be
DNA SEQUENCE AMPLIFICATION
213
extremely unstable in the early stages of amplification, quite variable from cell to cell, and a cell line selected on the fluorescence-activated cell sorter for a high gene copy number could lose most (but usually not all) of the extra copies during continued propagation in the absence of drug (153).This result was taken as evidence that the initial amplified units are episomal in nature, and are stabilized later in the drug-selection regimen by integration of the episomes into chromosomes. However, it could be argued equally well that initial amplification events are actually intrachromosomal [as suggested by the in situ hybridization studies discussed above (142, IN)], but that frequent breakage within an initial amplicon cluster results in the generation of amplicon-bearing acentromeric fragments that are lost in the absence of selection. The few stably integrated amplicons that remain could then give rise to the stable MTX-resistant CHO cells obtained after long-term selection on MTX (see Section V).
B. Size and Structure of Amplified DNA Sequences In early studies, the average size of amplicons was estimated by determining the percentage of the genome represented by ABRs, converting that number into base-pairs, and then dividing by the estimated copy number of the gene (45, 47, 147). An in-gel renaturation method that allows the direct visualization of amplified restriction fragments in gels has also been used to estimate amplicon size (161, 1 6 1 ~ )In . this method, a genomic restriction digest (end-labeled with 32P)is separated on an agarose gel, the gel is soaked in alkali to denature the DNA into single strands in situ, and the gel is exposed to a neutral salt solution long enough to allow renaturation of amplified restriction fragments but not single-copy fragments. The gel is then treated with a solution of S1 nuclease to remove the unhybridized, singlestrand, single-copy sequences, and the DNA in the gel is transferred to a membrane. After exposure to X-ray film, amplified restriction fragments are easily visualized against a perfectly clear background (161). The beauty of the in-gel renaturation technique is that no knowledge of the nature of the amplified sequence is necessary, no specific probes are required to detect amplified restriction fragments, and the amplified sequences remaining in the gel can be eluted and cloned. Thus, sequences common to several multi-drug-resistant cell lines were identified (162), cloned (30),and used to isolate cDNAs that encode the efflux proteins responsible for drug resistance in this case (163 and references therein). Using the in-gel renaturation method, it is also possible to estimate the size of individual amplified fragments and therefore the minimum size of consensus amplicons (26, 31, 162). Using these and other methods to estimate amplicon size, it is clear that the units of amplification in mammalian cells are always much larger than the
214
,
JOYCE L. HAMLIN ET AL.
gene whose amplification is being selected by the drug-treatment regimen or the growth conditions (in the case of oncogenes). Amplicons can range from as small as about 240 kb to more than 104 kb in length (164,165).This raises the question as to why such large amounts of flanking, colinear DNA are included in each amplified unit. A “snapback” hybridization procedure has been used to ask whether amplicons might be organized into head-to-head and tail-to-tail palindromic arrays in the genome (166).Restriction digests of the rapidly reannealing sequences were then detected in Southern blots, using specific hybridization probes for the amplified genes in question. Palindromes were actually found in the CAD amplicons of PALA-resistant Syrian hamster cells and in the amplicons of several neuroblastomas (166).An integrated polyoma virus that undergoes amplification in situ is also arranged head to head (167).In addition, closely related genes in gene families are often arranged in head-tohead configurations, suggesting that the formation of palindromes has been a frequent evolutionary event. From the earliest studies of amplified DNA, it was apparent that in some cell lines, amplicons can represent a very heterogenous collection of structures both among cell lines amplifying the same gene and within a single cell line, and regardless ofwhether the amplicons are borne on ABRs or DMs. In an effort to understand the structural organization of amplicons, more than 240 kb of contiguous DNA sequence was cloned from the DHFR amplicons of two different MTX-resistant murine cell lines (23).Among these cloned sequences were several novel fragments not present in the drug-sensitive parent, and which resulted from the union between two amplicons. These “junction” fragments were amplified in the drug-resistant cell line from which the library was made (i.e., some amplicons had identical endpoints), but to lesser degrees than the DHFR gene itself. This suggested that all amplicons were not joined at this sequence, which in turn suggested sequence heterogeneity among amplicons in a single cell line. Evidence was also obtained for the joining of unrelated sequences from distant loci to the DHFR amplicons during the amplification process (23). When the cloned sequences were used to probe restriction digests of genomic DNA from several other independently isolated MTX-resistant murine cell lines in which the DHFR amplicons were located either in stable ABRs or in unstable DMs, some amplicons were found to be smaller, some were larger, and some were positioned differently around the murine DHFR gene (168).Furthermore, during propagation of a cloned cell line at a drug level that did not require further amplification of the DHFR gene, amplicon structures were observed to change with time (23). The restriction patterns of DHFR amplicons from five different MTX-
DNA SEQUENCE AMPLIFICATION
215
resistant murine lymphomas and melanomas are unique to each cell line (24), and many novel junction fragments were found among recombinant clones isolated from a single cell line, indicating the presence of variablesized amplicons within a cell (25). Interestingly, however, a single DM from a mouse lymphoma cell line has the same complex restriction pattern on ethidium-bromide-stained gels as does the entire DM population from that cell line (169). These studies were confirmed by hybridizing a series of cloned sequences isolated from the purified DMs of one cell line to digests of genomic DNA from other MTX-resistant cell lines (25). In a study of the amplified CAD gene in PALA-resistant Syrian hamster cells (carried on ABRs), advantage was taken of the high copy number of the amplified sequences to develop a clever differential screening technique that does not require laborious chromosome-walking steps to isolate large numbers of clones from amplified DNA (170). Among several hundred kilobases of cloned sequence from the CAD locus, many different interamplicon junctions were found in the single cell line from which the library was made, none of them corresponded to junctions in other PALA-resistant cell lines, and none of them was amplified to the same degree as the CAD gene itself (27). These results evoke a picture in which a heterogeneous collection of amplicons with different endpoints exists in a single cell line, so that the copy number of amplified sequences is highest near the selected gene and decreases in number with distance away from the gene. The fact that so many different branchpoints (interamplicon junctions) exist in a single cell line may explain why it has been so difficult to clone the equivalent of a complete amplicon in its entirety from any mammalian cell line. In a further study of the CAD amplicon, in which single-step, PALAresistant mutants were selected and libraries of amplified DNA were isolated, so few junctions were recovered that the initial amplicons were calculated to be as great as 104 kb in length (165). Since the amplicons in more highly PALA-resistant cell lines are thought to be much smaller (based on the length of ABRs and the copy number of the gene), it was suggested that the initial large amplicons may become trimmed by unknown mechanisms during the many selection steps leading to high copy number (165). Additional evidence for this notion has been obtained recently and is discussed in Section IV. Data from various other systems also suggest heterogeneity among amplicon structures even in a single cell line. For example, in several coformycin- (32) and multi-drug-resistant (171, 172) hamster cell lines that have been examined, the copy number of amplified sequences is not uniform across the entire amplicon. This result suggests again that only a part of the original amplified unit is amplified in subsequent steps.
216
JOYCE L. HAMLIN ET AL.
A corollary of amplicon heterogeneity is that amplicon endpoints apparently do not represent “hotspots” for the recombination events that join amplicons together. However, an exception to this proposal was observed in coformycinresistant hamster cells, in which adenylate deaminase amplicons were found to have the same endpoints in two independently isolated cell lines (173). Sequencing of the 2.5-kb junction-containing fragment indicated the presence of Ah-like repeats, small palindromes, topoisomerase-I-consensus sequences, (A+T)-rich stretches, and several other sequence motifs that could conceivably be involved in recombination. However, it is important to point out that many of these elements might be expected to be found in any randomly chosen 2.5-kb sequence, and a recent study shows that the same elements do not occur in several other sequenced junction regions from adenylate deaminase amplicons (174). In contrast to the seemingly hopeless heterogeneity of amplicons in the above examples, the DHFR amplicons in MTX-resistant Chinese hamster cell lines seem to be quite homogenous. Two different DHFR amplicon types have been cloned in their entirely from an MTX-resistant CHO cell line (CHOC 400).The CHOC 400 cell line contains 103 amplicons per cell, and the multiple copies are located in three chromosomal ABRs (47). The larger type-I amplicons are 273 kb long, are arranged in head-to-tail arrays in the genome, and represent 5-10% of the amplicons in this cell line (164). The type-I sequence arose early during the amplification process, possibly from an even larger precursor, and eventually gave rise to the 240-kb type-11 amplicon by an internal 33-kb deletion (164,175).The type-I1 amplicon represents 75-80% of all of the amplicons in CHOC 400 cells, and the 700800 copies of this amplicon appear to be identical to one another (including the interamplicon junction fragments). The type-I1 amplicon is arranged head to head and tail to tail in the genome to form giant palindromes, as observed in earlier studies (166).This is an important finding because it indicates that the formation of palindromes is not necessarily an initial step in the amplification process, as has sometimes been suggested (167,176,
176a).
, ~
No evidence was obtained from studies of the CHOC 400 amplicons for the presence of any DNA from an unrelated locus, even though the two amplicon types together represent 80-85% of the amplicons in CHOC 400 cells. The remaining 15% are larger than 273 kb, supporting the argument that initial amplicons are large and are then trimmed by some unknown process to a smaller, stable size in highly drug-resistant cell lines (165). By separating large Sf;I restriction fragments (averaging 150 kb) on pulsed-field gradients gels and then performing the in-gel renaturation procedure (164,it is possible to visualize the large, amplified restriction frag-
DNA SEQUENCE AMPLIFICATION
217
ments in ethidium-bromide-stained agarose gels (26, 175).An analysis of two other independently isolated MTX-resistant Chinese hamster cell variants by this method indicates that the amplicons in these cell lines are also rather uniform in length (26). One cell line [DC3F/A3 (146)]contains a 450-kb palindromic amplicon type that has recently been cloned in its entirety (177). A second major type has a very homogenous core about 650 kb in length, and is likely to be organized in head-to-tail arrays (26). Thus, in the DC3F/A3 cell line as well, the amplicons are quite uniform in structure, and there is no indication that any DNA from unrelated, distant loci has become joined to the DHFR amplicons by complex rearrangements. Interestingly, the DHFR gene is not in the center of either of these amplicon types (26). An in-gel renaturation analysis of the amplified DNA in several tumors and tumor cell lines has also yielded evidence for the amplification of a relatively homogenous core sequence in the amplicons of several independent neuroblastomas (31). However, in all of the examples of amplicon homogeneity above, the cell lines have relatively high copy numbers of their respective amplified sequences and thus represent advanced stages in the amplification process. These amplicons may therefore be the final, stable arrangements that the cell arrived at after many months of trial and error during propagation in the drug, and may not be representative of the situation in less stable lines (e.g., those bearing DMs, or those examined earlier in the amplification process). An early study attempted to simpllfy the analysis of the structure of amplified DNA by transfecting defined sequences into the genome and studying the arrangement of these sequences after amplification to high copy number (278, 179). A mixture of a functional APRT gene, a truncated promoter-less TK gene, and 20 random clones from a human genomic library was used to transfect an APRT-/TK- murine cell line. Stable APRT+ cells were then subjected to selection for TK expression on HAT medium (which would theoretically require amplification of the TK gene about 20-fold). The cell lines cloned under these circumstances had indeed amplified the truncated TK gene, but the level of amplification decreased with distance from the TK gene. It was suggested that these results could be explained if amplification had occurred by onion-skin over-replication from a nearby origin (supplied presumably by one of the genomic clones), with forks progressing varying distances from the origin in individual amplification events (179). Integration of the extra copies was shown to have occurred by homologous recombination among the repeated units, usually between the vector sequences. In another study, independent transfected clones were examined for their ability to amplify the cloned CAD gene in the presence of PALA (134). In each cell line, the gene integrated into a different chromosomal location,
218
JOYCE L. HAMLIN ET AL.
and some clones were then able to amplify at much higher frequencies than others when subjected to selection on PALA (134).It was suggested that in these cell lines, the CAD gene had integrated near an active chromosomal origin of replication. There is evidence, however, that the mechanisms that operate during the amplification of transfected genes may not mimic those that function during the amplification of endogenous genes. For one thing, the experiments involving the TK/AP€V construct (178, 179) set up an artificial situation in which extensive regions of perfect homology are present in high copy number in the original ligated cointegrate that was inserted into the chromosome for subsequent amplification (i.e., the large number of vector sequences). Similarly, transfection of the CAD gene (134) resulted in all cases in the integration of multiple, already concatenated copies of the cosmid prior to amplification. Transfection of CHO cells with a cosmid containing the DHFR gene also invariably results in concatamer formation followed by (or possibly in concert with) integration into the chromosome 0. L. Hamlin, unpublished observations). Thus, the juxtaposition of relatively short stretches of perfectly homologous elements in these cases may set up an unstable recombinogenic situation that does not necessarily obtain when an endogenous, single-copy locus is amplified in the first step. Furthermore, the physical relationships among important structural elements in the DNA (e.g., origins of replication and matrix attachment sites) are surely upset by the integration of a long additional piece of DNA.
IV. Possible Mechanisms and Ways to Discriminate among Them Although DNA sequence amplification has been intensively studied in the last decade, very little is known about the mechanism(s) responsible for the grocess. The most serious obstacle to progress in the field is the frequency with which amplification events occur and the necessity to capture the event itself in order to really understand mechanisms. While amplification has been estimated to occur between and times per cell generation (28,37),which makes it one of the more common forms of mutation, it is still a rare event. Thus, most investigators either have tried to deduce the mechanism retroactively from the nature and arrangement of amplified sequences, or they have tried by various means to increase the frequency of initial amplification events. To date, however, the number of possible mechanisms has not been lowered by any really compelling line of research. The number of hard facts related to DNA sequence amplification is small, owing, at least in part, to the fact that there are so many different cell
DNA SEQUENCE AMPLIFICATION
219
lines and loci, and no single system has been studied in great detail by a large number of laboratories. In addition, initial events that lead to the original supernumerary copy or copies of the sequence in question need not necessarily be the same as subsequent events that lead to rearrangements, possibly because of destabilization phenomena set up by a new local chromosomal architecture. Thus, a model for one cell line and/or locus may not be adequate to explain amplification of the same locus in another cell line, or another locus in the same cell line. Any viable model for a DNA sequence amplification mechanism must explain the fact that a cell starts out with two single-copy loci and ends up, after the first step, with more than two; that is, the appearance of extra DNA must somehow be explained. Models should also cope with the fact that amplification appears to be a frequent event in tumor cells, but not in normal cells. In addition, any useful model must explain why initial amplicons are probably very large (possibly megabases in length), and are heterogenous in size and endpoints among different cell lines selected for amplification of the same gene. The usual models that have been proposed are outlined in Fig. 1. Necessarily, they invoke either over-replication, disjunction (due to unequal genetic exchange or failure of the two daughter helices to segregate), or some combination of these processes. Some form of recombination event(s) would seem to be required in all models to account for the juxtaposition of new sequences in head-to-head or head-to-tail arrays, either in the body of a chromosome as an ABR or in a circular episome or DM.
A. Unequal Sister-chromatid Exchange This general model (Fig. 1A) is based on nondisjunction of two copies of a gene that normally segregate independently. Unequal sister-chromatid exchange could occur because both copies of the gene in question end up on the same chromosome through a single recombination event that transfers the end of one chromatid to the other. As shown in Fig. l A , this could involve an intermediate in which the original sense of the chromosome is conserved, resulting in relatively well-repaired chromosomes, both of which have centromeres, but which have suffered reciprocal gains and losses of the intervening material (i.e., the primordial amplicon). Alt&natively, an unequal exchange could occur by an event in which the sense of the joined chromatids is reversed at the point of the exchange, resulting in the formation of a dicentric intermediate and the loss of markers distal to the point of exchange (the latter prediction being testable). This kind of chromosome is unstable, of course, because at the next mitosis the two centromeres will be pulled to opposite poles and the chromosome will eventually break. If the break occurs between a centromere and one of the
1
I I
RE-REPLICATION
UNEPUAL SISTER CHROMATID EXCHANGE
p-pr-p-\ -'+I GENE
CENTROMERE
._
EPISOME
3% U
INSERTION INLOCO
I c
I 1
DELETION / EPISOME FORMATION
--
r
CONSERVATIVE
TRANSPOSITION
DELETION NlCUlNG, ROLLING CIRCLE REPLICATION
I
FIG. 1. Models for initial amplification events in mammalian cells. (A) Two unequal sister chromatid exchange events that would lead to an initial duplication of the DHFR gene. (B) An over-replication model is shown that would lead to amplification either in loco, if the extra duplexes were integrated close to the original locus, or at a distant position if the extra duplexes had a finite extrachromosomal existence. The length of the amplimns may be shortened during the process. (C) A deletion model in which the deleted locus forms an episome that increases in size and gene copy number, possibly by rolling-circle replication; the episome then either remains extrachromosomal as a double minute, or reintegrates into the same or another chromosome, possibly after having been trimmed. (D) A conservative transposition model in which extra copies of the locus in question are generated by a roll-in replication mechanism analogous to transposition of bacteriophages, and in which the original locus remains intact.
markers, the sequences between the break and the original recombination joint will be present twice on one chromosome. If the broken chromosome end is not healed either by attachment of a telomere [which process has so far been studied only at the molecular level in yeast and lower eukaryotes (180)]or attachment of the broken end to another acentromeric chromosome fragment, the stage is now set for the classic breakage-fusion-bridge cycles first described by McClintock in her studies on maize (151, 152). Attachment of the broken end to another centromeric chromosome or fragment, or fusion of the two daughter chromatids during or after the next
DNA SEQUENCE AMPLIFICATION
221
D N A replication phase, will again result in breakage, fusion, and bridge formation. The latter mechanism (an inverted sister-chromatid exchange) is an attractive model because it explains the very high frequency of dicentric chromosomes observed in hamster cell lines during the early stages of amplification (142). It also explains why the early amplicons are most often located on the same chromosome arm as the parental single-copy locus, but very often far away (sometimes farther away than the usual chromosome end) (142).The initial instability that was observed by fluorescence-activated cell-sorter analysis in CHO cells selected for amplification of the DHFR gene (153) could also be explained by the unequal-sister-chromatid-exchange model: Instability could be attributed to the multiple rearrangements that occur before a broken chromosome containing terminal amplicons is healed and stabilized by fusion with a telomere or an acentromeric chromosome fragment . Unequal-sister-chromatid-exchange models have not been popular because, prior to high-resolution fluorescence in situ hybridization techniques, they were somewhat difficult to test. Early studies used quenching of Hoechst fluorescence by bromodeoxyuridine to monitor sister-chromatid exchanges in the ABRs of highly MTX-resistant cell lines (181). The frequency of sister-chromatid exchanges in ABRs was slightly lower than in other chromosomal regions, suggesting that, once stabilized in highly resistant cell lines, ABRs do not appear to undergo frequent further rearrangements. However, it could still be argued that the initial amplification event is an unequal exchange, and that once a stable configuration is found, it is less likely to undergo crossing-over. It is even possible that subsequent amplification events are made more likely by the formation of the original (possibly unstable) recombination joint, setting up cycles of, for example, overreplication initiated by frayed ends or other unknown mechanisms. The latter proposal is borrowed from the attractive suggestions (126) that an original breakage might render that site more susceptible to re-integrations of an episome. As with most models, it is difficult to explain how the copy number of amplicons becomes so high so early in certain cells selected in one or two steps over the period of a few weeks.
B. Re-replication This model (Fig. 1B) in its simplest form (36-38, 182, 183) states that amplification occurs by repeated firing of an origin of replication in a single S period, resulting in supernumerary copies of the replicon(s) in question, presumably initially lying side by side in onion-skin arrays, as in chorion
222
JOYCE L. HAMLIN ET AL.
gene amplification in Drosophila. If the extra copies are integrated into the chromosome at or near the original site of over-replication, they could eventually become the ABRs that characterize many amplificants. Failure to integrate the amplicons into the chromosome could result in their release to form an episome that eventually gives rise to DMs. In the early days when very little was known about the details of amplification, this crude model was satisfactory to spark experimentation. Let us now examine the way in which this model and its variations deal with the known facts. The over-replication model has gained support from several experimental findings. First, over-replication and in loco recombination of the amplicons could explain early observations that the ABRs representing amplified DHFR genes are often found on chromosome 2 (or the rearranged 22 counterpart) in Chinese hamster cells (75). Second, several studies have shown that a variety of agents that arrest replication forks greatly enhance the frequency of amplification of the DHFR gene upon subsequent selection on MTX. These agents include UV irradiation (184),y-irradiation (185), hydroxyurea (186), MTX (187'), carcinogens (188), hypoxia (189, 190), arsenic (19.Z), 5-fluorodeoxyuridine (192), and mitogenic hormones and tumor promoters (193). Carcinogens also induce the T-antigen-dependent replication of SV40 in permissive hamster cells (194). It is argued that, by arresting replication forks in their tracks, these agents allow repeated initiations at local origins of replication without intervening cytokinesis. In this model, hormones and phorbol esters would presumably exert their effects by placing a larger proportion of the population in a cycling configuration and therefore in the S period itself. The model has been tested in a slightly different way by Schimke and colleagues. They showed that if synchronized cells are treated with inhibiting concentrations of hydroxyurea in the early S period and the drug is then removed, a large percentage of the DHFR genes are subsequently re-replicated prior to cytokinesis (195, 196). If the cell population is then subjected to MTX selection, a surprisingly large number of cells amplify the DHFR gene (195). It was suggested that if forks emanating from early-firing origins are arrested, re-initiation occurs at the same origins with high efficiency, resulting in de facto amplification of surrounding regions. It follows from this proposal that any other origin in the cell should also misfire, with the result that much of the genome should be amplified after this treatment. In fact, two different unlinked, selectable genes are amplified in the same cell with a much higher frequency than would be expected if each gene were amplified in an independent event (189~).The fate of the rest of the postulated unselected amplified DNA has not been followed, although it would presumably be lost by natural processes in the absence of selection
DNA SEQUENCE AMPLIFICATION
223
for its retention, since all ABRs detectable in a cell by Giemsa banding invariably contain the selected gene, as shown by in situ hybridization with a specific probe (e.g., 45, 47 and 147). However, there are several ways in which the re-replication model does not explain the extant data. For one thing, the large size of many amplicons (usually greater than 500 kb) suggests that many are usually larger than parental replicons, which are estimated to average about 100 kb (197). In at least one case in which an entire amplicon has been cloned, the presence of more than one origin has been demonstrated directly (177). Thus, amplicons are usually larger than single replicons. In addition, the high resolution of in situ hybridization studies on the mitotic chromosomes of DHFR amplificants shows that although the amplicons are almost invariably on the same chromosome arm as the parental single copy locus, they are usually very far from the parental locus (sometimes >SO megabases). Does this mean that the initial amplified (over-replicated) sequence can be longer than 50 megabases but is somehow trimmed down in later stages? The model can be changed slightly to suggest that the replication forks emanating from misfiring origins can move through many (even hundreds of) replicons during the amplification procedure. But what prevents them from doing this normally? Furthermore, replication forks can only travel approximately 2 megabases in a typical 8-hour S period (197), which would limit the size of the initial amplicons in this model to about 4 megabases. However, evidence from several quarters suggests that they may often be larger than this (142, 165). Alternatively, the model might read that the over-replicated sequences are usually first released as episomes, which increase in number in the cell because of unequal segregation at mitosis. The number of amplified repeats per episome would also increase, either by rolling-circle replication or by recombination, to become visible DMs. In some cases (e.g., in Chinese hamster cells), DMs could eventually integrate into chromosomes to form ABRs. However, the model does not explain why integration occurs preferentially into the chromosome that bore the original gene, but not actually at the site of the gene (as observed in 142 and 154). There are also weaknesses in the experimental approaches used to support the re-replication model. For example, the agents proposed to halt replication forks (e.g., MTX, hydroxyurea, UV, 5-fluorodeoxyridine) can also induce both single- and double-strand breaks in DNA. These breaks could initiate unequal sister-chromatid exchange, conservative transposition, or any other mechanism requiring a recombination step (which all models do). The effect of phorbol esters and hormones could merely be on the growth state and, as a consequence, the level of the enzymes required for recombination. Furthermore, it has been suggested that the experiments that
224
JOYCE L . HAMLIN ET AL.
purportedly showed re-replication of the DHFR locus in a single S period (195) are flawed, since it is possible that many cells had actually traversed mitosis and entered a second S period at the time of sampling (see 198 and references therein, and 199). Thus, some of the strongest experimental evidence for the re-replication hypothesis has been called into question (but see the discussion in 200). Another quite complicated re-replication model (201)proposes that nascent DNA chains may switch template strands and begin to replicate in the opposite direction, thus forming the palindromes that have been observed in several systems (e.g., 164, 166, 167, 176a, 177 and 202). The proposed resolution of these structures is also very complicated, but the model should be considered seriously since it explains the appearance of palindromes and could be operative at either initial or later stages of amplification.
C. Deletion and Episome Formation This model (Fig. 1C) falls into the “unequal-segregation” category. The most recent variation of this model (126) suggests that, during the S period, breaks occur at the single-strand regions of stalled replication forks (e.g., caused by MTX or hydroxyurea), either at all four positions (two at each fork) or at only two of the four (one at each fork). If the two helices released by four breaks become joined together, palindromic joints between amplicons are formed. Alternatively, circularization of a single daughter helix would result in the formation of a head-to-tail joint. In either case, these “episomes” are then released from the chromosome to evolve into larger DMs-by clustering of episomes, resulting in nondisjunction, and subsequent homologous recombination between episomes to form larger molecules that eventually become microscopically visible DMs. It is proposed that these DMs have a short episomal half-life in the case of cells that usually display ABRs, and reintegrate into the body of a chromosome. To explain the usual occurrence of ABRs on the same chromosome arm as the original locus, it is proposed that the original breakpoints at the stalled replication forks become hotspots for re-integration of the episomal DM due to destabilization (126).The model thus hopes to explain the instability of amplified sequences early in the amplification process, as well as the preferred chromosomal locations of initial replicons. The deletion-and-episome-formationmodel is supported by several observations by Wahl and co-workers over the last several years. They first examined the amplification of a transfected, stably integrated CAD gene in new chromosomal locations in CHO cells (134, 135), and observed that in particular transfected cell lines, the CAD gene was more easily amplified than others. It was suggested that this resulted from juxtaposition of the gene to an active origin of replication. Using a field-inversion gel-electro-
DNA SEQUENCE AMPLIFICATION
225
phoresis method for the analysis of large circular episomes, they detected episomes 250-600 kb in length in the cell lines that more readily amplified the CAD gene (135).After propagation for extended time periods in a selective agent, DMs appeared and the episomes disappeared, possibly in a precursor-product relationship (135). In addition, episomes appeared as a result of, or concomitant with, deletion of the original integrated CAD gene in at least some cases (203).Episomes have also been observed in a variety of other cell lines that bear amplified sequences, and in some tumors (204206). An attractive feature of the deletion-and-episome model is that it suggests a role for chromosomal breakage and subsequent destabilization by recombination events. This phenomenon has been suggested before to explain the fact that DHFR amplicons (ABRs) often occur near translocation breakpoints in MTX-resistant CHO and murine cells (144, 207). This model could also explain the genesis of the very complicated chromosomal translocations involving amplicons that were observed in in situ hybridization studies on the early stages of DHFR gene amplification in C H O cells (142), since all of these must have arisen from an initial breakage event that could, in fact, have been a deletion. However, there are several observations that are not explained well by the deletion-and-episome-formation model. In the case of C H O cells, episomes or visible DMs have never been observed by fluorescence in situ hybridization at any level of drug selection, including the lowest levels displaying supernumerary copies of the DHFR gene (B. J. Trask and J. L. Hamlin, unpublished). Since this method can detect single-copy loci with almost 100%efficiency, the postulated episomes are unlikely to have escaped observation. Secondly, the 250- to 650-kb episomes, proposed to be the precursors to either DMs or ABRs (203), are much smaller than those expected, based on the low frequency of novel joints formed in the initial stages of CAD gene amplification (165),or by the large apparent size of initial DHFR amplicons based on fluorescence in situ hybridization studies (142, 154).On the other hand, the 250- to 650-kb episomes are considerably larger than expected if they represent parts or all of parental replicons, which are expected to average 100 kb (197). The obligate deletion that is proposed to occur in this model is not supported by fluorescence in situ hybridization experiments on early events in DHFR gene amplification in C H O cells (142). In seven of eight independently isolated MTX-resistant cell lines, both copies of the parental singlecopy DHFR locus are retained at their correct chromosomal locations in the amplificants, with DHFR amplicons located on the same chromosome arm, but at a great distance. It could be argued, however, that the only cells that survived selection immediately after the deletion event were the progeny of
226
JOYCE L . HAMLIN ET AL.
the daughter cell to which both the episome and the undeleted chromosome partitioned. The deletion-and-episome-formationmodel is also not strictly compatible with observations that, in a cloned series of increasingly MTX-resistant CHO cell lines, chromosomal ABRs appear to elongate in situ as the DHFR gene copy number increases, arguing for an intrachromosomal mechanism of amplification, at least in the later stages (J. L. Hamlin, unpublished). This observation could be explained if episomes were continually being deleted from ABRs, expanded in internal copy number and then re-integrated into a pre-existing ABR. However, the postulated episomal intermediates predicted by this model are not observed during amplification of the DHFR gene in CHO cells (142; B. J. Trask and J. L. Hamlin, unpublished). In addition, the model predicts that the palindromic sequences often observed in the amplicons of drug-resistant cell lines and some tumors arise from initial rather than later events. In the CHOC 400 cell line, the only system from which whole amplicon equivalents have been cloned, there is no doubt that all copies of the head-to-head type-I1 amplicon contain the original head-to-tail junction that characterizes the larger type-I amplicon (164,175). Therefore, the type-I1 amplicon must have arisen from the type I late in the process, not in an initial event.
D. Conservative Transposition In this model (Fig. lD), an initial amplification event is proposed to occur by a mechanism similar to that for the transposition of phage in bacteria (208, 209). Ligation occurs between a single-strand nick on one end of the donor locus and a double-strand nick at the recipient site. The nicks are proposed to occur most often at a distance from one another on the same chromosome arm. Both single- and double-strand nicks are proposed to be natural intermediates resulting from the activity of topoisomerases I and 11, which function normally to relieve torque and superhelical turns during replication and transcription activities. The proposed ligation joins the elements together, but the recipient contains a free 3’-hydroxyl end. This end then serves as a primer for elongation by the replication machinery, which uses the donor DNA loop as a template. DNA replication proceeds around the circle by a strand-displacement mechanism and, eventually, an endonuclease introduces a doublestrand break that terminates replication (208,209).This results in a duplication of the sequences between the two nicks. In the conservative version of this model, one copy remains in its original location and the new copy or copies are deposited at the new location or are released as episomes. If the novel joints formed in these original amplicons are hotspots for the formation of double-strand breaks, they could render the new amplicons
DNA SEQUENCE AMPLIFICATION
227
susceptible targets for further transposition events, possibly occurring within an amplicon cluster (e.g., aided by homologous recombination?). It must be further assumed that once additional amplicons have been formed, their novel arrangement, vis-a-vis the usual chromosomal arrangement, destabilizes the ABRs with respect to breakage, leading to the high percentage of breakage-fusion-bridge intermediates observed early in the amplification process (142). By assuming that there are subtle differences among cell lines in the balance of the endonucleases that eventually end the replication part of the cycle, one can explain the observations that some cell lines usually carry amplified sequences on ABRs, while others carry them on DMs. An attractive feature of the conservative transposition model is that it accounts for the observation that, reasonably early in the amplification process, the number of amplicons can be quite large [for example, in 120 individual mitotic figures examined 6 weeks after selecting CHO cells on MTX, no MTX-resistant cells containing only one or two additional copies of the DHFR gene were observed (142)l. Intrachromosomal conservative transposition could also account for the rather variable, but usually large, amplicons observed in most systems early in the amplification process, since the elements coming together for ligation would be more apt to be widely separated on a chromosome than close together. Furthermore, the resulting amplicon clusters would usually be located rather far from the single-copy locus, but usually on the same chromosome arm. It is conceivable that, once formed, even intrachromosomal amplicons are not always stably integrated and can recombine out as circular episomes. These, in turn, may not be stable, and may reintegrate either at the original recipient site or at some other location. Obviously, many other models could be suggested that incorporate these basic mechanisms in different ways or combinations, but the ones discussed here provide a starting point for constructive thinking about what is possible, beginning with the premises outlined at the start of this section. As always, models are useful when they provide a basis for experimentation, the results of which lead to refinement or elimination of the model. In spite of the fact that workers in many laboratories have studied amplification for many years, the inability to eliminate any of the models discussed above is due primarily to a lack of precise data on the nature of the first amplification events and the resulting sequences formed during these events. There is also a need for novel experimental approaches, and there are several possibilities. For example, variant cell lines that ampllfy genes with at least a 10-fold higher frequency than the starting cells have been isolated (210, 211). The hope is to be able to isolate genes from these variants that are involved in the amplification mechanisms per se. A cell-free system has also been established in which many of the events involved in the initial ampli-
228
JOYCE L. HAMLIN ET AL.
fication of an integrated SV40 virus can be reproduced (212).This system should eventually lead to the identification and isolation of proteins important for the amplification process.
V. Usefulness of Cell lines Bearing Amplified Genes An aspect of DNA sequence amplification rarely addressed in reviews on the subject is the way in which investigations into the process itself have aided in the related fields of transcription, replication, repair, recombination, and nuclear architecture. As it happens, many of the genes amplified in response to selection with competitive inhibitors are so-called “housekeeping” genes, many of which would not have been cloned or would have been difficult to clone if they or their mRNAs were not present in high copy number in the cell. In each case, the problem of cloning the gene or the cDNA is simplified by a factor closely proportional to the amplification factor. Thus, many fewer colonies had to be screened to clone both the cDNAs (213-215) and the genes (216-219) for murine and Chinese hamster DHFR. In the case of very rare messages and the complex and often incomplete genomic libraries of mammalian cells, this can mean an enormous savings in time and effort. The same savings of time are realized in the purification of the proteins whose cognate genes are amplified. Often, as in the case of DHFR, the competitive inhibitor used to select for amplificants in the first place can be used to purify the protein in question by &nity chromatography (e.g., 220). As mentioned before, the in-gel renaturation method (161) has been extremely useful in cloning the genes responsible for multi-drug resistance in hamster and human systems (e.g., 30 and 88), and also at least one new oncogene contained in the amplicons of a human tumor (221).None of these sequences could have been identified or cloned with such rapidity if cell lines containing multiple copies had not been available. Amplificants also greatly aid the cloning of large stretches of DNA from interesting chromosomal domains since, like the gene itself, the entire repeating unit is present in much higher copy number in genomic libraries. Thus, it has been possible to clone more than 550 kb of contiguous DNA sequence from the Chinese hamster DHFR locus (177). More than 240 kb has been cloned from the murine DHFR locus (23),and approximately the same amount has been cloned from the CAD amplicons in Syrian hamster cells (27, 28). A similarly large amount of contiguous DNA sequence from the amplicons of a neuroblastoma cell line has been isolated (222). It is probable that the intricate regulation of genes and replication units will not be fully appreciated until all aspects of chromatin architecture are understood, starting at the nucleosomal level and finishing with the highest orders
DNA SEQUENCE AMPLIFICATION
229
of packaging that characterize whole chromatin domains. Large stretches of DNA, encompassing the highest organizational units, must be isolated in order to address these issues, and the isolation of such large domains will be greatly aided by using cell lines bearing amplified sequences. Another important practical application of the amplification mechanism is that selectable genes trmsfected into mammalian cells can be amplified to high copy number in their new chromosomal locations. Thus, the genes encoding DHFR (223-225; see (53) for a complete list), CAD (226), adenosine deaminase (227), and a multi-drug efflux protein (228) have all been used as selectable markers in vectors designed to amplify a colinear gene that is itself not selectable. Theoretically, the successful stable transfection of a cell line with the DHFR gene can only be detected if the recipient cell line is doubly deleted for the gene [e.g., as in DHFR-deficient cell lines (229)],since the selective media lacking thymidine, hypoxanthine, and glycine requires only a single copy of the gene for cell survival. In practice, however, the cells that survive the initial selection regimen after transfection almost invariably have amplified the gene several times already, and are therefore already moderately MTX-resistant. The same phenomenon is observed when the CAD gene is transfected into wild-type CHO cells (226, 230). Thus, cloned DHFR minigenes should be useful amplifiable vectors in any cell line, even if it already contains two DHFR genes (J. L. Hamlin, unpublished). Cell lines bearing amplified genes can be used to study other phenomena, such as the effect on control mechanisms when a particular mRNA and its protein are overproduced. In some MTX-resistant C H O cell lines, for example, DHFR can represent as much as 25-30% of the cellular protein without significantly affecting the health of the cell (47). This is a surprising result, since the enzyme is involved in the metabolism of one-carbon adducts, glycine, thymidine, and indirectly in several other cellular pathways. Furthermore, in the absence of MTX, when the enzyme is not neutralized by the drug, the cells seem to grow with apparent impunity. It is hard to understand how the cell copes with the extra load of both drug and enzyme. Presumably, similar accommodations are made by other drug-resistant cell lines that ampllfy other genes. However, there are other cases in which interesting accommodations have been made by the cell that suggest the regulatory loops expected of differentiated cells. If cultures are subjected to stepwise increases in compactin over several months, mutants are eventually isolated that have amplified the gene for HMG-CoA reductase, a membrane-associated enzyme involved in cholesterol metabolism (69). When extremely drug-resistant cell lines are obtained, their morphology is phenomenal. In cross-section, electron micrographs show whorling membrane systems packed in tight,
230
JOYCE L . HAMLIN ET AL.
almost crystalline, arrays (69). In some unknown way, the membrane lipids required to accommodate the extra membrane protein are elaborated in vast quantities. Another interesting regulatory phenomenon occurs during amplification of the gene encoding the M, subunit of ribonucleotide reductase in cell lines that have been exposed to hydroxyurea (231,232).U p to a 10-fold amplification, the drug-resistant cells actually increase production of the mRNA encoding the M, subunit, which is required stoichiometrically with M, to form the functional ribonucleotide reductase protein. This result suggests that the cell has a regulatory loop in which the level of one subunit is sensed by the promoter of the gene encoding the other. However, above a certain level of M, gene amplification, the cell is no longer able to produce enough messenger to overcome the higher inhibitor concentration, and then cells that have also amplified the M, subunit start to be selected (P. R. Srinivasan, personal communication). Thus, the presence of an amplified gene in this cell system opens a window into a regulatory system that might not have been guessed at or so easily demonstrated in its absence. Since we became interested in DNA sequence amplification because it was possible that amplicons might represent chromosomal replicons, we elaborate here on how amplificants are beginning to shed light on the field of mammalian chromosomal DNA replication as well. It was appreciated rather early that units of amplification are larger than the gene that is selected for by the drug-treatment regimen. It was therefore anticipated that amplicons might be equal in size to, or possibly larger than, replicons. An important question at the time (and still) is whether origins of replication are fixed genetic (cis-regulatory) elements analogous to the origins of microorganisms. We reasoned that if they were, and if amplicons were similar repeating units, each amplicon would contain at least one origin. In the case of the CHOC 400 cell line that we developed, the restriction fragments arising from the amplified 240-kb DHFR domains can actually be visualized against the background smear of single-copy sequences on ethidium-bromide-stained agarose gels (47). This allowed the demonstration that replication initiates in only a subset of these bands when the cells enter the S period, suggesting the presence of a fixed origin(s) (233). The early-labeled fragments were excised from the gel, labeled in uitro with [32P]dCTP, and used as specific hybridization probes on a cosmid library from the CHOC 400 cell line. The early-labeled fragments mapped together in a single cosmid, and defined a 28-kb “initiation zone” lying downstream from the DHFR gene (234). In recent years, the same approach has been used to isolate another initiation locus from the 450-kb DHFR amplicon of the independently isolated MTX-resistant Chinese hamster cell line, DC3F/A3 (177). Thus, the
DNA SEQUENCE AMPLIFICATION
231
method is reliable, and should be useful for cloning other early-firing initiation regions from the amplicons of other drug-resistant cell lines. In another in uiuo labeling approach to identifying replication initiation loci (177), synchronized cells were labeled in the early S period with the dense thymidine analog bromodeoxyuridine, and the DNA made in the first 30 minutes of the S period was isolated by virtue of its hybrid density on CsCl equilibrium gradients. When this DNA was labeled in uitro with [32P]dCTP and used to probe a slot-blot containing a series of cloned cosmids representing almost 500 kb from the DHFR locus, it could clearly be seen that replication initiates in two separate zones separated by about 250
kb (177). In an independent approach, CHOC 400 cells were labeled in uitro with 32P in the very early S period and the DNA synthesized in the first few minutes of S was isolated by BND-cellulose chromatography (235). When this early-replicating DNA was used as a hybridization probe on Southern blots containing a series of clones from the 28-kb early-labeled zone, a single 4.3-kb fragment was preferentially illuminated by the early-replicated probe (235). This same locus, as well as a second locus mapping about 20 Kb downstream, was also identified as a replication start site in an in uiuo labeling study using in-gel renaturation to detect the pattern of labeling of amplified restriction fragments (236). In yet another approach, Okazaki fragments have been purified from synchronized, permeabilized CHO cells in the early S period and used to probe the separated plus and minus strands of recombinant M,, clones from the initiation zone of the DHFR locus (237). Since Okazaki fragments switch template strands at origins of replication (238),this approach should indicate whether a fixed origin of replication exists in the DHFR domain. Indeed, the results of this experiment suggest the presence of a preferred strand-switching site in a fragment less than 500 bp long, which itself is located in the larger early-replicating fragments identified in the studies cited above (233-
236). However, in very recent studies, high-resolution two-dimensional mapping methods (239, 240) were used to study replication-fork movement through the amplified DHFR domain in CHOC 400 cells, and yielded the surprising result that, in uiuo, initiation occurs at multiple random sites within a broad zone encompassing the entire 28-kb initiation zone defined in earlier studies (241). It is presently not clear why the results of the latter in oiuo studies are so different from the in oitro study discussed above (237),in which a very circumscribed initiation zone was detected in the DHFR amplicon (237). Perhaps the difference will be explained by the lack of proper chromatin folding in the in uitro situation. All of the experiments discussed above were made possible by the high
232
JOYCE L. HAMLIN ET AL.
copy number of the DHFR amplicon in MTX-resistant cell lines, as well as the availability of clones covering more than 500 Kb of DNA sequence from this locus. As different amplified loci are isolated in recombinant cosmids and plasmids, it should be possible to apply any or all of the techniques used in these studies to the isolation of other origins of replication. There are obviously many other ways in which cell lines bearing amplified genes could be used to study the normal processes of transcription, replication, repair, recombination, and chromosome architecture. Hopefully, this discussion will stimulate thought patterns leading to new investigations in which advantage is taken of these interesting biological variants.
VI. Conclusions and Future Directions In spite of the intensive research carried out in many laboratories in the last decade on DNA sequence amplification, very little is actually known about the mechanisms responsible for this phenomenon. Some obvious questions that remain unanswered are the following: Why does amplification occur only in tumor cells (with very few exceptions)? Why is the frequency so high? How large are the initial units of amplification, and what is the nature of the initial junctions between amplicons? Does the initial step involve an unequal exchange, re-replication, or transposition event? Are initial events that generate the first extra copies of a sequence the same as subsequent events that lead to high copy numbers? Are certain cells predisposed to amplify any sequence by mutations that affect important genes involved in replication, recombination, or other cellular functions? In the next decade, many of these questions may be answered by the application of the important techniques of fluorescence in situ hybridization, fluorescence-activated cell sorting, pulsed-field gradient gel electrophoresis, in-gel renaturation, and new techniques yet to be developed. It is also expected that the next decade will see a greatly increased use of amplificants for the isolation of DNA sequences, mRNAs, and the corresponding proteins from new loci in mammalian cells.
ACKNOWLEDGMENTS We thank the many colleagues who sent us reprints of their published work. We regret that space limitations prevented us from citing all of these papers. Work in the authors’ laboratory was supported by grants from the National Institutes of Health and the American Cancer Society.
REFERENCES 1. H . Tobler, Biochem. Anim. Deo. 3, 91 (1975). 2. K. D. Tartof, ARGen 9, 355 (1975).
DNA SEQUENCE AMPLIFICATION
233
3. E. 0. Long and I. B. Dawid, ARB 49, 727 (1980). 4. R. E. Pearlman, P. Anderson, J. Engberg and J. R. Nilsson, Specijc Eukayotic Genes, Struct. Org. Funct. 13, 351 (1979). 5. A. C. Spradling and A. P. Mahowald, PNAS 77, 1096 (1980). 6 . A. C. Spradling, Cell 27, 193 (1981). 7. R. Grimn-Shea, G . Thireos and F. C. Kafatos, Dew. Biol. 91, 325 (1982). 8 . Y. N. Osheim and 0. L. Miller, Cell 33, 543 (1983). 9 . A. P. Bird, CSHSQB 42, 1179 (1978). 10. K. Karrer and J. G. Gall, J M B 104, 421 (1976). 1 1 . J. Engberg, P. Anderson, V. k i c k and J. Collins, J M B 104, 455 (1976). 12. M.-C. Yao, E. H. Blackburn and J. G. Gall, CSHSQB 43, 1293 (1978). 13. R. P. Anderson and J. R. Roth, Annu. Reu. Microbiol. 31, 473 (1977). 14. P. E. Hansche, V. Beres and P. Lange, Genetics 88, 673 (1978). 15. S. Fogel and J. W. Welch, PNAS 79, 5342 (1982). 16. J. W. Szostak and R. Wu, Nature 284, 426 (1980). 17. H. L. Petes and T. D. Petes, Nature 289, 144 (1981). 18. S. M. Beverley, J. A. Coderre, D. V. Santi and R. T. Schimke, Cell 38, 431 (1984). 19. S. Detke, G . Chaudhuri, J. A. Kink and K.-P. Chang, JBC 263, 3418 (1988). 20. T.E. Ellenberger and S. M. Beverley, JBC 264, 15094 (1989). 21. A. F. Cowman and A. M. Lew, MCBiol9, 5182 (1989). 22. Mouches, C., N. Pasteur, J. B. Berge, 0. Hyrien, M. Raymond, B. Robert de Saint Vincent, M. De Silvestri and G. P. Georghiou, Science 233, 788 (1986). 23. N. A. Federspiel, S. M. Beverley, J. W. Schilling and R. T.Schimke, JBC 259,9127 (1984). 24. C. J. Bostock and C. Tyler-Smith, JMB 153, 219 (1981). 25. S. Caizzi and C. J. Bostock, NARes 10, 6597 (1982). 26. J. E. Looney, C. Ma, T.-H. Leu, W. Flintoff, B. Troutman and J. L. Hamlin, MCBiol8, 5268 (1988). 27. F. Ardeshir, E. Giulotto, J. Zieg, 0. Brison, W. S. L. Liao and G. R. Stark, MCBiol 3, 2076 (1983). 28. J. Zieg, C. E. Clayton, F. Ardeshir, E. Giulotto, E. A. Swyryd and G. R. Stark, MCBiol3, 2089 (1983). 29. N. Kanda, R. Shreck, F. Alt, G. Bruns, D. Baltimore and S. Latt, PNAS 80,4069 (1983). 30. P. Gros, J. Croop, I. Roninson, A. Varshavsky and D. E. Housman, PNAS 83, 337 (1986). 31. K. W. Kinzler, B. A. Zehnbauer, G. M. Brodeur, R. C. Seeger, J. M. Trent, P. S. Meltzer and B. Vogelstein, PNAS 83, 1031 (1986). 32. M. Debatisse, 0. Hyrien, E. Petit-Koskas, B. Robert de Saint Vincent and G . Buttin, MCBiol6, 1776 (1986). 33. C.-Y. Yeung, E. G. Frayne, M. R. Al-Ubaidi, A. G. Hook, D. E. Ingolia, D. A. Wright and R. E. Kellems, JBC 258, 15179 (1983). 34. J. K. Cowell, ARGen 16, 21 (1982). 35. J. L. Biedler, M . B. Meyers and B. A. Spengler, Ado. Cell. Neurobiol. 4, 267 (1983). 36. R. T. Schimke, Cell 37, 705 (1984). 37. G . R. Stark and G . M. Wahl, ARB 53, 447 (1984). 38. J. L. Hamlin, J. D. Milbrandt, N. H. Heintz and J. C. Azizkhan, Int. Reu. Cytol. 90, 31 (1984). 38a. G . R. Stark, M. Debatisse, E. Giulotto and G. M. Wahl, Cell 57, 901 (1989). 39. M. T. Hakala, S. F. Zakrzewski and C. A. Nichol, JBC 236, 952 (1961). 40. J. W. Littlefield, PNAS 62, 88 (1969). 41. R. E. Kellems, F. W. Alt and R. T. Schimke, JBC 251, 6987 (1976). 42. J. L. Biedler, A. M. Albrecht and D. J. Hutchison, Cancer Res. 32, 153 (1972).
234
JOYCE L. HAMLIN ET AL.
43. J. L. Biedler, A. M. Albrecht and B. A. Spengler, Eur. J . Cancer 14, 41 (1978). 44. J. L. Biedler and B. A. Spender, Science 191, 185 (1976). 45. J. H. Nunberg, R. J. Kaufman, R. T. Schimke, G. Urlaub and L. A. Chasin, PNAS 75, 5553 (1978). 46. J. L. Biedler, P. W. Melera and 8. A. Spender, Cancer Genet. Cytogenet. 2, 47 (1980). 47. J. D. Milbrandt, N. H. Heintz, W. C. White, S. M. Rothman and J. L. Hamlin, PNAS 78, 6043 (1981). 48. B. J. Dolnick, R. J. Berenson, J. R. Bertino, R. J. Kaufman, J. H. Nunberg and R. T. Schimke, J . Cell Biol. 83, 394 (1979). 49. R. J. Kaufman, P. C. Brown and R. T. Schimke, PNAS 76, 5669 (1979). 50. P. E. Barker, Cancer Genet. Cytogenet. 5, 81 (1982). 51, W. F. Flintoff, S. V. Davidson and L. Siminovitch, S m t i c Cell Mol. Genet. 2,245 (1976). 52. D. A. Haber, S. M. Beverley, M. L. Kiely and R. T. Schimke, JBC 256, 9501 (1981). 53. J. L. Hamlin and C. Ma, BBA 1087, 107 (1990). 54. F. M. Sirotnak, D. M. Moccio, L. E. k ll e h e r and L. J. Goutas, Cancer Res. 41, 4447 (1981). 55. T. D. Kempe, E. A. Swyryd, M. Bruist and G . R. Stark, Cell 9, 541 (1976). 56. G. M. Wahl, R. A. Padgett and G. R. Stark, ]BC 254, 8679 (1979). 57. C. Rossana, L. G. Rao and L. F. Johnson, MCBiol2, 1118 (1978). 58. F. Baskin, S. C. Carlin, P. Kraus, M. Friedkin and R. N. Rosenberg, Mol. Phannacol. 11, 105 (1975). 59. 1. L. Andrulis, C. Duff, S. Evans-Blackler, R. Worton and L. Siminovitch, MCBiol3,391 (1983). 60. J. S. Gantt, C.-S. Chiang, W. G. Hatfield and S. M. Arfin, JBC 255, 5808 (1980). 61. M. Debatisse, M. Berry and G . Buttin, MCBiol2, 1346 (1982). 62. P. A. Hoffee, S. W. Hunt I11 and J. Chiang, S m t i c Cell Mol. Genet. 8, 465 (1982). 63. C.-Y. Yeung, L. G. Frayne, M. R. Al-Ubaidi, A. G. Hook, D. A. Wright and R. E. Kellems, JBC 258, 15179 (1983). 64. M. Meuth and H. Green, Cell 3, 367 (1974). 65. W. H. Lewis and J. A. Wright, J . Cell. Physiol. 97, 73 (1978). 66. C. R. Ashman and R. L. Davidson, MCBiol 1, 254 (1981). 67. L. Akerblom, A. Ehrenberg, A. Graslund, H. Lankinen, P. Reichard and L. Thelander, PNAS 78, 2159 (1981). 68. J. M. Cocking, P. N. Tonin, N. M. Stokoe, E. J. Wensing, W. H. Lewis and P. R. Srinivasan, S m t i c Cell Mol. Genet. 13, 221 (1987). 69. D. J. Chin, K. L. Luskey, J. R. Faust, R. J. MacDonald, M. S. Brown and J. L. Goldstein, PNAS 79, 7704 (1982). 70. E. C. Hardeman, H . 4 . Jenke and R. D. Simoni, PNAS 80, 1516 (1983). 71. L. R. Beach and R. D. Palmiter, PNAS 78, 2110 (1981). 72. G . G . Gick and K. S. McCarty, Sr., JBC 257, 9049 (1982). 73. B. D. Crawford, M. D. Enger, J. K. GriBth, J. L. Hanners, J. L. Longmire, A. C. Munk, R. L. Stallings, J. G. Tesmer, R. A. Walters and C. E. Hildebrand, MCBiol5,320 (1985). 73a. J. F. Ash, R. M.Fineman, T. Kalka, M. Morgan and B. Wire, J . Cell Biol. 99,971 (1984). 74. B. P. Kopnin, Cytogenet. Cell Genet. 30, 11 (1981). 75. J. L. Biedler, in “Gene Amplification” (R. T. Schimke, ed.), pp. 39-45. CSH Lab, Cold Spring Harbor, New York, 1982. 76. A. D. Lewis, I. D. Hickson, C. N. Robson, A. L. Harris, J. D. Hayes, S. A. Griffiths, M. M. Manson, A. E. Hall, J. E. Moss and C. R. Wolf, PNAS 85, 8511 (1988). 77. D. P. Suttle and G. R. Stark, JBC 254, 4602 (1983). 78. B. B. Levinson, B. Ullman and D. W. Martin, Jr., JBC 254, 4396 (1983).
DNA SEQUENCE AMPLIFICATION
235
A. P. Young and G . M. Ringold, JBC 258, 11260 (1983). P. G. Sanders and R. H. Wilson, EMBO J . 3, 65 (1984). P. S. Mamont, M. C. Duchesne, J. Grove and C. Tardif, Erp. Cell Res. 115, 387 (1978). J. Choi and I. E. Schemer, Somatic Cell Mol. Genet. 7 , 219 (1981). M. Debatisse, M . Berry and G. Buttin, MCBiol 2, 1346 (1982). 84. E. Huberman, C. K. McKeown and J. Friedman, PNAS 78, 3151 (1981). 85. B. A. Criscuolo and S. S. Krag, J . CeU Biol. 94, 586 (1982). 86. J. S. Gantt, C. A. Bennett and S. M. Arfin, PNAS 78, 5367 (1981). 87. T.4. Su, H . 4 . 0. Bock, W. E. O’Brien and A. L. Beaudet, ]BC e56, 11826 (1981). 88. I. B. Roninson, J. E. Chin, K. Choi, P. Gros, D. E. Housman, A. Fojo, D.-W. Shen, M. M. Gottesman and I. Pastan, PNAS 83, 4538 (1986). 89. J. Bourhis, L. J. Goldstein, G. Riou, I. Pastan, M. M. Gottesman and J. Benard, Cancer Res. 49, 5062 (1989). 90. C. R. Fairchild, S . P. Ivy, C.-S. Kao-Shan, J. Whang-Peng, N. Rosen, M. A. Israel, P. W. Melera, K. H. Cowan and M. E. Goldsmith, Cancer Res. 47, 5141 (1987). 91. A. M. Van der Bliek, F. Baas, T. Van der Velde-Koerts, J. L. Biedler, M. B. Meyers, R. F. Ozols, T. C. Hamilton, H. Joenje and P. Borst, Cancer Res. 48, 5927 (1988). 92. M. Raymond, E. Rose, D. E. Housman and P. Gros, MCBiol 10, 1642 (1990). 93. A. V. Gudkov, 0. B. Chernova, A. R. Kazarov and B. P. Kopnin, Somatic Cell Mol. Genet. 13, 609 (1987). 94. R. Dalla Favera, F. Wong-Staal and R. C. Gallo, Nature 299, 61 (1982). 95. K. Alitalo, M. Schwab, C. C. Lin, H. E. Varmus and J. M. Bishop, PNAS 80, 1707 (1983). 96. M. Schwab, K. Alitalo, H. E. Varmus, J. M. Bishop and D. George, Nature 303, 497 (1983). 97. S . J. Collins and M. T. Groudine, PNAS 80, 4813 (1983). 98. G. Carloni, B. Champ, M.-J. Vilarem, C. Lavialle and R. Cassingena, FEBS Lett.233,268 (1988). 99. I. Garcia, P.-Y. Dietrich, M. Aapro, G. Vauthier, L. Vadas and E. Engel, Cancer Res. 49, 6675 (1989). 100. E. Bogenmann, H. Moghadam, Y. A. DeClerck and A. Mock, Cancer Res. 47, 3808 (1987). 101. P. E. Kiefer, G . Bepler, M . Kubasch and K. Havemann, Cancer Res. 47, 6236 (1987). 102. B. E. Johnson, R. W. Makuch, A. D. Simmons, A. F. Gazdar, D. Burch and A. W. Cashell, Cancer Res. 48, 5163 (1988). 103. M. M. Nau, B. J. Brooks, Jr., D. N. Carney, A. F. Gazdar, J. F. Battey, E. A. Sausville and J. D. Minna, PNAS 83, 1092 (1986). 104. A. J. Wong, J. M. Ruppert, J. Eggleston, S. R. Hamilton, S. B. Baylin and B. Vogelstein, Science 233, 461 (1986). 105. M. Shiraishi, M. Noguchi, Y. Shimosato and T. Sekiya, Cancer Res. 49, 6474 (1989). 106. H. Ohno, S. Fukuhara, Y. Arita, S. Doi, R. Takahashi, H. Fujii, T. Honjo, T. Sugiyamaand H. Uchino, Cancer Res. 48, 4959 (1988). 107. B. K. Suchy, M. Sarafoff, R. Kerler and H. M. Rabes, Cancer Res. 49, 6781 (1989). 108. M . Fukumoto, D. H. Shevrin and I. B. Roninson, PNAS 83, 6846 (1988). 109. H. Kato, K. Okamura, Y. Kurosawa, T. Kishikawa and K. Hashimoto, FEBS Lett.250,529 (1989). 110. G . Brodeur, F. A. Hayes, A. A. Green, J. T. Casper, J. Wasson, S. Wallach and R. C. Seeger, Cancer Res. 47, 4248 (1987). 111. J. Filmus, J. M. Trent, R. Pullano and R. N. Buick, Cancer Res. 46, 5179 (1986). 112. W. Fukumoto, R. D. Estensen, L. Sha, G. J. Oakley. L. B. Twiggs, L. L. Adcock, L. F. Carson and I. B. Roninson, Cancer Res. 49, 1693 (1989). 79. 80. 81. 82. 83.
236
JOYCE L. HAMLIN ET AL.
L.-C. Wang, W. Vass, C. Gao and K. S. S. Chang, Cancer Res. 47, 4192 (1987). M. Quintanilla, K. Brown, M. Ramsden and A. Balmain, Nature 322, 78 (1986). U . Rovigatti, D. K. Watson and J. J. Yunis, Science 232, 398 (1986). J.-B. Park, J. S. Rhim, S.-C. Park, S.-W. Kimm and M. H. Kraus, Cancer Res. 49, 6605 (1989). 117. M. S. Berger, G. W. Locher, S. Saurer, W. J, Gullick, M. D. Waterfield, B. Groner and N. E. Hynes, Cancer Res. 48, 1238 (1988). 118. M. Tal, M. Wetzler, Z. Josefberg, A. Deutch, M. Gutman, D. Assaf, R. Kris, Y. Shiloh, D. Givol and J. Schlessinger, Cancer Res. 48, 1517 (1988). 119. T. Yamamoto, N. Kmata, H. Kawano, S. Shimizu, T. Kuroki, K. Toyoshima, K. Rikimaru, N. Nomura, R. Ishizaki, I. Pastan, S. Gamou and N. Shimizu, Cancer Res. 46,414 (1986). 120. M. C. Hollstein, A. M. Smits, C. Galiana, H. Yamasaki, J. L. Bos, A. Mandard, C. Partensky and R. Montesano, Cancer Res. 48, 5119 (1988). 121. J. Ro, S . M. North, G . E. Gallick, G. N. Hortobagyi, J. U. Gutterman and M. Blick, Cancer Res. 48, 161 (1988). 122. P. A. Humphrey, A. J. Wong, B. Vogelstein, H. S. Friedman, M. H. Werner, D. D. Bigner and S. H. Bigner, Cancer Res. 48, 2231 (1988). 123. L. T. Malden, U. Novak, A. H. Kaye and A. W. Burgess, Cancer Res. 48, 2711 (1988). 124. T. Tsuda, E. Tahara, G. Kajiyama, H. Sakamoto, M. Terada and T. Sugimura, Cancer Res. 49, 5505 (1989). 125. M. C. Yoshida, M. Wada, H. Satoh, T. Yoshida, H. Sakamoto, K. Miyagawa, J. Yokota, T. Koda, M. Kakinuma, T. Sugimura and M. Terada, PNAS 85, 4861 (1988). 126. G . M. Wahl, Cancer Res. 49, 1333 (1989). 127. G . L. Nicholson, Cancer Res. 47, 1473 (1987). 128. E. Otto, S. McCord and T. D. T h y , JBC 264, 3390 (1989). 129. T. D. Tlsty, B. H. Margolin and K. Lum, PNAS 86, 9441 (1989). 130. J. A. Wright, H. S. Smith, F. M. Watt, M. C. Hancock, D. L. Hudson and G. R. Stark, PNAS 87, 1791 (1990). 131. C. A. Prody, P. Dreyfus, R. Zamir, H. Zakut and H. Soreq, PNAS 86, 690 (1989). 132. D. M. Robins, S. Ripley, A. S. Henderson and R. Axel, Cell 23, 29 (1981). 133. B. Robert de Saint Vincent, S. Delbruck, W. Eckhart, J. Meinkoth, L. Vitto and G . Wahl, Cell 27, 267 (1981). 134. G. M. Wahl, B. Robert de Saint Vincent and M. L. DeRose, Nature 307, 516 (1984). 135. S. M. Carroll, P. Gaudray, M. L. DeRose, J. F. Emery, J. L. Meinkoth, E. Nakkim, M. Subler, D. D. von Hoff and G. M. Wahl, MCBiol7, 1740 (1987). 136. D. L. George and U. Franke, Cytogenet. Cell Genet. 28, 217 (1980). 137. L. A. Quinn, G. E. Moore, R. T. Morgan and L. K. Woods, Cancer Res. 39, 4914 (1979). 138. G. Levan and A. Levan, in “Gene Amplification” (R. T. Schimke, ed.), pp. 91-97. CSHLab, Cold Spring Harbor, New York, 1982. 139. V. V. Sakonov, 0.S. Kapitsa, M. B. Sapegina and S. I. Gorodetski, Genetika (Moscow) 21, 1974 (1985). 140. A. V. Gudkov and B. P. Kopnin, Sou. Sci. Res. D:P h y s i o c h . Biol. 7 , 95 (1987). 141. M. Seabright, Lancet 2, 971 (1971). 142. B. J. Trask and J. L. Hamlin, Genes Dew. 3, 1013 (1989). 143. C. Fougere-Deschatrette, R. T. Schimke, D. Weil and M. C. Weiss, J . Cell Biol. W,497 (1984). 144. W. F. Flintoff, M. K. Weber, C. R. Nagainis, A. K. Essani, D. Robertson and W. Salser, MCBiol2, 275 (1982). 145. C. J. Bostock and A. T. Sumner, “The Eukaryotic Chromosome,” pp. 256-259. NorthHolland Pub]., Amsterdam, 1978. 113. 114. 115. 116.
DNA SEQUENCE AMPLIFICATION
237
J. L. Biedler and B. A. Spengler, JNCl57, 683 (1976). J. A. Lewis, J. L. Biedler and P. W. Melera, J . Cell B i d . 94, 418 (1982). D. Pinkel, T. Straume and J. W. Gray, PNAS 83, 2934 (1986). B. J. Trask, D. Pinkel and G. van den Engh, Genomics 5, 710 (1989). 150. C. J. Bostock and E. M . Clark, Cell 19, 709 (1980). 151. B. McClintock, Genetics 26, 234 (1941). 152. B. McClintcEk, Science 226, 792 (1984). 153. R. J. Kaufman and R. T. Schimke, MCBiol 1, 1069 (1981). 154. K . A. Smith, P. A. Gorman, M. B. Stark, R. Groves and G. R. Stark, Cell (in press). 155. B. A. Hamkalo, P. J. Farnham, R. Johnston and R. T. Schimke, PNAS 82, 1026 (1985). 156. P. E. Barker and T. C. Hsu, J N C l 6 2 , 257 (1979). 157. S. Takayama and Y. Uwaike, Chromosoma 97, 198 (1988). 158. R. J. Kaufman, P. C. Brown and R. T. Schimke, MCBiol 1, 1084 (1981). 159. J. L. Biedler, T.-D. Chang, R. H. F. Peterson, P. W. Melera, M. B. Meyers and B. A. Spengler, in “Rational Basis of Chemotherapy,” (B. A. Chabner, ed.), pp. 71-92. Liss, New York, 1983. 160. R. N. Johnston, S. M. Beverley and R. T. Schimke, PNAS I, 3711 (1983). 161. I. Roninson, NARes 11, 5413 (1983). 161a. M. Fukumoto, D. H. Shevrin and I. B. Roninson, PNAS 85, 6846 (1988). 162. I. B. Roninson, H. T. Abelson, D. E. Housman, N. Howell and A. Varshavsky, Nature 309, 626 (1984). 163. J. M. Croop, B. C. Guild, P. Gros and D. E. Housman, Cancer Res. 47, 5982 (1987). 164. J. E. Looney and J. L. Hamlin, MCBiol7, 569 (1987). 165. E. Giulotto, 1. Saito and G. R. Stark, EMBOJ. 5, 2115 (1986). 166. M. Ford and M. Fried, Cell 45, 425 (1986). 167. C. Passananti, B. Davies, M. Ford and M. Fried, E M B O J . 6, 1697 (1987). 168. J. Schilling, S. Beverley, C. Simonsen, G. Crouse, D. Setzer, J. Feagin, M. McGrogan, N. Kohlmiller and R. T.Schimke, in “Gene Amplification”(R. T. Schimke, ed.), pp. 149-153. CSHLab, Cold Spring Harbor, New York, 1982. 169. C . Tyler-Smith and C. J. Bostock, J M B 153, 237 (1981). 170. 0. Brison, F. Ardeshir and G. R. Stark, MCBiol2, 578 (1982). 171. A. M. Van der Bliek, T. Van der Velde-Koerts, V. Ling and P. Borst, MCBiol 6, 1671 (1986). 172. M. H. L. de Bruijn, A. M. Van der Bliek, J. L. Biedler and P. Borst, MCBiol 6, 4717 (1986). 173. 0. Hyrien, M. Debatisse, G. Buttin and B. Robert de Saint Vincent, EMBOJ. 6, 2401 (1987). 174. E. Legouy, N. Fossar, G. Lhomondand 0. Brison, SotnaticCellMol. Genet. 15,309(1989). 175. C. Ma, J. E. Looney, T.-H. Leu and J. L. Hamlin, MCBiol8, 2316 (1988). 176. J. C . Ruiz and G. M. Wahl, MCBiol8, 4302 (1988). 176a. M. W. Heartlein and S. A. Latt, NARes 17, 1697 (1989). 177. C. Ma, T.-H. Leu and J. L. Hamlin, MCBiol 10, 1338 (1990). 178. J. M. Roberts and R. Axel, Cell 29, 109 (1982). 179. J. M. Roberts, L. B. Buch and R. Axel, Cell 33, 53 (1983). 180. E. H. Blackburn, Cell 37, 7 (1984). 181, L. A. Chasin, L. Graf, N. Ellis, M. Landzberg and G. Urlaub, in “Gene Amplification”(R. T. Schimke, ed.), pp. 161-165. CSHLab, Cold Spring Harbor, New York, 1982. 182. M. L. Pall, PNAS 78, 2465 (1981). 183. A. Varshavsky, PNAS 78, 3673 (1981). 184. T. D. Tlsty, P. C. Brown and R. T.Schimke, MCBiol4, 1050 (1984). 146. 147. 148. 149.
238
JOYCE L. HAMLIN ET AL.
185. R. C. Sharma and R. T. Schimke, Cancer Res. 49, 3861 (1989). 186. P. C. Brown, T. D. Tlsty and R. T. Schimke, MCBiol3, 1097 (1983). 187. R. N. Johnston, J. Feder, A. B. Hill, S. W. Sherwood and R. T. Schimke, MCBiol6,3373
(1986). 188. T. Kleinberger, E. Sahar and S. Lavi, MCBfol9, 979 (1988). 189. G. C. Rice, C. Hoy and R. T. Schimke, PNAS 83, 5978 (1986). 189a. G. C. Rice, V. Ling and R. T. Schimke, PNAS 84,9261 (1987). 190. S. D. Young, R. S. Marshall and R. P. Hill, PNAS 85, 9533 (1988). 191. T. C. Lee, N. Tanaka, P. W. Lamb, T. M. Gilmer and J. C. Barrett, Science 241,79 (1988). 192. J. D. Schuetz, K. M. Gorse, I. D. Goldman and E. H. Westin, JBC 263, 7708 (1988). 193. A. Varshavsky, Cell 25, 561 (1981).
194. S. Lavi, PNAS 78, 6144 (1981). 195. B. D. Mariani and R. T. Schimke, JBC 259, 1901 (1984). 196. C. A. Hoy, G. C. Rice, M. Kovacs and R. T. Schimke, JBC 262, 11927 (1987). 197. J. A. Huberman and A. D. Riggs, J M B 32, 327 (1968). 198. P. Hahn, W. F. Morgan and R. B. Painter, Somatic Cell Mol. Genet. 13, 597 (1987). 199. D. C. Creasy and P. 0. P. Ts’o, Cancer Res. 43, 6298 (1988). 200. R. T. Schimke, JBC 263, 5989 (1988). 201. 0. Hyrien, M. Debatisse, G . Buttin and B. Robert de Saint Vincent, EMBO J . 7, 407 (1988). 202. J. Nalbantoglu and M. Meuth, NARes 14, 8361 (1986). 203. S. M. Carroll, M. L. DeRose, P. Gaudray, C. M. Moore, D. R. Needham-VanDevanter, D. D. von Hoff and G. M. Wahl, MCBiol8, 1525 (1988). 204. J. C. Ruiz, K. Choi, D. D. von Hoff, I. B. Roninson and G. M. Wahl, MCBiol9, 109 (1989). 205. D. D. von Hoff, D. R. Needham-VanDevanter, J. Yucel, B. E. Windle and G. M. Wahl, PNAS 85, 4804 (1988). 206. B. J. Maurer, E. Lai, B. A. Hamkalo, L. Hood and G . Attardi, Nature 327, 434 (1987). 207. P. Riva, C. De Giuli Morghen and L. Larizza, S m t i c Cell Mol. Genet. 15, 377 (1989). 208. R. M. Harshey and A. I. Bukhari, PNAS 78, 1090 (1981). 209. D. J. Galas and M. Chandler, PNAS 78, 4858 (1982). 210. E. Giulotto, C. Knights and G. R. Stark, Cell 48, 837 (1987). 211. M . Rolfe, C. Knights and G. R. Stark, Cancer Cells 6, 325 (1988). 212. Y. Berko-Flint, S . Karby, D. Hassin and S. Lavi, MCBiol 10, 75 (1990). 213. A. C. Y. Chang, J. H. Nunberg, R. J. Kaufman, H. A. Erlich, R. T. Schimke and S . N. Cohen, Nature 275, 617 (1978). 214. A. M. Carothers, G. Urlaub, N. Ellis and L. A. Chasin, NARes 11, 1997 (1983). 215. P. W. Melera, J. P. Davide, C. A. Hession and K. W. Scotto, MCBiol4, 38 (1984). 216. J. H. Nunberg, R. J. Kaufman, A. C. Y. Chang, S. N. Cohen and R. T. Schimke, Cell 19, 355 (1980). 217. G. F. Crouse, C. C. Simonsen, R. N. McEwan and R. T. Schimke, JBC 257,7887 (1982). 218. J. D. Milbrandt, J. C. Azizkhan, K. S. Greisen and J. L. Hamlin, MCBiol3, 1266 (1983). 219. A. M. Carothers, G. Urlaub, N. Ellis and L. A. Chasin, NARes 11, 1997 (1983). 220. P. W. Melera, C. A. Hession, J. P. Davide, K. W. Scotto, J. L. Biedler, M. B. Meyersand S. Shanske, JBC 257, 12939 (1982). 221. K. W. Kinder, S. H. Bigner, D. D. Bigner, J. M. Trent, M. L. Law, S. J. O’Brien, A. J. Wong and B. Vogelstein, Science 236, 70 (1987). 222. B. A. Zehnbauer, D. Small, G. M. Brodeur, R. Seeger and B. W. Vogelstein, MCBiol8, 522 (1988). 223. R. J. Kaufman and P. A. Sharp, J M B 159, 601 (1982).
DNA SEQUENCE AMPLIFICATION
239
224. C. S . Gasser, C. C. Simonsen, J. W. Schilling and R. T. Schimke, PNAS 79, 6522 (1982). 225. G. F. Crouse, R. N. McEwan and M. L. Pearson, MCBiol3, 257 (1983). 226. B. Robert de Saint Vincent, S. Delbruck, W. Eckhart, J. Meinkoth, L. Vitto and G. Wahl, Cell 27, 267 (1981). 227. R. J. Kaufman, P. Murtha, D. E. Ingolia, C.-Y. Yeung and R. E. Kellems, PNAS 83,3136 (1986). 228. R. Konig, G . Ashwell and J. A. Hanover, PNAS 86, 9188 (1989). 229. G. Urlaub and L. A. Chasin, PNAS 77, 4216 (1980). 230. G. M. Wahl, V. Allen, S. Delbruck, W. Eckhart, J. Meinkoth, R. Padgett, B. Robert de Saint Vincent, J. Rubnitz, G. Stark and L. Vitto, in “Gene Amplification”(R. T. Schimke, ed.), pp. 167-175. CSHLab, Cold Spring Harbor, New York. 231. J. A. Wright, T. G . Alam, G. A. McClarty, A. Y. Tagger and L. Thelander, Somutic Cell Mol. Genet. 13, 155 (1987). 232. J. M. Cocking, P. N. Tonin, N. M. Stokoe, E. J. Wensing, W. H. Lewis and P. R. Srinivasan, Somutic Cell Mol. Genet. 13, 221 (1987). 233. N. H. Heintz and J. L. Hamlin, PNAS 79, 4083 (1982). 234. N. H. Heintz, J. D. Milbrandt, K. S. Greisen and J. L. Hamlin, Nature 302, 439 (1983). 235. W. C. Burhans, J. E. Selegue and N. H. Heintz, Bchem 25, 441 (1986). 236. T.-H. Leu and J. L. Hamlin, MCBid 9, 523 (1989). 237. W. C. Burhans, L. T. Vassilev, M. S. Caddle, N. H. Heintz and M. L. DePamphilis, Cell 62, in press (1990). 238. R. T. Hay and M. L. DePamphilis, Cell 28, 767 (1982). 239. B. J. Brewer and W. L. Fangman, Cell 51, 463 (1987). 240. K. A. Nawotka and J. A. Huberman, MCBiol8, 1408 (1988). 241. J, P. Vaughn, P. A. Dijkwel and J. L. Hamlin, Cell 61, 1075 (1990).
This Page Intentionally Left Blank
Molecular-Biology Approaches to Genetic Defects of the Mammalian Nervous System J. GRECORSUTCLIFFE*.~ AND GABRIELH. TRAVIS? *Vepatfment of Mokcuhr Biology Research Institute of Scripps Clinic LA Jolla, California 92037 tDepartment of Psychiatry University of Texas Southwestern Medical Center Valhs, Texas 75235 Neural Mutants .......................................... The rds Gene . . . . ............................... Secretogranin 111 .................................. Making Mutants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Getting All of the Genes . . . . . . . . . . . . . . . . . . . VI. Reprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . .................
1. 11. 111. IV. V.
243
It is known from RNA complexity and cDNA cloning analyses that there are thousands of genes whose expression is exclusive to or highly enriched in the vertebrate central nervous system (1).A prerequisite for understanding neural activity is coming to terms with the functions of these many genes: How do their protein products act and interact to form the nervous system? One effective means for studying brain molecular function is the isolation and characterization of genes encoding neuroactive substances, their synthetic enzymes, their receptors, and molecules that share sequence relationships with these proteins. These studies amount to a direct extension of classical biochemical analysis and represent a major endeavor of modern molecular neurobiology. A limitation of this research is that it can only produce information about molecules for which there are already known functions, and hence assays. It is thus uninformative about a considerable portion of neural molecules for which functions have not already been postulated. 1
To whom correspondence may be addressed. 24 1
Progress in Nucleic Acid Research and Molecular Biology. Vol. 41
Copyright 0 I991 by Academic Press, Inc. All rights of reproduction in any form reserved.
242
J . GREGOR SUTCLIFFE AND GABRIEL H . TRAVIS
Two alternative strategies that can be informative about novel brain proteins use complementary approaches. One involves the isolation and characterization of genes defined by neurological mutations, allowing direct correlations to be made between mutant neural phenotypes and protein structure. The second is the study of genes whose products are restricted to neural tissues, but whose functions are unknown. Recent advances in technology indicate that these approaches can be potentially as illuminating as more traditional studies on known brain proteins. Here, we discuss in detail examples of each of the two approaches. The eventual goal of all such studies is a broad understanding of the activities that are encoded by our genes.
1. Neural Mutants Genetic defects that affect neural function in humans and in experimental animals have been identified. More than 100 mouse strains with heritable neurological deficits have been described. The mutations responsible for most of these phenotypes have been mapped in the genome (2). Descriptions have been made of the anatomical and physiological abnormalities that result from many of the mutations (3).A disproportionate number affect the formation of myelin or the development of the cerebellum or the retina, because these processes occur predominantly during postnatal mouse maturation and defects in the resulting structures are at least partially compatible with life. Furthermore, abnormalities in myelin or the cerebellum lead to easily observable phenotypes.
A. Gene Isolation by Chromosomal Position How does one proceed from a mutation to a molecular understanding of the defective process? Two sorts of information are available. The first is the chromosomal location of the mutated locus, which can be used as starting information for isolation of the locus by physical methods. Advances in methods for establishing genetic linkage maps using restriction-fragment-length polymorphisms (RFLPs) have led to high-resolution genetic mapping for human mutations and even higher-resolution maps for mice, since recombinant inbred lines can be used to extend pedigrees. Because RFLPs are assayed by hybridization with DNA probes, the probe that detects the most tightly linked RFLP can be used to initiate saturation cloning of the surrounding region. Transcribed domains are then considered candidates for the sought locus, and each candidate gene is examined in detail. The recent isolations of the genes responsible for cystic fibrosis (4-6) and neurofibromatosis (7, 8) represent powerful illustrations of this strategy.
MAMMALIAN NERVOUS SYSTEM DEFECTS
243
B. Candidate Protein Approach The second sort information that can be used to isolate the gene corresponding to a mutated locus is the phenotype of the mutant animal. Careful study of the components of the biological structure affected by the mutation can illuminate its molecular makeup, which then provides a list of candidate proteins whose structures can be compared among normal and affected individuals. Examples of this can be found for the structure of myelin, which is defective in several mouse neurological mutants (9, 10).The major proteins of myelin have been characterized, and the genes encoding them have been isolated. The structures of these genes and their transcripts in several mutant lines have been examined. The defect in the mutant shiuerer is a deletion of part of the gene encoding myelin basic protein (11-13). The defect of jimpy mice is a point mutation abrogating one of the RNA splice junctions within the gene encoding myelin proteolipid protein (14). The structures of these proteins and their ultrastructural localizations have been used to interpret the specific pathologies of the mutant mice.
C. Candidate mRNA Approach Information about mutant phenotypes can be used in a conceptually similar, but methodologically quite distinct, manner to determine the genes responsible for neural phenotypes. In the example below, phenotype was used to form a hypothesis about the particular cells in which a gene is expressed, and molecular technology was used to isolate clones of RNAs expressed specifically in those cells that served as candidates for deeper investigation. Conceptually, this approach represents a different way in which to describe the components of a biological structure.
II. The rds Gene A. Mutant Phenotype The “retinal-degeneration-slow” (rds) gene (15) of mice is located on chromosome 17. Mice homozygous for rds exhibit normal retinal development until the first postnatal week. At that time, photoreceptor neurons normally begin forming outer segments, the tightly packed arrays of closed membrane discs where phototransduction takes place. In the mutant retinas, only rudimentary outer segments lacking compact disc structures are formed (16, 17). The photoreceptor neurons appear otherwise normal until the third postnatal week, and are capable of phototransduction, albeit at a lower than normal efficiency (18).After 3 weeks, they gradually degenerate,
244
J. GREGOR SUTCLIFFE AND GABRIEL €1. TRAVIS
such that cell loss is nearly complete by 1 year (19).Degeneration appears to be limited to photoreceptor neurons. Experiments with tetraparental mouse chimeras (20, 21) demonstrated that genotypically mutant photoreceptor neurons degenerated, even when surrounded by pigment epithelium that was genetically wild-type, suggesting that rds acts in photoreceptor neurons. Heterozygotes exhibit a mild phenotype, with short outer segments that contain irregularly arranged and vacuolated discs, and late-onset, slow cell loss (22). These observations suggested (23) that rds on chromosome 17 expresses an mRNA that is exclusive to photoreceptor neurons.
B. Isolation of Clones of Candidate mRNAs To test this hypothesis, a highly sensitive and essentially quantitative method of subtractive hybridization (24),which used cloned driver and phenol-enhanced hybridization, was used for isolating clones of mRNAs present exclusively in photoreceptor neurons. To accomplish this, advantage was taken of retinal-degeneration (rd)mice, an unrelated genetically blind strain that lacks photoreceptor neurons. An RNA present in normal but not rd retina can be operationally considered photoreceptor-specific. Thus, when cloned rd retinal RNA sequences were subtracted from radioactive cDNA made from normal mouse retinal RNA, the resulting probe allowed isolation of candidate cDNA clones of retinal mRNAs, each of which was tested by Northern RNA blot hybridization to verify that it was a clone of a truly photoreceptor neuron-specific mRNA (23). The resulting group of clones corresponded to 12 distinct photoreceptor neuron-specific mRNAs. One of these mRNAs, represented by 70 independent isolates, encoded the known photoreceptor-specific molecule, opsin.
C. Identification of the rds Gene 1. CANDIDATE ON CHROMOSOME 17 Representative clones corresponding to each of the 12 photoreceptorspecific mRNAs were hybridized on Southern blots to restriction endonuclease digests of genomic DNA isolated from a panel of mouse x hamster somatic cell hybrid lines that carry various subsets of mouse chromosomes and, as controls, digests of genomic DNA from hamster and mouse. One clone detected the mouse-specific DNA fragments in the hybrid samples if, and only if, the hybrid line carried mouse chromosome 17 (23).Thus, it was a clone of a photoreceptor neuron-specific gene on chromosome 17 and, given the hypothesis, a candidate for rds.
2.
mRNA SELECTIVELY AFFECTEDBY
rds MUTATION
The clone was used as a probe for Northern blot hybridization to mRNA extracted from retinas of wild-type and rds mutant mice, taken at 2 months,
245
MAMMALIAN NERVOUS SYSTEM DEFECTS
before full degeneration occurs. mRNAs of 1600 and 2700 nucleotides were detected in the wild-type samples. These were of reduced abundance and approximately 10,000 nucleotides larger in the extracts from mutant retinas. When, as a control, the same blot was hybridized with an opsin clone, it was seen that opsin RNAs were present and of normal size; hence, the photoreceptors were still mostly intact in the rds mutant retinas used in this experiment. This suggested that the two mRNAs detected by the clone were preferentially affected by the rds mutation (23).
3. rds MUTATION CAUSED BY
LARGE
INSERTION
From the nucleotide sequence of full-length cDNA clones of these two mRNAs, an open reading frame that encodes a 346-aminoacid protein (translated in Fig. 1)was identified. The two RNAs detected in the Northern blot experiment are accounted for by alternate polyadenylation sites: The two mRNAs encode identical putative 346-residue proteins. Probes derived from the putative protein-coding region of the cDNA were used on Southern blots to examine restriction endonuclease digests of genomic DNA from wild-type and rds mutant mice. The probes detected additional or larger fragments in the mutant sample (23). Clones of the corresponding genes were isolated from wild-type and mutant mouse genomic DNA, and their nucleotide sequences in the protein-coding region were determined. The mutant gene sequence contained an insertion of approximately 10,OOO nucleotides relative to the wild-type gene sequence. This insertion disrupts the proteincoding potential of the gene at amino acid 258, explaining the mechanism of the rds mutation.
D. The rds Product Is a Membrane-associated Glycoprotein 1. HIGHINTERSPECIES CONSERVATION The putative 346-aminoacid sequence (Fig. 1) is novel. The sequence contains four uncharged regions and two potential sites for N-linked glycosylation, and thus predicts a membrane-associated glycoprotein of 39 kDa. The sequences of the rat (25), human (26), and bovine (27) homologs have recently been determined. The sequences (compared in Fig. 1)exhibit 85% mutual identity over their entire 346-residue lengths (26); the differences are largely conservative and preserve putative membrane-spanning regions and one of the sites for glycosylation. The least conserved region is the carboxyl-terminal tail. 2. DETECTIONIN RETINAL EXTRACTS To detect this putative protein, a series of peptides corresponding to nonoverlapping regions of the amino-acid sequence was chemically synthesized and coupled to an immunogenic carrier protein; the conjugates were
m r h b
m
r
h
b
m r h
MALLKVKFDQKKRVKLAQGLWLMNWLSVLAGIVLFSLGLFLKIELRKRSDVMDNSESHFVPNSLIGVGVLSCVFNSLAGKICYDALD
MALLKVKFDQKKRVKLAQGLWLMNWFSVLAGIIIFSLGLFLKIEW(RSDVM"SESHNPNSLIGMGVLSCVFNSLAGK1CYDALD ALLKVKFDQKKRVKLAQGLWLMNWFSVLAGIIIFGLGLFLKIEW(RSDVMNNSESHFVPNSLIGVGVLSCVFNSLAGK1CYDALD
PAKLAKWKPTLKPYLAVCIFFNVILFLVALCCFLLRGSLESTLAYGLKNGMKYYRDTDTPGRCFMKKTIDMLQIEFKCCGNNGFRDW PAKYAKWKPWLKLYLAVCVFFNVILFLVALCCFLLRGSLESTLAYGLKNGMKYYRDTDTPGRCFMKKTIDMLQIEFKCCGNNGFRDW
PAKYARWKPTLKPYLAICVLFNIILFLVALCCFLLRGSLENTLGQGLKNGMKYYRDTDTPGRCFMKKTIDM~IEFKCCGNNGFRDW PAKYAKWKPWLKPYLAVCVLFMNLFLVALCCFLLRGSLESTWIGLKNGMKFYRDTDTPGRCFMKKTIDMLQIEFKCCGNNGFRDW
b
FEINTI SNRY LDFSSKEVKDRIKSNVDGRYLVDGVPFSCCNPNSPRPCIQYQLTNNSAHYSYDHNTEEL
m r
MNSMGWTLLVWLFEVSITAGLRYLHTALESVSNPEDPECESEGWLLEKSVPETWKAFLESFKKLGKSNQVEAEGADAGPAPEAG MNSMGWTLLIWLFEVSITAGLRFLHTALESVSNPEDPEDPECESEGWLLENSVSETWKAFLESFKKLGKSNQVEAEAADAGQAPEAG MNSMGWTLLIWLFEVTITIGLRYLQTSLDGVSNPEESESESEGWLLEKSVPETWKAFLESVKKLGKGNQVEAEGAGAGAGQAPEAG MNTTGAVTLLVWLFEVTITVGLRYLHTALEGMANPEDPECESEGWLLEKSVPETWKAFLESVKKLGKGNQVEAEGEDAGQAPAAG
h
b
FIG. 1. Cross-species comparison of rds proteins. The sequences in single-letter code of the rds proteins from mouse (m), rat (r), human (h), and bovine (b)are aligned; all are of identical length, except that the bovine sequence is deduced from a less-than-full-length clone. Residues conserved among all four species are flanked by light shading. The four putative membrane-spanning domains are flanked by heavy shading. The single conserved potential site for N-linked glycosylation is indicated by an asterisk. (Adapted from 26.)
MAMMALIAN NERVOUS SYSTEM DEFECTS
247
used to elicit antisera in rabbits. The sera detected an approximately 39-kDa protein in Western immunoblots prepared with extracts from wild-type retina that was absent from extracts of rd mutant retina lacking photoreceptor neurons (28).The immunoreactivity was blocked by incubation of the antisera with the synthetic peptide immunogen.
3. BIOCHEMICAL CHARACTERIZATIONS Treatment of the retinal extracts with endoglycosidase F (endo-@&"acetylglucosaminidase F, EC 3.2.1.96), an enzyme that removes N-linked glycans, reduced the apparent size of the immunoreactive target protein to about 36 kDa, indicating that the rds protein is a glycoprotein, probably containing one glycan, as suspected from the primary sequence comparisons. When extracts were prepared under nonreducing conditions, the mobility of the protein was about 75 kDa, suggesting that it exists in vim, covalently coupled to another protein, possibly as a homodimer (28).When outer segments were isolated and subjected to Triton X-114 phase separation, the rds protein was found to be restricted to the membrane fraction. Thus, rds encodes a 346-residue membrane glycoprotein that undergoes covalent interactions.
E. Histological Characterizations 1. EXPRESSION SPECIFICALLY IN RODS AND CONES Analyses by in situ hybridization of retinas from normal mice showed the
rds mRNA to be restricted to inner segments of photoreceptor neuron cell bodies (28); in mice, these are predominantly rod cells. When a mouse developmental series was examined by this method, initial detection was at postnatal day 5. In predegenerate retina from rds mice, hybridization was also observed over photoreceptor neuron cell bodies, although at lower intensity than that of normal retinas, suggesting that the large transcript of the mutant rds gene is less stable than that from the wild-type gene. The rds RNA was also detected by in situ hybridization in the human retinal fovea, which is a pure population of cone cells, suggesting that rds is expressed in both rod and cone photoreceptor neurons.
2. DETECTIONIN OUTER SEGMENT DISCS The antipeptide sera in thin sections of retina reacted almost exclusively with the photoreceptor outer segment layer (28), the structure selectively affected by the rds mutation. In a developmental series, initial detection was at day 6, when outer segments are first observed. At the electron-microscope level, immunogold histochemistry showed the rds protein to be associated with the laminated disc membranes, distributed along the entire width of the discs and along the length of the outer segments.
248
J. GREGOR SUTCLIFFE A N D GABRIEL H. TRAVIS
F. Model for rds Protein Function The data show that the mouse rds gene encodes a photoreceptor neuronspecific, membrane-spanning glycoprotein that is localized specifically in outer segment discs. The rds mutation is the result of an approximately 10,000-nucleotide insertion within the gene encoding this membrane glycoprotein that alters its carboxyl terminus. Although its biochemical function has not been established, it is apparent that the rds protein must be an indispensable component of the outer segment, because the structure does not form in the mutant. 1. TOPOLOGICAL MODEL
From the features of the primary protein sequence preserved among species, and the biochemical observations confirming the hypotheses that the rds protein is N-glycosylated and membrane-associated, a formal model for its structure (Fig. 2) has been proposed (28), in which the protein is an integral membrane component with four membrane-spanning domains. Short amino- and long carboxyl-terminal tails are proposed to reside on the cytoplasmic side of the outer segment disc membrane. Two domains, one of which carries an N-linked glycan, protrude from the membrane face into the lumen of the disc. Twelve cysteine residues are conserved among the species, seven of which are located in the large lumenal domain. Some of the 12 cysteine residues are involved in covalent interactions with another protein. The apparent size of the rds protein under nonreducing conditions suggests that it may exist in uiuo as a homodimer. 2. INTRADISCALADHESION MOLECULE
Given its location in disc membranes and probable topology, the rds protein has been proposed (28) to function as an adhesion molecule for compacting the retinal outer segment discs. Several observations support this hypothesis. In the retinas of rds homozygotes, opsin-containing vesicular structures bud from the ciliary projections of the inner segments, but fail to compact into the normally observed laminated array of discs (29-31). In heterozygotes, disc compaction occurs but is not complete, as vacuolated gaps appear (22). These phenotypes are consistent with absence of an adhesion molecule in homozygotes and reduction in its amount due to haploinsufficiency in the heterozygotes. The phenotype of degeneration would hence be secondary to the proposed defect in adhesion. In the presence of tunicamycin, an antibiotic that inhibits protein glycosylation, explanted retinas exhibit a phenotype similar to that observed in rds mutants (32), suggesting that glycosylation, presumably of the rds
FIG. 2. Structural model for the rds protein within the disk membrane. (A) The four putative membrane-spanning domains are labeled M 1-M4. This orientation places the aminoand carboxyl-terminal domains (N and C, respectively), as well as a third domain (Cl-C3), within the cytoplasm and two loops (D1 and D2), ofwhich D2 contains the site of glycosylation (GLY), within the extracellular (intradiscal) space. (B) Functional model for the rds protein as an adhesion molecule responsible for the stabilization of outer segment discs. The proposed function of the rds protein is to maintain opposing faces of the outer segment discs in close apposition through homophilic interactions across the disc space. These attractive interactions may involve glycans and/or cysteine residues within the conserved D2 loops. DM, Disc membrane; PM, plasma membrane. (Adapted from 28.)
250
J. GREGOR SUTCLIFFE AND GABRIEL H. TRAVIS
protein, is necessary for compaction. This implicates the large intralumenal domain as important to the adhesive process. The sequence of this domain is highly conserved among species. Whether adhesion is mediated through homotypic or heterotypic interactions and whether covalent bonds are utilized for adhesion remain to be demonstrated. Some monoclonal antibodies directed to the carboxyl terminus of the bovine rds homolog react preferentially with the rims of the outer segment discs (27). Polyclonal reagents against the carboxyl terminus of the mouse rds protein react with antigen distributed across the entire width of the discs (28). These observations suggest that the cytoplasmic tail of the rds protein exists in conformationally distinct forms in the rim and the interior, with the monoclonal reagents detecting only the rim conformer. These different conformers may occur as a result of protein-protein interactions formed by molecules located in the interior.
G. Perspective These studies illustrate how one can use new technologies to proceed from a hypothesis about selective expression of a gene based on the phenotype of a mutant to detailed molecular, biochemical, and cell-biological understanding of the normal gene product. These data, along with information about the cellular phenotype of the mutant, can quickly lead to an explicit model for the function of the protein and an explanation for the defect in the mutant. The example discussed below illustrates how one can proceed from biochemical description of a gene whose product is of unknown function to the production and analysis of mice in which the gene is ablated. Again, it is the integration of information about phenotype with biochemical and sequence data that works heuristically to produce models for neural protein function.
111. Secretogranin 111 A. Isolation of cDNA Clones by Random Selection Because a large percentage of the mammalian genome is dedicated to neuronal function, a significant fraction of the total brain mRNA mass (approximately 25%) is composed of transcripts present only in the brain. For this reason, it is possible to isolate cDNA clones of brain-specific mRNAs by random picking from an unselected brain cDNA library (33). Subtractive hybridization schemes are useful for isolating clones of RNAs expressed with some regional anatomic specificity within the brain (24, 34-36). The challenge is to attribute functions to genes whose products are identified initially only by their sites of expression.
MAMMALIAN NERVOUS SYSTEM DEFECTS
251
A clone of one RNA, named 1B1075, was isolated randomly from a rat brain cDNA library (37). The clone hybridized on Northern RNA blots to mRNA from brain and pituitary gland, but not from other tissues.
6. Detection of the Protein Product 1. SIMILARITY TO CHROMOGRANINS/SECRETOGRANINS The nucleotide sequence of the 1B1075 cDNA clone contains an open reading frame encoding a protein of 533 amino acids. The deduced protein sequence (Fig. 3) contains an apparent signal sequence and is quite acidic, although it also contains many sets of tandem basic residues (37).Although novel in computer data-base searches, the sequence resembles sequences of members of the “chromogranin/secretogranin” family of secretory vesicle proteins, each of which is found in a subset of neuroendocrine tissues. A sequence motif characteristic of chromogranins/secretogranins (38) is present in the 1B1075 sequence.
2. PRESENCE I N
LARGE
PROJECTION NEURONS
To detect the putative protein, a series of nonoverlapping peptides corresponding to the deduced amino-acid sequence was synthesized and used to raise antisera in rabbits. The resulting antipeptide sera were used to examine tissue protein extracts by Western immunoblotting. Each detected a 57kDa, brain-specific protein, and the immunoreactivity could be blocked by the incubation of the antisera with synthetic peptide (37). These observations, particularly the coincidence of reactivity with antisera against a series of nonoverlapping fragments of the sequence, provided strong evidence that the 57-kDa protein is the product of the 1B1075 mRNA. In situ hybridization analysis with the probes derived from the cDNA clone and immunocytochemical analysis with the antipeptide antibodies both detected large projection neurons throughout the central nervous system, especially enriched in cortical structures such as the cerebral cortex, hippocampus, and cerebellum, although present in several subcortical nuclei (37). Much of the immunoreactivity was associated with fibers, both axons and, more rarely, dendrites, projecting from the cell bodies detected by in situ hybridization and immunohistochemistry.
3. ASSOCIATION WITH
VESICLES
The antipeptide sera were used to study the subcellular distribution of the 1B1075 protein at the electron-microscope level (37).Immunoreactivity was associated with small, rounded organelles resembling synaptic vesicles. In axons, these were most frequently observed in presynaptic boutons, clustered immediately adjacent to synaptic specializations. In apical dendrites, immu-
252
J . GREGOR SUTCLIFFE AND GABRIEL H. TRAVIS
+
+
-++
-
++
1 MPLPSCTQAP V P S S F L L P F S FITKATAQGP SARLFPAERR GARKNGVPLD
- --
+ - + + + -+ -+ + + 51 RLLDTGVGAQ QRPNSSFPKP EGSQDKSLHN RELSAERPLN EQIAEAEADK
-+
++
-
-+
+
-+--++
10 1 IKKTYPSESKPSERNFSSVD NLNLLKAITE KETVEKAKQS
-- - + ++ --
-
-++ -
+
--
+
-+
IRSSPFDNRL
-- +
+ -
151 NVDDADSTKN RKLTDEYDST KSGLDRKVQD DPDGLHQLDG T P L T A E D I V H
-
--
---
-
+ + + -+ -+ + + + 201 K I A T R I Y E E N DRGVFDKIVS KLLNLGLITE SQAHTLEDEV AEALQKLISK
-
--
-
-
--
-
- + + + -+ + - - 251 EANNYEEAPE KPTSRTENQD GKIPEKVTPV AATODGFTNR ENDDTVSNTL -++ ++---++-+301 TLSNGLERRT NPHRDDDFEE L Q Y F P N F U L T S I D S E K E A K E K E T L I T I M
+
- +
--
+
- --
+ + -+
- + +
351 KTLIDFVKMM V K Y G T I S P E E GVSYLENLDE TIALQTKNKL EKNTTDSKSK
-
-+ + -+-- + -++- +-+ + 401 LFPAPPEKSH EETDSTKEEA AKMEKEYGSL KDSTKDDNSN LGGKTDEAKG
+ -
--
- + + - +++ ++ +-- -
+ +-
-
-+
451 KTEAYLEAIR KNIEWLKKHN KKGNKEDYDL SKMRDFINQQ ADAYVEKGIL
+++
- +
-
+
501 IRKKPTPSNA STAACENGRQ PEPSNCSSKN N I A FIG. 3. Predicted amino-acid sequence of the secretogranin-111 protein. The predicted signal sequence is indicated in italics. Basic (+) and acidic (-) residues are indicated; the peptides used to elicit antisera are underlined. The sequence similar to members of the chromogranin/secretogranin family (i.e., residues 369-392) is shown in italics. (Adapted from 37.)
noreactive vesicles resembled vesicles of the smooth endoplasmic reticulum. Within the anterior pituitary gland, serial-section immunocytochemistry showed that immunoreactivity was in vesicles of adenocorticotropic hormone (ACTH)-producing corticotropic cells. Based on the sequence similarities with members of the chromogranin/secretograninfamily of proteins and the
MAMMALIAN NERVOUS SYSTEM DEFECTS
253
location of the 1B1075 protein in vesicles, it has been provisionally named secretogranin I11 (37).
C. Production of a Secretogranin-Ill Mutant Mouse 1. MAPPINGON CHROMOSOME 9 Southern blot analyses demonstrated that there are single genes encoding secretogranin 111 in both rats and mice. The mouse homolog of the gene has been located, by RFLP analysis of recombinant inbred mice, on chromosome 9, near the dil locus (39).This region of chromosome 9 is one of a small number of mammalian chromosome segments for which large numbers of deletion chromosomes exist in maintained mouse lines (40). Because there are several essential genes located in this segment, most of the deletions must be maintained in mice in the heterozygous state. DNA from mice that heterozygously carry each of the chromosome8 deletions was cleaved with restriction endonucleases and examined by Southern blot hybridization with a full-length secretogranin-111 cDNA clone. An RFLP distinguished the chromosome carrying the deletion from the nondeleted chromosome 9 of the diploid set. It was determined for each of the heterozygotes whether it contained one or two secretogranin-111 genes: If there were only one gene, the deletion covered the locus. Several deletions covering the secretogranin-111 gene were identified (41).
2. MICE ABLATEDFOR
THE
SECRETOGRANIN-111 GENE
The minimum region of overlap between these deletions was expected to contain the secretogranin-I11 gene. To test this hypothesis, mice heterozygous for two of the smaller deletions covering the gene were mated. Progeny receiving a deletion chromosome from each parent were identified by cosegregation of linked markers. DNA from the double-deletion progeny and from nondeleted littermates was isoalated and examined by Southern blotting with a full-length secretogranin-I11 cDNA clone. No sequences hybridizing to the clone were detected in the DNA from the double-deletion mice, indicating that the secretogranin-I11 gene is absent from both chromosomes (41).Thus, these progeny are null mutants for the secretogranin-I11 gene. 3. SUBTLEPHENOTYPES
Observations of these mutant animals indicate that, even though they lack this expressed gene, they are overtly normal, exhibiting no abnormalities of behavior or impairment of movement. They are not particularly vigorous breeders, but both males and females are fertile. If these mice have defects, study of the normal sites of secretogranin-111 expression in the
254
J . GREGOR SUTCLIFFE AND GABRIEL H . TRAVIS
mutants should be informative. Preliminary studies suggest that the number of vesicles in some normally expressing cell types may be reduced, possibly indicating that a secretogranin-111 deficit influences the rate of formation or stability of certain intracellular vesicles. Abnormalities in corticotrope-mediated stress responses have been anecdotally noted. As these observations suggest testable hypotheses, the mouse model should enable one to gain insights into the function of secretogranin 111 in normal physiology. It may also shed light on functions of the other members of this protein family that, to date, have been defined primarily by their structural features.
IV. Making Mutants
I
Clones of mRNAs with interesting distributions that encode proteins with provocative structures are plentiful; subtractive hybridization has the power to provide clones of most of the RNAs whose expression is limited to anatomically defined subsets of neurons that can be dissected. To interpret the physiological significance of each of these, it is, in most cases, necessary to produce mutants. However, convenient pairs of deletions, as used to produce the secretogranin-111 mutant, will not be available for most genes. Direct methods for interfering with the expression of particular genes are required. The available technologies include the use of homologous recombination in pluripotent embryo stem cells to introduce site-specific mutations into the mouse germ line (42); the use of transgenically supplied antisense RNA to inhibit the processing, transport, and translation of endogenous RNA molecules (43, 44); and the transgenic introduction of ribozymes (site-specific ribonucleases) into mice, in which they might specifically cleave and inactivate particular transcripts (45,46). Strategies of these sorts have the power to produce mutants in neural genes. Analysis of such mutants would be informed if the normal sites of protein expression were determined in advance. Their phenotypes should have heuristic value for recognizing the normal functions of these brain genes in physiology and behavior. The methods can also be applied to produce animal models for human genetic diseases. It is an open question as to what percentage of genes will lead to observable phenotypes. This is in part because of the probable redundancy built into most biological systems, and in part because of our ignorance of mouse ethnography: In other words, would we recognize abnormal mouse behavior if we saw it?
V. Getting All of the Genes A. Thousands of Genes How can we get our hands on all central nervous system genes or, for that matter, all genes? During the 1970s, studies of RNA complexity suggested
MAMMALIAN NERVOUS SYSTEM DEFECTS
255
that between 10,000 and 150,OOO genes are expressed in the mammalian brain ( 1 , 47-49). Studies with clone frequency analysis (33)suggest that at least 5000, but probably closer to, but not in excess of, 30,000 genes are expressed as mRNAs in the brain, and that expression of a large portion of these is specific to or highly enriched in the nervous system. Recent studies have the potential to address the issue of the number of genes directly, while at the same time providing clones of RNAs in a less biased way than is possible by subtractive hybridization, which must always be model-based, since the investigator must choose a particular tissue to serve as a subtraction driver.
B. Nearly Complete Chromosome-Specific Genomic Libraries Clones of human and mouse chromosome-specific genomic fragments have been isolated from cultured human x hamster or mouse X hamster somatic cell hybrid lines using the species-specific hybridization of repetitive genomic elements to detect clones (50, 51). Hybridization conditions have been developed that, in the presence of excess unlabeled hamster DNA, allow radioactively labeled mouse repetitive DNA to be used to detect 98% of mouse genomic clones and no hamster clones; thus, chromosome-specific mouse clones can be isolated nearly quantitatively with essentially no hamster background (52).These conditions have been used to screen a genomic library prepared in a phase A vector with DNA from a mouse-hamster somatic cell line containing only mouse chromosome 16. Two genome equivalents of chromosome 16 (14,000 clones) were placed in microtiter trays. These clones, which contained sequences from nearly all chromosome-16 genes, could be arrayed onto only 50 filters.
C. Detection of Transcribed Genes The 50 filters could be screened with radioactive cDNA probes, from which repetitive sequences had been removed by hybridization to immobilized mouse repetitive DNA, to identify expressed genes. In pilot studies (52), 145 chromosome-16 genes expressed in the brain or the liver were detected. The level of detection in these experiments was roughly 0.01%. Since chromosome 16 represents 4% of the total mouse genome, the 145 genes extrapolate to approximately 4000 genes in the genome expressed at 0.01% or greater of the mRNA mass. This is already approaching the lower estimates for the number of genes. Probes enriched by subtractive hybridization should allow at least another 10-fold increase in sensitivity and provide the genes known to exist that are expressed as only 0.001% of the mRNA mass. It should be possible, by hybridization to such an array with cDNAs made from various tissues, to learn in which tissues each detectable gene,
256
J . GREGOR SUTCLIFFE AND GABRIEL H. TRAVIS
already in hand as a genomic clone, is expressed. Once identified, the genes can be put on the genetic map by RFLP methods, and local physical maps can be made. Such studies will provide nearly complete sets of candidate genes to be considered whenever a new genetic defect is encountered. They will also confront us directly with the daunting task of finding out the physiological role of each of these thousands of genes.
VI. Reprise One of the major lessons from classical prokaryotic molecular biology studies is that mutants can be used to help elucidate the molecular processes underlying biological phenomena. The same is true for eukaryotic systems: The problems, such as how the mammalian nervous system works, are more complex, but available technology is much more potent. Until recently, the study of mutations affecting mammalian organisms has been restricted to those that nature offered, either as chance or mutagen-induced abnormalities. To be recognized as abnormal, a mutant animal had to manifest an overt phenotype, the exception being humans, to whose neurobehavioral abnormalities we are more sensitive. Prior to the advent of approaches based on recombinant DNA, elucidating the molecular basis for a genetic defect was essentially limited to the candidate protein strategy: defining the components of an affected tissue and searching for variant structures between mutant and wild-type. Now it has become possible to isolate a mutant locus based on its chromosomal position. In approaches discussed in this article, it is also possible to use subtractive hybridization to define RNAs that are exclusive to a particular tissue and to treat these as candidates for the product of a mutant gene. Alternatively, it is possible to identify the expressed genes on a particular chromosome and to determine in which tissues each is expressed. These genes then form a menu of candidates for any mutation that maps to that chromosome. Finally, there are neither known functions nor mutants for most genes. There are, however, methods for identifying and describing their protein products. For these genes, ablation experiments are required to illuminate their contributions to normal physiology.
ACKNOWLEDGMENTS We thank our colleagues Elena Battenberg, Floyd Bloom, Dean Bok, Miles Brennan, Frank Burton, Patria Danielson, Dan Gerendasy, Martin Godbout, Karl Hasel, Ute Hochgeschwender, Dave Kingsley, Rob Milner, Hans-Peter Ottiger, and Ken Wong, whose studies we have reviewed here, and the National Institutes of Health for research support.
MAMMALIAN NERVOUS SYSTEM DEFECTS
257
REFERENCES 1 . J. G . Sutcliffe, Annu. Rew. Neurosci. 11, 157 (1988). 2 . R. L. Sidman, in “Genetics of Neurological and Psychiatric Disorders” (S. S. Kety, L. P. Roland, R. L. Sidman and S. W. Matthysse, eds.), p. 19. Raven, New York, 1983. 3. J. L. Noebels, Trends NeuroSci. 8, 827 (1985). 4 . J. M. Rommens, M. C. Iannuzzi, B . 4 . Kerem, M. L. Drumm, G. Melmer, M. Dean, R. Rozmahel, J. L. Cole, D. Kennedy, N. Hidaka, M. Zsiga, M. Buchwald, J. R. Riordan, L.-C. Tsui and F. S. Collins, Science 245, 1059 (1989). 5. J. R. Riordan, J. M. Rommens, B.-S. Kerem, N. Alon, R. Rozmahel, Z. Grzelczak, J. Zielenski, S. Lok, N. Plavsic, J.-L. Chou, M. L. Drumm, M. C. Iannuzzi, F. S. Collins and L.-C. Tsui, Science 245, 1066 (1989). 6. B . 4 . Kerem, J. M. Rommens, J. A. Buchanan, D. Markiewicz, T. K. Cox, A. Chakravarti, M. Buchwald and L.-C. Tsui, Science 245, 1073 (1989). 7 . M. R. Wallace, D. A. Marchuk, L. B. Andersen, R. Letcher, H. M. Odeh, A. M. Saulino, J. W. Fountain, A. Brereton, J. Nicholson, A. L. Mitchell, B. H. Brownstein and F. S. Collins, Science 249, 181 (1990). 8 . R. M. Cawthon, R. Weiss, G. Xu, D. Viskochil, M. Culver, J. Stevens, M. Robertson, D. Dunn, R. Gesteland, P. O’Connell and R. White, Cell 62, 193 (1990). 9 . J. G.Sutcliffe, Trends Genet. 3, 73 (1987). 10. G. Lemke, Neuron 1, 535 (1988). 1 1 . A. Roach, N. Takahashi, D. Pravtcheva, F. Ruddle and L. Hood, Cell 42, 149 (1985). 12. M. Kimura, H. Inoko, M. Katsuki, A. Ando, T. Sato, T. Hirose, H. Takashima, S. Inayama, H. Okano, K. Takamatsu, K. Mikoshiba, Y. Tsukada and I. Watanabe, J . Neurochern. 44, 692 (1985). 13. S. M. Molineaux, H. Engh, F. De Ferra, L. Hudson and R. A. Lazzarini, PNAS 83, 7542 (1986). 14. K.-A. Nave, F. E. Bloom and R. J. Milner, J . Neurochem. 49, 1873 (1987). 15. R. Van Nie, D. Ivanyi and P. Demant, Tissue Antigens 12, 106 (1978). 16. S. Sanyal and H. Jansen, Neurosci. Lett. 21, 23 (1981). 17. H.G . Jansen and S. Sanyd, /. C m p . Neurol. 224, 71 (1984). 18. A. I. Cohen, Inwest. Ophthalmol. Visual Sci. 24, 832 (1983). 19. S. Sanyal, G. Chader and G. Aguirre, in “Retinal Degeneration: Experimental and Clinical Studies,” p. 239. Liss, New York, 1985. 20. S. Sanyal and G. H. Zeimaker, Exp. Eye Res. 39, 231 (1984). 21. S. Sanyal, C. Dees and G. H. Zeimaker, 1. Embryol. Exp. Morphol. 98, 111 (1986). 22. R. K. Hawkins, H. G . Jansen and S. Sanyal, Exp. Eye Res. 41, 701 (1985). 23. G. H. Travis, M. B. Brennan. P. E. Danielson, C. A. Kozak and J. G. Sutcliffe, Nature 338, 70 (1989). 24. C. H. Travis and J. G. Sutcliffe, PNAS 85, 1696 (1988). 25. C. Begy and C. D. Bridges, NARes 18, 3058 (1990). 26. G. H. Travis, L. Christerson, P. E. Danielson, I. Klisak, R. S. Sparkes, T. P. Dryja and J. G. Sutcliffe, Cenornics in press (1991). 27. G. Connell and R . S. Molday, Bcheni 29, 4691 (1990). 28. G. H. Travis, J. G. Sutcliffe and D. Bok, Neuron 6, 61 (1991). 29. I. Nir and D. S. Papermaster, Inwest. Ophthalmol. Visual Sci. 27, 836 (1986). 30. J. Usukura and D. Bok, Exp. Eye Res. 45, 501 (1987). 31. H. G. Jansen, S. Sanyal, W. J. DeGrip and J. J. Schalken, Exp. Eye Res. 44, 347 (1987). 32. S. J. Fliesler, M. E. Rayborn and J. G. Hollyfield, /. Cell B i d . 100, 574 (1985). 33. R. J. Milner and J. G. Sutcliffe, NARes 11, 5497 (1983).
258
J. GREGOR SUTCLIFFE AND GABRIEL H. TRAVIS
34. J. B. Watson, E. F. Battenberg, K. K. Wong, F. E. Bloom and J. G. Sutcliffe, J . Neurosci. Res. 26, 397 (1990). 35. J. Bernal, M . Godbout, K. W. Hasel, G. H. Travis and J. G. SutclifTe, J . Neurosci. Res. 27, 153 (1990).
36. D. R. Clayton, M. E. Heucas, E. Y. Sinclair-Thompson, K. L. Nastiuk and F. Nottebohm, Neuron 1, 249 (1988). 37. H.-P. Ottiger, E. F. Battenberg, A.-P. Tsou, F. E. Bloom and J. G. Sutcliffe, J. Neurosci. 10, 3135 (1990).
38. H.-H. Gerdes, P. Rosa, E. Phillips, P. A. Baeuerle, R. Frank, P. Argos and W. B. Huttner, JBC 264, 12009 (1989). 39. C. Blatt, L. Weiner, J. G. Sutcliffe, M. N. Nesbitt and M. I. Simon, Cytogenet. Cell Genet. 40, 583 (1985). 40. L. B. Russell, Mutat. Res. 11, 107 (1971). 41. D. M. Kingsley, E. M. Rinchik, L. B. Russell, H.-P. Ottiger, J. G. Sutcliffe, N. G. Copeland and N. A. Jenkins, E M B O J . 9, 395 (1990). 42. M. R. Capecchi, Trends Genet. 5, 70 (1989). 43. J. G. Imnt and H. Weintraub, Cell 36, 1007 (1984). 44. M. Katsuki, M. Sato, M. Kimura, M .Yokoyama, K. Kobayashi and T.Nomura, Science 241, 593 (1988).
45. J. Haseloff and W. L. Gerlach, Nature 334, 585 (1988). 46. N. Sarver, E. M. Cantin, P. S. Chang, J. A. Zaia, P. A. Ladne, D. A. Stephens and J. J. Rossi, Science 247, 1222 (i99o). 47. N. D. Hastie and J. 0. Bishop, Cell 9, 961 (1976). 48. J. A. Bantle and W. E. Hahn, Cell 8, 139 (1976). 49. D. M . Chikaraishi, Bchem 18, 3249 (1979). 50. J. F. Gusella, C. Keys, A. Varsanyi-Breiner, F.-T. Kao, C. Jones, T. T. Puck and D. Housman, P N A S 77,2829 (1980). 51. M. Kasahara, F. Figueroa and J. Klein, P N A S 84, 3325 (1987). 52. U. Hochgeschwender, J. G. SutclifTe and M . B. Brennan, PNAS 86, 8482 (1989).
Lens Proteins and Their Genes
I I
HANSBLOEMENDAL AND WILFRIED
w. DE JONC
Department of Biochemistry University of Nijmegen 6500 H B Nijmegen, The Netherlands
I . The Lens and Its Proteins . . . . . . . . . . . . . ............ A. a-Crystallin Is Re1 B. Enzymes as Structural Lens Proteins . . . . . . .......... C. Evolutionary Inferences .............................. D . The Stress Connection . . . . . . ............ 11. The Lens and Its DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Lenticular cDNAs . . . . . . . . . . . . . . . . . . B. Lenticular Genes ................................... 111. Concluding Remarks . . . . ...................... References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
259 26 1 263 265 267 269 270 273 277 277
Since Darwin’s days, the vertebrate eye has been a paradigm in evolutionary debates. It is indeed one of the most stunning and appealing examples of evolutionary ingenuity. The vertebrate eye is developmentally and structurally completely different from any invertebrate eye. It must have originated in the fishlike ancestral vertebrates, approximately 450 million years ago. No satisfactory ideas exist about the evolutionary processes that modeled the morphology of this delicate organ and its components. In the center of the eye is the transparent lens, which focuses the incoming light on the retina. At the molecular level, we are now beginning to understand which genes and proteins were used as building blocks to create the lens, a novel organ in evolution. Also, the mutational mechanisms used in the process can be reconstructed from comparisons of present-day genes and proteins.
1. The lens and Its Proteins The lens of the eye is an organ with many unusual features (for reviews, see 1-5). It originates in the embryo by invagination of the head ectoderm. 259 Progress in Nucleic Acid Research and Molecular Biology. Vol. 41
Copyright Q 1991 by Academic Press, Inc.
All rights of reproduction in any form resewed.
260
HANS BLOEMENDAL AND WILFRIED W. DE JONC
The resulting lens vesicle becomes a solid organ by elongation of the posterior cells. Throughout life, the anterior epithelial cells continue to migrate laterally, elongate, and differentiate into fiber cells, which form the body of the lens; these cells extend from the anterior to the posterior poles of the lens. Newly formed fiber cells overlay the older ones in concentric layers (Fig. 1).The older cells are not broken down or replaced during aging, and, as a consequence, there is a gradient of increasingly older cells toward the lens nucleus. Since cell nuclei and other organelles are lost in the process of fiber cell formation, there is little turnover and renewal of lens proteins. This makes the lens a favorite object for studies of the aging of proteins. The loss of organelles avoids light scattering by such structures. The absence of blood vessels and nerves is likewise necessary for transparency of the lens. As a consequence, all nutrients and metabolites must come from the aqueous and vitreous fluids surrounding the lens. Transparency and proper light refraction require that protein concentrations be very high in the lens fiber cells (i.e., between 20%and 60%of net weight) and that the proteins be closely and regularly packed (6, 7). Protein concentration increases toward the lens nucleus, causing a refractive index gradient that increases the dioptric power and reduces spherical aberration. More than 90% of the protein in the lens consists of a limited group of proteins, the crystallins (1-5). The crystallins are considered to be watersoluble, structural proteins that occur in high concentrations in the cytoplasm of the lens fiber cells. Classically, four major groups of crystallins have been distinguished, on the basis of size, charge, and immunological properties. The a- and P-crystallins occur in all vertebrate classes. The ycrystallins are also widespread, although avian lenses have little or none, and Lens bow
\
Epithelium
I
FIG. 1. Cross-section through a mammalian lens.
Lens cortex
I
LENS PROTEINS AND THEIR GENES
261
are instead characterized by the presence of b-crystallin. The proportions in which these crystallins occur very greatly between species, but also in different layers within the same lens. Additional types of crystallins have recently been described, often occurring only in restricted taxonomic groups, especially among birds and lower vertebrates. Until recently, the crystallins were thought to be lens-specific, but, as will be seen, this notion has been abandoned in most cases. The growth pattern and structure of the lens dictate that the crystallins must be long-lived and stable. The integrity of the proteins, moreover, should not be affected too easily by the deleterious influences of, for example, radiant light, radicals, and heat. Some crystallins indeed display an extraordinary stability. a-Crystallin does not denature at temperatures up to 100°C (S), and PB,-crystallin, while undergoing reversible conformational changes, does not precipitate from a boiling solution (9). Otherwise, the different crystallin families vary considerably in their structural properties and in their subunit compositions (1-3).Various features contribute to the variety in crystallin composition. Not only gene duplications, but also alternative splicing and the use of different initiation codons, give rise to the synthesis of related proteins with different primary structures (5,10-12). The resulting primary gene products, moreover, can be modified by a number of post-translational processes, both enzymatic and nonenzymatic. Recently, the evolutionary origins of most of the eye lens crystallins have been elucidated. This can be helpful in understanding the significance of the variety in crystallin structure and functional properties.
A. a-Crystallin Is Related to the Small Heat-shock Proteins a-Crystallin forms large aggregates, of around 800 kDa, composed of two types of related subunits: aA and a B . It can constitute up to 50% of the total protein in mammalian lenses. a-Crystallin was the first lens protein for which a relationship with other proteins could be established. Surprisingly, the a-crystallin chains were found to be homologous with the small heatshock proteins (HSPs) of Drosophilu, soybean, nematodes, Xenopus, and humans (13,14).The homology is most conspicuous among the carboxyterminal halves of the proteins, corresponding to the two 3’ exons of the acrystallin genes. This led initially to the suggestion that exon-shuffling, combining a carboxy-terminal HSP domain with an unrelated amino-terminal domain, had been involved in the evolutionary origin of the ancestral acrystallin gene (15).The recombination of various structural and functional domains, by means of exon-shuffling, into novel proteins is indeed a wellknown evolutionary mechanism in vertebrates. However, more extensive sequence comparisons also revealed significant similarities in the aminoterminal halves of a-crystallin and small HSPs. This led to the conclusion
262
HANS BLOEMENDAL AND WILFRIED W. DE JONC
that a-crystallin has originated from the small HSP family by the classical mechanism of gene duplication and subsequent divergence, allowing the adaption to novel functions (14). The initial divergence between small HSPs and a-crystallin probably occurred well before the eye lens originated in evolution. This is suggested by the recent finding that the aB-crystallin gene is expressed in tissues outside the lens, most notably in heart tissue (16-18). In fact, aB-crystallin is expressed at higher levels in scrapie-infected hamster brain, suggesting a relationship with scrapie-induced cellular stress conditions (19). In astrocytes in the brains of patients with Alexander’s disease, there are granular inclusions composed of aB-crystallin (20).aB-Crystallin thus may be a constitutive HSP-cognate in different tissues, being selected and promoted as an abundant lens structural protein because of some suitable properties. aACrystallin has not been encountered outside the lens. The fact that a B is the predominant a-crystallin subunit in lens epithelial cells, while a A dominates in the fiber cells (21),also seems to indicate that a A is the more specialized lens protein, having originated later in evolution. Interestingly, aB, but not aA, has conserved in the promoter region of its gene a perfect heat-shock consensus element (22),although the gene seems not to be induced by heat shock of lenses in organ culture (23). A further relative of the small HSP and a-crystallin family is a 40-kDa major egg antigen from S c h i s t o s m munsoni (24), which contains a tandemly repeated region of homology with the common domain of a-crystallin and small HSPs (14). Very recently, an 18-kDa immunodominant antigen from Mycobucterium leprue has been added to this protein family, extending its roots into the prokaryote realm (25).There is indeed an intriguing relationship between stress proteins and the immune response to infectious diseases (26). Both a-crystallin and small HSPs occur as large polydisperse aggregates, from 200 to 800 kDa in size, forming spherical structures of 15-20 nm in diameter (27, 28). In both groups of proteins, CAMP-dependent phosphorylation does occur (27, 29). The size and intracellular location of the small HSPs depend on the physiological state of the cell (30).It is generally assumed that the small HSPs form some kind of structure, protecting cells from stress-induced damage, but the mechanism of such protection is still unknown (25, 31). It has also been speculated that the small HSPs are enzymes whose activity is regulated by polymerization (32).This activation might be important for a cell’s recovery from stress. For other HSPs, it is known that they are involved in protein folding and assembly (25).The small HSPs may also play a role in the normal processes of development (30).Many stress proteins are indeed developmentally induced (25). It has been proposed that small HSPs in chloroplasts protect against light damage (33),as has also been suggested for the unrelated 70-kDa HSPs in
263
LENS PROTEINS AND THEIR GENES
the retina (34). Protection against the deleterious effects of intense light would be a useful property for a-crystallin in the lens. Perhaps a-crystallin can be considered a constitutive stress protein, conferring protection against the multitude of endogenous and exogenous insults to which the lens is exposed. This is all the more plausible where the lens lacks the possibilities for rapid homeostatic responses through endocrine or neural pathways. Moreover, in the deeper, pycnotic layers of the lens, where metabolic activities are still important, there are no means to respond to stress by the induction of HSP genes. In addition to the putative protective properties of a-crystallin, it may also be the favorable stability properties of the small HSPs that make them suitable for functioning in the lens (5).
B. Enzymes as Structural Lens Proteins In the past 2 years, the origins and relationships of most other crystallins have been revealed in rapid succession (Table I). Surprisingly, three
TABLE I OCCURRENCE AND IDENTIFICATION OF EYE LENSCRYSTALLINS ~
Crystallin
~~
Occurrence
a
All vertebrates
P
All vertebrates All vertebrates (low in birds and reptiles) Birds and reptiles
Y
6
Many birds and crocodiles Rabbits and hares
L
A
Lampreys. some fishes, birds, and reptiles Frogs (Rana)
7
P
5 Sn, ~~~
Guinea pig and Octodon Squid ~
oND, Not determined.
Identification
Non-lens expression
Reference
Related to small heatshock proteins Similarities with Protein S of Myxococcus xanthus
aA, No aB, Yes No No
13, 14, 16, 37 17, 25, 42 51, 54, 56
Identical or very similar to argininosuccinate lyase Active lactate dehydrogenase B4 Related to hydroxyacylCoA dehydrogenase Active a-enolase
Yes
10, 40
Yes
35, 36
Yes
48
Yes
39
N Da
44-47
Yes
49, 50, 52
ND
5 . 63
Related to aldose and aldehyde reductase and prostaglandin F synthase Related to alcohol deh ydrogenase Related to glutathione S-transferase
264
HANS BLOEMENDAL AND WILFRIED W. DE JONG
crystallins are active enzymes. r-crystallin occurs in the lenses of many birds and crocodiles, and reaches levels of up to 23% of total lens protein in some species (35). Its amino-acid sequence is very similar to that of lactate dehydrogenase (LDH) B,, the muscle-type isozyme (EC 1.1.1.27).Moreover, Ecrystallin from duck lenses have almost normal LDH activity. In chicken lenses, ecrystallin is not present at appreciable levels. In both duck and chicken, only a single Ldh-B-type gene is present (36). While this gene is expressed at normal “housekeeping” levels in the heart, and also in the chicken lens, it is highly overexpressed in the duck lens. A single gene product is thus simultaneously used as a glycolytic enzyme, at low levels in many tissues, and as an abundant structural component in the lens. The LDH B, of birds and reptiles is more thermostable than that of other vertebrates (377, as is true for duck lens E-crystallin (38),which may give it suitable structural stability for functioning in the lens. Comparable observations have been made for 7-crystallin, which occurs at high levels as a lens protein among birds, reptiles, fishes, and lampreys. 7Crystallin appears to be identical to a-enolase (EC 4.2.1.ll), although its enzymatic activity is greatly reduced due to post-translational modifications and aging in the lens. Again, in ducks, a single 7-crystallin/a-enolase gene is present, which thus encodes a protein with dual functions (39). A somewhat different situation exists for 6-crystallin (10, 40-42). This protein is abundant in the lenses of almost all birds and reptiles, reaching levels of 70% in chicken and duck lenses. 6-Crystallin is highly homologous with argininosuccinate lyase (EC 4.3.2.1) (ASL), an enzyme of the urea cycle. Duck lens 6-crystallin especially has high ASL activity. In chickens and ducks, 6-crystallin is encoded by two closely linked and very similar genes, 6 , and 6,. It appears that 6, is the direct homolog of the single ASL gene in mammals (41)and that the 6, gene product, highly expressed in duck lens, has retained the ASL activity. Chicken &-crystallinis approximately 1% encoded by the 6, gene and has little ASL activity. Gene duplication thus may have allowed one copy of the 6 genes, 6,, to code for a protein that has lost ASL activity, while maintaining its suitability as a lens structural protein. Three other crystallins are more distantly related to various enzyme families, and have, themselves, no known enzymatic activities. All of these appear to be very restricted in their occurrence. p-Crystallin has been found only in the frog genus R a m (44). It shows highest sequence identity (58%) with bovine lung prostaglandin F synthase (EC 1.4.99.1)(45),and 50%and 43% sequence identity, respectively, with rat lens aldose reductase (EC 1.1.1.21)and human liver aldehyde reductase (EC 1.1.1.2)(46).Moreover, it is 37% identical to a yeast protein of unknown function (47). X-Crystallin has only been observed in the lenses of rabbits and hares, where it reaches levels of 7 4 % of total protein (48). It has 30% homology with hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35)and 26%homology with
LENS PROTEINS A N D THEIR GENES
265
enoyl-CoA hydratase (EC 4.2.1.17)plus 3-hydroxyacyl-CoA dehydrogenase. These enzymes are involved in fatty-acid metabolism in mitochondria and peroxisomes, respectively. In the guinea pig lens, about 10%of the water-soluble protein is contributed by {-crystallin (47). The same lens protein is also present in the rodent Octodon degus (50),but has not been found in other rodent species @ I ) , nor in any other mammalian or submammalian group. 5-Crystallin is distantly related to the zinc-containing long-chain alcohol dehydrogenases (EC 1.1.1.I), having about 25% homology with the mouse enzyme (52 53). 5Crystallin lacks the residues that bind the catalytic zinc atom of the alcohol dehydrogenase, and consequently has no dehydrogenase activity. In contrast, the coenzyme-binding fold has been conserved particularly well.
C. Evolutionary inferences The recent accumulation of novel findings about the crystallins allows a reconstruction of the molecular events that occurred during the origin and evolution of the lens. The ubiquitous lens proteins, a-,p-, and y-crystallins, must have been recruited at the early beginning of lens evolution. aCrystallin, as an abundant lens protein, originated by increased expression of the already existing aB gene, while a A probably arose as a lens-specific protein after the a A gene diverged by duplication from the aB gene. a A and aB occur in different proportions in the heteropolymeric a-crystallin aggregates of different vertebrates. The y-crystallins are monomeric and the P-crystallins are oligomeric members of the same protein family (5, 54). In most species investigated, multiple P and y genes are present. The functional significance of the multiple p- and y-crystallin products in a single species is unknown, and they might in part be considered redundant; witness also the fact that several y genes have been “silenced” in humans (54).On the other hand, the alteration of PB2 in the mouse results in a hereditary cataract (55). The p- and ycrystallins have not been found outside the lens, nor are they related to any known eukaryotic protein. However, a structural similarity exists with the coat protein of Myxococcus xanthus (56), and it probably is only a matter of time before the eukaryotic ancestor of the p- and y-crystallins is discovered. The fact that a-,p-, and y-crystallins have been maintained in all vertebrate classes is witness to their universal and persistent success in functioning in lenses of widely varying consistency and refractive power. Also, the overexpression of a-enolase as 7-crystallin in the lens must have been an early invention in the common vertebrate ancestor, maintained in lampreys and several higher vertebrate groups. .r-Crystallin, however, has been lost again as a lens protein in mammals and amphibians, and in many birds, reptiles, and fishes. At later stages of evolution, additional genes have come to high ex-
266
HANS BLOEMENDAL AND WILFRIED W. DE JONC
pression in the lenses of certain vertebrate lineages. The ASL gene gave rise to the appearance of &crystallin in the common ancestor of reptiles and birds, becoming abundant in many groups, and largely replacing the other crystallins, notably y-crystallin. In a single species, as in the chimney swift, it may have disappeared again (57). The success of &crystallin as a lens protein may have allowed, after gene duplication, the 6, gene product to lose enzyme activity and thus be maintained, like a A , solely for functioning as a lens protein. Interestingly, nitrogen excretion by way of the urea cycle has been lost in birds and most reptiles. It thus appears that the loss of ASL in the urea cycle coincided with the origin of high lens expression of the ASL gene as 6-crystallin (41). The Ldh-B gene developed its high expression in the lens, as E-crystallin, in the last common ancestor of crocodiles and birds. Subsequently, the expression of E-crystallin was again lost haphazardously in many avian orders (35).p-Crystallin apparently originated in the ancestral lineage of the old and widespread frog genus Rana. Considering the distant homology, p-crystallin itself is probably not an aldehyde reductase (EC 1.1.1.1 or .2) or a prostaglandin F synthase (EC 1.14.99.1), but rather a diverged offshoot of this family. It is not yet known whether it is lens-specific or not. More recently, less than 30-40 million years ago, the high lens expression of h-crystallin in the family Leporidae (rabbits and hares), and of (-crystallin in the rodent families Cavidae and Octodontidae, must have originated. Both of these crystallins, or close relatives thereof, can be detected in small amounts in non-lens tissues, notably in liver tissue (48, 50). In the case of r-crystallin/LDH B, and 7-crystallin/a-enolase, a single gene product is used for different purposes, enzymatic or structural, depending on the level of expression. This example of “evolutionary opportunism” (42) has also been described as “gene sharing” (40). Gene sharing also applies for ASL and 3,-crystallin, and for aB-crystallin in and outside the lens. aA-,a2-, and probably also the p- and y-crystallin have apparently evolved into truly lens-specific proteins. In the case of p-, c-, and hcrystallins, for which enzymatic properties are not known, and which are only distantly related to known enzymes, it is unclear to what extent they represent examples of gene sharing. These genes, of course, present very interesting cases of the development of tissue-specific expression (10). In all crystallin genes, the high lens expression must have developed by the acquisition of specific regulatory nucleotide sequences. However, common promoter or enhancer sequences for the crystallins have not been recognized (10). In some genes, as in Ldh-B, these regulatory sequences, once developed in the crocodilian-avian ancestor, are subsequently lost in many lineages. In the case of h-crystallin, the high lens expression has apparently only been developed and maintained in one gene from a multigene family, and in a single lineage (48). In fact, the
LENS PROTEINS AND THEIR GENES
267
current findings of the enzyme-related crystallins explain the earlier reported examples of non-lens expression of several crystallins (58). It appears that chance events lead occasionally to a gradual or sudden increase of expression of a gene in lens cells. In certain instances, as for acrystallin and 8-crystallin, the results are highly successful, giving the lens improved properties, which are maintained by positive selection. In other cases, as for T-, i-, or h-crystallin, the high level of a protein can be moderately advantageous or perhaps even just neutral. It has indeed been established that superfluous genes can linger for tens of millions of years in the genome without being silenced and lost (59). Such crystallins can therefore be maintained as selectively neutral characters, or be lost again without any adverse effects. Finally, of course, in many instances, the increased level of a given protein may be detrimental for the cell, and the evolutionary trial will be aborted by negative selection.
D. The Stress Connection The question is, which proteins are advantageous in the lens cytoplasm, or at least compatible with the transparency and longevity of the lens? Is there a common denominator in the properties of the various proteins that have been recruited as crystallins? This is not apparent at first glance; witness the great variety of structural and, where known, functional properties (1-5). An intriguing pattern emerges, however, when one compares the background and relationships of the crystallins. As mentioned already, a-crystallin shares many structural properties with its relatives, the small HSPs, and aBcrystallin may even function as a stress protein outside the lens. It is not yet known how the small HSPs exert their protective role (25,31).However, other HSPs are involved in the prevention or disruption of inappropriate proteinprotein interactions, and may help in the refolding of denatured proteins (60), which certainly would be useful functions in the lens. As for the enzyme crystallins, it is remarkable that most of them are related or identical to glycolytic enzymes or to enzymes closely associated with carbohydrate metabolism. Interestingly, many glycolytic and related enzymes appear to be induced by stress (see 61). Enolase is increased in heat-shocked yeast cells; glyceraldehyde phosphate dehydrogenase (EC 1.2.1.12) is a heat-shock protein in both yeast and Xenopus embryos; pyruvate kinase (EC 2.7.1.40) possibly is hsp62 in Xenopus embryos; an isozyme of lactate dehydrogenase (EC 1.1.1.28) is a major anoxic stress protein in fibroblasts (62);and in maize seedlings, alcohol dehydrogenase (EC 1.1.1.1 or .2), aldolase (4.1.2.13), pyruvate decarboxylase (4.1.1. l), and glucose phosphate isomerase (5.3.1.9) occur as anaerobic stress proteins. Heat shock and anoxia result in inhibition of oxidative phosphorylation and consequently in the reduction of ATP production, requiring the cell to shift to alternative
268
HANS BLOEMENDAL AND WILFRIED W. DE JONC
sources of energy, such as anaerobic glycolysis. The demand for alternative energy sources may be met by increasing the levels of the enzymes involved. Because of the particular anatomy of the lens, most of the required metabolic energy is supplied by anaerobic glycolysis (3).Transparency of the lens depends on an intact energy metabolism and on proteins that are prevented from unfolding and forming undesired associations. The lens, therefore, would benefit from being in a permanent stlite of “stress tolerance,” that is, the ability to survive and cope with deleterious influences that would otherwise be lethal. Cells normally acquire thermotolerance by the induction of heat-shock proteins, but the lens, with the diminished protein-synthesizing capacity in the fiber cells, largely lacks this facility, and should at all times be prepared to resist stress by safeguarding its energy supply and warranting its structural integrity. Of course, the levels of the crystallins as stress proteins and glycolytic enzymes are far beyond any physiological need, even in times of stress. Perhaps these exuberantly high levels can be reached because the regulatory mechanisms of these genes are readily switched on under the conditions of development and homeostasis that prevail in the lens, and occasionally may “bolt” in the course of evolution, without being stopped because their products are harmless for the functioning of the lens. Perhaps, therefore, it is not so much the structural properties of the crystallins per se that explain their presence in such large quantities, but rather their suitable properties in combination with the propensity for easily induced high gene expression. There thus appears to be a general connection between the crystallins and stress- or development-dependent gene expression (5). The association of ASL and hydroxyacyl-CoA dehydrogenase with stress conditions is not apparent, and their overexpression in lenses as 6- and Acrystallin, respectively, may be just a chance event, not ;elated to basic requirements of the lens. Finally, the distant similarity of the p-and ycrystallins with the calcium-binding protein S of M. xunthus does not directly provide a clue to their suitability as lens proteins. However, their abundance and ubiquity are witness to their successful functioning in the lens. Both the p- and the y-crystallins are, in contrast to the other crystallins, encoded by multigene families. This allows the synthesis of many variations on the same structural theme, possibly adapting them to different functional requirements. Also, enzyme-related proteins are used as lens proteins in invertebrate lenses. The major proteins of the squid lens are related to glutathione Stransferase (EC 2.5.1.18) (5,63). In fact, glutathione S-transferase also nicely fits the “stress connection,” because it is important in the lens for detoxification and possibly protection against peroxidation damage. Screening of lens
269
LENS PROTEINS AND THEIR GENES
proteins in disparate vertebrate and invertebrate groups undoubtedly will reveal additional examples of overexpressed enzymes, and will further expose the evolutionary principles underlying this phenomenon. It is likely that also in other cell types, proteins may be expressed at high levels because this makes additional functions possible, or just because their presence happens to be selectively neutral.
II. The Lens and Its DNA After almost a century of lens protein studies, recombinant DNA technology has given a new impulse to lens research. Analyses of cDNA sequences have allowed the prediction of the amino-acid sequences of a variety
TABLE I1 VARIOUS CRYSTALLIN cDNAs AND GENES REPORTED IN THE LITERATURE Crystallin a
P
Y
s E
A P
5
Species Rat Mouse Hamster Calf Frog Chicken Human Rat Mouse Calf Human Chicken Frog Rat Mouse Hamster Calf Human Frog Fish Chicken Duck Birddreptiles Rabit Frog Guinea pig
cDNA (reference)
65, 66 67, 69
Gene (reference)
111,112 15, 113-115
85 133 65, 72 68 81, 82, 82a 77, 78 93 65, 66, 70 69
138 107, 108 138, 139 140 101-104, 137
83, 84, 87
90,91 96, 97 64, 74, 75 36, 99 48 45, 98 52
134, 135 104, 140 119-122, 128 128
270
HANS BLOEMENDAL AND WILFRIED W. DE JONC
of crystallins from different species, and thus the elucidation of the primary structure, in much less time than was necessary for the classical methods of direct amino-acid sequencing. Moreover, cDNA probes have enabled the detection of crystallin genes and the establishment of their structure and organization.
A. Lenticular cDNAs In the early 1980s, the first reports of crystallin cDNAs constructed in bacterial plasmids appeared in the literature (Table 11) (64-70).
1. MURINEcDNAs By positive hybridization selection and translation, recombinant plasmids containing cDNA sequences encoding several rat crystallin subunits were identified. cDNA clones for aA, aAins,PB,,, and several y-crystallins were obtained (65).At the same time, it was found that the rat olA mRNA comprised more than 50% of noncoding nucleotide sequences (66).A similar phenomenon is the observation of more noncoding than coding sequences in mouse aA-crystallin messenger (67). p-Crystallin mRNA sequences in mice have been determined by Inana et a2. (68).cDNA cloning has been used to demonstrate multiple mouse y-crystallin RNAs (69). High intragenic sequence homology in two different rat lens y-crystallin cDNAs reflected a duplication event in the primordial gene (70). Both the internal duplication and the fact that P- and y-crystallins form a protein family were demonstrated at the protein level for the first time by Driessen et al. (71). Two rat P-crystallin sequences have been reported, among which is the complete sequence of rat PB, derived from the corresponding cDNA. Homologies of 37% with rat y-crystallin and about 50%with bovine P-crystallin were noted (72).
2. AVIAN cDNAs 6-Crystallin is a major lenticular protein in birds and reptiles. It is undetectable in the lenses of all other vertebrates. A detailed account of this protein and its nucleic acid has been given by Piatigorsky (73). Full-length cDNA coding for 6-crystallin was obtained by transcription of chicken mRNA (74). The nucleotide sequence was consistent with various tryptic peptides, the total amino-acid composition of the protein and the molecular weight as estimated by gel electrophoresis with sodium dodecyl sulfate. RNA sequences complementary to cloned chicken g-crystallin cDNA reveal size heterogeneity (75).Two different 6-crystallin polypeptides could be derived from a 6,-crystallin cDNA. As an interpretation of this phenomenon, it has been suggested that the heterogeneity arises at the translational level (76).The levels of 6- and P-crystallin mRNAs could be determined by North-
LENS PROTEINS AND THEIR GENES
271
ern blotting. Temporal and spatial changes during development were noticed. The experiments provided a quantitative basis for exploring differential expression of the 6-and P-crystallin gene families in chick lens. Four different cloned P-crystallin cDNAs were used to differentiate members of the chicken P-crystallin family. It appeared that the P-crystallin messengers accumulate near the end of embryogenesis, whereas the 6 mRNA accumulates during early embryonic development (77). Preferential conservation of the globular domains of the PA,/A,-crystallin polypeptides of chicken lens was observed in cDNA sequencing (78).There is evidence that y-crystallin is absent in the chicken (79). This was sustained at the nucleotide level with the aid of a cloned y-crystallin probe (80).
3. CATTLEcDNAs Identification and isolation of clones containing the following crystallinencoding sequences have been described (81):PB,, PB,, PBP, and PA,/A,. An alternating Pro-Ala sequence in the NH2-terminal extension of the PB, subunit was discovered. A cow P-crystallin cDNA that revealed 96% homology with the murine P23, but only 43% homology with cow PBp, has been isolated and sequenced (82). Recently, the amino-acid sequence of the remaining primary P-crystallin gene products has been determined, using two cDNA clones for PA,- and PA4, respectively (82a). A complete calf ycrystallin cDNA sequence has also been reported (83). The coding region comprised 522 bp in addition to 30 bp and the 5’ end, respectively. Interestingly, a 32-bp sequence of the latter showed 70% complementarity with the first monomer unit of the consensus Alu-I DNA. Two different bovine y-crystallin cDNA clones have also been characterized (84), the untranslated 5’ regions of which were dissimilar. The leader sequence of one clone was strikingly homologous to part of a rabbit immunoglobin a-heavychain mRNA. The nucleotide sequence of bovine aA-crystallin cDNA, comprising the total coding region in addition to 5’- and 3’-noncoding sequences, has been determined (85). However, the complete sequence of the 173 amino-acid residues of this molecule was already known (86). Sequence analysis of a Ps-crystallin cDNA clone (87), now designated ys (88), revealed that the amino-terminal amino-acid residue is serine, not tryptophan (89). 4. AMPHIBIAN cDNAs
Tomarev et al. were the first to detect cDNA clones coding for frog ycrystallin (90, 91). The mRNA corresponding to one of the y-crystallins showed the presence of an internal duplication (92), as originally detected in calf y-crystallin (71). The same group also reported the isolation and structure of a cDNA clone encoding a frog P-crystallin (93). From comparison of the surface amino acids of frog PA,-crystallin predicted from cDNA clones
272
HANS BLOEMENDAL AND WILFRIED
W.
DE JONG
with mammalian and bird PA,, the conclusion could be drawn that these exposed residues are more highly conserved than the buried amino acids (94). In contrast to the long 3’-noncoding region in hamster a A mRNA (15), cDNA sequence determination indicated that the corresponding messenger in the frog lacks this untranslatable stretch (95).
5. FISHcDNAs Carp p, [ys in the new nomenclature agreement (88)l closely resembles the corresponding bovine crystallin with the exception of a small four-aminoacid amino-terminal extension, which is lacking in this fish (96). These results were obtained by cDNA sequence determination. The nucleotide sequences of y-crystallin cDNAs cloned from the carp have also been determined. The amino-acid sequences derived consist of two polypeptides with 177 and 172 amino-acid residues for y-ml and y-m2, respectively. They exhibit unusually high methionine contents: 12.4% for ym l and 14% for y-m2. Comparison of both fish y-crystallins with bovine y-II crystallin revealed similarity in structure (30% of the surface hydrophobic groups are composed of methionine) (97). 6. NOVELCRYSTALLIN cDNAs
A novel type of lens protein in the frog, originally named E-crystallin, with a molecular mass of 35 kDa, showed no homology with other known vertebrate crystallins (98). Later, this protein was renamed p-crystallin (46) to avoid confusion with a newly detected r-crystallin in birds and reptiles (99). The sequence of the cloned wrystallin cDNA of this lens protein is identical to that of LDH B, (36), whereas the frog lens crystallin shows homology with aldose reductase (46). Watanabe et al. (45) demonstrated the structural similarity of frog lens p-crystallin to bovine lung prostaglandin F synthase. Actually, they found 77% identity and conservative substitutions without any deletions and/or additions. h-Crystallin detected in rabbits is another novel crystallin species (48). This protein, which also occurs in hares, is closely related to hydroxyacylCoA dehydrogenase (EC 1.1.1.35). Still another enzyme crystallin, found in and purified from the turtle, is wxystallin. This protein may be considered a major component, as in a number of species it amounts to about 10% of the total water-soluble lens protein (39). Peptide analysis and cDNA sequencing indicate a striking similarity with a-enolase. [-Crystallin is another enzyme protein that amounts to about 10% of the a-,P-, and y-crystallins. It has been detected in the guinea pig (49).Analysis of a cDNA clone revealed an identity with alcohol dehydrogenase (52).The finding of all of these “enzyme crystallins” allows the conclusion that the lens is relatively rich in multifunctional proteins that, because of their high concentration, represent part of the structural elements, like the classical
LENS PROTEINS AND THEIR GENES
273
crystallins, but may at the same time serve as enzyme sources when needed in a particular metabolic process. In this connection, it should be mentioned that chicken and duck 8-crystallins have ASL activity (40),whereas squid SI,,-crystallin is similar to glutathione S-transferase (63, 100).
B. Lenticular Genes 1. MURINEGENES The nature of the rat y-crystallin genes was established with the aid of two different cDNA clones. The results showed that there exists a strict colinearity between gene organization and protein folding domains, and the intragenic duplication of these genes (101). Blot hybridization experiments with mouse y-crystallin cDNAs revealed the presence of at least four related genes. From the detailed structure of the gene encoding y4-crystallin, a common structure with rat y-crystallin could be deduced (102). The occurrence of six different y-crystallin genes has been demonstrated (103).All have the same mosaic structure. The first exon contains a short 5’noncoding region and the first 9 bp of the coding sequence. The second exon encodes protein motifs I and 11, while protein motifs I11 and IV are encoded by the third exon. From the sequence of the latter in the different genes, an evolutionary tree of the gene family was constructed. This tree suggested that three of the present y-crystallin genes are derived from genes that originated from a tandem duplication of a two-gene cluster. Two duplications of the last gene of the four-gene cluster eventually yielded the other ycrystallin genes. The transcription initiation sites of the y-crystallin genes were mapped by a combination of primer extension data and S1-nuclease mapping (104).It appeared that four of the six genes have multiple transcription sites, A cloned mouse y-crystallin promoter showed activity in chick lens tissue, but was inactive in various nonlenticular cells (105). The offspring of transgenic mice carrying the y,-crystallin promoter fused to the coding part of the bacterial lac2 gene expressed high levels of the enzyme in the central nuclear lens fiber cells (106).It could be deduced that the y, sequences for tissue-specific expression must be localized between positions -759 and +45. A clear relationship between exon distribution and motifs at the protein level was demonstrated for a 23-kDa mouse P-crystallin (107).Each of the four exons encodes a separate motif. The results supported the homology concept between p- and y-crystallins deduced previously from protein sequence studies (71). The rat PB,-crystallin gene has also been characterized (108). The complete structure of the hamster a A gene has been determined by van den Heuvel et al. (15). This gene comprises four exons and exists as a single-copy gene in the hamster genome. The insert typical for murine aA-
274
HANS BLOEMENDAL AND WILFRIED W. DE JONC Stop codon
Start codon TATA box
\
/
m
I I I
m 7 I
nm
(CCTT),,
,' II
POIY(A)signal
m m
A,
04'"'mRNA
4.
u&
mRNAs
1 m
m
mRNA
FIG.2. Schematic representation of the exons on the hamster a-crystallin gene (boxes indicated by Roman numerals). The positions of transcription and translation signals are also shown, including a peculiar 16-fold CCTT repeat. Below the gene are depicted the two mRNAs that are transcribed from the same gene in rodents by alternative splicing.
crystallins discovered originally in protein sequence studies (109, 110) and arising by alternative RNA splicing (111)could be traced at the gene level. It is located between exons 1and 3 (15)(Fig. 2). The mouse a A gene is closely linked to the major histocompatibility complex (H-2) in the region between glyoxylase (Glo-1) and H-2K on chromosome 17 (112). In contrast, the hamster aA-crystallin gene is located on chromosome 21 (113),whereas the aBcrystallin gene is on chromosome 11 (114,115). Like the aA-crystallin gene, the aB-crystallin gene occurs as a singlecopy gene (113).The coding sequences are spread over three exons with a total length of 709 nucleotides. The exonlintron distribution is similar to that of the aA-crystallin gene, except for the 69 nucleotides encoding the 23 inserted amino acids in a aA-crystallin. The 3'-noncoding region of aBcrystallin mRNA is short (i.e., 140 nucleotides) compared to the extremely long noncoding 3' end of aA-crystallin mRNA (520 nucleotides). The expression of the aB-crystallin gene has been studied in great detail in mice (17 ) and in rats (116).Significant levels of aB-crystallin mRNA occur not only in the lens, but also in the heart, skeletal muscle, kidneys, and lungs; low levels could be detected in the brain and the spleen. A mouse aB-crystallin minigene has been constructed and introduced into the germ line of mice. It was expressed in parallel with the endogenous gene. One of the peculiar aspects of the lens is that normally no tumors arise in this tissue. However, transgenic mice carrying a hybrid gene comprising the ahcrystallin promoter fused to the nucleotide sequence of the coding region for the simian virus 40 (SV40) T antigen did develop tumors. This remarkable result shows that carcinogenesis in the lens can be provoked by the gene manipulation described (117, 118). In later studies, two different lines of transgenic mice that express the uA-crystallinlSV40 tumor antigen
LENS PROTEINS AND TIiEIR GENES
275
fusion gene in the lens were described ( 1 1 8 ~ )In . one line, the transgene was expressed very early in development, whereas in the other, expression occurred only after primary lens fiber differentiation. Interaction between two distinct regulatory sequences of the mouse aAcrystallin 5’-noncoding region activates the a A promoter for lens-specific transcription. These studies were conducted by transfection of aAcrystallin-CAT (chloramphenicol acetyltransferase) hybrids into explanted embryonic chicken lens epithelia (118b).
2. AVIAN GENES Southern blotting experiments led to the conclusion that at least two different 6-crystallin genes occur in the chicken genome (119-121). The complete sequence of one of the two allelic 6-crystallin genes was elucidated by Ohno et al. (122). The occurrence of 17 exons and 16 introns could be established. The length of the encoded protein could be deduced to be 465 amino acids, longer by 19 residues than predicted previously from cDNA nucleotide sequences. As mentioned before, there is no 6-crystallin in mammalian lens cells. However, if a chicken 6-crystallin gene is microinjected into the nucleus of a mouse cell, the gene is expressed in the lens epithelium as efficiently as in the homologous chicken cells. From these results, one may conclude that lens-specific expression of the 6-crystallin gene takes place in the xenogeneic environment, as non-lens tissues, in general, cannot express the gene (123). On the other hand, the 6-crystallin gene is transcribed in non-lens cells derived from chicken embryo. For example, b-crystallin mRNA can be detected in neural retina, brain, and limb buds (124).Furthermore, it appears that transgenic mice carrying the chicken 6-crystallin gene do express the protein also in the cerebrum, particularly in the pyramidal neurons of the piriform cortex (125).Moreover, chicken 6-crystallin DNA sequences could be demonstrated by in situ hybridization in mouse teratocarcinoma cell lines that carried the chicken gene stably incorporated (126). Differences in stability and efficiency of 6-crystallin expression have been observed upon transfection and microinjection of the gene into the nuclei of mouse cells (127).About 90% of the daughter cells showed expression of the protein after transfection. On the other hand, microinjection yielded only 30% of expression. Hybridization and sequence data revealed homology between chicken and duck b-crystallin genes, both in exons and introns and in the 5’-noncoding region. The introns showed about 30% sequence similarity (128). Simultaneous injection of GC-box-containing fragments from 6crystallin, simian virus early, and herpes simplex virus type 1 tk-promoters suppressed b-crystallin expression in lens. However, co-injection with DNA
276
HANS BLOEMENDAL AND WILFRIED W. DE JONC
fragments lacking the GC box did not have this effect. The suppression phenomenon was ascribed to an SP-1-like transcription factor (129). Lens-specific enhancer activity involved in the regulation of 8-crystallin gene expression has been found in the third intron of the chicken 8,-crystallin gene. Removal of this sequence totally abolished the expression of the corresponding protein, whereas reinsertion of the enhancer, either internally or in the upstream or downstream position, restored the expression (130). There are various methylation sites in the 8-crystallin gene (131).Hypomethylation of 8-crystallin genes occurs in post-mitotic lens cells, a result that presumably implies some kind of excision mechanism in lenticular tissue (132). Increased sensitivity of the b-crystallin gene to endogenous DNase has been observed upon terminal differentiation of chick lens ( 1 3 2 ~ )A. similar effect could also be demonstrated for P-tubulin and vimentin. Insulin and insulin-like growth factor I play a role in 8-crystallin expression in developing chick lens (132b). Okazaki et ul. (133) determined the tissue-specific expression of a chick a-crystallin gene in mouse lens cells by introduction of a hybrid a/& crystallin gene into cultured mouse lens cells in primary culture. DNA sequences located 242-189 bp upstream from the transcription initiation site were required for high-level expression in lens cells. Replacement of this sequence by the enhancer sequences of Moloney murine leukemia virus resulted in expression of the hybrid gene in both lens cells and fibroblasts.
3. HUMANGENES Only two y genes have originally been identified in human lens tissue. Nevertheless, it appears that the human y-crystallins are encoded by at least seven different genes. By chromosome mapping with genomic D N A probes containing human y-crystallin gene sequences, six of these genes were mapped, leading to the assignment of this human crystallin gene family to chromosome 2 (134).The cluster presumably occurs on the long arm of the chromosome. Sequence analyses of three of the human y-crystallin genes revealed that three genes are potentially active, whereas two represent closely related pseudogenes (135). Two of the genes and one of the pseudogenes are oriented in a head-to-tail fashion within 22.5 kilobases (136). One of the pseudogenes has been assumed to be functional before inactivation, which suggests that their identical mutation was generated by gene conversion. In a comparison of the coding sequences of six human genes, six rat genes, and four mouse genes encoding y-crystallin, a uniform rate of evolution of all regions of the corresponding proteins was observed (137). The same authors reported the linkage of PB,- and PB,-crystallin genes in humans and rats (138). Characterization of the human PA,/PA,
LENS PROTEINS AND THEIR GENES
277
gene again confirmed ancestral relationships among the Ply-crystallin superfamily. The sequence of the corresponding human PA3/PAl-crystallin was predicted from the derived mRNA sequence (139).
4. AMPHIBIANGENES With the aid of cloned cDNA probes, it was shown that at least four structurally similar, but not identical, y-crystallin genes exist in the frog (140).
111. Concluding Remarks Recent developments in lens research have demonstrated that certain enzyme proteins occurring in nonlenticular tissues in minor quantities exist in the lens in high concentration and seem therefore to be recruited as structural elements in evolution. Furthermore, it appears that at least one of the typical structural proteins of the vertebrate lens, aB-crystallin, is by no means lens-specific. For example, heart, brain, skeletal muscle, kidney, and possibly other tissues express this characteristic component of the acrystallin aggregate. Furthermore, we are beginning to understand how genes are expressed and regulated in the eye lens. Studies using gene constructs containing crystallin 5’-regulating sequences enabled the mapping of positive and negative control elements responsible for tissue-specific expression. However, many features, such as post-transcriptional modifications, still have to be clarified at this level. The use of transgenic animals undoubtedly will be a most valuable tool, not only to elucidate the regulation of lens gene expression, but also to serve as a means to unravel the metabolic disorders leading to opacification in the lens. One may even speculate that, as far as the inherited forms of cataract are concerned, gene dissection and/or replacement will eventually lead to therapy for such a disorder, which is still the major cause of blindness in the Third World (141).
REFERENCES 1 . H. Bloemendal (ed.),“Molecular and Cellular Biology of the Eye Lens.” Wiley, New York,
1981. 2. H . Bloemendal, CRC Crit. Reu. Biochem. 12, 1 (1982). 3. J. J. Harding and M. J. C. Crabbe, i n “The Eye” (H. Davson, ed.), p. 207. Academic Press,
Orlando, Florida, 1984. 4. H . Maisel (ed.), “The Ocular Lens.” Dekker, New York, 1985. 5. G. J. Wistow and J. Piatigorsky, ARB 57, 479 (1988). 6. C. Slingsby, TZBS 10, 281 (1985). 7. M. Delaye and A. Tardieu, Nature 302, 415 (1983). 8 . M. Maiti, M. Kono and B. Chakrabarti, FEBS Lett. 236, 109 (1988).
278
HANS BLOEMENDAL AND WILFRIED W. DE JONG
9 . J. Horwitz, M. McFall-Ngai, L.-L. Ding and 0. Yaron, in “The Lens: Transparency and Cataract” (G. Duncan, ed.), p. 227. Eurage, Rijswijk, The Netherlands, 1986. 10. J. Piatigorsky, FASEB J . 3, 1933 (1989). 11. W. W. de Jong and W. Hendriks, J . Mol. Eool. 24, 121 (1986). 12. J. Piatigorsky and G . J. Wistow, Cell 57, 197 (1989). 13. T. D. Ingolia and E. A. Craig, PNAS 79, 2360 (1982). 14. W. W. de Jong, J. A. M. Leunissen, P. J. M. Leenen, A. Zweers and M. Versteeg, JBC 263, 5141 (1988). 15. R. van den Heuvel, W. Hendriks, W. Q u a and H. Bloemendal, J M B 185, 273 (1985). 16. S. P. Bhat and C. N. Nagineni, BBRC 158, 319 (1989). 17. R. A. Dubin, E. F. Wawrousek and J. Piatigorsky, MCBiol9, 1083 (1989). 18. T. Iwaki, A. Kume-Iwaki and J. E. Coldman, J . Histochem. Cytochem. 38, 31 (1990). 19. J. R . Duguid, R. G . Rohwer and B. Seed, PNAS 85, 5738 (1988). 20. T. Iwaki, A. Kume-Iwaki, R. K. H. Liem and J. E. Goldman, Cell 57, 71 (1988). 21. A. J. M. Vermorken and H. Bloemendal, Nature 271, 779 (1978). 22. M. A. Thompson, J. W. Hawkins and J. Piatigorsky, Gene 56, 173 (1987). 23. W. W. de Jong, W. A. Hoekman, J. W. M. Mulders and H. Bloemendal, J . Cell Biol. 102, 104 (1986). 24. V. Dene, D. W. Dunne, K. S. Johnson, D. W. Taylor and J. S. Cordingley, Mol. Biochem. Parasitol. 21, 179 (1986). 25. S. Lindquist and E. A. Craig, ARGen 22, 631 (1988). 26. D. Young, R. Lathigra, R. Hendrix, D. Sweetser and R. A. Young, PNAS 85,4267 (1988). 27. A.-P. Arrigo, J. P. Suhan and W. J. Welch, MCBiol 8, 5059 (1988). 28. J. F. Koretz and R. C. Augusteyn, Curr. Eye Res. 7, 25 (1988). 29. A. Spector, R. Chiesa, J. Sredy and W. Garner, PNAS 82, 4712 (1985). 30. J. M. Rossi and S . Lindquist, J . Cell Biol. 108, 425 (1989). 31. M. J. Schlesinger, J . Cell B i d . 103, 321 (1986). 32. N. C. Colleir, J. Heuser, M. A. Levy and M. Schlesinger, J. Cell Biol. 106, 1131 (1988). 33. G . Schuster, D. Even, K. Kloppstech and I. Ohad, E M B O J . 7, 1 (1988). 34. M. F. Barbe, M. Tytell, D. J. Gower and W. J. Welch, Science 241, 1817 (1988). 35. G . J. Wistow, J. W. M. Mulders and W. W. de Jong, Nature 326, 622 (1987). 36. W. Hendriks, J. W. M. Mulders, M. A. Bibhy, C. Slingsby, H. Bloemendal and W. W. de Jong, PNAS 85, 7114 (1988). 37. A. C. Wilson, N. 0. Kaplan, L. Levine, A. Pesce, M. Reichlin and W. S. Allison, FP 23, 1258 (1964). 38. S.-H. Chiou, W.-P. Chang and H.-K. Lin, BBA 957, 313 (1988). 39. G . J. Wistow, T. Lietman, L. A. Williams, S . 0. Stapel, W. W. de Jong, J. Horwitz and J. Piatigorsky, J . Cell B i d . 107, 2729 (1988). 40. J. Piatigorsky, W. E. O’Brien, B. L. Norman, K. Kalumuck, G. J. Wistow, T. Borris, J. M. Nickerson and E. F. Wawrousek, PNAS 85, 3479 (1988). 4 1 . L . 4 . Yeh, A. Elzanowski, L. T. Hunt and W. C. Barker, Comp. Biochem. Physiol. 89B, 433 (1988). 42. R. F. Doolittle, Nature 336, 18 (1988). 43. T. Matsubasa, M. Takiguchi, Y. Amaya, I. Matsuda and M. Mori, PNAS 86, 592 (1989). 44. G. G. Gause, J . , S. I. Tomarev, R. D. Zinovieva, K. G . Arutyunyan and S . M. Dolgilevich, in “The Lens: Transparency and Cataract” (G. Duncan, ed.), p. 171. Eurage, Rijswijk, The Netherlands, 1986. 45. K. Watanabe, Y. Fujii, K. Nakayama, H. Ohkubo, S. Kuramitsu, H. Kagamiyama, S. Nakanishi and 0. Hayaishi, PNAS 85, 11 (1988).
LENS PROTEINS AND THEIR GENES
279
46. D. Carper, C. Nishimura, T. Shinohara, B. Dietzchold, G . Wistow, C. Craft, P. Kador and J. H. Kinoshita, FEBS Lett. 220, 209 (1987). 47. U. Oechsner, V. Magdolen and W. Bandlow, FEBS Lett. 238, 123 (1988). 48. J. W. M. Mulders, W. Hendriks, W. M. Blankesteijn, H. Bloemendal and W. W. de Jong, JBC 263, 15462 (1988). 49. Q. L. Huang, P. Russell, S. H. Stone and J. S. Zigler, Jr., Curr. Eye Res. 6, 725 (1987). 50. J. S. Zigler, Jr., and X.-Y. Du, Inoest. Ophthlmol. Visual Sci. 30 (Suppl.), 266 (1989). 51. W. Hendriks, J. Sanders, L. de Leij, F. Ramaekers, H. Bloemendal and W. W. de Jong, EJB 174, 133 (1988). 52. A. Rodokanaki, R. K. Holmes and T. Borrh, Gene 78, 215 (1989). 53. T. BorrAs, B. Persson and H. Jornvall, Bchem 28, 6133 (1989). 54. N. H. Lubsen, H. J. M. Aarts and J. G . G. Schoenmakers, Prog. Biophys. Mol. Biol. 51, 47 (1988).
55. M. Nakamura, P. Russell, D. A. Carper, G . Inana and J. H. Kinoshita, JBC 263, 19218 (1988). 56. G. Wistow, L. Summers and T. Blundell, Nature 315, 771 (1985). 57. G . Wistow, T. Lietman, A. Anderson and J. Piatigorsky, Inuest. Ophthalmol. Visual Sci. 30 (Suppl.), 267 (1989).
58. R. M. Clayton, J. C. Jeanny, D. J. Bower and L. H. Errington, Curr. Top. Deu. Biol. 20, 137 (1986).
59. S. Ohno, Trends Genet. 1, 160 (1985). 60. H. Pelham, Nature 332, 776 (1988). 61. R. W. Nickells and L. W. Browder, J . Cell Biol. 107, 1901 (1988). 62. G . R. Anderson and B. K. Farkas, Bchem 27, 2187 (1988). 63. S. I. Tomarev and R . D. Zinovieva, Nature 336, 86 (1988). 64. S. P. Bhat and J. Piatigorsky, PNAS 76, 3299 (1979). 65. H. J. Dodemont, P. M. Andreoli, R. J. M. Moormann, F. C. S. Ramaekers, J. G . G . Schoenmakers and H. Bloemendal, PNAS 78, 5320 (1981). 66. R. J. M . Moormann, H. M. W. van der Velden, H. J. Dodemont, P. M. Andreoli, H. Bloemendal and J. G . G . Schoenmakers, NARes 9, 4813 (1981). 67. C. R. King, T.Shinohara and J. Piatigorsky, Science 215, 985 (1982). 68. G. Inana, T.Shinohara, J. V. Maizel, Jr., and J. Piatigorsky, JBC 257, 9064 (1982). 69. T. Shinohara, E. A. Robinson, E. Appella and J. Piatigorsky, PNAS 79, 2783 (1982). 70. R. J. M. Moormann, J. T.den Dunnen, H. Bloemendal and J. G. G . Schoenmakers, PNAS 79, 6876 (1982).
71. H. P. C. Driessen, P. Herbrink, H. Bloemendal and W. W. de Jong, Exp. Eye Res. 31,243 (1980).
72. 73. 74. 75. 76. 77. 78. 79. 80. 81.
J. T. den Dunnen, R. J. M. Moormann and J. G. G. Schoenmakers, BBA 824,295 (1985). J. Piatigorsky, MCB 59, 33 (1989). K. Yasuda, N. Nakajima, T. Isobe, F. S. Okada and Y. Shimura, EMBO I. 3, 1397 (1984). D. J. Bower, L. H. Errington, N. R. Wainwright, C. Sime, S. Morris and R. M. Clayton, BJ 201, 339 (1981). E. F. Wawrousek, J. M. Nickerson and J. Piatigorsky, FEBS Lett. 205, 235 (1986). J. F. Hejtmancik, D. C. Beebe, H. Ostrer and J. Piatigorsky, Deu. Biol. 109, 72 (1985). C. A. Peterson and J. Piatigorsky, Gene 45, 139 (1986). D. S. McDevitt and L. R. Croft, Exp. Eye Res. 25, 473 (1977). J. A. Treton, R. E. Jones, C. R. King and J. Piatigorsky, Erp. Eye Res. 39, 513 (1984). Y. Quax-Jeuken, C. Janssen, W. Q u a , R. van den Heuvel and H. Bloemendal, J M B 180, 457 (1984).
280
HANS BLOEMENDAL AND WILFRIED W. DE JONC
82. M. B. Gorin and J. Horwitz, Curr. Eye Res. 3, 939 (1984). 82a. G. L. M. van Rens, W. W. de Jong, C. Slingsby and H. Bloemendal, Gene in press
(1991). 83. S. P. Bhat and A. Spector, DNA 3, 287 (1984).
84. R. E. Hay, W. D. Woods, R. L. Church and J. M. Petrash, BBRC 146, 332 (1987). 85. R. E. Hay and J. M. Petrash, BBRC 148, 31 (1987). 86. F. J. van de Ouderaa, W. W. de Jong and H. Bloemendal, E]B 39, 207 (1973). 87. Y. Quax-Jeuken, H. Driessen, J. Leunissen, W. Quax, W. de Jong and H. Bloemendal, E M B O ]. 10, 2597 (1985). 88. H. Bloemendal, J. Piatigorsky and A. Spector, Erp. Eye Res. 48, 465 (1989). 89. L. R. Croft, Ciba Found. Symp. 19, 207 (1973). 90. S. I. Tomarev, S. M. Dolgilevich, G. L. Kogan, R. D. Zinovieva and G . G. Gause, Dokl. Akad. Nauk SSSR 261, 1476 (1981). 91. S . I. Tomarev, R. D. Zinovieva, S. M. Dolgilevich, A. S. Kraev, K. G. SkryabinandG. G . Gause, Dokl. Biochem. 273, 388 (1987). 92. S. I. Tomarev, A. S. Kraev, K. G. Skryabin, G. G. Gause and A. A. Bayev, Dokl. Akad. Nauk SSSR 263, 1485 (1982). 93. S . V. Luchin, S. I. Tomarev, S. M. Dolgilevich, A. S. Kraev, A. S. Skriabin and G. G. Gause, Dokl. Akad. Nauk 279, 233 (1984). 94. S. V. Luchin, R. D. Zinovieva, S. I. Tomarev, S. M. Dolgilevich, G . G . Gause, B. Bax, Jr., H. Driessen and T. L. Blundell, BBA 916, 163 (1987). 95. S. I. Tomarev, R. D. Zinovieva, S. M. Dolgilevich, A. S. Krayev, K. G . SkryabinandG. G. Gause, Jr., FEBS Lett. 162, 47 (1983). 96. T.-N. Chang and W.-C. Chang, BBA 910, 89 (1987). 97. T.Chang, Y.-J. Jiang, S.-H. Chiou and W.-C. Chang, BBA 951, 226 (1988). 98. S. I. Tomarev, R. D. Zinovieva, S. M. Dolgilevich, S. V. Luchin, A. S. Krayev, K. 6. Skryabin and G. G. Gause, Jr., FEBS Lett. 171, 297 (1984). 99. S. 0. Stapel, A. Zweers, H. J. Dodemont, J. H. Kan and W. W. de Jong, E]B 147, 129 (1985). 100. G. Wistow and J. Piatigorsky, Science 236, 1554 (1987). 101. L. Mulleners, P. Andreoli, H. Bloemendal and J. G. G. Schoenmakers, ] M B 171, 353 (1983). 102. S. Lok, L . 4 . Tsui, T. Shinohara, J. Piatigorsky, R. Gold and M. Breitman, NARes 12, 4517 (1984). 103. J. T. den Dunnen, R. J. M. Moormann, N. H. Lubsen and J. G . G. Schoenmakers, JMB 189, 37 (1986). 104. P. M. M. Kastrop, K. E. P. van Roozendaal and J. G. C . Schoenmakers, E]B 157, 203 (1986). 105. S. Lok, M. Breitman, A. Chepelinsky, J. Piatigorsky, R. J. M. Gold and L.-C. Tsui, MCBiol5, 2221 (1985). 106. D. R. Goring, J. Rossant, S. Clapoff, M. L. Breitman and L.-C. Tsui, Science 235, 456 (1987). 107. G . Inana, J. Piatigorsky, B. Norman, C. Slingsby and T. Blundell, Nature 302,310(1983). 108. R. J. M. Moormann, R. Jongbloed and J. G. G . Schoenmakers, Gene 29, 1 (1984). 109. L. H. Cohen, L. W. Westerhuis, W. W. de Jong and H. Bloemendal, EJB 89,259 (1978). 110. H. Bloemendal, L. H. Cohen and W. W. de Jong, Znterdiscip. Top. Gerontol. 12, 261 (1978). 111. C. R. King and J. Piatigorsky, Cell 32, 707 (1983). 112. L. C. Skow and M. E. Donner, Genetics 110, 723 (1985).
LENS PROTEINS AND THEIR GENES
28 1
113. Y. Qua-Jeuken, W. Quax, G. van Rens, P. Meera Khan and H. Bloemendal, PNAS 82, 5819 (1985). 114. J. T. Wijnen, M. Oldenburg, H. Bloemendal and P. Meera Khan, Znt. Workshop Hum. Gene Mapping, loth, New Haoen Con& (1989). 115. R. H. Brakenhoff, A. H. M. Geurts van Kessel, M.Oldenburg, J. T. Wijnen, H. Bloemendal, P. Meera Khan and J. G. Schoenmakers, Hum. Genet. 85, 237 (1990). 116. T. Iwaki, A. Kume-Iwaki and J. E. Goldman, J . Histochem. Cytochem. 38, 31 (1990). 117. K. A. Mahon, A. B. Chepelinsky, J. S . Khillan, P. A. Overbeek, J. Piatigorsky and H. Westphal, Science 235, 1622 (1987). 118. A. B. Chepelinsky, B. Sommer and J. Piatigorsky, MCBiol. 7, 1807 (1987). 118a. T. Nakamura, K. A. Mahon, R. Miskin, A. Dey, T. Kuwabara and H. Westphal, New Biol. 1, 193 (1989). 118b. A. B. Chepelinsky, J. S. Khillan, K. A. Mahon, P. A. Overbeek, H. Westphal and J. Piatigorsky, Enoiron. Health Perspect. 75, 17 (1987). 119. S. P. Bhat, R. E. Jones, M. A. Sullivan and J. Piatigorsky, Nature 284, 234 (1980). 120. R. E. Jones, S. P. Bhat, M. A. Sullivan and J. Piatigorsky, PNAS 77, 5879 (1980). 121. K. Yasuda, H. Kondoh, T. S. Okada, N. Nakajima and Y. Shimura, NARes 10,2879 (1982). 122. M. Ohno, H. Sakamoto and K. Yasuda, NARes 13, 1593 (1985). 123. H. Kondoh, K. Yasuda and T. S. Okada, Nature 301, 440 (1983). 124. K. Agata, K. Yasuda and T. S. Okada, Deo. Biol. 100, 222 (1983). 125. H. Kondoh, K. Katoh, Y. Takahashi, H. Fujisawa, M. Yokoyama, S. Kimura, M. Katsuki, M. Saito and T. Nomura, Deo. Biol. 120, 177 (1987). 126. Y. Takahashi, T. S. Okada and H. Kondoh, Deo. Growth D i g . . 27, 606 (1985). 127. H.-X. Xie, Cell Struct. Funct. 8, 315 (1983). 128. J. Piatigorsky, B. Norman and R. E. Jones, 1. Mol. Eool. 25, 308 (1987). 129. S. Hayashi and H . Kondoh, MCBiol 6, 4130 (1986). 130. S. Hayashi, K. Goto, T. S. Okada and H. Kondoh, Genes Deo. 1, 818 (1987). 131. R. E. Jones, D. DeFeo and J. Piatigorsky, JBC 256, 8172 (1981). 132. C. H. Sullivan and R. M. Grainger, PNAS 84, 329 (1987). 132a. A. S. Muel, M. Laurent, E. Chaudun, J. Alterio, R. Clayton, Y. Courtois and M. F. Counis, Mutat. Res. 219, 157 (1989). 132b. J. Alemany, P. Zelenka, J. Serrano and F. de Pablo, JBC 264, 17559 (1989). 133. K. Okazaki, K. Yasuda, H. Kondoh and T. S. Okada, EMBO J . 4,2589 (1985). 134. H . F. Willard, S. 0. Meakin, L.-C. Tsui and M. L. Breitman, Somatic Cell Mol. Genet. 11, 511 (1985). 135. S. 0. Meakin, R. P. Du, L.-C. Tsui and M. L. Breitman, MCBiol 7, 2671 (1987). 136. S. 0. Meakin, M. L. Breitman and L.-C. Tsui, MCBiol5, 1408 (1985). 137. H. J. M. Aarts, J. T. den Dunnen, J. Leunissen, N. H. Lubsen and J. G. G. Schoenmakers, 1. Mol. Eool. 27, 163 (1988). 138. H. J. M. Aarts, J. T. den Dunnen, N. H. Lubsen and J. G. G. Schoenmakers, Gene 59,127 (1987). 139. D. Hogg, L. C. Tsui, M. Gorin and M . L. Breitman,JBC 261, 1240 (1986). 140. S. I. Tomarev, R. D. Zinovieva, P. Chalovka, A. S. Krayev, K. G. Skryabin and G . G . Gause, Gene 27, 301 (1984). 141. C . R. Dawson and I. R. Schwab, Bull. W.H.O. 59, 493 (1981).
This Page Intentionally Left Blank
Index
A Acceptor stem, recognition of tRNA and, 44-58 Alanine synthetases, recognition of tRNA and, 44-51 Amino-acid-acceptor specificity, recognition of tRNA and, 25-29 Amino-acid sequences, proline-rich protein multigene families and, 9-1 1 Aminoacyl-tRNA synthetases, recognition of tRNA by, see tRNA, recognition of Amplification, of DNA sequences, see DNA sequence amplification in mammalian cells Antibody precipitation, nuclear RNAbinding proteins and, 185 Anticodon, recognition of tRNA and, 29-44
genetic defects of mammalian nervous system and, 242, 253, 2.55 Clones, genetic defects of mammalian nervous system and, 244, 250-251 Complementary DNA genetic defects of mammalian nervous system and, 250-251 lenticular, 269-272 proline-rich protein multigene families and, 9-11 Conservative transposition, DNA sequence amplification in mammalian cells and, 226-228 Crosslinking, structural elements in RNA and, 165 a-Crystallin, lens protein and, 261-263
D B Biotin, nuclear RNA-binding proteins and, 185-186
C Candidate mRNA, genetic defects of mammalian nervous system and, 243 Candidate protein, genetic defects of mammalian nervous system and, 243 cDNA, see Complementary DNA Chromatid exchange, DNA sequence amplification in mammalian cells and, 219220 Chromatin, DNA sequence amplification in mammalian cells and, 211-213 Chromogranins, genetic defects of mammalian nervous system and, 251 Chromosomes DNA sequence amplification in mammalian cells and, 208-211
Deletion, DNA sequence amplification in mammalian cells and, 224-226 Discriminator base at position 73, recognition of tRNA and, 44-58 DNA lens, 269-276 structural elements in RNA and, 170-171 DNA sequence amplification in mammalian cells, 203-204, 232 abnormally banding chromosome regions, 208-21 1 cell lines, 228-232 cytological characteristics of DNA, 207213 double-minute chromatin bodies, 211-213 historical development of amplification field, 205-206 mechanisms, 218-219 conservative transposition, 226-228 deletion, 224-226 episome formation, 224-226 re-replication, 220-224 unequal sister-chromatid exchange, 219-220 283
284
INDEX
occurrence, 206-207 structure of amplified DNA sequences, 213-218 Double-minute chromatin bodies, DNA sequence amplification in mammalian cells and, 211-213
E Enzymes, lens protein and, 263-265 Episome formation, DNA sequence amplification in mammalian cells and, 224226 Escherichio coli, recognition of tRNA and anticodon, 30-43 complex of glutamine tRNA and glutamine synthetase, 66-71 Escherichiu coli synthetases, recognition of tRNA and, 55-57 arginine, 58-60 glutamine, 53-55 isoleucine, 64-65 phenylalanine, 62-63 serine, 51-53
F Filter binding, nuclear RNA-binding proteins and, 182-184 Fluorescence energy transfer, structural elements in RNA and, 165 Fluorescence quenching, nuclear RNAbinding proteins and, 184
G Gene expression proline-rich protein multigene families and, 16-18 ribosome biogenesis in yeast and post-transcriptional regulation, 109-111 structure, 103-104 transcriptional regulation, 104-109 Genes, lenticular, 272-276 Genetic defects of mammalian nervous system, 241-242, 254-256 making mutants, 254
neural mutants, 242-243 rds gene, 243-250 secretogranin 111, 250-254 Glutamine synthetase. recognition of tRNA and, 66-67
H Hairpins, structural elements in RNA and, 136-140 Heat-shock proteins, lens protein and, 261263 Helix interactions, structural elements in RNA and, 153-157
1 Lens DNA, 269 lenticular cDNAs, 269-272 lenticular genes, 272-276 Lens proteins, 259-261, 277 a-crystallin, 261-263 enzymes, 263-265 evolution, 265-267 stress connection, 267-269 Lenticular cDNAs, 269-272 Lenticular genes, 272-276
M Mammalian nervous system, genetic defects of, see Genetic defects of mammalian nervous system Mapping, genetic defects of mammalian nervous system and, 253 Messenger RNA genetic defects of mammalian nervous system and, 243-244 proline-rich protein multigene families and, 6-9 Methionine synthetase, recognition of tRNA and, 40-43 Mobility shift, nuclear RNA-binding proteins and, 185 Multigene families, proline-rich protein, see Proline-rich protein multigene families
285
INDEX
Mutants genetic defects of mammalian nervous system and, 242-244, 253-254 structural elements in RNA and, 166
N Nervous system, mammalian, genetic defects of, see Genetic defects of mammalian nervous system Nuclear magnetic resonance, structural elements in RNA and, 162-164 Nuclear RNA-binding proteins, 179-181, 198-199 RNA recognition motif family of proteins, 187-188 direct interaction, 188-192 domains, 194-195 origins, 188 RNA processing, 197-198 specificity, 192-194 structural features, 195-196 trancription, 196-197 RNA-protein interactions sequences, 186-187 in oitro detection, 181-186 Nucleocytoplasmic transport, ribosome biogenesis in yeast, 118-123
P Phenotype, genetic defects of mammalian nervous system and, 243-244, 253-254 Pre-RNA, ribosome biogenesis in yeast and, 111-1 16 Proline-rich protein multigene families, 1-5, 20 amino-acid sequences, 9-11 cDNAs, 9-11 cell-free translation analysis, 6-9 functional aspects of, 18-20 mRNA, 6-9 regulation of gene expression, 16-18 sequences, 9-16 Protein genetic defects of mammalian nervous system and, 243, 248-253 lens, 259-269, 277
nuclear RNA-binding proteins, see Nuclear RNA-binding proteins proline-rich, see Proline-rich protein multigene families ribosome biogenesis in yeast and, 103111, 117-118 structural elements in RNA and, 167-169 Pseudoknots, RNA and, 150-152
R Random selection, genetic defects of mammalian nervous system and, 250-251 rds gene, genetic defects of mammalian nervous system and, 243-250 Re-replication, DNA sequence amplification in mammalian cells and, 220-224 Ribosomal RNA, see Ribosome biogenesis in yeast Ribosome biogenesis in yeast, 89-91 expression of ribosomal-protein genes post-transcriptional regulation, 109- 111 structure, 103-104 transcriptional regulation, 104-109 modification of ribosomal proteins, 117118 nucleocytoplasmic transport and assembly, 118-123 pre-RNA modification of, 116 processing of, 111-116 transcription of ribosomal-RNA genes 5-S rRNA and yeast RNA Pol 111, 103 initiation by yeast RNA Pol I, 92-99 regulation and yeast RNA Pol I, 101102 structure, 91-92 termination by yeast RNA Pol I, 99-101 RNA, ribosomal, see Ribosome biogenesis in yeast RNA, structural elements in, 131-132 determination of RNA structure, 161-162 chemical modification, 165-166 crosslinking, 165 fluorescence energy transfer, 165 mutation, 166 nuclear magnetic resonance, 162-164 protein-RNA interactions protein-duplex interactions, 167
INDEX
protein-loop binding, 167-168 recognition of three-dimensional structure, 168-169 RNA-DNA interactions, 170-171 RNA-RNA interactions, 169-170 secondary structure, 132-134 bulge loops, 140-141 duplexes, 134-136 hairpins, 136-140 internal loops, 142-145 junctions, 145-147 phylogenetic comparison, 148 prediction of, 147-149 single-stranded regions, 136 thermodynamic stability, 148-159 tertiary interactions, 150 helix-helix, 156-157 loop-loop interactions, 152 phylogenetic comparison, 158 predictions of, 157-160 pseudoknots, 150-152 single strand-helix, 153-156 thermodynamic stabilities, 158-160 three-dimensional structure, 160-161 RNA-binding proteins, see Nuclear RNAbinding proteins RNA recognition motif family of proteins,
187-196 RNA processing, 197-198 trancription, 196-197
S Salivary gland proline-rich protein, see Proline-rich protein multigene families Secretogranin 111, genetic defects of mammalian nervous system and, 250-254 Selection, genetic defects of mammalian nervous system and, 250-251 Sequences DNA, amplification in mammalian cells and, see DNA sequence amplification in mammalian cells nuclear RNA-binding proteins and, 186-
187 proline-rich protein multigene families and, 9-16 Stress, lens protein and, 267-269
Synthetases, recognition of tRNA by, see tRNA, recognition of
T Transcription genetic defects of mammalian nervous system and, 255-256 nuclear RNA-binding proteins and, 196-
197 ribosome biogenesis in yeast and, 104111 Transcription of ribosomal-RNA genes, ribosome biogenesis in yeast and 5-S rRNA and yeast RNA Pol 111, 103 initiation by yeast RNA Pol I, 92-99 regulation and yeast RNA Pol I, 101-102 structure, 91-92 termination by yeast RNA Pol I, 99-101 Translation, proline-rich protein multigene families and, 6-9 Transposition, DNA sequence amplification in mammalian cells and, 226-228 tRNA, recognition of, 23-24, 81-82 acceptor stem and discriminator base, 44,
57-58 alanine synthetases, 44-51 E . coli glutamine synthetase, 53-55 E. coli serine synthetase, 51-53 E. coli synthetases, 55-57 anticodon, role of, 29-30, 44 E. coli tRNAs, 30-43 yeast tRNAs, 43-44 assays of amino-acid-acceptor specificity,
25-29 binding domains of synthetases, 72-81 complex of E . coli glutamine tRNA and glutamine synthetase, 66-71 modified bases, 65-66 E . coli isoleucine synthetase, 64-65 yeast arginine synthetase, 65 profiles, 63-64 E. coli arginine synthetase, 58-60 E . coli phenylalanine synthetase, 6263 yeast phenylalanine synthetase, 60-
62 recognition versus identity, 24
287
INDEX
U Ultraviolet-cross-linking, nuclear RNAbinding proteins and, 184-185 Unequal sister-chromatid exchange, DNA sequence amplification in mammalian cells and, 219-220
W WestNorthern blotting, nuclear RNAbinding proteins and, 186
Y Yeast, ribosome biogenesis in, see Ribosome biogenesis in yeast Yeast arginine synthetase, recognition of tRNA and, 65 Yeast phenylalanine synthetases, recognition of tRNA and, 60-62 Yeast tRNA, recognition of tRNA and, 4344
This Page Intentionally Left Blank